Skip to main content

Azure Data Lake Gen2 From the Command Line

Hands-On Lab

 

Photo of Will Boyd

Will Boyd

DevOps Team Lead in Content

Length

00:30:00

Difficulty

Intermediate

Azure Data Lake Gen2 is built on Azure Blob Storage but offers additional features. With Data Lake Gen2, you can store unstructured Blob data hierarchically, providing greater flexibility in how your data is organized. In this lab, you will have the opportunity to work with Azure Data Lake Gen2 storage from a Linux command line. You will retrieve, edit, and upload some Azure Data Lake Gen2 data from within the Bash Azure Cloud Shell.

What are Hands-On Labs?

Hands-On Labs are scenario-based learning environments where learners can practice without consequences. Don't compromise a system or waste money on expensive downloads. Practice real-world skills without the real-world risk, no assembly required.

Azure Data Lake Gen2 From the Command Line

In this lab, our company Store All the Things!, is using Azure Data Lake Gen2 to manage configuration data, which is used to configure some internal applications. One such application, a backend inventory data processor, requires a configuration change in order to increase the number of threads it uses.

We will download the configuration data from Azure Data Lake Gen2, make the requested change to the file, and re-upload the edited file. We'll be doing this using the Bash Azure Cloud Shell.

Some additional details:

  • The configuration file is located in a storage account with a name that begins with sattconfigs.
  • The configuration file is in a container called configuration.
  • The configuration file is called inventory/processor/invprocessor.conf.
  • In the configuration file, change the line that begins with numTreads= to numTreads=100.

Before We Begin

Before we can get started, we need to open an Incognito window in our preferred web browser, and then use the link provided, log in to the Azure portal using the supplied credentials. In a separate tab, log in to the Azure cloud shell (bash) at shell.azure.com.

Download the Configuration File

Once logged in, we can get started with our lab. To begin, there is a Subscription field. Leave it as the default and then select Advanced Settings. Here, leave the Subscription and Resource group as the defaults and fill in the others as follows:

  • For Cloud Shell region, select West US.
  • For Storage Account, select Use existing, then choose the storage account with the name that begins with cloudshell.
  • For File Share, select Use existing and enter cloudshell.

With everything filled in, select Create storage.

With our storage created, we need to authenticate with the Azure Storage service using the azcopy login command.

The command provides a URL and an authentication code. Open the URL and enter the code that was provided by the shell to authenticate our azcopy cli tool. Once back on the CLI, it shows that the login command is complete.

Next, we need to make our storage account easy to reference. To do this, we need to set an environment variable containing the name of the storage account. The name of our account is found in the Azure Portal and starts with sattconfigs. Enter the command replacing <storage account name> with the name in the portal:

storage_account=<storage account name>

Next, we need to download the configuration file from Azure Data Lake. To do so, we will be using the variable we just made along with the azcopy command:

azcopy copy "https://${storage_account}.dfs.core.windows.net/configuration/inventory/processor/invprocessor.conf" invprocessor.conf

We want to make sure that the file downloaded successfully by viewing the contents. Use the cat invprocessor.conf, and we'll see some configuration data. Seeing this data appear means that we have downloaded the file correctly.

Make Changes and Upload the Configuration File

With our configuration file, we now need to perform edits. To do so, use thevi invprocessor.conf command. Here, change the numThreads configuration value to 100:

...
numTreads=100
...

Save and exit the file.

We now need to upload the edited file to Azure Data Lake, replacing the existing file:

azcopy copy invprocessor.conf "https://${storage_account}.dfs.core.windows.net/configuration/inventory/processor/invprocessor.conf"

Conclusion

Upon completing this lab, we are now able to download, configure, change, and upload data from Azure Data Lake Gen2. Congratulations on finishing the lab!