Microsoft Azure Data Lake Store connection
To access your data in Microsoft Azure Data Lake Store, create a connection asset for it.
Azure Data Lake Store (ADLS) is a scalable data storage and analytics service that is hosted in Azure, Microsoft's public cloud. The Microsoft Azure Data Lake Store connection supports access to both Gen1 and Gen2 Azure Data Lake Storage repositories.
Create a connection to Microsoft Azure Data Lake Store
To create the connection asset, you need these connection details:
- WebHDFS URL: The WebHDFS URL for accessing HDFS.
To connect to a Gen 2 ADLS, use the format,https://<account-name>.dfs.core.windows.net/<file-system>
Where<account-name>
is the name you used when you created the ADLS instance.
For<file-system>
, use the name of the container you created. For more information, see the Microsoft Data Lake Storage Gen2 documentation.
- Tenant ID: The Azure Active Directory tenant ID
- Client ID: The client ID for authorizing access to Microsoft Azure Data Lake Store
- Client secret: The authentication key that is associated with the client ID for authorizing access to Microsoft Azure Data Lake Store
For Credentials, you can use secrets if a vault is configured for the platform and the service supports vaults. For information, see Using secrets from vaults in connections.
Choose the method for creating a connection based on where you are in the platform
In a project Click Add to project > Connection. See Adding a connection to a project.
In a catalog
Click Add to catalog > Connection. See Adding a connection asset to a catalog.
In a deployment space
Click Add to space > Connection. See Adding connections to a deployment space.
In the Platform assets catalog
Click New connection. See Adding platform connections.
Next step: Add data assets from the connection
Where you can use this connection
You can use Microsoft Azure Data Lake Store connections in the following workspaces and tools:
Analytics projects
- DataStage (DataStage service)
- Metadata import (Watson Knowledge Catalog)
- Notebooks (Watson Studio). Use the insert-to-code function to get the connection credentials and load the data into a data structure. See Load data from data source connections.
- SPSS Modeler (SPSS Modeler service)
Catalogs
- Platform assets catalog
- Other catalogs (Watson Knowledge Catalog)
Azure Data Lake Store authentication setup
To set up authentication, you need a tenant ID, client (or application) ID, and client secret.
- Gen1:
- Create an Azure Active Directory (Azure AD) web application, get an application ID, authentication key, and a tenant ID.
- Then, you must assign the Azure AD application to the Azure Data Lake Store account file or folder. Follow Steps 1, 2, and 3 at Service-to-service authentication with Data Lake Store using Azure Active Directory.
- Gen2:
- Follow instructions in Acquire a token from Azure AD for authorizing requests from a client application. These steps create a new identity. After you create the identity, set permissions to grant the application access to your ADLS. The Microsoft Azure Data Lake Store connection will use the associated Client ID, Client secret, and Tenant ID for the application.
- Give the Azure App access to the storage container using Storage Explorer. For instructions, see Use Azure Storage Explorer to manage directories and files in Azure Data Lake Storage Gen2.
Supported file types
The Microsoft Azure Data Lake Store connection supports these file types: Avro, CSV, Delimited text, Excel, JSON, ORC, Parquet, SAS, SAV, SHP, and XML.
Learn more
Parent topic: Supported connections