Microsoft Azure Data Lake Storage connection
To access your data in Microsoft Azure Data Lake Storage, create a connection asset for it.
Azure Data Lake Storage (ADLS) is a scalable data storage and analytics service that is hosted in Azure, Microsoft's public cloud. The Microsoft Azure Data Lake Storage connection supports access to both Gen1 and Gen2 Azure Data Lake Storage repositories.
Create a connection to Microsoft Azure Data Lake Storage
To create the connection asset, you need these connection details based on your deployment:
Common connectivity
- WebHDFS URL: The WebHDFS URL for accessing HDFS.
To connect to a Gen 2 ADLS, use the format,https://<account-name>.dfs.core.windows.net/<file-system>
Where<account-name>is the name you used when you created the ADLS instance.
For<file-system>, use the name of the container you created. For more information, see the Microsoft Data Lake Storage Gen2 documentation.
Select Server proxy to access the Microsoft Azure Data Lake Storage data source through a proxy server. Depending on the setup, a proxy server can provide load balancing, increased security, and privacy. The proxy server settings are independent of the authentication credentials and the personal or shared credentials selection.
- Proxy host: The proxy URL. For example,
https://proxy.example.com. - Proxy port number: The port number to connect to the proxy server. For example,
8080or8443. - Proxy protocol: The proxy server protocol. You can choose either one of two protocols: HTTP or HTTPS.
- Encreypted proxy communication: If you choose HTTP, you can enable this option. This option enables layered tunneling for HTTP proxy communication if the tartget server supports HTTPS and the proxy is configured for encrypted tunneling.
- No proxy: A comma-separated list of hosts to bypass the proxy configured in the connection.
StreamSets
- WebHDFS URL: The WebHDFS URL for accessing HDFS.
To connect to a Gen 2 ADLS, use the format,https://<account-name>.dfs.core.windows.net/<file-system>
Where<account-name>is the name you used when you created the ADLS instance.
For<file-system>, use the name of the container you created. For more information, see the Microsoft Data Lake Storage Gen2 documentation.
Select Secure connection to enable a secure connection using the Azure Blob File System Secure (ABFSS) protocols.
Credentials
You have specific authentication methods based on your deployment:
Microsoft Entra ID is a cloud-based identity and access management service. To obtain connection values for the Entra ID authentication method, sign in to the Microsoft Azure portal and go to your storage account. For information about Microsoft Entra ID, see What is Microsoft Entra ID?.
Common connectivity
Choose the authentication method:
Client credentials
- Tenant ID: The Microsoft Entra tenant ID. To find the tenant ID, go to Microsoft Entra ID > Properties. Scroll down to the Tenant ID field. For more information, see How to find your Microsoft Entra tenant ID.
- Client ID: The client ID for authorizing access to Microsoft Azure Data Lake Storage. To find the client ID for your application, select Microsoft Entra ID. From App registrations, select your application. Click Copy to copy the client ID of your application. For more information, see Register a Microsoft Entra app and create a service principal.
- Client secret: The authentication key that is associated with the client ID for authorizing access to Microsoft Azure Data Lake Storage. To find the client secret for your application, select Microsoft Entra ID. From App registrations, select your application. Go to Certificates & secrets > Client secrets. Click Copy to copy the existing client secret or click New client secret to create a new client secret and copy it. For more information, see Register a Microsoft Entra app and create a service principal.
Username and password
- Tenant ID: The Microsoft Entra tenant ID. To find the tenant ID, go to Microsoft Entra ID > Properties. Scroll down to the Tenant ID field. For more information, see How to find your Microsoft Entra tenant ID.
- Client ID: The client ID for authorizing access to Microsoft Azure Data Lake Storage. To find the client ID for your application, select Microsoft Entra ID. From App registrations, select your application. Click Copy to copy the client ID of your application. For more information, see Register a Microsoft Entra app and create a service principal.
- Username and Password: Username and password for the Microsoft Azure Data Lake Storage account. You need permission to access the file without multi-factor authentication.
Certificates
- SSL certificate: The SSL certificate of the host to be trusted when the host certificate was not signed by a known certificate authority.
StreamSets
Choose the authentication method:
Azure Managed Identity
- Client ID: The client ID for authorizing access to Microsoft Azure Data Lake Storage. To find the client ID for your application, select Microsoft Entra ID. From App registrations, select your application. Click Copy to copy the client ID of your application. For more information, see Register a Microsoft Entra app and create a service principal.
Client credentials
- Tenant ID: The Microsoft Entra tenant ID. To find the tenant ID, go to Microsoft Entra ID > Properties. Scroll down to the Tenant ID field. For more information, see How to find your Microsoft Entra tenant ID.
- Client ID: The client ID for authorizing access to Microsoft Azure Data Lake Storage. To find the client ID for your application, select Microsoft Entra ID. From App registrations, select your application. Click Copy to copy the client ID of your application. For more information, see Register a Microsoft Entra app and create a service principal.
- Client secret: The authentication key that is associated with the client ID for authorizing access to Microsoft Azure Data Lake Storage. To find the client secret for your application, select Microsoft Entra ID. From App registrations, select your application. Go to Certificates & secrets > Client secrets. Click Copy to copy the existing client secret or click New client secret to create a new client secret and copy it. For more information, see Register a Microsoft Entra app and create a service principal.
Azure Data Lake Storage authentication setup
To set up authentication, you need a tenant ID, client (or application) ID, and client secret.
- Gen1:
- Create an Azure Active Directory (Azure AD) web application, get an application ID, authentication key, and a tenant ID.
- Then, you must assign the Azure AD application to the Azure Data Lake Storage account file or folder. Follow Steps 1, 2, and 3 at Service-to-service authentication with Azure Data Lake Storage using Azure Active Directory.
- Gen2:
- Follow instructions in Acquire a token from Azure AD for authorizing requests from a client application. These steps create a new identity. After you create the identity, set permissions to grant the application access to your ADLS. The Microsoft Azure Data Lake Storage connection will use the associated client ID, client secret, and tenant ID for the application.
- Give the Azure App access to the storage container using Storage Explorer. For instructions, see Use Azure Storage Explorer to manage directories and files in Azure Data Lake Storage Gen2.
Supported file types
The Microsoft Azure Data Lake Storage connection supports these file types: Avro, CSV, Delimited text, Excel, JSON, ORC, Parquet, SAS, SAV, SHP, and XML.
Table formats
In addition to Flat file, the Microsoft Azure Data Lake Storage connection supports these Data Lake table formats: Delta Lake and Iceberg.