Remote storage connectivity for Datalake tables
Before any user can create or access a Datalake table, a storage access alias must be created by a database administrator to establish connectivity to the remote storage where the table resides. When the storage access alias is created, the remote storage account credentials are safely stored in an encrypted keystore.
In addition to storing the credentials for storage connectivity, the alias can optionally include a location in the container where the table will be created.
The usage of the storage access alias to create a table is subject to access control and can be granted to a single user, a user group or a database role.
Before you begin
You must have database administration authority (DBADM) to create or delete a storage access alias.
A container must be provisioned on one of the supported object storage services before the storage access alias is created.
The following information must be available to create the storage access alias:
- The name of the container
- The API endpoint used to access the container
- The credentials to access the container. The credentials must have read AND write access to allow table creation.
Creating the storage access alias
CALL SYSIBMADM.STORAGE_ACCESS_ALIAS.CATALOG('alias-name', 'vendor', 'endpoint', 'user', 'password', 'container', 'object', 'grantee-type', 'grantee')with
the following parameters: - alias-name
- The name of the new storage access alias.
- vendor
- Specifies the type of the remote storage. For accessing Datalake tables, only S3 or AZURE is supported.
- endpoint
- Specifies the remote storage endpoint to access the container.
- user
- Specifies the S3 access key or AZURE account name of the remote storage account.
- password
- Specifies the S3 secret key or AZURE account key of the remote storage account.
- container
- Specifies the name of the S3 bucket or AZURE container where the tables are to be created or accessed.
- object
- [Optional] Specifies a parent directory in the container for the Datalake table. If a path is specified for the object parameter, the Datalake table must be located in a subdirectory of that path.
- grantee-type
- Specifies the type of grantee. Use 'U' for a user ID, 'G' for a group name, and 'R' for a database role name.
- grantee
- The user, group, or role granted usage access of the storage access alias. Only users with this access are allowed to use the alias-name to create a Datalake table.
Using the storage access alias to create a Datalake table
DB2REMOTE://alias-name//object/path- alias-name
- Must be the name of an alias to which the user that is creating the table has access. The alias contains the information required to access the container where the table data is to reside.
- object
- Must be the object specified when creating the storage access alias. It represents a path on the container. If no object was specified when creating the storage access alias, this part can remain empty.
- path
- An optional additional path. Appended to the object path, it represents the location of the table data in the container.
For more information, see the CREATE DATALAKE TABLE statement.
Security considerations for Datalake tables
- Only users who have been granted access to a storage access alias, either explicitly or through a group or role can use it to create a table. However, once a table has been created, the access control on the table itself is independent of the storage access alias and is only governed by standard SQL access controls.
- As a single grantee can be given access to any storage access alias, and because it is not possible to create groups on Db2, it is recommended to use database roles to control access to storage access alias for flexibility.
Additional configuration properties required to access the remote storage
Depending on the type of object storage used and the configuration, you might need to provide
additional configuration parameters2 to enable a successful connection. Those
parameters can be set by using the SET_DATALAKE_CONFIG stored procedure. The property file name to
use is CORE.
Deleting the storage access alias
CALL SYSIBMADM.STORAGE_ACCESS_ALIAS.UNCATALOG('alias-name')with
the following parameters:- alias-name
- The name of the storage access alias.
After the storage access alias is deleted, the remote storage service cannot be accessed. In particular, data stored in Datalake tables that were created by using this alias will not be accessible (unless another storage access alias exists that provides access to it).
Example
'saleslake' to access an Amazon
Web Services S3 bucket named 'companybucket' and on this bucket path
'datalake/salesteam'. The alias will be accessible to all members of the database
role
'salesteam_member':CALL SYSIBMADM.STORAGE_ACCESS_ALIAS.CATALOG(
'saleslake', 'S3', 's3.us-east-2.amazonaws.com',
'****', '****', 'companybucket', 'datalake/salesteam',
'R', 'salesteam_member')CALL SYSIBMADM.STORAGE_ACCESS_ALIAS.CATALOG(
'saleslake', 'AZURE', 'https://companyaccount.blob.core.windows.net',
'****', '****', 'companycontainer', 'datalake/salesteam',
'R', 'salesteam_member')Remarks
- By setting the object parameter to the path
'datalake/salesteam'in the CATALOG stored procedure call, the administrator restricts access to the container to only subpaths of that path. - The consumers of the storage alias can create Datalake tables in any subpath that they want underneath it, but cannot access locations outside of this path.
- This gives the administrator the capability to isolate datasets for each team or group of users, while giving the administrator the flexibility to also share that data when desired (by granting access to the tables after they are created through SQL GRANT and REVOKE).