Remote storage connectivity for Datalake tables
Before any user can create or access a Datalake table, a storage access alias must be created by a database administrator to establish connectivity to the remote storage where the table resides. When the storage access alias is created, the remote storage account credentials are safely stored in an encrypted keystore.
In addition to storing the credentials for storage connectivity, the alias can optionally include a location in the container where the table will be created.
The usage of the storage access alias to create a table is subject to access control and can be granted to a single user, a user group or a database role.
Before you begin
You must have database administration authority (DBADM) to create or delete a storage access alias.
A container must be provisioned on one of the supported object storage services before the storage access alias is created.
The following information must be available to create the storage access alias:
- The name of the container
- The API endpoint used to access the container
- The credentials to access the container. The credentials must have read AND write access to allow table creation.
Creating the storage access alias
CALL SYSIBMADM.STORAGE_ACCESS_ALIAS.CATALOG('alias-name', 'vendor', 'endpoint', 'user', 'password', 'container', 'object', 'grantee-type', 'grantee')
with
the following parameters: - alias-name
- The name of the new storage access alias.
- vendor
- Specifies the type of the remote storage. For accessing Datalake tables, only 'S3' is supported.
- endpoint
- Specifies the remote storage endpoint to access the container.
- user
- Specifies the S3 access key of the remote storage account.
- password
- Specifies the S3 secret key of the remote storage account.
- container
- Specifies the name of the S3 container where the tables are to be created or accessed.
- object
- [Optional] Specifies a parent directory in the container for the Datalake table. If a path is specified for the object parameter, the Datalake table must be located in a subdirectory of that path.
- grantee-type
- Specifies the type of grantee. Use 'U' for a user ID, 'G' for a group name, and 'R' for a database role name.
- grantee
- The user, group, or role granted usage access of the storage access alias. Only users with this access are allowed to use the alias-name to create a Datalake table.
Using the storage access alias to create a Datalake table
DB2REMOTE://alias-name//object/path
- alias-name
- Must be the name of an alias to which the user that is creating the table has access. The alias contains the information required to access the container where the table data is to reside.
- object
- Must be the object specified when creating the storage access alias. It represents a path on the container. If no object was specified when creating the storage access alias, this part can remain empty.
- path
- An optional additional path. Appended to the object path, it represents the location of the table data in the container.
For more information, see the CREATE DATALAKE TABLE statement.
Security considerations for Datalake tables
- Only users who have been granted access to a storage access alias, either explicitly or through a group or role can use it to create a table. However, once a table has been created, the access control on the table itself is independent of the storage access alias and is only governed by standard SQL access controls.
- As a single grantee can be given access to any storage access alias, and because it is not possible to create groups on Db2, it is recommended to use database roles to control access to storage access alias for flexibility.
A user can list the storage access aliases he has access to with the SYSIBMADM.STORAGE_ACCESS_ALIAS.LIST table function.
Deleting the storage access alias
CALL SYSIBMADM.STORAGE_ACCESS_ALIAS.UNCATALOG('alias-name')
with
the following parameters:- alias-name
- The name of the storage access alias.
After the storage access alias is deleted, the remote storage service cannot be accessed. In particular, data stored in Datalake tables that were created by using this alias will not be accessible (unless another storage access alias exists that provides access to it).
Example
'saleslake'
to access an
Amazon Web Services S3 bucket named 'companybucket'
and on this bucket path
'datalake/salesteam'
. The alias will be accessible to all members of the database
role
'salesteam_member'
:CALL SYSIBMADM.STORAGE_ACCESS_ALIAS.CATALOG('saleslake', 'S3', 's3.us-east-2.amazonaws.com', '****', '****', 'companybucket', 'datalake/salesteam', 'R', 'salesteam_member')
'items'
in the bucket at location
'datalake/salesteam/landing/items'
by running the following
statement:CREATE DATALAKE TABLE items(id int, name varchar(32)) LOCATION 'DB2REMOTE://saleslake//datalake/salesteam/landing/items'
Remarks
- By setting the object parameter to the path
'datalake/salesteam'
in the CATALOG stored procedure call, the administrator restricts access to the container to only subpaths of that path. - The consumers of the storage access alias can create Datalake tables in any subpath that they want underneath it, but cannot access locations outside of this path.
- This gives the administrator the capability to isolate datasets for each team or group of users, while giving the administrator the flexibility to also share that data when desired (by granting access to the tables after they are created through SQL GRANT and REVOKE).