Remote storage connectivity for Datalake tables
Before any user can create or access a Datalake table, a storage access alias must be created by a database administrator to establish connectivity to the remote storage where the table resides. When the storage access alias is created, the remote storage account credentials are safely stored in an encrypted keystore.
In addition to storing the credentials for storage connectivity, the alias can optionally include a location in the container where the table will be created.
The usage of the storage alias to create a table is subject to access control and can be granted to a single user, a user group or a database role.
Before you begin
You must have database administration authority (DBADM) to create or delete a storage alias.
A container must be provisioned on one of the supported object storage services before the storage alias is created.
The following information must be available to create the storage alias:
- The name of the container
- The API endpoint used to access the container
- The credentials to access the container. The credentials must have read AND write access to allow table creation.
Creating the storage alias
CALL SYSIBMADM.STORAGE_ACCESS_ALIAS.CATALOG('alias-name', 'vendor', 'endpoint', 'user', 'password', 'container', 'object', 'grantee-type', 'grantee')with
the following parameters: - alias-name
- The new storage alias name.
- vendor
- Specifies the type of the remote storage. For accessing Datalake tables, only 'S3' is supported.
- endpoint
- Specifies the remote storage endpoint to access the container.
- user
- Specifies the S3 access key of the remote storage account.
- password
- Specifies the S3 secret key of the remote storage account.
- container
- Specifies the name of the S3 container where the tables are to be created or accessed.
- object
- [Optional] Specifies a parent directory in the container for the Datalake table. If a path is specified for the object parameter, the Datalake table must be located in a subdirectory of that path.
- grantee-type
- Specifies the type of grantee. Use 'U' for a user ID, 'G' for a group name, and 'R' for a database role name.
- grantee
- The user, group, or role granted usage access of the access storage alias. Only users with this access are allowed to use the storage alias to create a Datalake table.
Using the storage alias to create a Datalake table
DB2REMOTE://alias-name//object/path- alias-name
- Must be the name of an alias to which the user that is creating the table has access. The alias contains the information required to access the container where the table data is to reside.
- object
- Must be the object specified when creating the storage alias. It represents a path on the container. If no object was specified when creating the storage alias, this part can remain empty.
- path
- An optional additional path. Appended to the object path, it represents the location of the table data in the container.
For more information, see the CREATE DATALAKE TABLE statement.
Security considerations for Datalake tables
- Only users who have been granted access to a storage alias, either explicitly or through a group or role can use it to create a table. However, once a table has been created, the access control on the table itself is independent of the storage alias access and is only governed by standard SQL access controls.
- As a single grantee can be given access to any storage alias, and because it is not possible to create groups on Db2, it is recommended to use database roles to control access to storage alias for flexibility.
A user can list the storage aliases he has access to with the SYSIBMADM.STORAGE_ACCESS_ALIAS.LIST table function.
Deleting the storage alias
CALL SYSIBMADM.STORAGE_ACCESS_ALIAS.UNCATALOG('alias-name')with
the following parameters:- alias-name
- The name of the storage alias.
After the access storage alias is deleted, the remote storage service cannot be accessed. In particular, data stored in Datalake tables that were created by using this alias will not be accessible (unless another access storage alias exists that provides access to it).
Example
'saleslake' to access an Amazon
Web Services S3 bucket named 'companybucket' and on this bucket path
'datalake/salesteam'. The alias will be accessible to all members of the database
role
'salesteam_member':CALL SYSIBMADM.STORAGE_ACCESS_ALIAS.CATALOG('saleslake', 'S3', 's3.us-east-2.amazonaws.com', '****', '****', 'companybucket', 'datalake/salesteam', 'R', 'salesteam_member')'items' in the bucket at location
'datalake/salesteam/landing/items' by running the following
statement:CREATE DATALAKE TABLE items(id int, name varchar(32)) LOCATION 'DB2REMOTE://saleslake//datalake/salesteam/landing/items'Remarks
- By setting the object parameter to the path
'datalake/salesteam'in the CATALOG stored procedure call, the administrator restricts access to the container to only subpaths of that path. - The consumers of the storage alias can create Datalake tables in any subpath that they want underneath it, but cannot access locations outside of this path.
- This gives the administrator the capability to isolate datasets for each team or group of users, while giving the administrator the flexibility to also share that data when desired (by granting access to the tables after they are created through SQL GRANT and REVOKE).