Remote storage connectivity for Datalake tables

Before any user can create or access a Datalake table, a storage access alias must be created by a database administrator to establish connectivity to the remote storage where the table resides. When the storage access alias is created, the remote storage account credentials are safely stored in an encrypted keystore.

In addition to storing the credentials for storage connectivity, the alias can optionally include a location in the container where the table will be created.

The usage of the storage access alias to create a table is subject to access control and can be granted to a single user, a user group or a database role.

Before you begin

You must have database administration authority (DBADM) to create or delete a storage access alias.

A container must be provisioned on one of the supported object storage services before the storage access alias is created.

The following information must be available to create the storage access alias:

  • The name of the container
  • The API endpoint used to access the container
  • The credentials to access the container. The credentials must have read AND write access to allow table creation.

Creating the storage access alias

To create a storage alias to access Datalake tables, issue the following command:
CALL SYSIBMADM.STORAGE_ACCESS_ALIAS.CATALOG('alias-name', 'vendor', 'endpoint', 'user', 'password', 'container', 'object', 'grantee-type', 'grantee')
with the following parameters:
alias-name
The name of the new storage access alias.
vendor
Specifies the type of the remote storage. For accessing Datalake tables, only 'S3' is supported.
endpoint
Specifies the remote storage endpoint to access the container.
user
Specifies the S3 access key of the remote storage account.
password
Specifies the S3 secret key of the remote storage account.
container
Specifies the name of the S3 container where the tables are to be created or accessed.
object
[Optional] Specifies a parent directory in the container for the Datalake table. If a path is specified for the object parameter, the Datalake table must be located in a subdirectory of that path.
grantee-type
Specifies the type of grantee. Use 'U' for a user ID, 'G' for a group name, and 'R' for a database role name.
grantee
The user, group, or role granted usage access of the storage access alias. Only users with this access are allowed to use the alias-name to create a Datalake table.

Using the storage access alias to create a Datalake table

Once a storage access alias has been created, a user who has been granted access to it can use it to create a Datalake table. When creating a Datalake table, a LOCATION clause must be specified. The syntax of the LOCATION clause must be as follows:
DB2REMOTE://alias-name//object/path
alias-name
Must be the name of an alias to which the user that is creating the table has access. The alias contains the information required to access the container where the table data is to reside.
object
Must be the object specified when creating the storage access alias. It represents a path on the container. If no object was specified when creating the storage access alias, this part can remain empty.
path
An optional additional path. Appended to the object path, it represents the location of the table data in the container.

For more information, see the CREATE DATALAKE TABLE statement.

Security considerations for Datalake tables

  • Only users who have been granted access to a storage access alias, either explicitly or through a group or role can use it to create a table. However, once a table has been created, the access control on the table itself is independent of the storage access alias and is only governed by standard SQL access controls.
  • As a single grantee can be given access to any storage access alias, and because it is not possible to create groups on Db2, it is recommended to use database roles to control access to storage access alias for flexibility.

A user can list the storage access aliases he has access to with the SYSIBMADM.STORAGE_ACCESS_ALIAS.LIST table function.

Deleting the storage access alias

To disconnect Db2 from the remote storage service and prevent access, you can delete the storage access alias by running the following command:
CALL SYSIBMADM.STORAGE_ACCESS_ALIAS.UNCATALOG('alias-name')
with the following parameters:
alias-name
The name of the storage access alias.

After the storage access alias is deleted, the remote storage service cannot be accessed. In particular, data stored in Datalake tables that were created by using this alias will not be accessible (unless another storage access alias exists that provides access to it).

Example

The following command creates a storage access alias 'saleslake' to access an Amazon Web Services S3 bucket named 'companybucket' and on this bucket path 'datalake/salesteam'. The alias will be accessible to all members of the database role 'salesteam_member':
CALL SYSIBMADM.STORAGE_ACCESS_ALIAS.CATALOG('saleslake', 'S3', 's3.us-east-2.amazonaws.com', '****', '****', 'companybucket', 'datalake/salesteam', 'R', 'salesteam_member')
It is then possible for a member of the salesteam_member role to create a Datalake table named 'items' in the bucket at location 'datalake/salesteam/landing/items' by running the following statement:
CREATE DATALAKE TABLE items(id int, name varchar(32)) LOCATION 'DB2REMOTE://saleslake//datalake/salesteam/landing/items'

Remarks

  • By setting the object parameter to the path 'datalake/salesteam' in the CATALOG stored procedure call, the administrator restricts access to the container to only subpaths of that path.
  • The consumers of the storage access alias can create Datalake tables in any subpath that they want underneath it, but cannot access locations outside of this path.
  • This gives the administrator the capability to isolate datasets for each team or group of users, while giving the administrator the flexibility to also share that data when desired (by granting access to the tables after they are created through SQL GRANT and REVOKE).