IBM Db2 Big SQL considerations for GDPR readiness

This document is intended to help you in your preparations for GDPR readiness. It provides information about features of Db2 Big SQL that you can configure, and aspects of the product’s use, that you should consider to help your organization with GDPR readiness. This information is not an exhaustive list, due to the many ways that clients can choose and configure features, and the large variety of ways that the product can be used in itself and with third-party applications and systems.

Notice

Clients are responsible for ensuring their own compliance with various laws and regulations, including the European Union General Data Protection Regulation. Clients are solely responsible for obtaining advice of competent legal counsel as to the identification and interpretation of any relevant laws and regulations that may affect the clients’ business and any actions the clients may need to take to comply with such laws and regulations.

The products, services, and other capabilities described herein are not suitable for all client situations and may have restricted availability. IBM does not provide legal, accounting, or auditing advice or represent or warrant that its services or products will ensure that clients are in compliance with any law or regulation.

GDPR
Product configuration - considerations for GDPR Readiness
Data life cycle
Data collection
Data storage
Data access
Data processing
Data deletion
Data monitoring
Responding to data subject rights

GDPR

General Data Protection Regulation (GDPR) has been adopted by the European Union ("EU") and applies from May 25, 2018.

Why is GDPR important?

GDPR establishes a stronger data protection regulatory framework for processing of personal data of individuals. GDPR brings:

New and enhanced rights for individuals
Widened definition of personal data
New obligations for processors
Potential for significant financial penalties for non-compliance
Compulsory data breach notification

Read more about GDPR

Product configuration - considerations for GDPR Readiness

Db2 Big SQL provides the features and capabilities that are needed to help IBM customers meet their GDPR responsibilities. This document is intended to provide guidance for the capabilities that are relevant to the needs of IBM customers under this legislation.

Configuration to support data handling requirements

The GDPR legislation requires that personal data is strictly controlled and that the integrity of the data is maintained. This requires the data to be secured against loss through system failure and also through unauthorized access or via theft of computer equipment or storage media.

For information on protecting your system against loss through system failure, see Backing up and recovering.

For information on protecting against unauthorized access or via theft of computer equipment or storage media, see Security and governance.

Data life cycle

GDPR requires that personal data is:

Processed lawfully, fairly and in a transparent manner in relation to individuals.
Collected for specified, explicit and legitimate purposes.
Adequate, relevant and limited to what is necessary.
Accurate and, where necessary, kept up to date. Every reasonable step must be taken to ensure that inaccurate personal data are erased or rectified without delay.
Kept in a form which permits identification of the data subject for no longer than necessary.

Whether a Db2 Big SQL database contains personal data depends on the business needs and objectives of an IBM customer.

IBM customers are responsible for ensuring that appropriate consent is in place for the collection and storage of personal data within a Db2 Big SQL system. IBM customers are also responsible for configuring Db2 Big SQL to ensure that the data is secured throughout its residence in a Db2 Big SQL database.

This document is intended to give insight into how Db2 Big SQL interacts with personal data when it is stored in a Db2 Big SQL database. This document also identifies specific aspects that might need to be considered by IBM customers.

Personal data used for online contact with IBM

Db2 Big SQL customers can submit online comments/feedback/requests to contact IBM about Db2 Big SQL subjects in a variety of ways, primarily:

Public comments area on pages of Db2 Big SQL documentation in IBM Knowledge Center
Public comments area on pages in the IBM Db2 Big SQL and Hadoop Developer Community

Typically, only the client name and email address are used, to enable personal replies for the subject of the contact, and the use of personal data conforms to the IBM Privacy Statement.

Data collection

IBM Db2 Big SQL (or Big SQL as simply called) is the IBM SQL engine in the Apache Hadoop environment. You can use the familiar standard SQL syntax in Big SQL, as well as SQL extensions, with Hadoop-based technologies. You can use Big SQL to query, analyze, and summarize data. The specific data collected and stored in the Hadoop filesystem is determined by the customer based on their business requirements; this data may or may not include personal data for clients of the customer.

Db2 Big SQL itself does interact with a specific subset of personal data related to those individuals (referred to as users) who actually establish a direct connect to the database system, either through an application or using other Db2 Big SQL interfaces such as the Command Line Processor (CLP). These users may or may not be clients depending on how the customer has decided to engage with their clients. The personal information that Db2 Big SQL collects includes the external user ID and credentials (e.g. password) used to establish the connection, the IP address for the connection, and the DB2 Big SQL authorization IDs associated with the user ID.

Customer business logic and processes: The customer decides how and when client data is collected within their business process(es). SQL statements and Db2 Big SQL utilities are used by the customer to present the data to Db2 Big SQL for any desired processing and access within Db2 Big SQL as required by their business.
Authentication: Any attempt to connect to a database must be authenticated and Db2 Big SQL requires the presentation of an external user ID and credentials (e.g. password) for this purpose. This information is passed by Db2 Big SQL to the authentication service configured by the Db2 Big SQL customer (during Db2 Big SQL configuration) and, assuming a successful authentication, this service will then provide Db2 Big SQL with the individual and group Db2 Big SQL authorization IDs associated with that specific user in its records. This information is associated with that specific connection for the life of the connection.
Backup: Db2 Big SQL provides its customers with the ability to backup the data contents of a metadata to an independent file in a customer defined location. This backup file will contain any customer data located in the specified Db2 Big SQL source location (e.g. database or tablespace) at the time of the backup request.
Transaction logs: Db2 Big SQL transaction logs, used for recovering a database after a failure or recovering to a specific point-in-time from a database backup, can contain some of the personal information collected by Db2 Big SQL about the connection which made the change(s) as well as the customer data that was changed. Customer have the ability to define a location where Db2 Big SQL will archive old transaction logs for long-term storage.
Audit logs: If Big SQL Audit is enabled by the customer, the audit logs will contain some of the personal information associated with the connection by Big SQL. Also, if the EXECUTE category of audit is configured by the customer, records for these events could contain any customer data revealed through SQL statement text in the form of literals or data arguments provided for host variables or parameter markers in the SQL statement.
Diagnostic information: For diagnostic purposes, the contents of the Db2 Big SQL diagnostic log (db2diag.log) can contain some of the personal information associated with the Db2 Big SQL connection. As well, in the event of a service event occurring within Db2 Big SQL (e.g. an unexpected error or termination), additional diagnostic files can be created; these files can contain the personal information associated with the Db2 Big SQL connection as well as any customer data revealed through SQL statement text in the form of literals or data arguments provided for host variables or parameter markers in the SQL statement.
Monitoring: Some Db2 Big SQL monitoring interfaces can be used to access both the personal information associated with the connection by Db2 Big SQL as well as any customer data revealed through SQL statement text in the form of literals or data arguments provided for host variables or parameter markers in the SQL statement.
Database catalog tables: Certain actions can result in Db2 Big SQL recording the Db2 Big SQL authorization ID associated with the currently connected user into its internal catalog tables as a record of ownership or permanent permission for that authorization ID. Examples of these actions include the creation of a database object or the granting of a database permission by or for the connected user.
Db2 Big SQL configuration files: As part of its database manager and database configuration files, Db2 Big SQL can request information related to IP addresses and user IDs needed to access other (non-Db2 Big SQL) services.

Data storage

The following Data Storage mechanisms are used by Db2 Big SQL which users may wish to consider when assessing their GDPR readiness.

Storage of account data
Storage of client data
Storage in backups
Storage in archives

Storage of account data

Account information used by Db2 Big SQL to authenticate individuals is stored in a security facility outside of the product. The security facility can be part of the operating system or a separate product and the customer is responsible for determining how and where this information is controlled within the security facility. For more information, refer to Authentication.

Storage of client data

The customer explicitly inserts or loads any Hadoop data to be assigned to specific tables that they have created within the database. The physical location of the data in those tables is determined the table definition. For more information, refer to the CREATE TABLE statement.

Storage in backups

The customer determines when and where database backups will occur through their configuration of Db2 Big SQL and/or their use of the BACKUP command. For more information, refer to Backing up and recovering.

Storage in archives

Any archiving of Db2 Big SQL metadata is handled by the backup and restore utility mentioned above.

Data access

The customer has complete control over what authorities and privileges are made available to any user who can connect to the database. For more information, refer to Authorization.

By default when Db2 Big SQL is installed, a default database is created and it is not considered restrictive. For more information see Default privileges granted on the bigsql database. If strict control over access is desired, it is recommended to use security products such as IBM Guardium Data Protection for Databases to evaluate the access control model in place for Hadoop environments or any such products.

Separation of duties

While Db2 Big SQL provides the ability to implement separation of duties through its granular authorization model, it does not enforce this policy. The customer is responsible for ensuring that is policy is properly implemented and maintained.

Privileged administrators

The DATAACCESS authority gives users the ability to access the customer data in any table in the database and this authority should only be granted when absolutely needed. Customers should be aware that the DATAACCESS authority is granted by default when the DBADM authority is granted unless the WITHOUT DATAACCESS clause is used in the GRANT statement. The DATAACCESS authority also gives the user the ability to execute interfaces to collect monitoring data and to review the contents of the Big SQL diagnostic log. For more information see Authorization and GRANT (database authorities) statement.

If it is necessary to have users with the DATAACCESS authority in place, the customer can implement additional protection against inappropriate access by this authority to customer data in tables by implementing row and column access control (RCAC) on the tables containing sensitive data. For more information about row level and column level access control, respectively, refer to CREATE PERMISSION statement and CREATE MASK statement.

The SECADM authority gives users the security administration authority for the database which allows them to create and manage security-related database objects as well as grant and revoke all database authorities and privileges. A user with this authority can also extract data from the Db2 Big SQL audit files. This authority should only be granted when absolutely needed. For more information, refer to Database authorization.

The ACCESSCTRL authority gives users the ability to grant and revoke privileges on objects within the database and should only be granted when absolutely needed. Customers should be aware that the ACCESSCTRL authority is granted by default when the DBADM authority is granted unless the WITHOUT ACCESSCTRL clause is used in the GRANT statement. For more information, refer to Database authorization.

Administrators

The DBADM authority and SQLADM authority both allow the user to execute interfaces to collect monitoring data and to review the contents of the Big SQL diagnostic log. For more information refer to Database authorization.

Activity logs

Db2 Big SQL provides the ability to configure and enable audit logs through its Audit facility. Access to the data in these audit log files is controlled by the file permissions on the files and the EXECUTE privilege on the procedure.

Data processing

Encryption in motion

The Db2 Big SQL server supports over-the-wire encryption for database client applications by using the IBM Data Server Driver for JDBC and SQLJ (type 4 connections) that connects through SSL. You can also use the IBM-provided Kerberos security plug-in library (IBMkrb5) for remote JDBC/ODBC connections to Db2 Big SQL for Kerberos authentication for these clients. refer to Authentication.

Similarly, secure communications should also be considered in an HADR environment where personal data may appear in the transaction logs flowing between the primary and standby databases. For more information, refer to Configuring SSL for the communication between primary and standby HADR servers.

Encryption at rest

In order to protect database files, transaction logs, and backups while they are at rest on external storage media, it is recommended that this data be encrypted. For more information, refer to Encryption.

Db2 Big SQL supports HDFS transparent encryption that is available in Hadoop ecosystem, which means that data is decrypted as it is read, but the files themselves remain encrypted.

Data deletion

Client data deletion

Personal data collected by the customer about their clients and stored within Db2 Big SQL can be deleted from individual tables using the DELETE or TRUNCATE SQL statements or by dropping the tables which contain the data. This will make the data inaccessible to future access within the database although it will still potentially exist in existing database backups, archived transaction logs, or archived audit files and will continue to do so until those files are removed.

To remove references to a user Db2 Big SQL authorization ID from the internal catalog tables, action must be taken to revoke any authorization granted to that authorization ID and to drop or transfer ownership of any object created by that ID.

It is also possible that some references to the deleted personal data could still exist in the Db2 Big SQL diagnostic log or in additional diagnostic files. The customer is responsible for managing these files and removing them if they are no longer needed.

Account data deletion

To remove the Db2 Big SQL database, the customer needs to remove the complete Db2 Big SQL installation, the customer can uninstall the product and then follow standard clean up practices detailed in Uninstalling Db2 Big SQL.

Data monitoring

Log monitoring

The transaction scope for Db Big SQL is very limited. The data definition statements that affect Db2 Big SQL tables is Auto commit by default. If the statement is successful it will auto commit. If the statement fails, it will automatically roll back. As Big SQL does not store any data by itself and only references files in HDFS, INSERT into (Hadoop tables) statements are not transactional operations. Data that is processed with a LOAD HADOOP USING is put into a temporary location until the processing is completed. If the LOAD HADOOP USING fails, the data is discarded. More details on transactional behavior for Hadoop tables, refer to Transactional behavior of Hadoop tables.

Data monitoring

Db Big SQL provides optional Auditing capabilities which can be configured by the customer to view a variety of different activity within the database including access and changes to data in the database. There are also external products such as IBM Guardium Data Protection for Databases which can be used to audit database activity.

Activity monitoring

Db2 Big SQL includes monitoring tools to analyze the performance of the service such as the Db2 Big SQL query interface to analyze distributed file system (DFS) readers and writers, cluster health tools, and service monitoring metrics. For more information, refer to Monitoring.

Responding to data subject rights

The customer is responsible for meeting data subject rights through their database configuration and design, application logic, and business processes. Customers must manage and account for the deletion or modification of any personal data that is collected by Db2 Big SQL or by customer database application logic.

You can use ordinary data manipulation (DML) SQL statements to delete or modify any identified personal information that is stored in the database itself. However, remember that the original data might still exist in archived transaction logs and database backup images. Personal information might also exist in extra files that are associated with the database, such as the audit files and diagnostic files.