IBM Cloud Pak for Data considerations for GDPR readiness
This document helps you with your preparations for GDPR readiness. It provides information about features of IBM Cloud Pak® for Data that you can configure, and aspects of the product's use that you should consider to help your organization with GDPR readiness.
This information is not an exhaustive list, due to the many ways that customers can choose and configure features, and the large variety of ways that the product can be used in itself and with third-party applications and systems.
Customers are responsible for ensuring their own readiness for the laws and regulations, including the European Union General Data Protection Regulation. Customers are solely responsible for obtaining advice of competent legal counsel as to the identification and interpretation of any relevant laws and regulations that may affect the customers‚ business, and any actions the customers might need to take to comply with such laws and regulations.
The products, services, and other capabilities described herein are not suitable for all customer situations and might have restricted availability. IBM® does not provide legal, accounting, or auditing advice or represent or warrant that its services or products will ensure that customers are ready for any law or regulation.
Quick links
- GDPR
- Configuration to support data handling requirements
- Configuring the product to support data privacy
- Configuring the product to support data security
- Configuring the product to support data security
- Data lifecycle
- Data storage
- Data access
- Data processing
- Data deletion
- Responding to data subject rights
GDPR
General Data Protection Regulation (GDPR) has been adopted by the European Union (EU) and applies from May 25, 2018.
GDPR establishes a stronger data protection regulatory framework for processing of personal data of individuals. GDPR brings:
- New and enhanced rights for individuals
- Widened definition of personal data
- New obligations for processors
- Potential for significant financial penalties for non-compliance
- Compulsory data breach notification
Read more about GDPR
- EU GDPR Information Portal: https://gdpr.eu/
- IBM GDPR website: https://www.ibm.com/data-responsibility/gdpr/
Configuration to support data handling requirements
The GDPR legislation requires that personal data is strictly controlled and that the integrity of the data be maintained. This requires the data to be secured against loss through systems failure and also loss through unauthorized access, or via theft of computer equipment or storage media.
You can deploy and configure Cloud Pak for Data in an environment where security measures are in place to address data handling requirements that are related to the GDPR. The product is back office in nature and designed to reside in a secure environment while operating. The method that you choose to approach this requirement will vary depending on your business requirements.
It is recommended that you have an overall high level understanding of the topology of the nodes and components that make up this product and the environments where it exists.
You can find more detail about approaches and factors related to securing an environment for the product in Security considerations.
Note that Cloud Pak for Data is a facilitator for moving data (which could include personal data) between varying different data storage technologies (databases, queueing technologies, file storage). This guide only covers relevant topics that are related to the product itself. GDPR-related configuration and security information of potential source and target components is outside the scope of this information.
Configuring the product to support data privacy
A common approach to addressing data privacy is to limit and divide access to data and processing functions to small groups and or individuals on an as-required basis. You can take this approach with Cloud Pak for Data.
The overall security of the systems that are involved is covered in Security considerations, but one aspect of data privacy is to control who can access a system where the product exists. This type of control limits the overall number of individuals who can come into potential contact with data. Note that this should include not just product -specific users, but also overall system administrators as well.
Access to the actual product installation and configuration of specific source and/or target agents should also be limited to a small group. This is especially important because these areas provide direct access to data storage (both the source and target databases through the associated access credentials) and also product operational data.
Normally implementing privacy controls is a function that is limited to the product administrators in charge of setting up and maintaining the product. Regular operators who use Cloud Pak for Data to solve a business function should be a separately controlled group without this access.
IBM Documentation has specific information on installation and configuration of Cloud Pak for Data.
Configuring the product to support data security
Data security with Cloud Pak for Data is accomplished by deploying and operating the product in a secure environment (encrypted file systems, VPN technologies that include secure network connections, firewall security and controlling access to the system perimeter). Direct security of data within the scope of the product is managed by controlling the access to administrator operators on the system where the product is installed along with the product administration account (the account under which Cloud Pak for Data is installed).
You also need to consider security of the data in cases where support trace logging is configured. By default, the product captures this data to the same location where the product is installed (and this should be on a secure file system). The process by which you manage these captured logs, including their retention and distribution, can affect data security and you should develop a secure procedure that meets GDPR handling requirements.
Note also that the product requires access to individual databases to operate. Managing the credentials that are required for connecting to databases is important to achieving a secure environment. Cloud Pak for Data does not store passwords for individual databases. Passwords for users on the platform are stored in an encrypted database that is included with the product.
Data lifecycle
Cloud Pak for Data is data agnostic, not specifically aware of the nature of data that it is handling other than at a technical level (encoding, data type, size). As such, the product can never be aware of the presence (or lack thereof) of personal data, except for the aforementioned cases where a customer has explicitly defined handling data that might be personal in nature. It is up to customer discretion if there is the possibility that personal data is present in the data that is being moved by this product.
More information is available in IBM Documentation about the general high-level process for data handing by Cloud Pak for Data.
Because you can define specialized handling of specific data (for example data filtering and translations), in some cases you can explicitly define data that is personal in nature to be handled in a specific way. For example, using the Cloud Pak for Data catalog you can classify certain metadata columns as sensitive. Or you can run a transformation job and not replicate the transactions that are executed on the source database by user X. In these cases, the product agents would be acting on the personal data, although not in such a way that the product is aware of the nature of the data and only based on the customer-defined rules. All the data is stored in relational tables and thus is also restricted in access to authorized users with adequate data access privileges as enforced by the data governance system.
During special, customer-controlled situations such a diagnostic servicing, product trace logging could be enabled, which could result in data being captured in these logs. These logs would be persisted to disk storage where the product is installed and typically you provide the logs to IBM to assist with servicing. Best practice is to immediately remove these logs from the system after you provide them to service to end the life-cycle of any potential personal data that they might contain.
To avoid sharing personal data with IBM in logs that are collected for product servicing, ensure that this data is either removed or rendered no longer personal.
The nature of the data, the handling, and the lifecycle has its own specifics related to personal data and the GDPR. For information on that, consult product documentation.
Personal data used for online contact with IBM
Cloud Pak for Data customers can submit online comments/feedback/requests to contact IBM about Cloud Pak for Data subjects in a variety of ways, primarily:
- Public comments area on pages in the Cloud Pak for Data community
Typically, only the customer name and email address are used to enable personal replies for the subject of the contact, and the use of personal data conforms to the IBM Online Privacy Statement.
Data storage
Transparency and data minimization
Because Cloud Pak for Data is data agnostic and does not collect data for the intention of storage, no personal data is directly available for review of transparency and thus the product meets the principle of data minimization.
Protection
The product does handle data that could include personal data and in some cases this data could reside on disk storage at some points in the data lifecycle. Access controls are built on top of system and component mechanisms (for example operating system and database access controls) to control and limit product access.
To protect data, you must provide a secure environment in which to run the product. For data that might reside in temporary disk storage caches or trace log files, a supplementary full disk volume encryption solution is required. In the case of the data transmission between nodes, it is recommended that a VPN solution be employed (either software based or a physical hardware-based technology).
Additionally, standard system and IT security approaches (such as firewalls and network architectures) should be employed to protect all nodes that are involved with the movement and storage of data from risk of outside attack.
More details on approaches to this and considerations are covered in the Cloud Pak for Data documentation in IBM Documentation.
Additional considerations
- Account data: Most user account data that the product uses is contained and controlled by the system user account management facilities, facilities in related components such as database access controls, and in some cases external controls such as LDAP directories.
- Backups: While there is not a set product backup procedure that would cover the potential areas where GDPR-controlled data could reside at any given time, you could initiate a backup action for a system where the product exists that could inadvertently capture personal data. You must ensure that any actions that you perform on a system that could potentially contain personal data comply with GDPR handling policies.
Data access
Data access in the product is limited to a small number of roles, typically a small number of user accounts:
- The account under which the product is installed and base configuration is performed.
- User account credentials that the configured product uses to access actual data for reading and writing (for source and target agents).
- User accounts that are used for operational configuration and monitoring.
Of these, the last group usually encompasses the largest potential group of users because separate account roles are available for system administration, operations, and monitoring.
Because the product is an intermediary processor of data to and from various sources and targets, you need to access these stores by using various APIs and means. They are all controlled through various different user account credentials depending on the specifics of the databases or queuing technologies. Be aware that these are the user accounts that are being used to gain access to all data.
You can find additional details on account requirements and options in the Cloud Pak for Data documentation in IBM Documentation.
Controlling access to logs
All major components of the product infrastructure have activity and debug logging capabilities. The detail level of logging is configurable with minimal information logged in default mode during normal operations. These logs are visible to the user account that owns the product installation, as well as any superuser administrator accounts. Access to these accounts is controlled through the specific underlying operating system mechanisms for user access control on the specific node where the product is running.
In cases of detailed trace logging being enabled (such as during servicing), or in the case of a product error being encountered, detailed data that is being processed by the product could be captured in the log files. As such any user account with access to the log files could access potential personal data.
IBM support engineers might also need to be given access to logs or data during customer requested product servicing.
Additional considerations
In addition to the separation of duties around overall system administration, product installation and ownership, and data access controls, there is also the separation of accounts and duties for the product configuration, management, and operational control. While these are not related to direct data access, product management users can have privileges that allow them to control the product configuration around which data is moved, along with control of the movement of data. Be sure to consider the specific authorizations to what users or groups of users are allowed this capability.
Data processing
You can control the way data (and potentially personal data) is processed by the product through the product configuration and control interface.
Encryption
The product does not handle the security of the data directly and relies on outside mechanisms to provide a secure environment. The encryption of the environment should be handled through system-level encryption of the file systems and network connections across which the product communicates.
Security profiles and data processing
Such a model of an overall secure system with individual application access controls allows for the securing of the entire system to be handled by one group, such as a security administration team, while allowing only a specific product user team to be able to access and control the product data processing activities. The access to the potential underlying data and the users who are allowed to control the movement of data can be separated as well by separating the product setup access and the user control access accounts.
Specific details on access to data and data processing controls can be found in the Cloud Pak for Data documentation in IBM Documentation.
The details of the actual processing of the data are described in Data lifecycle.
Data deletion
Article 17 of the GDPR states that data subjects have the right to have their personal data removed from the systems of controllers and processors without undue delay.
Cloud Pak for Data is not a forward-facing application for customers and thus does not provide any mechanisms for data subjects to request or control data deletion. All data deletion related to the product can only be accomplished by authorized users.
Also, because the product does not permanently store any data and data that it does come in contact with is purged on a regular basis as part of continuous operation, there is normally no active requirement to delete any data related to the product.
Special cases related to this include:
- Trace/error logging and other data for servicing
- System level backups that capture configuration and operational data on file systems where the product is installed
Best practices should be followed to avoid the possibility of having personal data spread in scenarios like these, such as not backing up operational data for the product as well as a rigorous policy around the collection and management of trace logging data.
Data monitoring
Because Cloud Pak for Data must be installed in a secure environment to achieve the data protection requirements of GDPR, use these security mechanisms to regularly monitor the security state of the product and environment. Consult the product information related to those solutions for details on how to monitor the regular state of system security.
An effective security monitoring and management protocol needs to cover many areas including:
- Overall system security and access
- Product configuration
- Product monitoring
- Monitoring of log and trace data produced by the product
Individual customer needs will vary. Use the tools and functions that are mentioned above as part of developing this overall security management solution for your specific needs.
Responding to data subject rights
Because personal data can potentially flow through the product, a request to delete all personal data could mean data going through the product would fall under that request.
Because the normal product operation is to move data from a source to a target on a continual basis, and never to retain any data for longer than is necessary to perform this function, any data will naturally age out on the end-to-end data flow. This happens as fast as the product is able to process the data.
Special scenarios where this might not be the case could include:
- Service trace/error logs
- System backups covering the product installation and operational file systems of the product
Follow best practices of not backing up operational data for the product as well as a rigorous policy around the collection and management of trace logging to limit the potential spread of personal data.