Integrating with IBM Knowledge Catalog for federated data protection
You can enable federated data protection to integrate your IBM® Knowledge Catalog service with Guardium® Data Protection to help ensure that data protection rules that are defined in IBM Knowledge Catalog are enforced at the data source-level in a consistent way across in-scope data sources. To enable federated data protection, use Guardium query rewrite capabilities to create and run transformations (such as data masking) that are based on IBM Knowledge Catalog rules.
Before you begin
The integration between Guardium and IBM Knowledge Catalog is available for data sources that have data source-specific user-defined functions (UDFs) available. To use the IBM Knowledge Catalog for federated data protection, you must install the UDF for your data source.
- IBM Cloud Pak® for
Data 4.6 or
later with IBM Knowledge Catalog
service.Note: IBM Knowledge Catalog works with the Cloud Pak for Data Data Privacy service to provide masking and transformation. Before you can use this integration, you need to enable both the IBM Knowledge Catalog and the Data Privacy service. Be sure to check the Known issues for Data Privacy (Masking flow) in the IBM Docs for Cloud Pak for Data for any issues that you might need to know about.
- The IBM Knowledge Catalog - Guardium integration must be running.
- One or more users with privileges to run Cloud Pak for Data data protection rules. The users do not need to be admins.
- Guardium Data Protection 11.5 or later.
- A supported data source for which a set of user-defined functions is available along with the user-defined functions for your data source. User-defined functions are precompiled into libraries that are suitable for each data source. For more information about supported data sources and UDFs, see Adding User-Defined Functions (UDFs) for IBM Knowledge Catalog - Guardium integration.
Depending on your data source, you can incorporate query rewrite policies from Guardium into your IBM Knowledge Catalog data protection scenario. For more information, see Setting up a transformation integration.
Architecture notes
- How the IBM Knowledge Catalog - Guardium integration works (IBM Knowledge Catalog view)
- The integration adds the policy enforcement point (PEP) from the IBM Knowledge Catalog XACML model. The PEP allows Guardium to use IBM Knowledge Catalog rules for enhanced data protection.
- How transformation integration works (Guardium view)
- From a Guardium
perspective, the transformation integration takes the following steps:
- A customer defines data protection rules in IBM Knowledge Catalog.
- Guardium sends session and request details to IBM Knowledge Catalog for evaluation in the form of a resource key.
- The verdict from IBM Knowledge Catalog is returned and can include a transformation specification that provides details about how to transform the query.
- Guardium uses its query rewrite capabilities to rewrite the query in accordance with the transformation specification.
- The altered query is forwarded to the database server by an S-TAP (or External S-TAP) and the Guardium sniffer.
- The transformed (such as pseudonymized, redacted, or anonymized) data is returned to the database client.
- How the Guardium default user works
- By default, Guardium always passes the user dbuser to IBM Knowledge Catalog when it requests a decision, and dbuser connects to the database. You can configure Guardium to send appuser to IBM Knowledge Catalog instead.
- Guardium local rule enforcement
- While the integration expects that the data protection rules are defined in IBM Knowledge Catalog, Guardium local rules are also enforced. For example, if a request is blocked in Guardium, that request is blocked regardless of the IBM Knowledge Catalog rules. In fact, Guardium does not even ask IBM Knowledge Catalog for a decision if the local Guardium rules already block access.
Prerequisites and known issues
- Using the same user domain (prerequisite)
- The databases that are protected by Guardium and
IBM Knowledge Catalog must be
configured to use the same user domain. That is, the logged-in Guardium user
must match the user that is defined in IBM Knowledge Catalog.
For example, if the database protected by Guardium sees a user ABC, then that user ABC must have the same meaning in IBM Knowledge Catalog. Sharing the user domain is a key prerequisite for the integration.
- IBM Knowledge Catalog required user permissions
- For Guardium to
connect to IBM Knowledge Catalog, a user with
manage data protection rules
is required. Guardium suggests that this IBM Knowledge Catalog user is dedicated to the Guardium -IBM Knowledge Catalog integration. To create a dedicated user from IBM Knowledge Catalog, take the following steps,- From Cloud Pak for
Data access
control, create a custom role. Assign
manage data protection rules
permission to the role. - From Cloud Pak for Data access control, create a dedicated user and assign the new role to that user.
- Use this user to configure Guardium to connect to IBM Knowledge Catalog.
- From Cloud Pak for
Data access
control, create a custom role. Assign
- Specifying a port to connect to a server
- Some database clients, including MySQL, can use dynamic ports to connect to the database server. For the IBM Knowledge Catalog integration,Guardium needs to know the port so that Guardium can generate the resource key to send to the IBM Knowledge Catalog server.
- IBM Knowledge Catalog asset owner
- To avoid double enforcement, the person who runs the Discovery process and the profiling in the IBM Knowledge Catalog project must be the same person who owns the IBM Knowledge Catalog assets.
- Case sensitivity and white space in tables
- In some databases, two tables can be named ABC and abc. If only one of these tables exists (either ABC or abc), when Guardium requests a decision from IBM Knowledge Catalog, IBM Knowledge Catalog can find the table and return the decision. However, if both tables exist in the catalog, IBM Knowledge Catalog can return only the decision for ABC. In this case, Guardium can never enforce IBM Knowledge Catalog on abc. In addition, white space in object names can cause unpredictable results.
- Changing configuration parameters
- When you change the IBM Knowledge Catalog configuration, either from the wkc_configuration CLI or from the GUI, the existing instance of the IBM Knowledge Catalog PEP SDK connection is deleted. Guardium establishes a new connection to a new IBM Knowledge Catalog PEP SDK instance. However, before the new connection is established, the system cannot get a decision from the IBM Knowledge Catalog server. During that time, you might see errors from IBM Knowledge Catalog.
- Query rewrite: Rewritten SQL statement cannot exceed SQL query length limit
- When using Guardium query rewrite, the length of the rewritten SQL statement must be shorter than the SQL query length limit for the database type.
- Hive-specific prerequisite
- To use Hive with the IBM Knowledge Catalog integration, you must disable both Kerberos and TLS-encryption (SSL). For authentication, use a username and password validation method, such as LDAP.
- Upgrading IBM Knowledge Catalog assets from IBM Cloud Pak for Data 4.0.x or earlier
- If your IBM Knowledge Catalog contains assets that were cataloged earlier than Cloud Pak for Data 4.5.0 and are protected by Guardium through IBM Knowledge Catalog data protection rules, then you need to run the Cloud Pak for Data resource_key job script to regenerate the keys for the assets.
- Specifying a column alias
- You can specify a column alias with the following caveats:
- Use shorten transformed function signature is not available. This feature will be available in a future release.
- You must configure the column alias using the Guardium UI,
rather than using the
store wkc_configuration
CLI.
- Maximum of 2 conditions for row-level filtering
- For row-level filtering, you can include up to two conditions in your IBM Knowledge Catalog query. For more information, see Setting up a transformation integration.