Integrating with IBM Knowledge Catalog for federated data protection

You can enable federated data protection to integrate your IBM® Knowledge Catalog service with Guardium® Data Protection to help ensure that data protection rules that are defined in IBM Knowledge Catalog are enforced at the data source-level in a consistent way across in-scope data sources. To enable federated data protection, use Guardium query rewrite capabilities to create and run transformations (such as data masking) that are based on IBM Knowledge Catalog rules.

Before you begin

The integration between Guardium and IBM Knowledge Catalog is available for data sources that have data source-specific user-defined functions (UDFs) available. To use the IBM Knowledge Catalog for federated data protection, you must install the UDF for your data source.

To integrate the Guardium query rewrite with IBM Knowledge Catalog transformation, you need:
  • IBM Cloud Pak® for Data 4.6 or later with IBM Knowledge Catalog service.
    Note: IBM Knowledge Catalog works with the Cloud Pak for Data Data Privacy service to provide masking and transformation. Before you can use this integration, you need to enable both the IBM Knowledge Catalog and the Data Privacy service. Be sure to check the Known issues for Data Privacy (Masking flow) in the IBM Docs for Cloud Pak for Data for any issues that you might need to know about.
  • The IBM Knowledge Catalog - Guardium integration must be running.
  • One or more users with privileges to run Cloud Pak for Data data protection rules. The users do not need to be admins.
  • Guardium Data Protection 11.5 or later.
  • A supported data source for which a set of user-defined functions is available along with the user-defined functions for your data source. User-defined functions are precompiled into libraries that are suitable for each data source. For more information about supported data sources and UDFs, see Adding User-Defined Functions (UDFs) for IBM Knowledge Catalog - Guardium integration.

Architecture notes

How the IBM Knowledge Catalog - Guardium integration works (IBM Knowledge Catalog view)
The integration adds the policy enforcement point (PEP) from the IBM Knowledge Catalog XACML model. The PEP allows Guardium to use IBM Knowledge Catalog rules for enhanced data protection.
In IBM Knowledge Catalog, a PEP cache stores evaluation responses (decisions) that are received from IBM Knowledge Catalog data protection rules. Each decision is an instance of a computed outcome that is based on a combination of the current policy space and user context.
How transformation integration works (Guardium view)
From a Guardium perspective, the transformation integration takes the following steps:
  • A customer defines data protection rules in IBM Knowledge Catalog.
  • Guardium sends session and request details to IBM Knowledge Catalog for evaluation in the form of a resource key.
  • The verdict from IBM Knowledge Catalog is returned and can include a transformation specification that provides details about how to transform the query.
  • Guardium uses its query rewrite capabilities to rewrite the query in accordance with the transformation specification.
  • The altered query is forwarded to the database server by an S-TAP (or External S-TAP) and the Guardium sniffer.
  • The transformed (such as pseudonymized, redacted, or anonymized) data is returned to the database client.
How the Guardium default user works
By default, Guardium always passes the user dbuser to IBM Knowledge Catalog when it requests a decision, and dbuser connects to the database. You can configure Guardium to send appuser to IBM Knowledge Catalog instead.
In a 3-tier application environment, appuser is the application’s user. However, IBM Knowledge Catalog has no notion of either dbuser or appuser. IBM Knowledge Catalog expects the data protection rules to apply to whichever user who receives the data.
Guardium local rule enforcement
While the integration expects that the data protection rules are defined in IBM Knowledge Catalog, Guardium local rules are also enforced. For example, if a request is blocked in Guardium, that request is blocked regardless of the IBM Knowledge Catalog rules. In fact, Guardium does not even ask IBM Knowledge Catalog for a decision if the local Guardium rules already block access.
However, if the local Guardium rules allow access, then Guardium asks IBM Knowledge Catalog for a decision. The decision that is returned by IBM Knowledge Catalog is then enforced by Guardium.

Prerequisites and known issues

Before you configure the integration, make sure that both the Guardium Data Protection and IBM Knowledge Catalog integration points are set up correctly. You need to understand (and possibly address) the following issues:
Using the same user domain (prerequisite)
The databases that are protected by Guardium and IBM Knowledge Catalog must be configured to use the same user domain. That is, the logged-in Guardium user must match the user that is defined in IBM Knowledge Catalog.

For example, if the database protected by Guardium sees a user ABC, then that user ABC must have the same meaning in IBM Knowledge Catalog. Sharing the user domain is a key prerequisite for the integration.

IBM Knowledge Catalog required user permissions
For Guardium to connect to IBM Knowledge Catalog, a user with manage data protection rules is required. Guardium suggests that this IBM Knowledge Catalog user is dedicated to the Guardium -IBM Knowledge Catalog integration. To create a dedicated user from IBM Knowledge Catalog, take the following steps,
  1. From Cloud Pak for Data access control, create a custom role. Assign manage data protection rules permission to the role.
  2. From Cloud Pak for Data access control, create a dedicated user and assign the new role to that user.
  3. Use this user to configure Guardium to connect to IBM Knowledge Catalog.
For more information, see Managing Cloud Pak for Data users in the IBM Cloud Pak for Data documentation.
Specifying a port to connect to a server
Some database clients, including MySQL, can use dynamic ports to connect to the database server. For the IBM Knowledge Catalog integration,Guardium needs to know the port so that Guardium can generate the resource key to send to the IBM Knowledge Catalog server.
To ensure that Guardium can always find the correct port, use the IBM Knowledge Catalog replace API to specify a port (for instance, set the port to 194). For example,
curl -X PATCH --header 'Content-Type: application/json' --header 'Accept: application/json' --header 'Authorization: Bearer eyJhbT
...
AjiDAgutg' -d '[ {"op":"replace","path":"/metadata/resource_key","value":"0000:0000:0000:0000:0000:FFFF:092E:2222|194|GUARD_TEST:/guard_test/test_mysql"} ]' 'https://cpd-wkc.apps.guardium-wkc450.my.company.com:443/v2/assets/4b302249-7fe3-4e60-a34b-1ce09a364d10?catalog_id=2aa86b2f-7a1d-4c97-aa65-085ad312345&hide_deprecated_response_fields=false'
IBM Knowledge Catalog asset owner
To avoid double enforcement, the person who runs the Discovery process and the profiling in the IBM Knowledge Catalog project must be the same person who owns the IBM Knowledge Catalog assets.
Case sensitivity and white space in tables
In some databases, two tables can be named ABC and abc. If only one of these tables exists (either ABC or abc), when Guardium requests a decision from IBM Knowledge Catalog, IBM Knowledge Catalog can find the table and return the decision. However, if both tables exist in the catalog, IBM Knowledge Catalog can return only the decision for ABC. In this case, Guardium can never enforce IBM Knowledge Catalog on abc. In addition, white space in object names can cause unpredictable results.
Guardium suggests that for databases involved in the IBM Knowledge Catalog integration, you do not use case-sensitive table names or white spaces.
Changing configuration parameters
When you change the IBM Knowledge Catalog configuration, either from the wkc_configuration CLI or from the GUI, the existing instance of the IBM Knowledge Catalog PEP SDK connection is deleted. Guardium establishes a new connection to a new IBM Knowledge Catalog PEP SDK instance. However, before the new connection is established, the system cannot get a decision from the IBM Knowledge Catalog server. During that time, you might see errors from IBM Knowledge Catalog.
Query rewrite: Rewritten SQL statement cannot exceed SQL query length limit
When using Guardium query rewrite, the length of the rewritten SQL statement must be shorter than the SQL query length limit for the database type.
Hive-specific prerequisite
To use Hive with the IBM Knowledge Catalog integration, you must disable both Kerberos and TLS-encryption (SSL). For authentication, use a username and password validation method, such as LDAP.
Upgrading IBM Knowledge Catalog assets from IBM Cloud Pak for Data 4.0.x or earlier
If your IBM Knowledge Catalog contains assets that were cataloged earlier than Cloud Pak for Data 4.5.0 and are protected by Guardium through IBM Knowledge Catalog data protection rules, then you need to run the Cloud Pak for Data resource_key job script to regenerate the keys for the assets.
Information about installing and running the resource_key job is available from the Cloud Pak for Data 4.5.1 release as a IBM Knowledge Catalog hotfix. For more information, see the Cloud Pak for Data 4.5.1 release notes.
Specifying a column alias
You can specify a column alias with the following caveats:
  • Use shorten transformed function signature is not available. This feature will be available in a future release.
  • You must configure the column alias using the Guardium UI, rather than using the store wkc_configuration CLI.
For more information, see Column alias parameter.
Maximum of 2 conditions for row-level filtering
For row-level filtering, you can include up to two conditions in your IBM Knowledge Catalog query. For more information, see Setting up a transformation integration.