Table of contents

Known issues (Watson Knowledge Catalog)

These known issues apply to Watson Knowledge Catalog.

Also see:

General issues

You might encounter these known issues and restrictions when you work with the Watson Knowledge Catalog service.

Categories might not show the latest updates

Categories and their contents can be processed in different areas of Watson Knowledge Catalog. This is why the contents of categories that are currently being viewed might not show the latest updates.

To ensure that you are viewing the latest updates it is recommended that you manually refresh the view.

Snowflake connection problems

When using the connection type Snowflake you might encounter the following problems:

  • Snowflake connections are available in projects, but not in global connections. You have to upload the Snowflake JDBC driver to create connections in the global connections area.
  • Snowflake connections might fail for some tables when discovering assets.
  • The synchronization process between information assets and default catalog assets does not support Snowflake connections for data preview and profiling. For details refer to the Information assets view.

Watson Knowledge Catalog options or buttons are still available in the navigation menu or panels after uninstall

You have uninstalled Watson Knowledge Catalog from Cloud Pak for Data, but the navigation menu or panels still contain Watson Knowledge Catalog options or buttons. If you select these options or buttons, error message 502 or 404 is returned. For more information see Uninstalling Watson Knowledge Catalog.

Catalog issues

You might encounter these known issues and restrictions when you use catalogs.

Missing previews

You might not see previews of assets in these circumstances:

  • In a catalog or project, you might not see previews or profiles of connected data assets that are associated with connections that require personal credentials. You are prompted to enter your personal credentials to start the preview or profiling of the connection asset.
  • In a catalog, you might not see previews of JSON, text, or image files that were published from a project.
  • In a catalog, the previews of JSON and text files that are accessed through a connection might not be formatted correctly.
  • In a project, you cannot view the preview of image files that are accessed through a connection.

Add collaborators with lowercase email addresses

When you add collaborators to the catalog, enter email addresses with all lowercase letters. Mixed-case email addresses are not supported.

Multiple concurrent connection operations might fail

An error might be encountered when multiple users are running connection operations concurrently. The error message can vary.

After upgrade you can’t add or test connections during metadata import or discovery

You have just upgraded to Watson Knowledge Catalog. When you try to add new or test existing connections during metadata import or discovery, the connection might end up waiting for connection.

Workaround: Restart the agent in the is-en-conductor-0 pod. If the agent is still not rendering any requests, then delete the conductor pod. This way you create a new instance of the pod and you can add or test connections again.

Can’t enable enforcing data protection rules after catalog creation

You cannot enable the enforcement of data protection rules after you create a catalog. To apply data protection rules to the assets in a catalog, you must enable enforcement during catalog creation.

The first redacted column becomes the last column

When you view a data asset in a catalog, the first redacted column in the original data asset is shifted to be the last column in the transformed data asset, regardless of its previous position.

Delay for preview with multiple masked columns

When you view a data asset that has a masked column, the preview of the data takes a few moments to appear because of the process of masking. If a data asset has multiple masked columns, the delay is corresponding longer.

Assets are blocked if evaluation fails

The following restrictions apply to data assets in a catalog with policies enforced: File-based data assets that have a header can’t have duplicate column names, a period (.), or single quotation mark (‘) in a column name.

If evaluation fails, the asset is blocked to all users except the asset owner. All other users see an error message that the data asset cannot be viewed because evaluation failed and the asset is blocked.

Asset owner might be displayed as “Unavailable”

When data assets are created in advanced data curation tools, the owner name for those assets in the default catalog is displayed as “Unavailable”.

InfoSphere Information Governance Catalog home page displayed

Sometimes, instead of a feature page, the InfoSphere Information Governance Catalog home page is displayed. For example, it might happen when you want to open a data lineage viewer for an information asset. Reload the browser to get to the feature page.

Synchronizing information assets

The following types of information assets are currently synchronized:

  • Tables and their associated columns
  • Files and their associated columns
  • Connections

Teradata connections and connected data assets are synchronized from the Information assets view to the default catalog.

You can’t edit information assets in the default catalog. To edit information assets, choose Organize > Information assets from the navigation menu.

SSL connection is not automatically set for synchronized assets

You create an SSL connection in the global connection area, for example, run automated discovery, and publish assets. If you then go to the default catalog and open the connection property of the asset, the port is not configured to accept SSL connections.

Workaround: Open the Edit connection page for the synced asset in the default catalog and select the check box to indicate that the port is configured to accept SSL connections.

Don’t delete the entire database from an information asset

The default catalog is not updated if the entire database is deleted from the information asset.

Workaround: To synchronize the database deletion in the information asset with the default catalog, do not delete the entire database, but manually delete each table contained in the database that you want to delete.

Note: If you try to delete the database before you apply the workaround, the workaround can’t be used anymore.

You can’t remove information assets from the default catalog or a project

You can’t remove information assets from data quality projects or the default catalog. These assets are still available in the default catalog.

Workaround: To remove an information asset from the default catalog or data quality projects, you first have to remove it from Information assets view. The synchronization process propagates the delete from Information assets view into the default catalog. However, you can remove assets from the default catalog or projects if they are not synchronized.

Log-in prompt is displayed in Organize section

When you’re working in the Organize section, a log-in prompt might be displayed, even though you’re active.

Workaround: Provide the same credentials that you used to log in to Cloud Pak for Data.

Missing default catalog and predefined data classes

The automatic creation of the default catalog after installation of the Watson Knowledge Catalog service can fail. If it does, the predefined data classes are not automatically loaded and published as governance artifacts.

Workaround: Ask someone with the Administrator role to follow the instructions for creating the default catalog manually.

Term assignment is not synced to the information asset

When you add an asset from a connection to the default catalog and assign a business term to this asset in the process, this term is not synced to the information assets view.

Workaround: Add the asset without term assignment and assign the term after the asset is initially synced.

Quick scan jobs remain in status QUEUED for a long time after restart of the quick scan pod

When running large automated discovery jobs or if large jobs have run recently, and you then restart the quick scan pod, quick scan jobs may remain in status QUEUED for a long time until they are processed due to the number of messages that need to be skipped during pod startup.

To reduce the amount of time until quick scan jobs are started, run the following steps:

  1. Pause the quick scan job(s) in status QUEUED.
  2. Edit the deployment of the quick scan pod:
    oc edit deploy odf-fast-analyzer
    
  3. Locate the lines that contain:
    name: ODF_PROPERTIES
    value: -Dcom.ibm.iis.odf.kafka.skipmessages.older.than.secs=43200
    
  4. Replace 43200 with a smaller value such as 3600 to limit the number of messages that need to be skipped.
  5. Storing the update triggers pod recreation. Wait until the quick scan pod is in status RUNNING.
  6. Resume the quick scan job(s) paused in step 1. Their status will change to RUNNING within a short period of time.

An error occurs when importing metadata

When importing metadata you might encounter an error message. Wait a moment, then re-try to import metadata again.

Make sure that you have the required permissions, see Metadata import.

Governance artifacts

You might encounter these known issues and restrictions when you use governance artifacts.

Business term relationships to reference data values are not shown

You can add a related business term to a value in a reference data artifact. However, the related content for a business term does not show the related reference data value.

Limitations to automatic processing in data protection rules

Data protection rules do not automatically process artifacts that are closely related to artifacts that are explicitly specified in these cases:

  • Synonyms to business terms in conditions are not automatically processed. Only terms that are explicitly specified in conditions are considered.
  • Dependent data classes of data classes in conditions or actions are not automatically considered. For example, if you specify the data class “Drivers License” in a condition, the dependent data classes, such as, New York State Driver’s Licence, are not processed by the rule.

Can’t reactivate inactive artifacts

When an effective end date for a governance artifact passes, the artifact becomes inactive. You can’t reset an effective end date that’s in the past. You can’t reactivate an inactive artifact. Instead, you must re-create the artifact.

Can’t export top-level categories

You can’t export top-level categories. You can export only lower-level categories.

Data classes have multiple primary categories displayed

When you assign a subcategory to a data class to be the primary category, all higher-level categories of the selected subcategory are also displayed in the details of this data class as primary categories. However, they are not assigned.

Can’t delete categories

You can’t delete a category unless it never contained governance artifacts.

Workaround: Rename the category and then delete it.

Re-import might fail to publish when previous import request has not yet finished

Re-import might fail to publish governance artifacts such as business terms when called immediately after a previous import request.

Importing and publishing a large number of governance artifacts is done in the background and might take some time. If you re-publish artifacts when the initial publishing process of the same artifacts hasn’t finished yet, the second publishing request fails and the status of the governance artifact drafts shows Publish failed.

Make sure that the publishing process is finished before you try to import and publish the same artifacts again.

After importing and publishing a large number of data classes, the Profile page is not updated and refreshed

If you create and publish a data class, the Profile page of an asset is updated and refreshed. However, if you import and publish a large number of data classes (for example, more than 50 data classes) using a CSV file, the Profile page is not updated and refreshed for these imported data classes.

Workaround: If you have to import and publish a large number of data classes and you notice that the Profile page is not updated and refreshed, wait a moment, then edit just one data class, such as by adding a blank to this data class, and publish it. As a result the Profile page is updated and refreshed to show all data classes that you have published including the imported large number of data classes.

Extracting terms and rules does not work in managed OpenShift environments

On a managed OpenShift environment on IBM® Cloud, you cannot extract business terms and governance rules from PDF files because you cannot upload the files to extract from.

You can’t publish data classes with the same name in different categories

The names of data classes should be unique. Don’t create data classes with the same name in different categories.

Note: Use globally unique names for data classes if you want to process data quality or data discovery assets.

Analytics projects

You might encounter these known issues and restrictions when you use analytics projects.

Data is not masked in some analytics project tools

When you add a connected data asset that has masked columns from a catalog to a project, the columns remain masked when you view the data and when you refine the data in the Data Refinery tool. However, other tools in projects do not preserve masking when accessing data through a connection. For example, when you load connected data in a notebook, you access the data through a direct connection and bypass masking.

Workaround To retain masking of connected data, create a new asset with Data Refinery:

  1. Open the connected data asset and click Refine. Data Refinery automatically includes the data masking steps in the Data Refinery flow that transforms the full data set into the new target asset.
  2. If necessary, adjust the target name and location.
  3. Click the Run button, and then click Save and Run. The new connected data asset is ready to use.
  4. Remove the original connected data asset from your project.

Discovery fails for connections with personal credentials

You can’t discover data assets from a connection in an analytics project or in a catalog if that connection requires personal credentials.

Data in the relationships analysis table is not filtered as expected

When you select a link in the relationships analysis chart, the data in the relationship analysis table is not filtered based on this selection.

Analysis on an asset added from catalog can fail if the asset wasn’t properly synced

Column analysis or data quality analysis on a database table that you added to the project manually can fail if the data asset wasn’t properly synced from the default catalog to the Information assets view. The data quality score is 0%, the analysis status is set to invalid, and the database column type is not properly set.

Workaround: Run automated discovery with the appropriate analysis configuration on the respective database table:

  1. Go to Organize > Curation > Data discovery > New discovery job > Automated discovery.
  2. Select the synced connection.
  3. In the Discovery root field, select the database table asset for which analysis failed.
  4. Select to run column analysis and data quality analysis on the asset.
  5. Select to publish the results to the catalog.
  6. Select the project to which you want to add the data.
  7. Start discovery.

Governance workflows

You might encounter these known issues and restrictions when you use governance workflows.

Adding artifact types to an inactive default workflow

You can’t move artifacts types to the default workflow if it is inactive by deleting another workflow. You must deactivate another workflow and then manually activate the default workflow.

To move artifact types to the default workflow:

  1. Click Organize > Data and AI governance > Management > Governance workflows.
  2. Open an active workflow by clicking its name.
  3. Click Deactivate and then confirm by clicking Deactive. The artifact types for the workflow are moved to the default workflow automatically.
  4. Open the “default workflow configuration” workflow.
  5. Click Activate.

Limitation for draft workflows on Firefox

You can’t select any artifact types when you view workflow drafts in the Firefox web browser version 60. Use a different browser.

Incorrect notifications for completed workflow tasks

If a workflow task has multiple assignees, and one person completes the task, then the other assignees see an incorrect notification. The notification states that the task couldn’t be loaded, instead of that the task is already completed by another assignee.

Task details are displayed even after the task is completed

When you complete a workflow task, its details are still displayed. The issue occurs when there are fewer than 10 tasks in the list.

Workaround: Select the task from the list to refresh the details.

Workflow details unavailable after upgrade

After an upgrade from Cloud Pak for Data 2.5 to 3.0, the details of workflow configuration show “Unavailable” for the user who created or modified the workflow.

If you enable notifications in your workflow configurations, you must also add at least one collaborator

When configuring a workflow, you can select tasks and enable notifications. If you enable notifications for a task, you also have to add at least one collaborator, who can be the same as one of the assignees. Otherwise, the check box of the task you selected is cleared with the next refresh.

To enable notifications:

  1. Add the assignees in the Details section.
  2. Scroll down to the Notifications section. Then select the required action and add the collaborators to be notified or at least one collaborator or assignee.

Data curation

You might encounter these known issues and restrictions when you use advanced data curation tools.

Incorrect connections associated with connected data assets after automated discovery

When you add connected data assets through automated discovery, the associated connection assets might be incorrect. Connections that have the same database and host names are indistinguishable to automated discovery, despite different credentials and table names. For example, many Db2 databases on IBM Cloud have the same database and host names. An incorrect connection with different credentials might be assigned and then the data asset can’t be previewed or accessed.

Data discovery fails when started by a Data Steward

Users with the Data Steward role can start a data discovery, even though they don’t have sufficient permissions to run the discovery. As a result, the discovery fails. You must have the Data Quality Analyst role to run discovery.

Data Stewards can’t create automation rules

Users with the Data Steward role can start creating an automation rule, even though they don’t have sufficient permissions to manage automation rules. As a result, the automation rule is not saved and an error is displayed. You must have the Data Quality Analyst role to create automation rules.

Discovery on a Teradata database fails

When you run a data discovery on a Teradata database by using JDBC connector, and the CHARSET is set to UTF8, the analysis fails with an error.

Example error content: The connector detected character data truncation for the link column C3. The length of the value is 12 and the length of the column is 6.

Workaround: When a database has Unicode characters in the schemas or tables, set the CHARSET attribute to UTF16 when you create a data connection.

Changes to global connections aren’t propagated for discovery

After you add a global connection to the data discovery area, any subsequent edit to or deletion of the global connection is not propagated to the connection information in the data discovery area and is not effective.

Workaround: Delete the discovery connection manually. You must have the Access advanced governance capabilities permission to be able to complete the required steps:

  1. Go to Organize > Curation > Metadata import
  2. Go to the Repository Management tab.
  3. In the Navigation pane, select Browse assets > Data connections.
  4. Select the connection that you want to remove and click Delete.

Re-add updated global connections to the data discovery area as appropriate.

Approving tables in quick scan results fails

When a table name contains a special character, its results cannot be loaded to a project. When you click Approve assets, an error occurs.

Also, when you select more than one table to approve, and one of them fails to be loaded, the rest of the tables fail as well. The only way to approve the assets is to rediscover the quick scan job.

Virtual tables are not supported for BigQuery connections

You cannot create SQL virtual tables for data assets added from Google BigQuery connections.

Column analysis fails if system resources or the Java heap size are not sufficient

Column analysis might fail due to insufficient system resources or insufficient Java heap size. In this case, modify your workload management system policies as follows:

  1. Open the Information Server operations console by entering its URL in your browser: https://<server>/ibm/iis/ds/console/

  2. Go to Workload Management > System Policies. Check the following settings and adjust them if necessary:

    Job Count setting: If the Java Heap size is not sufficient, reduce the number to 5. The default setting is 20.

    Job Start setting: Reduce the maximum number of jobs that can start within the specified timeframe from 100 in 10 seconds (which is the default) to 1 in 5 seconds.

Quick scan reports successful completion but returns no results for some SSL connections

When you add a global SSL connection as a discovery connection, the path to the trust store is internally changed so that the certificates cannot be found. As a result, quick scan cannot access the certificates and fails. However, successful completion is reported, but no results are returned.

Workaround: A user with the appropriate permissions must add the Engine volume mount to the odf-fast-analyzer pod by running the following command:

oc patch deployment odf-fast-analyzer --patch '{"spec": {"template": {"spec": {"volumes": [{"name": "engine-dedicated-volume", "persistentVolumeClaim": {"claimName": "0072-iis-en-dedicated-pvc"}}], "containers": [{"name": "odf-fast-analyzer","volumeMounts": [{"name": "engine-dedicated-volume","mountPath": "/mnt/dedicated_vol/Engine"}]}]}}}}'

In addition, make sure any custom trust store that you want to use is located in the /user-home/_global_/security/customer-truststores folder so that quick scan can access the certificates. Otherwise, the same error will occur even after applying the workaround.

Quick scan hangs when analyzing a Hive table that has been defined incorrectly

When analyzing a schema that contains an incorrectly defined Hive table, quick scan starts looping when trying to access the table. Make sure that the table definition for all Hive tables is correct.

Not all tables approved in quick scan results are published

When you approve more than 2,000 tables, not all of them are published to the catalog.

Workaround: Check how many tables weren’t published to the catalog. Depending on the number, complete one of these steps:

  • For a single table, look for the table the information assets view and edit its short or long description. When you save our changes, the asset is synced to the catalog.
  • For more than one table, the data must be manually synchronized. An administrator of the project (namespace) where the Watson Knowledge Catalog service is installed must run the istool graph batchload command with the -g db option or the -t DataCollection option (preferred). The command must be run on the conductor pod. For the exact command syntax, see Synchronizing assets manually in the IBM InfoSphere Information Server documentation.

Automated discovery might fail when the data source contains a large amount of data

When the data source contains a very large amount of data, automated discovery can fail. The error message indicates that the buffer file systems ran out of file space.

Workaround: To have the automated discovery complete successfully, use one of these workarounds:

  • Use data sampling to reduce the number of records being analyzed. For example, set the sample size to 10% of the total number of records.
  • Have an administrator increase the amount of scratch space for the engine executing the analysis process. The administrator will need to use the Red Hat OpenShift cluster tools to increase the size of the volume where the scratch space is, typically /mnt/dedicated_vol/Engine in the is-en-conductor pod. Depending on the storage class that is used, the scratch space might be on a different volume.

    The size requirements for scratch space depend on the workload. As a rule of thumb, make sure to have enough scratch space to fit the largest dataset that will be processed. Then, multiply this by the number of similar analyses that you want to run concurrently. For more information about expanding volumes, see the instructions in the OpenShift Container Platform documentation.

Discovery jobs fail due to an issue with connecting to the Kafka service

Automated discovery and quick scan jobs fail if no connection to Kafka service can be established. The iis-services and odf-fast-analyzer deployment logs show error messages similar to these:

org.apache.kafka.common.KafkaException: Failed create new KafkaAdminClient
at org.apache.kafka.clients.admin.KafkaAdminClient.createInternal(KafkaAdminClient.java:338)
at org.apache.kafka.clients.admin.AdminClient.create(AdminClient.java:52)
at com.ibm.iis.odf.core.messaging.kafka.KafkaQueueConsumer.createTopicIfNotExistsNew(KafkaQueueConsumer.java:184)
at com.ibm.iis.odf.core.messaging.kafka.KafkaQueueConsumer.createTopicIfNotExists(KafkaQueueConsumer.java:248)
at com.ibm.iis.odf.core.messaging.kafka.KafkaQueueConsumer.startConsumption(KafkaQueueConsumer.java:327)
at com.ibm.iis.odf.core.messaging.kafka.KafkaQueueConsumer.run(KafkaQueueConsumer.java:260)
at java.lang.Thread.run(Thread.java:811)

To resolve the issue, an administrator should restart Kafka manually by running the following command:

oc delete pod kafka-0

Settings for discovery or analysis might be lost after a pod restart or upgrade

After pod restart or upgrade, settings might be lost or reverted to their default values, such as properties on RHEL level on the pod in the nproc file or MaximumHeapSize in ASBNode/conf/proxy.xml. Refer to Analysis or discovery jobs fail with an out-of-memory error for more information on settings.

Workaround: Check your settings before you start upgrading. Most settings will be retained, but some settings might be reverted to their default settings. Check your /etc/security/limits.conf on every compute node in the cluster and add or edit the required settings as follows:

  • The parameters from is-en-conductor-0 pod:

    /opt/IBM/InformationServer/Server/DSEngine/bin/dsadmin -listenv ANALYZERPROJECT | grep DEFAULT_TRANSPORT_BLOCK
    APT_DEFAULT_TRANSPORT_BLOCK_SIZE=3073896
    
    com.ibm.iis.odf.datastage.max.concurrent.requests=4 contained in odf.properties
    
  • The parameters from iis-services pod:

    /opt/IBM/InformationServer/ASBServer/bin/iisAdmin.sh -d -k com.ibm.iis.ia.max.columns.inDQAOutputTable
    com.ibm.iis.ia.max.columns.inDQAOutputTable=500
    
    /opt/IBM/InformationServer/ASBServer/bin/iisAdmin.sh -d -k com.ibm.iis.ia.server.jobs.postprocessing.timeout
    com.ibm.iis.ia.server.jobs.postprocessing.timeout=84600000
    
    /opt/IBM/InformationServer/ASBServer/bin/iisAdmin.sh -d -k com.ibm.iis.events.kafkaEventConsumer.timeout
    com.ibm.iis.events.kafkaEventConsumer.timeout=10000
    
    /opt/IBM/InformationServer/ASBServer/bin/iisAdmin.sh -d -key com.ibm.iis.ia.jdbc.connector.heapSize
    com.ibm.iis.ia.jdbc.connector.heapSize=2048
    
    /opt/IBM/InformationServer/ASBServer/bin/iisAdmin.sh -d -key com.ibm.iis.ia.engine.javaStage.heapSize
    com.ibm.iis.ia.engine.javaStage.heapSize=1024
    
    /opt/IBM/InformationServer/ASBServer/bin/iisAdmin.sh -d -k com.ibm.iis.ia.server.useSingleFDTable
    com.ibm.iis.ia.server.useSingleFDTable=true
    
  • The limits (in limits.conf) are defined here:

    root soft nofile 65000
    root hard nofile 500000
    * soft nproc 65000
    * soft nofile 65000
    * hard nofile 500000
    dsadm soft nproc 65000
    dsadm soft nofile 65000
    hdfs soft nproc 65000
    root soft nproc 65000
    

Discovery jobs are not resumed on restore

When you restore the system from a snapshot, discovery jobs that were in progress at the time the snapshot was taken are not automatically resumed. You must explicitly stop and restart them.

You might encounter these known issues and restrictions when you use global search.

Governance artifacts and information assets don’t have path details displayed in global search results.