Preparing to install IBM Knowledge Catalog

Plan and prepare to install IBM Knowledge Catalog.

Before you begin

Complete these tasks before you install one of the IBM Knowledge Catalog editions.

Determining the optional features to enable

For each of the IBM Knowledge Catalog editions, you can enable several optional features during installation, upgrade, or at any time later.

The default installation settings are as follows:
IBM Knowledge Catalog
enableDataQuality: false 
enableKnowledgeGraph: false
useFDB: false
IBM Knowledge Catalog Premium
enableDataQuality: false
enableKnowledgeGraph: false
enableSemanticAutomation: false
enableSemanticEmbedding: false
enableSemanticEnrichment: true
enableAISearch: false
enableTextToSql: false
enableModelsOn: cpu
useFDB: false
IBM Knowledge Catalog Standard
enableKnowledgeGraph: false
enableSemanticAutomation: false
enableSemanticEnrichment: true
enableAISearch: false
enableModelsOn: cpu
useFDB: false
Use the following table to see which optional features you can enable and find out which optional component you need to enable that feature.
Table 1. Optional features and the component needed to enable
Feature Component needed Entry in the custom resource
Lineage
Availability:
  • IBM Knowledge Catalog
  • IBM Knowledge Catalog Premium
  • IBM Knowledge Catalog Standard
You can install one of these lineage services:
  • IBM® Manta Data Lineage, which is the default lineage service and uses the default Neo4j graph database.
  • MANTA Automated Data Lineage, which requires the FoundationDB graph database. So you must also set useFDB: true.
Knowledge graph
enableKnowledgeGraph: true
Relationship explorer
Availability:
  • IBM Knowledge Catalog
  • IBM Knowledge Catalog Premium
  • IBM Knowledge Catalog Standard
Knowledge graph
enableKnowledgeGraph: true
Set FoundationDB as the database to use to store the data generated by knowledge graph.

By default, the Neo4j graph database is installed during installation or upgrade. This graph database is required if you want to use IBM Manta Data Lineage as your lineage service.

useFDB: true is required only for new installations with MANTA Automated Data Lineage.

Availability:
  • IBM Knowledge Catalog
  • IBM Knowledge Catalog Premium
  • IBM Knowledge Catalog Standard
FoundationDB graph database
useFDB: true
Business-term relationship search
Availability:
  • IBM Knowledge Catalog
  • IBM Knowledge Catalog Premium
  • IBM Knowledge Catalog Standard

Business-term relationship search is available with both types of graph database.

Knowledge graph
enableKnowledgeGraph: true
Data quality features
Availability:
  • IBM Knowledge Catalog
  • IBM Knowledge Catalog Premium
Data quality
enableDataQuality: true
Gen AI based capabilities in metadata enrichment

You must set both parameters.

Availability:
  • IBM Knowledge Catalog Premium
  • IBM Knowledge Catalog Standard

If you enable this feature, you can work with the default enableModelsOn setting or change the setting. With enableModelsOn: gpu or enableModelsOn: remote, you can also work with a custom model instead of the default model:

customModelSemanticEnrichment
Specify a custom model to use with the gen AI based enrichment features. That model must be available in watsonx_ai_ifm.

If you use a custom model for gen AI based enrichment, make sure that you have the number of GPUs that is required for the selected model.

Important: For custom models, the accuracy of results might vary.

See Deployment mode.

Gen AI based enrichment
enableSemanticAutomation: true
enableSemanticEnrichment: true

Deployment mode for the models that are used for the gen AI capabilities

Availability:
  • IBM Knowledge Catalog Premium
  • IBM Knowledge Catalog Standard

This option determines where the foundation models run that are used with the gen AI capabilities in metadata enrichment.

cpu
This is the default setting.

A Granite model (granite3-moe:3b-instruct-q4_K_M) is started on the internal containers with CPUs. This model is only used for expanding metadata and term assignment in metadata enrichment.

Enrichment with the model on CPU can be slower depending on the size of the data asset list.

To improve the performance of models on CPU, a cluster administrator can assign a dedicated cluster node (worker) to the pods on which models with runtime are deployed. For more information, see Setting up a dedicated node for models running on CPU.

Important: If you are upgrading and want to continue to run the models on GPU, you must set enableModelsOn: 'gpu'. Otherwise, manually remove the previously used ibm/granite-8b-code-instruct foundation model and the ibm/granite-3-8b-instruct model that is deployed with the upgrade as described in Removing foundation models from IBM watsonx.ai™.

If you initially were working with models on GPU and switch to models on CPU later, you might also need to remove the ibm/granite-3-8b-instruct model. However, check that the model isn't used by any other service before you do so.

gpu
LLMs run in a local inference foundation models component (watsonx_ai_ifm). All generative AI capabilities are available in this mode.
Important: With this mode, the inference foundation models component (watsonx_ai_ifm) is automatically installed. This option requires at least one GPU to support LLM-based enrichment and LLM-based glossary generation. The default model is granite 3.1-8b-instruct.

For information about supported GPUs, see the Hardware requirements.

remote

LLMs run in a watsonx.ai instance on a another on-premises instance of IBM Software Hub or on IBM watsonx™ as a Service. A connection to the remote watsonx.ai instance must be configured. See Connecting to a remote watsonx.ai instance.

Important: The required models must be available and running in the remote watsonx.ai instance. Setting up a connection to the remote instance does not start the models. If a required model is not available in the remote watsonx.ai instance, no connection can be established and any tasks that rely on those models will fail.

You must also set enableSemanticAutomation: true and enableSemanticEnrichment: true.

Model deployment mode
enableModelsOn: 'cpu'|'gpu'|'remote'
Generate embeddings
Availability:
  • IBM Knowledge Catalog Premium
  • IBM Knowledge Catalog Standard

Enable computation of vectors for the metadata and the data during metadata enrichment. These vectors will be used for generating SQL queries from natural-language text (Text to SQL feature).

This settings is also required if you want to enable the Text to SQL feature.

If you enable this feature, you can work with the default enableModelsOn setting or change the setting.

You must also set enableSemanticAutomation: true.

Embeddings generation
enableSemanticAutomation: true
enableSemanticEmbedding: true
Text-To-SQL capabilities
Availability:
  • IBM Knowledge Catalog Premium
  • IBM Knowledge Catalog Standard

Enable Text-to-SQL capabilities for generating SQL queries from natural language input. These capabilities can be used for creating query-based data assets, for example, for data products or in searches.

If you enable this feature, you must also set enableSemanticEmbeddings: true, and enableModelsOn: gpu or enableModelsOn: remote.

With enableModelsOn: gpu, the text2sql model is started in watsonx_ai_ifm. The default model is granite-3-8b-instruct.

For more accurate results when plain text queries are converted into SQL queries (Text-to-SQL), set the llama-3-3-70b-instruct model as a custom model:
  • With enableModelsOn: gpu, set the custom model for the Text-To-SQL feature with the customModelTextToSql parameter.
  • With enableModelsOn: remote, set the custom model for the Text-To-SQL feature in Administration > Configurations and settings > Generative AI setup. For more information, see Connecting to a remote watsonx.ai instance.
The accuracy of the results also improves if the assets that serve as context are enriched by using LLMs. For this purpose, you can enable the gen AI based metadata enrichment features by setting enableSemanticEnrichment: true.
customModelTextToSql
Specify the ID of a custom model to use with the TextToSQL capabilities. That model must be available in watsonx_ai_ifm.

If you work with a custom model for TextToSQL, make sure that you have the number of GPUs that is required for the selected model.

Important: For custom models other than the llama-3-3-70b-instruct model, the accuracy of results might vary.
Text-to-SQL capabilities
enableTextToSql: true
AI search

Enable LLM-based search for assets and artifacts across all workspaces.

Availability:
  • IBM Knowledge Catalog Premium
  • IBM Knowledge Catalog Standard
AI search
enableAISearch: true
Decide when you want to install these features and follow the applicable instructions: