Preparing to install IBM Knowledge Catalog

Plan and prepare to install IBM Knowledge Catalog.

Before you begin

Complete these tasks before you install one of the IBM Knowledge Catalog editions.

Determining the optional features to enable

For each of the IBM Knowledge Catalog editions, you can enable several optional features during installation, upgrade, or at any time later.

The default installation settings are as follows:

IBM Knowledge Catalog

enableDataQuality: false 
enableKnowledgeGraph: false
useFDB: false

IBM Knowledge Catalog Premium

enableDataQuality: false
enableKnowledgeGraph: false
enableSemanticAutomation: false
enableSemanticEmbedding: false
enableSemanticEnrichment: true
enableAISearch: false
enableTextToSql: false
enableModelsOn: cpu
useFDB: false

IBM Knowledge Catalog Standard

enableKnowledgeGraph: false
enableSemanticAutomation: false
enableSemanticEnrichment: true
enableAISearch: false
enableModelsOn: cpu
useFDB: false

Use the following table to see which optional features you can enable and find out which optional component you need to enable that feature.

Table 1. Optional features and the component needed to enable
Feature	Component needed	Entry in the custom resource
Lineage Availability: IBM Knowledge Catalog IBM Knowledge Catalog Premium IBM Knowledge Catalog Standard You can install one of these lineage services: IBM® Manta Data Lineage, which is the default lineage service and uses the default Neo4j graph database. MANTA Automated Data Lineage, which requires the FoundationDB graph database. So you must also set `useFDB: true`.	Knowledge graph	`enableKnowledgeGraph: true`
Relationship explorer Availability: IBM Knowledge Catalog IBM Knowledge Catalog Premium IBM Knowledge Catalog Standard	Knowledge graph	`enableKnowledgeGraph: true`
Set FoundationDB as the database to use to store the data generated by knowledge graph. By default, the Neo4j graph database is installed during installation or upgrade. This graph database is required if you want to use IBM Manta Data Lineage as your lineage service. `useFDB: true` is required only for new installations with MANTA Automated Data Lineage. Availability: IBM Knowledge Catalog IBM Knowledge Catalog Premium IBM Knowledge Catalog Standard	FoundationDB graph database	`useFDB: true`
Business-term relationship search Availability: IBM Knowledge Catalog IBM Knowledge Catalog Premium IBM Knowledge Catalog Standard Business-term relationship search is available with both types of graph database.	Knowledge graph	`enableKnowledgeGraph: true`
Data quality features Availability: IBM Knowledge Catalog IBM Knowledge Catalog Premium	Data quality	`enableDataQuality: true`
Gen AI based capabilities in metadata enrichment You must set both parameters. Availability: IBM Knowledge Catalog Premium IBM Knowledge Catalog Standard If you enable this feature, you can work with the default `enableModelsOn` setting or change the setting. With `enableModelsOn: gpu` or `enableModelsOn: remote`, you can also work with a custom model instead of the default model: customModelSemanticEnrichment Specify a custom model to use with the gen AI based enrichment features. That model must be available in `watsonx_ai_ifm`. If you use a custom model for gen AI based enrichment, make sure that you have the number of GPUs that is required for the selected model. Important: For custom models, the accuracy of results might vary. See Deployment mode.	Gen AI based enrichment	`enableSemanticAutomation: true enableSemanticEnrichment: true`
Deployment mode for the models that are used for the gen AI capabilities Availability: IBM Knowledge Catalog Premium IBM Knowledge Catalog Standard This option determines where the foundation models run that are used with the gen AI capabilities in metadata enrichment. `cpu` This is the default setting. A Granite model (`granite3-moe:3b-instruct-q4_K_M`) is started on the internal containers with CPUs. This model is only used for expanding metadata, and term generation and assignment in metadata enrichment. Running models on CPU can result in significantly lower performance and reduced output quality, particularly for tasks such as name generation. CPU-based inference should be used with caution and is generally better suited for development, testing, or cost-constrained scenarios. For production use, GPU-based inference is strongly recommended. If you use CPU models, results can be improved by making use of glossary and abbreviation support, and by combining LLM-based generation with approaches such as fuzzy matching. An instance administrator can upload abbreviation files to the semantic automation pods for global use in all projects. See Uploading custom abbreviation files for use in metadata enrichment. Project administrators can upload additional abbreviation files to projects and configure the metadata enrichment settings for metadata expansion. See Custom abbreviation files for name generation and Default enrichment settings. To improve the performance of models on CPU, a cluster administrator has these options: Increase the resources by providing more CPUs and memory to the `semantic-text-generation` pods, for example: `oc set resources deployment semantic-text-generation -n ${PROJECT_CPD_INST_OPERANDS} --limits=cpu=10 oc set resources deployment semantic-text-generation -n ${PROJECT_CPD_INST_OPERANDS} --limits=memory=4Gi` Assign a dedicated cluster node (worker) to the pods on which models with runtime are deployed. For more information, see Setting up a dedicated node for models running on CPU. Important: If you are upgrading and want to continue to run the models on GPU, you must set `enableModelsOn: 'gpu'`. Otherwise, manually remove the previously used `ibm/granite-8b-code-instruct` foundation model and the `ibm/granite-3-8b-instruct` model that is deployed with the upgrade as described in Removing foundation models from IBM watsonx.ai™. If you initially were working with models on GPU and switch to models on CPU later, you might also need to remove the `ibm/granite-3-8b-instruct` model. However, check that the model isn't used by any other service before you do so. `gpu` LLMs run in a local inference foundation models component (`watsonx_ai_ifm`). All generative AI capabilities are available in this mode. Important: With this mode, the inference foundation models component (`watsonx_ai_ifm`) is automatically installed. This option requires at least one GPU to support LLM-based enrichment and LLM-based glossary generation. The default model is `granite 3.1-8b-instruct`. For information about supported GPUs, see the Hardware requirements. `remote` LLMs run in a watsonx.ai instance on another on-premises instance of IBM Software Hub or on IBM watsonx™ as a Service. A connection to the remote watsonx.ai instance must be configured. See Connecting to a remote watsonx.ai instance. Important: The required models must be available and running in the remote watsonx.ai instance. Setting up a connection to the remote instance does not start the models. If a required model is not available in the remote watsonx.ai instance, no connection can be established and any tasks that rely on those models will fail. You must also set `enableSemanticAutomation: true` and `enableSemanticEnrichment: true`. If you switch from using models on CPU to a different deployment mode later, make sure to stop the `semantic-text-generation` pod where the CPU models run to avoid conflicts. `oc scale deployment semantic-text-generation --replicas=0`	Model deployment mode	`enableModelsOn: 'cpu'\|'gpu'\|'remote'`
Generate embeddings Availability: IBM Knowledge Catalog Premium IBM Knowledge Catalog Standard Enable computation of vectors for the metadata and the data during metadata enrichment. These vectors are used, for example, for generating SQL queries from natural-language text (Text to SQL feature). This settings is also required if you want to enable the Text to SQL feature. You must also set `enableSemanticAutomation: true`.	Embeddings generation	`enableSemanticAutomation: true enableSemanticEmbedding: true`
Text-To-SQL capabilities Availability: IBM Knowledge Catalog Premium IBM Knowledge Catalog Standard Enable Text-to-SQL capabilities for generating SQL queries from natural language input. These capabilities can be used for creating query-based data assets, for example, for data products, SQL-based data quality rules, or in searches. If you enable this feature, you must also set `enableSemanticEmbeddings: true`, and `enableModelsOn: gpu` or `enableModelsOn: remote`. With `enableModelsOn: gpu`, the Text-to-SQL model is started in `watsonx_ai_ifm`. The default model is `granite-3-8b-instruct`. You can set a custom model instead: With `enableModelsOn: gpu`, set the custom model for the Text-To-SQL feature with the `customModelTextToSql` parameter. With `enableModelsOn: remote`, set the custom model for the Text-To-SQL feature in Administration > Configurations and settings > Generative AI setup. For more information, see Connecting to a remote watsonx.ai instance. The accuracy of the results also improves if the assets that serve as context are enriched by using LLMs. For this purpose, you can enable the gen AI based metadata enrichment features by setting `enableSemanticEnrichment: true`. customModelTextToSql Specify the ID of a custom model to use with the Text-to-SQL capabilities. That model must be available in `watsonx_ai_ifm`. If you work with a custom model for Text-to-SQL, make sure that you have the number of GPUs that is required for the selected model.	Text-to-SQL capabilities	`enableTextToSql: true`
AI search Enable LLM-based search for assets and artifacts across all workspaces. Availability: IBM Knowledge Catalog Premium IBM Knowledge Catalog Standard	AI search	`enableAISearch: true`

Decide when you want to install these features and follow the applicable instructions:

For more information about enabling these features during installation, see Installing IBM Knowledge Catalog .
For more information about enabling these features during the upgrade, see the topic that is specific to your product version in the Upgrading IBM Knowledge Catalog section.
For more information about enabling these features after the initial install or upgrade, see Enabling optional features after installation or upgrade for IBM Knowledge Catalog.

Generative AI capabilities per model deployment mode

Depending on your requirements, you must specify different combinations of installation parameter for your generative AI setup in addition to enableSemanticAutomation: true. Some further service-specific configuration might be required after the installation is complete. Check the following table for possible configuration options:

Table 2. Configuration options for gen AI capabilities (draft)
Model deployment mode	Capabilities	Additional settings in IBM Knowledge Catalog
`enableModelsOn: cpu` For details, see `enableModelsOn: cpu`.	Generation of names and descriptions for tables and columns, and generation and assignment of business terms in metadata enrichment (Designing metadata enrichment)	Enable gen AI based capabilities in the metadata enrichment project settings.
`enableModelsOn: gpu` For details, see `enableModelsOn: gpu`.	Generation of names and descriptions for tables and columns, and generation and assignment of business terms in metadata enrichment (Designing metadata enrichment) `enableSemanticEnrichment: true` For use of a custom model, add: `customModelSemanticEnrichment: <model_id>` Text to SQL: For data products (Creating a data product from SQL) if IBM Data Product Hub is also installed in your deployment For query-based data assets (Adding a query-based asset) For data quality rules (Managing data quality rules) `enableTextToSql: true enableSemanticEmbeddings: true` For use of a custom model, add: `customModelTextToSql: <model_id>` Generation of data quality rule descriptions (Managing data quality rules)	Enable gen AI based capabilities in the metadata enrichment project settings. Enable natural language queries data intelligence project settings. Enable text to SQL for data products: go to Data products > Configurations and settings > Generative AI. Enable Explain data quality rules with AI in the data quality project settings.
`enableModelsOn: remote` For details, see `enableModelsOn:remote`. A connection to the remote watsonx.ai instance must be configured in Administration > Configurations and settings > Generative AI setup. See Connecting to a remote watsonx.ai instance.	Generation of names and descriptions for tables and columns, and generation and assignment of business terms in metadata enrichment (Designing metadata enrichment) `enableSemanticEnrichment: true` Text to SQL: For data products (Creating a data product from SQL) if IBM Data Product Hub is also installed in your deployment For query-based data assets (Adding a query-based asset) For data quality rules (Managing data quality rules) `enableTextToSql: true enableSemanticEmbeddings: true` Generation of data quality rule descriptions (Managing data quality rules) Custom models must be set in Administration > Configurations and settings > Generative AI setup.	Enable gen AI based capabilities in the metadata enrichment project settings. Enable natural language queries data intelligence project settings. Enable text to SQL for data products: go to Data products > Configurations and settings > Generative AI. Enable Explain data quality rules with AI in the data quality project settings.

Certified foundation models for metadata enrichment and Text-to-SQL

The following models are certified for use with the generative AI capabilities in IBM Knowledge Catalog Premium and IBM Knowledge Catalog Standard.

Metadata enrichment

meta-llama/llama-3-3-70b-instruct 5.3.1 and later
openai/ gpt-oss-120b 5.3.1 and later
ibm/ granite-4-h-small
ibm/granite-3-3-8b-instruct
ibm/granite-8b-code-instruct

Text-to-SQL

meta-llama/llama-3-3-70b-instruct 5.3.1 and later
openai/ gpt-oss-120b 5.3.1 and later
meta-llama/llama-4-maverick-17b-128e-instruct-fp8

Models that are identified as certified models have undergone evaluation with the gen AI capabilities in IBM Knowledge Catalog. Models that are not certified are not guaranteed to work as expected, and their accuracy and performance can vary.

For more information about the models, see:

Supported foundation models in watsonx.ai on IBM Software Hub
Supported foundation models in watsonx.ai
Billing details for generative AI assets in watsonx.ai Runtime for models in watsonx.ai as a Service