Preparing to install watsonx.data intelligence

Plan and prepare to install watsonx.data intelligence.

Before you begin

Complete these tasks before you install watsonx.data intelligence.

Planning your configuration

With watsonx.data intelligence, you have several installation options: some primary installation options from which you must select at least one and several additional installation options that can be combined with the primary options.

enableDataGovernanceCatalog

Enable the data governance and catalog features in general.

For installations with enableDataGovernanceCatalog: true, you can enable additional features with these properties:

enableDataQuality

Enable data quality features in projects so that you can measure, monitor, and maintain the quality of your data to ensure the data meets your expectations and standards for specific use cases.

Important: If you enable this feature, DataStage, specifically DataStage Enterprise, is automatically installed.

If you did not purchase a DataStage license, use of DataStage Enterprise is limited to creating, managing, and running data quality rules. For examples of accepted use, see Enabling optional features after installation or upgrade.

enableKnowledgeGraph

Specify whether to enable the knowledge graph feature, which provides the following capabilities:

Relationship explorer
Business term relationship search

enableAISearch

Enable LLM-based semantic search for assets and artifacts across all workspaces.

enableDataLineage

Enable data lineage features. Data lineage is the process of tracking data as it is moved and used by different software tools. Lineage tracks where data came from, how it was transformed, and where the data was moved to.

enableDataProductHub

Enable data sharing features. When you enable data sharing, data producers can package data and data-related assets into data products so that data consumers have access to secure, high quality data.

For installations with enableDataProductHub: true, you can enable additional features with these properties:

enableAISearch: Enable LLM-based semantic search for assets and artifacts across all workspaces.

enableGenerativeAICapabilities

Enable generative AI capabilities and install the hosted MCP server. Combine this option with enableDataGovernanceCatalog: true. Enable the gen AI capabilities if you plan to use the following features:

enableSemanticEmbedding

Enable computation of vectors for the metadata and the data during metadata enrichment. These vectors will be used, for example, for generating SQL queries from natural-language text (Text-to-SQL feature).

This settings is also required if you want to enable the Text-to-SQL feature.

enableSemanticEnrichment

Enable gen AI based enrichment features:

Table name generation
Column name generation
Description generation
Glossary generation
Term assignment

If you enable this feature, you can work with the default enableModelsOn setting or change the setting. With enableModelsOn: gpu or

enableModelsOn:
remote

, you can also work with a custom model instead of the default model:

With enableModelsOn: gpu, set the custom model for the enrichment features with the customModelSemanticEnrichment parameter.
With enableModelsOn: remote, set the custom model for the enrichment features in Administration > Configurations and settings > Generative AI setup. For more information, see Connecting to a remote watsonx.ai™ instance.

customModelSemanticEnrichment

Specify a custom model to use with the gen AI based enrichment features when models run on GPU. That model must be available in watsonx_ai_ifm.

If you use a custom model for gen AI based enrichment, make sure that you have the number of GPUs that is required for the selected model.

Important: For custom models, the accuracy of results might vary.

This installation parameter is deprecated and will be removed in a future release. Instead of using this parameter, set the custom model for the gen AI based enrichment features in Administration > Configurations and settings > Generative AI setup.

enableTextToSql

Enable Text-to-SQL capabilities for generating SQL queries from natural language input. These capabilities can be used for creating query-based data assets, for example, for data products or SQL-based data quality rules, or in searches.

If you enable this feature, you must also set enableSemanticEmbeddings: true, and enableModelsOn: gpu or enableModelsOn: remote.

With enableModelsOn: gpu, the text2sql model is started in watsonx_ai_ifm. The default model is granite-3-8b-instruct.

You can set a custom model instead:

With enableModelsOn: gpu, set the custom model for the Text-To-SQL feature with the customModelTextToSql parameter.
With enableModelsOn: remote, set the custom model for the Text-To-SQL feature in Administration > Configurations and settings > Generative AI setup. For more information, see Connecting to a remote watsonx.ai instance.

The accuracy of the results also improves if the assets that serve as context are enriched by using LLMs. For this purpose, you can enable the gen AI based metadata enrichment features by setting enableSemanticEnrichment: true.

customModelTextToSql

Specify the ID of a custom model to use with the Text-to-SQL capabilities. That model must be available in watsonx_ai_ifm.

If you work with a custom model for Text-to-SQL, make sure that you have the number of GPUs that is required for the selected model.

This installation parameter is deprecated and will be removed in a future release. Instead of using this parameter, set the custom model for the Text-to-SQL capabilities in Administration > Configurations and settings > Generative AI setup.

enableModelsOn

Set the deployment mode for the models that are used for the gen AI capabilities. This option determines where the foundation models run that are used with the gen AI capabilities:

cpu

This is the default setting.

A Granite model (granite3-moe:3b-instruct-q4_K_M) is started on the internal containers with CPUs. This model is used only for expanding metadata and for term generation and assignment in metadata enrichment.

Running models on CPU can result in significantly lower performance and reduced output quality, particularly for tasks such as name generation. CPU-based inference should be used with caution and is generally better suited for development, testing, or cost-constrained scenarios. For production use, GPU-based inference is strongly recommended .

If you use CPU models, results can be improved by making use of glossary and abbreviation support, and by combining LLM-based generation with approaches such as fuzzy matching.

An instance administrator can upload abbreviation files to the semantic automation pods for global use in all projects. See Uploading custom abbreviation files for use in metadata enrichment.
Project administrators can upload additional abbreviation files to projects and configure the metadata enrichment settings for metadata expansion. See Custom abbreviation files for name generation and Default enrichment settings.

To improve the performance of models on CPU, a cluster administrator has these options:

Increase the resources by providing more CPUs and memory to the semantic-text-generation pods, for example:

oc set resources deployment semantic-text-generation -n ${PROJECT_CPD_INST_OPERANDS} --limits=cpu=10
oc set resources deployment semantic-text-generation -n ${PROJECT_CPD_INST_OPERANDS} --limits=memory=4Gi

Assign a dedicated cluster node (worker) to the pods on which models with runtime are deployed. For more information, see Setting up a dedicated node for models running on CPU.

gpu

LLMs run in a local inference foundation models component (watsonx_ai_ifm). All generative AI capabilities are available in this mode.

Important: With this mode, the inference foundation models component (watsonx_ai_ifm) is automatically installed. This option requires at least one GPU to support LLM-based enrichment and LLM-based glossary generation. The default model is


granite-4-h-small

For information about supported GPUs, see the Hardware requirements.

remote

LLMs run in a watsonx.ai instance on another on-premises instance of IBM® Software Hub or on IBM watsonx™ as a Service. None of the local models are started. A connection to the remote watsonx.ai instance must be configured. See Connecting to a remote watsonx.ai instance.

Important: The required models must be available and running in the remote watsonx.ai instance. Setting up a connection to the remote instance does not start the models. If a required model is not available in the remote watsonx.ai instance, no connection can be established and any tasks that rely on those models will fail.

The option enableGenerativeAICapabilities, and one or a combination of enableSemanticEnrichment, enableSemanticEmbedding, enableTextToSql must be set to true.

If you switch from using models on CPU to a different deployment mode later, make sure to stop the semantic-text-generation pod where the CPU models run to avoid conflicts.

oc scale deployment semantic-text-generation --replicas=0

enableUnstructuredDataIntegration

Enable unstructured data processing. Install Unstructured Data Integration to ingest, transform, and enrich unstructured data from diverse sources.

Prerequisites:

Red Hat® OpenShift® AI must be installed.
IBM watsonx.data™ Spark must be provisioned.
GPU requirements:
- Metadata extraction: 1
- Text-to-SQL: 2
- Entity extraction (optional): 2

Combine this option with enableDataGovernanceCatalog: true to make governance and cataloging features available in unstructured data processing.

If you install watsonx.data intelligence with the default configuration, the following settings are applied:

enableAISearch: false
enableDataGovernanceCatalog: true
enableDataLineage: true
enableDataProduct: true
enableDataQuality: false
enableGenerativeAICapabilities: true
enableKnowledgeGraph: true
enableModelsOn: cpu
enableSemanticEmbedding: false
enableSemanticEnrichment: true
enableTextToSql: false

Generative AI capabilities per model deployment mode

Depending on your requirements, you must specify different combinations of installation parameter for your generative AI setup in addition to enableGenerativeAICapabilities: true. Some further service-specific configuration might be required after the installation is complete. Check the following table for possible configuration options:

Table 1. Configuration options for gen AI capabilities (draft)
Model deployment mode	Capabilities	Additional settings in watsonx.data intelligence
`enableModelsOn: cpu` For details, see `enableModelsOn: cpu`.	Generation of names and descriptions for tables and columns, and generation and assignment of business terms in metadata enrichment (Designing metadata enrichment)	Enable gen AI based capabilities in the metadata enrichment project settings.
`enableModelsOn: gpu` For details, see `enableModelsOn: gpu`.	Generation of names and descriptions for tables and columns, and generation and assignment of business terms in metadata enrichment (Designing metadata enrichment) `enableSemanticEnrichment: true` Text to SQL: For data products (Creating a data product from SQL) For query-based data assets (Adding a query-based asset) For data quality rules (Managing data quality rules) `enableTextToSql: true enableSemanticEmbeddings: true` Generation of data quality rule descriptions (Managing data quality rules) The installation parameters `customModelSemanticEnrichment: <model_id>` and `customModelTextToSql: <model_id>` for setting custom models are deprecated and will be removed in a future release. Instead, set custom models in Administration > Configurations and settings > Generative AI setup. See Connecting to a remote watsonx.ai instance.	Enable gen AI based capabilities in the metadata enrichment project settings. Enable natural language queries data intelligence project settings. Enable text to SQL for data products: go to Data products > Configurations and settings > Generative AI. Enable Explain data quality rules with AI in the data quality project settings.
`enableModelsOn: remote` For details, see `enableModelsOn:remote`. A connection to the remote watsonx.ai instance must be configured in Administration > Configurations and settings > Generative AI setup. See Connecting to a remote watsonx.ai instance.	Generation of names and descriptions for tables and columns, and generation and assignment of business terms in metadata enrichment (Designing metadata enrichment) `enableSemanticEnrichment: true` Text to SQL: For data products (Creating a data product from SQL) For query-based data assets (Adding a query-based asset) For data quality rules (Managing data quality rules) `enableTextToSql: true enableSemanticEmbeddings: true` Generation of data quality rule descriptions (Managing data quality rules) Custom models must be set in Administration > Configurations and settings > Generative AI setup.	Enable gen AI based capabilities in the metadata enrichment project settings. Enable natural language queries data intelligence project settings. Enable text to SQL for data products: go to Data products > Configurations and settings > Generative AI. Enable Explain data quality rules with AI in the data quality project settings.

Certified foundation models for metadata enrichment and Text-to-SQL

The following models are certified for use with the generative AI capabilities in watsonx.data intelligence.

Metadata enrichment

meta-llama/llama-3-3-70b-instruct
openai/ gpt-oss-120b
ibm/ granite-4-h-small
ibm/granite-3-3-8b-instruct
ibm/granite-8b-code-instruct

Text-to-SQL

meta-llama/llama-3-3-70b-instruct
openai/ gpt-oss-120b
meta-llama/llama-4-maverick-17b-128e-instruct-fp8

Models that are identified as certified models have undergone evaluation with the gen AI capabilities in watsonx.data intelligence. Models that are not certified are not guaranteed to work as expected, and their accuracy and performance can vary.

For more information about the models, see:

Supported foundation models in watsonx.ai on IBM Software Hub
Supported foundation models in watsonx.ai as a Service
Billing details for generative AI assets in watsonx.ai Runtime for models in watsonx.ai as a Service