Preparing to install watsonx.data intelligence
Plan and prepare to install watsonx.data intelligence.
Before you begin
Complete these tasks before you install watsonx.data intelligence.
Planning your configuration
With watsonx.data intelligence, you have several installation options: some primary installation options from which you must select at least one and several additional installation options that can be combined with the primary options.
- enableDataGovernanceCatalog
- Enable the data governance and catalog features in general.For installations with enableDataGovernanceCatalog: true, you can enable additional features with these properties:
- enableDataQuality
- Enable data quality features in projects so that you can measure, monitor, and maintain the
quality of your data to ensure the data meets your expectations and standards for specific use
cases.Important: If you enable this feature, DataStage, specifically DataStage Enterprise, is automatically installed.
If you did not purchase a DataStage license, use of DataStage Enterprise is limited to creating, managing, and running data quality rules. For examples of accepted use, see Enabling optional features after installation or upgrade.
- enableKnowledgeGraph
- Specify whether to enable the knowledge graph feature, which provides the following
capabilities:
- Relationship explorer
- Business term relationship search
- enableAISearch
- Enable LLM-based semantic search for assets and artifacts across all workspaces.
- enableDataLineage
- Enable data lineage features. Data lineage is the process of tracking data as it is moved and used by different software tools. Lineage tracks where data came from, how it was transformed, and where the data was moved to.
- enableDataProductHub
- Enable data sharing features. When you enable data sharing, data producers can
package data and data-related assets into data products so that data
consumers have access to secure, high quality data.
For installations with
enableDataProductHub: true, you can enable additional features with these properties:- enableAISearch
- Enable LLM-based semantic search for assets and artifacts across all workspaces.
- enableGenerativeAICapabilities
- Enable generative AI capabilities and install the hosted MCP server. Combine this option with
enableDataGovernanceCatalog: true. Enable the gen AI capabilities if you plan to use the following features:- enableSemanticEmbedding
- Enable computation of vectors for the metadata and the data during metadata enrichment. These
vectors will be used, for example, for generating SQL queries from natural-language text
(Text-to-SQL feature).
This settings is also required if you want to enable the Text-to-SQL feature.
- enableSemanticEnrichment
- Enable gen AI based enrichment features:
- Table name generation
- Column name generation
- Description generation
- Glossary generation
- Term assignment
If you enable this feature, you can work with the defaultenableModelsOnsetting or change the setting. WithenableModelsOn: gpuorenableModelsOn: remote, you can also work with a custom model instead of the default model:- With
enableModelsOn: gpu, set the custom model for the enrichment features with thecustomModelSemanticEnrichmentparameter. - With
enableModelsOn: remote, set the custom model for the enrichment features in . For more information, see Connecting to a remote watsonx.ai™ instance.
- customModelSemanticEnrichment
- Specify a custom model to use with the gen AI based enrichment features when models run on GPU.
That model must be available in
watsonx_ai_ifm.If you use a custom model for gen AI based enrichment, make sure that you have the number of GPUs that is required for the selected model.
Important: For custom models, the accuracy of results might vary.This installation parameter is deprecated and will be removed in a future release. Instead of using this parameter, set the custom model for the gen AI based enrichment features in .
- enableTextToSql
- Enable Text-to-SQL capabilities for generating SQL queries from natural language input. These
capabilities can be used for creating query-based data assets, for example, for data products or
SQL-based data quality rules, or in searches.
If you enable this feature, you must also set
enableSemanticEmbeddings: true, andenableModelsOn: gpuorenableModelsOn: remote.With
enableModelsOn: gpu, the text2sql model is started inwatsonx_ai_ifm. The default model isgranite-3-8b-instruct.You can set a custom model instead:- With
enableModelsOn: gpu, set the custom model for the Text-To-SQL feature with thecustomModelTextToSqlparameter. - With
enableModelsOn: remote, set the custom model for the Text-To-SQL feature in . For more information, see Connecting to a remote watsonx.ai instance.
enableSemanticEnrichment: true.- customModelTextToSql
- Specify the ID of a custom model to use with the Text-to-SQL capabilities. That model must be
available in
watsonx_ai_ifm.If you work with a custom model for Text-to-SQL, make sure that you have the number of GPUs that is required for the selected model.
This installation parameter is deprecated and will be removed in a future release. Instead of using this parameter, set the custom model for the Text-to-SQL capabilities in .
- With
- enableModelsOn
- Set the deployment mode for the models that are used for the gen AI capabilities. This option
determines where the foundation models run that are used with the gen AI capabilities:
cpu- This is the default setting.
A Granite model (
granite3-moe:3b-instruct-q4_K_M) is started on the internal containers with CPUs. This model is used only for expanding metadata and for term generation and assignment in metadata enrichment.Running models on CPU can result in significantly lower performance and reduced output quality, particularly for tasks such as name generation. CPU-based inference should be used with caution and is generally better suited for development, testing, or cost-constrained scenarios. For production use, GPU-based inference is strongly recommended .
If you use CPU models, results can be improved by making use of glossary and abbreviation support, and by combining LLM-based generation with approaches such as fuzzy matching.- An instance administrator can upload abbreviation files to the semantic automation pods for global use in all projects. See Uploading custom abbreviation files for use in metadata enrichment.
- Project administrators can upload additional abbreviation files to projects and configure the metadata enrichment settings for metadata expansion. See Custom abbreviation files for name generation and Default enrichment settings.
To improve the performance of models on CPU, a cluster administrator has these options:- Increase the resources by providing more CPUs and memory to the
semantic-text-generationpods, for example:oc set resources deployment semantic-text-generation -n ${PROJECT_CPD_INST_OPERANDS} --limits=cpu=10 oc set resources deployment semantic-text-generation -n ${PROJECT_CPD_INST_OPERANDS} --limits=memory=4Gi - Assign a dedicated cluster node (worker) to the pods on which models with runtime are deployed. For more information, see Setting up a dedicated node for models running on CPU.
gpu- LLMs run in a local inference foundation models component (
watsonx_ai_ifm). All generative AI capabilities are available in this mode.Important: With this mode, the inference foundation models component (watsonx_ai_ifm) is automatically installed. This option requires at least one GPU to support LLM-based enrichment and LLM-based glossary generation. The default model isgranite-4-h-small.For information about supported GPUs, see the Hardware requirements.
remote-
LLMs run in a watsonx.ai instance on another on-premises instance of IBM® Software Hub or on IBM watsonx™ as a Service. None of the local models are started. A connection to the remote watsonx.ai instance must be configured. See Connecting to a remote watsonx.ai instance.
Important: The required models must be available and running in the remote watsonx.ai instance. Setting up a connection to the remote instance does not start the models. If a required model is not available in the remote watsonx.ai instance, no connection can be established and any tasks that rely on those models will fail.
The option
enableGenerativeAICapabilities, and one or a combination ofenableSemanticEnrichment,enableSemanticEmbedding,enableTextToSqlmust be set totrue.If you switch from using models on CPU to a different deployment mode later, make sure to stop thesemantic-text-generationpod where the CPU models run to avoid conflicts.oc scale deployment semantic-text-generation --replicas=0
- enableUnstructuredDataIntegration
- Enable unstructured data processing. Install Unstructured Data Integration to ingest, transform, and enrich
unstructured data from diverse sources.
- Prerequisites:
-
- Red Hat® OpenShift® AI must be installed.
- IBM watsonx.data™ Spark must be provisioned.
- GPU requirements:
- Metadata extraction: 1
- Text-to-SQL: 2
- Entity extraction (optional): 2
Combine this option with
enableDataGovernanceCatalog: trueto make governance and cataloging features available in unstructured data processing.
enableAISearch: false
enableDataGovernanceCatalog: true
enableDataLineage: true
enableDataProduct: true
enableDataQuality: false
enableGenerativeAICapabilities: true
enableKnowledgeGraph: true
enableModelsOn: cpu
enableSemanticEmbedding: false
enableSemanticEnrichment: true
enableTextToSql: falseGenerative AI capabilities per model deployment mode
Depending on your requirements, you must specify different combinations of installation parameter
for your generative AI setup in addition to enableGenerativeAICapabilities: true.
Some further service-specific configuration might be required after the installation is complete.
Check the following table for possible configuration options:
| Model deployment mode | Capabilities | Additional settings in watsonx.data intelligence |
|---|---|---|
enableModelsOn: cpuFor details, see |
Generation of names and descriptions for tables and columns, and generation and assignment of business terms in metadata enrichment (Designing metadata enrichment) | Enable gen AI based capabilities in the metadata enrichment project settings. |
enableModelsOn: gpuFor details, see |
The installation parameters
and
for setting
custom models are deprecated and will be removed in a future release. Instead, set custom models in
. See Connecting to a remote watsonx.ai instance. |
|
enableModelsOn: remoteFor details, see A connection to the remote watsonx.ai instance must be configured in . See Connecting to a remote watsonx.ai instance. |
Custom models must be set in . |
|
Certified foundation models for metadata enrichment and Text-to-SQL
- Metadata enrichment
-
- meta-llama/llama-3-3-70b-instruct
- openai/ gpt-oss-120b
- ibm/ granite-4-h-small
- ibm/granite-3-3-8b-instruct
- ibm/granite-8b-code-instruct
- Text-to-SQL
-
- meta-llama/llama-3-3-70b-instruct
- openai/ gpt-oss-120b
- meta-llama/llama-4-maverick-17b-128e-instruct-fp8
Models that are identified as certified models have undergone evaluation with the gen AI capabilities in watsonx.data intelligence. Models that are not certified are not guaranteed to work as expected, and their accuracy and performance can vary.
- Supported foundation models in watsonx.ai on IBM Software Hub
- Supported foundation models in watsonx.ai as a Service
- Billing details for generative AI assets in watsonx.ai Runtime for models in watsonx.ai as a Service