Updating projects and metadata with a new embedding model after upgrading

After you successfully upgraded watsonx.data intelligence to Version 2.4, update all projects that are enabled for natural language queries and all assets that were created through unstructured data processing to use the new multilingual embedding model for Text-to-SQL functionality.

Important: These post-upgrade steps apply only to deployments where generative AI capabilities are enabled.

In this product version, the embedding model that was used in earlier product versions is replaced with a new multilingual embedding model. To ensure continued functionality, existing data must be reprocessed with the new model.

Migrating projects that are enabled for natural language queries
Updating assets that stem from unstructured data processing

Migrating projects that are enabled for natural language queries

In projects that are enabled for natural language queries, you must repeat the onboarding of data assets. After you identify the projects that are enabled for natural language queries, complete the following steps for each of these projects:

Open the project.
Go to the Manage tab and open the Data intelligence settings.
Go to the Natural language queries section and disable natural language queries by clicking Disable.
In the confirmation window, click Disable to stop vectorization of metadata in this project.
In the Natural language queries section, re-enable natural language queries by clicking Enable. You might need to refresh the page before you can do that.
In the confirmation window, click Enable to start a new onboarding job and resume vectorization with the new model.
Wait for the onboarding job to complete successfully before you proceed to rerun any metadata enrichment or unstructured data curation flows. To monitor the job:
1. Go to the Jobs tab in the project and look for the new onboarding job.
2. To view the job details, click the job name.
3. Wait for the job run to show the status Completed.

Updating assets that stem from unstructured data processing

Reprocess all entity tables that were created through unstructured data curation or Unstructured Data Integration flows as derivative data assets to trigger computation of embeddings with the new model. Complete the following steps for each project thats contain entity assets:

Open the project and go to the Assets page.
To identify the derivative data assets to update, you have several options:
- Depending on the overall content of your project and the naming of derivative assets, you might be able to identify such assets in the data asset list. To view the list, select Data > Data assets.
- Identify derivative data assets from document sets. Go to Data > Document sets, open each document set, and check the list of derivatives in the Output section.

Create a metadata enrichment asset with the profiling, data search, and metadata expansion options.

Submit a POST request for /v2/metadata_enrichment/metadata_enrichment_area?project_id=$PROJECT_ID&enrichImmediate=false with the following request body:

{
  "name": "$NAME",
  "job": {
      "name": "$JOB_NAME"
    },
  "objective": {
    "enrichment_options": {
        "structured": {
       "semantic_expansion": true,
       "analyze_quality": false,
       "assign_terms": false,
       "data_search": true,
       "profile": true,
       "analyze_relationships": false
        }
      },
    "governance_scope": [
        {
          "type": "category",
          "id": "$ANY_CATEGORY_ID"
        }
      ],
    "sampling": {
        "structured": {
          "method": "top",
          "analysis_method": "fixed",
          "sample_size": {
            "name": "BASIC",
            "options": {
              "row_number": 1000,
              "classify_value_number": 100
            }
          }
        }
      },
    "datascope_of_reruns": "all"
  }

The category ID must be a valid UUID.

After the metadata enrichment asset is created, open it.
Edit the metadata enrichment:
1. Click Edit enrichment, click the Select scope step and add all assets that you identified as derivative data assets.
2. Click the Review step, the click Save.
Click Start full enrichment and wait for the job to finish.

Learn more

IBM Knowledge Catalog API: Create a metadata enrichment area asset