IBM Data Cataloging MCP server

IBM Data Cataloging Service offers a new capability of a Model Context Protocol (MCP) server. It is designed to enable seamless interoperability between multiple large language models (LLMs) and a unified suite of cataloging and data management tools. Through a centralized control layer, the MCP server allows LLMs to securely connect, interact, and run operations by using tools that are registered within the MCP environment.

Integrated with IBM Data Cataloging, the MCP server enhances how AI-driven systems access, analyze, and organize metadata from diverse data sources. It provides a consistent interface for tool discovery, execution, and lifecycle management, reducing the need for direct integrations between individual LLMs and the underlying cataloging infrastructure.

Key capabilities

Natural language queries on Data Cataloging Services (DCS)
It enables the user to explore and query cataloged information by using natural language, without requiring knowledge of query syntax. This capability improves data accessibility and empowers business users to retrieve insights more intuitively.
LLM-based tags and labels catalog suggestions
It assists the user in creating, managing, and discovering tags more efficiently. The LLM can use the MCP tools to suggest relevant tags based on dataset content and highlight existing tags across the catalog. This capability helps maintain metadata consistency, reduces manual effort, and enhances the overall accuracy of data classification.
Policy-driven auto-tagging and management
It allows the user to view existing catalog policies and create or run auto-tagging policies that are powered by AI. These policies automatically classify and label datasets based on predefined rules and model insights. This helps enforce governance standards, improves compliance, and ensures that metadata remains accurate and up to date over time.

Before you begin

Before you enable and use the MCP server within the IBM Data Cataloging environment, ensure that your system meets the following requirements. These prerequisites are necessary to authenticate with your cluster, enable the MCP components, and verify the available service endpoints.

Make sure that the following tools are installed and accessible from your local environment:
  • Red Hat OpenShift® CLI (oc)
  • curl

After verifying the prerequisites, log in to your OpenShift cluster by using CLI.

Setting up the MCP server

To enable MCP server capabilities, ensure that the Data Catalog has the required features that are enabled for the MCP server.

Note: The DCS_NS variable specifies the namespace where the DCS instance is deployed. By default, it is ibm-data-cataloging, and adjusts according to the namespace where it is installed.
Run the following command:
DCS_NS=ibm-data-cataloging

oc -n ${DCS_NS} patch SpectrumDiscover $(oc -n ${DCS_NS} get spectrumdiscover -o jsonpath='{.items[*].metadata.name}') \
  --type=merge -p '{
    "spec": {
      "enabled_features": {
        "natural-language-queries": {
          "enabled": true
        }
      }
    }
  }'

Once the patch is successfully applied, the SpectrumDiscover loads the MCP server with new capabilities.

Verifying that MCP server is running

After enabling the features, verify that the MCP server pods are up and running.

Run the following command:
oc -n ${DCS_NS} get pod -l 'app=isd,role=dcs-ai-mcp-server'
Example output:
isd-dcs-ai-mcp-server 1/1 Running 0 3h

If you see pods in the Running state, the MCP server is successfully up and operational.

Connecting to MCP server

Once the MCP server is deployed and prerequisites are met, you can retrieve the service endpoints to enable LLMs or interact with its tools. The MCP server exposes both SSE and HTTP endpoints for integration.

Retrieve MCP endpoints on Linux/macOS
oc get routes -n ${DCS_NS} -o json \
  | jq -r '.items[] 
  | select(.metadata.name | test("mcp")) 
  | "Route: \(.metadata.name)\nHost: \(.spec.host)\nSSE Endpoint: https://\(.spec.host)/mcp\nHTTP Endpoint: https://\(.spec.host)/mcp/http\n"'
Example output:
Route: dcs-mcp-route
Host: dcs-mcp-route-ibm-data-cataloging.apps.com
SSE Endpoint: https://dcs-mcp-route-ibm-data-cataloging.ibm.com/mcp
HTTP Endpoint: https://dcs-mcp-route-ibm-data-cataloging.com/mcp/http
The command outputs -
  • Route: The OpenShift route for the MCP server.
  • Host: The host URL assigned by OpenShift.
  • SSE Endpoint: The URL to connect through server-sent events.
  • HTTP Endpoint: The URL to connect through standard HTTP requests.

    This connection provides the necessary endpoints to integrate LLMs or other clients with the MCP server.

Initialization

Example of how to initialize the MCP server:

curl -i -X POST \
  <MCP http endpoint> \
  -H "Content-Type: application/json" \
  -H "Accept: application/json, text/event-stream" \
  -d '{
    "jsonrpc": "2.0",
    "id": 1,
    "method": "initialize",
    "params": {}
  }' --insecure

Replace all placeholder values before sending the request.

Example Output:

HTTP/1.1 200 OK
access-control-allow-headers: Content-Type
access-control-allow-methods: GET, POST, OPTIONS
access-control-allow-origin: *
cache-control: no-cache, no-transform
content-type: text/event-stream
mcp-session-id: xxxxxxxxxxxxxxxxxxxx
date: Tue, 16 Dec 2025 19:17:50 GMT
transfer-encoding: chunked
set-cookie: xxxxxxxxxxxxxxxxxxxxxxxx; path=/; HttpOnly; Secure; SameSite=None

Save the mcp-session-id, it is necessary to send requests.

List Tools


curl -X POST \
  <MCP http endpoint> \
  -H "Content-Type: application/json" \
  -H "Accept: application/json, text/event-stream" \
  -H "mcp-session-id: <mcp-session-id>" \
  -d '{
    "jsonrpc": "2.0",
    "id": 1,
    "method": "tools/list",
    "params": {}
  }' \
  --insecure

Replace all placeholder values before sending the request.

MCP server - Tools

The following section describes the new tools that are introduced in the MCP server, which is designed to improve interaction with data sources, manage credentials, policies, and tagging. Each tool serves a specific purpose and can be used independently or together to optimize data management and automation workflows.

dcs_set_credentials

The Set Credentials tool allows the user to configure the credentials that are required to connect the MCP server to the Data Cataloging Service.

Tool name Arguments Request
dcs_set_credentials DCS URL, Username, Password
curl -X POST
<mcp http endpoint>
-H "Content-Type: application/json"
-H "Accept: application/json, text/event-stream"
-H "mcp-session-id: <mcp-session-id>"
-d '{ "jsonrpc": "2.0", "id": 1, "method": "tools/call", "params": { "name": "dcs_set_credentials", "arguments": { "cluster_url": "<DCS URL>", "username": "user", "password": "password" } } }'
--insecure
DCS URL, Authentication token
curl -X POST
<mcp http endpoint>
-H "Content-Type: application/json"
-H "Accept: application/json, text/event-stream"
-H "mcp-session-id: <mcp-session-id>"
-d '{ "jsonrpc": "2.0", "id": 1, "method": "tools/call", "params": { "name": "dcs_set_credentials", "arguments": { "cluster_url": "<DCS URL>", "token": "<token>" } } }'
--insecure

Usage: It is used when starting a new session or when you need to update your connection credentials.

Replace all placeholder values before sending the request.

Obtaining a username and password:

You can retrieve the user and password by using the following commands:

macOS
#User
oc get secret keystone -n ${DCS_NS} -o jsonpath="{.data.user}" | base64 -D; echo

#Password
oc get secret keystone -n ${DCS_NS} -o jsonpath="{.data.password}" | base64 -D; echo
Linux
#User
oc get secret keystone -n ${DCS_NS} -o jsonpath="{.data.user}" | base64 --decode; echo

#Password
oc get secret keystone -n ${DCS_NS} -o jsonpath="{.data.password}" | base64 --decode; echo
Obtaining DCS URL:
Retrieve the DCS URL by using the following command:
DCS_CONSOLE=$(oc -n ${DCS_NS} get route console -o jsonpath='{"https://"}{.spec.host}')
echo $DCS_CONSOLE
Obtaining token:
Retrieve the token by using the following command:
curl -k -I -u <username>:<password> ${DCS_CONSOLE}/auth/v1/token | grep x-auth-token
Response:
x-auth-token: <token>
dcs_file_search

The File Search tool enables you to search across cataloged data sources by using natural language queries. It automatically converts user input into structured queries that retrieve relevant files, datasets, or metadata entries.

This capability allows you to find information quickly without needing to understand query syntax or underlying database languages. By bridging natural language with structured search, File Search improves accessibility for both technical and non-technical users, and accelerates data discovery.

Tool name Argument Request
dcs_file_search Query
curl -X POST
<mcp http endpoint>
-H "Content-Type: application/json"
-H "Accept: application/json, text/event-stream"
-H "mcp-session-id: <mcp-session-id>"
-d '{ "jsonrpc": "2.0", "id": 1, "method": "tools/call", "params": { "name": "dcs_file_search", "arguments": { "query": "SELECT * FROM METAOCEAN LIMIT 1" } } }'
--insecure

Replace all placeholder values before sending the request. Replace the query as needed.

Example prompt:
Show all CSV files containing customer transactions from 2023
The tool retrieves matching entries from the registered data sources.
dcs_get_registered_tags

The Get Registered Tags tool retrieves a complete list of all tags that are registered in the cataloging system. It helps users and AI agents review existing classifications before creating or suggesting new ones, ensuring that tag definitions remain consistent and non-duplicative.

This tool is useful for maintaining a clean and standardized tagging structure across multiple datasets and departments.

Tool name Request
dcs_get_registered_tags
curl -X POST
<mcp http endpoint>
-H "Content-Type: application/json"
-H "Accept: application/json, text/event-stream"
-H "mcp-session-id: <mcp-session-id>"
-d '{ "jsonrpc": "2.0", "id": 1, "method": "tools/call", "params": { "name": "dcs_get_registered_tags", "arguments": { } } }'
--insecure

Replace all placeholder values before sending the request.

Example use case:

Before adding a new Financial tag, you can check whether a similar tag exists in the catalog.

dcs_get_recommend_tags

The Recommend Tags tool uses LLM-powered analysis to suggest relevant tags for cataloged files and datasets. By examining metadata and contextual information, the tool recommends tags that help classify data more accurately and consistently.

This feature significantly reduces the time that is spent on manual tagging and enhances catalog quality by ensuring that all datasets are properly labeled.

Tool name Argument Request
dcs_get_recommend_tags Query
curl -X POST
<mcp http endpoint>
-H "Content-Type: application/json"
-H "Accept: application/json, text/event-stream"
-H "mcp-session-id: <mcp-session-id>"
-d '{ "jsonrpc": "2.0", "id": 1, "method": "tools/call", "params": { "name": "dcs_get_recommend_tags", "arguments": { "query": "SELECT * FROM METAOCEAN LIMIT 10" } } }'
--insecure

Replace all placeholder values before sending the request. Replace the query as needed.

Example prompt:
Recommend me some tags to classify my CSV files
The tool retrieves tags recommendation based on the metadata.
dcs_create_tag
The Create Tag tool enables you to define and create new tags.
Tool name Arguments Request
dcs_create_tag tagName, type
curl -X POST
<mcp http endpoint>
-H "Content-Type: application/json"
-H "Accept: application/json, text/event-stream"
-H "mcp-session-id: <mcp-session-id>"
-d '{ "jsonrpc": "2.0", "id": 1, "method": "tools/call", "params": { "name": "dcs_create_tag", "arguments": { "tagName": "TEST", "type": "VARCHAR(256)" } } }'
--insecure

Replace all placeholder values before sending the request.

Example prompt:
Create the tags that you recommend me.
The tool creates tags based on the recommendations.
dcs_get_policies

The Get Policies tool retrieves and lists all existing cataloging policies within the system. Policies define automated rules or governance conditions that are applied to data, metadata, or tags.

This tool provides visibility into the active policies that govern cataloged assets. It also helps users understand which rules are being enforced and how they affect data classification or compliance workflows.

Tool name Request
dcs_get_policies
curl -X POST
<mcp http endpoint>
-H "Content-Type: application/json"
-H "Accept: application/json, text/event-stream"
-H "mcp-session-id: <mcp-session-id>"
-d '{ "jsonrpc": "2.0", "id": 1, "method": "tools/call", "params": { "name": "dcs_get_policies", "arguments": { } } }'
--insecure

Replace all placeholder values before sending the request.

Example use case:

A data steward can use this tool to view all current auto-tagging and compliance policies before creating new ones.

dcs_create_policy

The Create AutoTag Policy tool enables you to define and deploy new auto-tagging policies that automatically assign tags to cataloged assets based on predefined rules and AI insights. It streamlines governance by automating repetitive classification tasks and ensuring that policies remain consistent across data sources.

This tool currently supports auto-tag policy types, allowing AI models to dynamically apply tags based on metadata patterns or user-defined contextual rules.

Tool name Arguments Request
dcs_create_policy polFilter, policyName, schedule, tags
curl -X POST
<mcp http endpoint>
-H "Content-Type: application/json"
-H "Accept: application/json, text/event-stream"
-H "mcp-session-id: <mcp-session-id>"
-d '{ "jsonrpc": "2.0", "id": 1, "method": "tools/call", "params": { "name": "dcs_create_policy", "arguments": { "polFilter": "filetype='pdf'", "policyName": "TEST", "schedule": "NOW", "tags": {'tagtest':'valtest'} } }'
--insecure

Replace all placeholder values before sending the request.

Example prompt:
Explore the catalog for security-related flaws and recommend me 5 different tags with proposed values that can be used for identify and classify the data on the catalog, make sure they are not duplicate.