IBM Data Cataloging MCP server
IBM Data Cataloging Service offers a new capability of a Model Context Protocol (MCP) server. It is designed to enable seamless interoperability between multiple large language models (LLMs) and a unified suite of cataloging and data management tools. Through a centralized control layer, the MCP server allows LLMs to securely connect, interact, and run operations by using tools that are registered within the MCP environment.
Integrated with IBM Data Cataloging, the MCP server enhances how AI-driven systems access, analyze, and organize metadata from diverse data sources. It provides a consistent interface for tool discovery, execution, and lifecycle management, reducing the need for direct integrations between individual LLMs and the underlying cataloging infrastructure.
Key capabilities
- Natural language queries on Data Cataloging Services (DCS)
- It enables the user to explore and query cataloged information by using natural language, without requiring knowledge of query syntax. This capability improves data accessibility and empowers business users to retrieve insights more intuitively.
- LLM-based tags and labels catalog suggestions
- It assists the user in creating, managing, and discovering tags more efficiently. The LLM can use the MCP tools to suggest relevant tags based on dataset content and highlight existing tags across the catalog. This capability helps maintain metadata consistency, reduces manual effort, and enhances the overall accuracy of data classification.
- Policy-driven auto-tagging and management
- It allows the user to view existing catalog policies and create or run auto-tagging policies that are powered by AI. These policies automatically classify and label datasets based on predefined rules and model insights. This helps enforce governance standards, improves compliance, and ensures that metadata remains accurate and up to date over time.
Before you begin
Before you enable and use the MCP server within the IBM Data Cataloging environment, ensure that your system meets the following requirements. These prerequisites are necessary to authenticate with your cluster, enable the MCP components, and verify the available service endpoints.
- Red Hat OpenShift® CLI
(
oc) curl
After verifying the prerequisites, log in to your OpenShift cluster by using CLI.
Setting up the MCP server
To enable MCP server capabilities, ensure that the Data Catalog has the required features that are enabled for the MCP server.
DCS_NS variable specifies the namespace where the DCS instance is
deployed. By default, it is ibm-data-cataloging, and adjusts according to the
namespace where it is installed.DCS_NS=ibm-data-cataloging
oc -n ${DCS_NS} patch SpectrumDiscover $(oc -n ${DCS_NS} get spectrumdiscover -o jsonpath='{.items[*].metadata.name}') \
--type=merge -p '{
"spec": {
"enabled_features": {
"natural-language-queries": {
"enabled": true
}
}
}
}'Once the patch is successfully applied, the SpectrumDiscover loads the MCP
server with new capabilities.
- Verifying that MCP server is running
-
After enabling the features, verify that the MCP server pods are up and running.
Run the following command:oc -n ${DCS_NS} get pod -l 'app=isd,role=dcs-ai-mcp-server'Example output:isd-dcs-ai-mcp-server 1/1 Running 0 3hIf you see pods in the Running state, the MCP server is successfully up and operational.
Connecting to MCP server
Once the MCP server is deployed and prerequisites are met, you can retrieve the service endpoints to enable LLMs or interact with its tools. The MCP server exposes both SSE and HTTP endpoints for integration.
- Retrieve MCP endpoints on Linux/macOS
-
oc get routes -n ${DCS_NS} -o json \ | jq -r '.items[] | select(.metadata.name | test("mcp")) | "Route: \(.metadata.name)\nHost: \(.spec.host)\nSSE Endpoint: https://\(.spec.host)/mcp\nHTTP Endpoint: https://\(.spec.host)/mcp/http\n"'
Initialization
Example of how to initialize the MCP server:
curl -i -X POST \
<MCP http endpoint> \
-H "Content-Type: application/json" \
-H "Accept: application/json, text/event-stream" \
-d '{
"jsonrpc": "2.0",
"id": 1,
"method": "initialize",
"params": {}
}' --insecure
Replace all placeholder values before sending the request.
Example Output:
HTTP/1.1 200 OK
access-control-allow-headers: Content-Type
access-control-allow-methods: GET, POST, OPTIONS
access-control-allow-origin: *
cache-control: no-cache, no-transform
content-type: text/event-stream
mcp-session-id: xxxxxxxxxxxxxxxxxxxx
date: Tue, 16 Dec 2025 19:17:50 GMT
transfer-encoding: chunked
set-cookie: xxxxxxxxxxxxxxxxxxxxxxxx; path=/; HttpOnly; Secure; SameSite=None
Save the mcp-session-id, it is necessary to send requests.
List Tools
curl -X POST \
<MCP http endpoint> \
-H "Content-Type: application/json" \
-H "Accept: application/json, text/event-stream" \
-H "mcp-session-id: <mcp-session-id>" \
-d '{
"jsonrpc": "2.0",
"id": 1,
"method": "tools/list",
"params": {}
}' \
--insecure
Replace all placeholder values before sending the request.
MCP server - Tools
The following section describes the new tools that are introduced in the MCP server, which is designed to improve interaction with data sources, manage credentials, policies, and tagging. Each tool serves a specific purpose and can be used independently or together to optimize data management and automation workflows.
dcs_set_credentials-
The Set Credentials tool allows the user to configure the credentials that are required to connect the MCP server to the Data Cataloging Service.
Tool name Arguments Request dcs_set_credentialsDCS URL, Username, Password curl -X POST <mcp http endpoint> -H "Content-Type: application/json" -H "Accept: application/json, text/event-stream" -H "mcp-session-id: <mcp-session-id>" -d '{ "jsonrpc": "2.0", "id": 1, "method": "tools/call", "params": { "name": "dcs_set_credentials", "arguments": { "cluster_url": "<DCS URL>", "username": "user", "password": "password" } } }' --insecureDCS URL, Authentication token curl -X POST <mcp http endpoint> -H "Content-Type: application/json" -H "Accept: application/json, text/event-stream" -H "mcp-session-id: <mcp-session-id>" -d '{ "jsonrpc": "2.0", "id": 1, "method": "tools/call", "params": { "name": "dcs_set_credentials", "arguments": { "cluster_url": "<DCS URL>", "token": "<token>" } } }' --insecureUsage: It is used when starting a new session or when you need to update your connection credentials.
Replace all placeholder values before sending the request.
dcs_file_search-
The File Search tool enables you to search across cataloged data sources by using natural language queries. It automatically converts user input into structured queries that retrieve relevant files, datasets, or metadata entries.
This capability allows you to find information quickly without needing to understand query syntax or underlying database languages. By bridging natural language with structured search, File Search improves accessibility for both technical and non-technical users, and accelerates data discovery.
Tool name Argument Request dcs_file_searchQuery curl -X POST <mcp http endpoint> -H "Content-Type: application/json" -H "Accept: application/json, text/event-stream" -H "mcp-session-id: <mcp-session-id>" -d '{ "jsonrpc": "2.0", "id": 1, "method": "tools/call", "params": { "name": "dcs_file_search", "arguments": { "query": "SELECT * FROM METAOCEAN LIMIT 1" } } }' --insecureReplace all placeholder values before sending the request. Replace the query as needed.
Example prompt:
The tool retrieves matching entries from the registered data sources.Show all CSV files containing customer transactions from 2023
dcs_get_registered_tags-
The Get Registered Tags tool retrieves a complete list of all tags that are registered in the cataloging system. It helps users and AI agents review existing classifications before creating or suggesting new ones, ensuring that tag definitions remain consistent and non-duplicative.
This tool is useful for maintaining a clean and standardized tagging structure across multiple datasets and departments.
Tool name Request dcs_get_registered_tagscurl -X POST <mcp http endpoint> -H "Content-Type: application/json" -H "Accept: application/json, text/event-stream" -H "mcp-session-id: <mcp-session-id>" -d '{ "jsonrpc": "2.0", "id": 1, "method": "tools/call", "params": { "name": "dcs_get_registered_tags", "arguments": { } } }' --insecureReplace all placeholder values before sending the request.
Example use case:
Before adding a new Financial tag, you can check whether a similar tag exists in the catalog.
dcs_get_recommend_tags-
The Recommend Tags tool uses LLM-powered analysis to suggest relevant tags for cataloged files and datasets. By examining metadata and contextual information, the tool recommends tags that help classify data more accurately and consistently.
This feature significantly reduces the time that is spent on manual tagging and enhances catalog quality by ensuring that all datasets are properly labeled.
Tool name Argument Request dcs_get_recommend_tagsQuery curl -X POST <mcp http endpoint> -H "Content-Type: application/json" -H "Accept: application/json, text/event-stream" -H "mcp-session-id: <mcp-session-id>" -d '{ "jsonrpc": "2.0", "id": 1, "method": "tools/call", "params": { "name": "dcs_get_recommend_tags", "arguments": { "query": "SELECT * FROM METAOCEAN LIMIT 10" } } }' --insecureReplace all placeholder values before sending the request. Replace the query as needed.
Example prompt:
The tool retrieves tags recommendation based on the metadata.Recommend me some tags to classify my CSV files
dcs_create_tag- The Create Tag tool enables you to define and create new tags.
Tool name Arguments Request dcs_create_tagtagName,typecurl -X POST <mcp http endpoint> -H "Content-Type: application/json" -H "Accept: application/json, text/event-stream" -H "mcp-session-id: <mcp-session-id>" -d '{ "jsonrpc": "2.0", "id": 1, "method": "tools/call", "params": { "name": "dcs_create_tag", "arguments": { "tagName": "TEST", "type": "VARCHAR(256)" } } }' --insecureReplace all placeholder values before sending the request.
Example prompt:
The tool creates tags based on the recommendations.Create the tags that you recommend me.
dcs_get_policies-
The Get Policies tool retrieves and lists all existing cataloging policies within the system. Policies define automated rules or governance conditions that are applied to data, metadata, or tags.
This tool provides visibility into the active policies that govern cataloged assets. It also helps users understand which rules are being enforced and how they affect data classification or compliance workflows.
Tool name Request dcs_get_policiescurl -X POST <mcp http endpoint> -H "Content-Type: application/json" -H "Accept: application/json, text/event-stream" -H "mcp-session-id: <mcp-session-id>" -d '{ "jsonrpc": "2.0", "id": 1, "method": "tools/call", "params": { "name": "dcs_get_policies", "arguments": { } } }' --insecureReplace all placeholder values before sending the request.
Example use case:
A data steward can use this tool to view all current auto-tagging and compliance policies before creating new ones.
dcs_create_policy-
The Create AutoTag Policy tool enables you to define and deploy new auto-tagging policies that automatically assign tags to cataloged assets based on predefined rules and AI insights. It streamlines governance by automating repetitive classification tasks and ensuring that policies remain consistent across data sources.
This tool currently supports auto-tag policy types, allowing AI models to dynamically apply tags based on metadata patterns or user-defined contextual rules.
Tool name Arguments Request dcs_create_policypolFilter,policyName,schedule,tagscurl -X POST <mcp http endpoint> -H "Content-Type: application/json" -H "Accept: application/json, text/event-stream" -H "mcp-session-id: <mcp-session-id>" -d '{ "jsonrpc": "2.0", "id": 1, "method": "tools/call", "params": { "name": "dcs_create_policy", "arguments": { "polFilter": "filetype='pdf'", "policyName": "TEST", "schedule": "NOW", "tags": {'tagtest':'valtest'} } }' --insecureReplace all placeholder values before sending the request.
Example prompt:Explore the catalog for security-related flaws and recommend me 5 different tags with proposed values that can be used for identify and classify the data on the catalog, make sure they are not duplicate.