Integrating data with an agent
Build, run, and operationalize data flows by using natural language with Agentic Data Integration.
Tech preview This is a technology preview and is not yet supported for use in production environments.
Use Agentic Data Integration to create production-grade data flows without data engineering expertise. Instead of manually configuring each stage of a flow, you describe your data requirements in natural language. Your AI agent converts those requirements into a data flow. After you review and approve the flow, the agent can run, monitor, and optionally schedule a job for the flow.
The agent can help you throughout the data integration lifecycle by performing the following tasks:
- Explore the data assets in your project.
- Suggest sources, transformations, optimizations, and targets.
- Explain the reasoning behind flow decisions.
- Generate DataStage or StreamSets flows based on your request.
- Run, monitor, and schedule jobs for the generated flows.
Agentic Data Integration automates many common tasks in IBM watsonx.data integration. For complex or highly customized data flows, you might need to make manual adjustments in the flow canvas in the UI.
Requirements
The following requirements exist for Agentic Data Integration:
- Cloud platforms
-
Agentic Data Integration is not available in all regions on all cloud platforms. See Regional availability.
- Required service
-
IBM watsonx.data integration
- Required roles
-
To explore existing project assets with an agent, you must have the Viewer role in the project.
-
To build and edit data flows and to run jobs with an agent, you must have the Editor or Admin role in the project.
Data formats
Agentic Data Integration can process structured data, such as data stored in relational databases or CSV files.
Capabilities
Converse with your AI agent by using natural language to complete data integration tasks, including:
- Discovering projects that you have access to.
- Retrieving details about the project assets, such as connections, data assets, jobs, DataStage flows, StreamSets flows, and StreamSets environments and engines.
- Creating and editing DataStage flows.
- Exploring options to optimize DataStage flow performance.
- Creating StreamSets environments and flows.
- Starting DataStage and StreamSets flows as jobs and monitoring job run progress.
- Canceling job runs and deleting jobs.
- Scheduling DataStage jobs.
Ask your agent what tasks that it can complete. If a task is not supported, complete the task directly in the IBM watsonx platform UI.
Skills and tools architecture
When you describe a data requirement, Agentic Data Integration interprets your request, selects the appropriate skills, and orchestrates the required tools to complete the task.
Agentic Data Integration exposes watsonx.data integration capabilities through a skills and tools architecture:
- Skills
- Skills provide information and context to the agent. For example, the DataStage flow optimization skill provides best practices for improving DataStage flow performance, including partitioning, sorting, stage selection, tuning.
- Tools
- Tools complete actions, such as creating flow definitions, running jobs, and validating results. For example, the create DataStage flow tool is used to create or edit DataStage flows.
Together, skills and tools enable the agent to understand your request and take meaningful action.
Ask your agent to list all available skills and tools.
Workflow
Working with Agentic Data Integration includes these basic steps:
-
Set up a project.
Add connections and data assets to your project. The agent uses these assets to determine the appropriate data sources and output schema for your request. See Setting up a project for an agent.
-
Request data from the agent.
Request data from Data Integration Agent. See Integrating data with Data Integration Agent.