Build data pipelines as code: Introducing the IBM watsonx.data integration Python SDK

The general availability of the watsonx.data integration Python SDK represents a key milestone in IBM’s vision for an AI-ready data foundation, enabling data teams to scale pipeline development and power agents with high-quality data. 

Two coworkers facing a desktop with data on the screen while talking

The watsonx.data integration Python SDK introduces a code-first model that builds on existing Python skills and gives agents a consistent interface for code generation and validation. As data teams prepare for agentic AI, pipeline development must have another option that is friendly for LLM-generation.

The Python SDK enables that shift by allowing teams to build, version, automate and govern batch and real-time streaming pipelines as code— reducing manual effort and enabling scalable data integration. Together with our continued investment in agentic pipeline authoring (in preview), this release reinforces IBM’s commitment to meeting clients where they are as they build AI-ready data foundations.

Meeting the demands of agentic AI requires flexible pipeline development

Every organization feels the strain of today’s data landscape: business teams need faster insights, data teams are stretched thin by brittle and fragmented systems, and compliance leaders worry about sensitive data slipping through the cracks. These pressures intensify with the rise of agentic AI, where success depends not just on powerful models, but on the strength of the data foundation beneath them.

At the core of that foundation is data integration: the pipelines that connect, transform and deliver data so it can be trusted and used. When integration falters, AI fails. According to MIT’s The GenAI Divide, 95% of generative AI pilots fail not because models fall short, but because the data foundation isn’t ready. At the same time, data teams are being asked to build and manage more pipelines across more data types and environments, even as 77% of organizations report a shortage of the required skills.

This growing gap between demand and capacity makes it clear that pipeline development must be flexible, meeting users where they are. Traditional authoring is no longer enough. Business users want to express intent through natural language. Technical practitioners want code. And many teams rely on a visual canvas for rapid design.

IBM is investing deeply in this multimodal approach so watsonx.data integration can support every user in their preferred workflow.

A code-first approach to building data pipelines

The new IBM watsonx.data integration Python SDK is a major step forward in that vision, as it gives developers and data engineers a powerful code-first way to build, automate and maintain pipelines programmatically, reducing manual effort and accelerating time to value.

Data engineers and ETL developers have long valued the choice of how to build data pipelines, including using visual no-code/low-code interfaces or coding directly. Regardless of authoring style, pipelines can be defined once, versioned in Git, and deployed consistently through CI/CD workflows. Each approach serves different needs and skill sets within data teams.

Now, with the Python SDK, teams can author and manage data integration pipelines using one of the most widely adopted languages in data engineering. Since data engineers are comfortable reading, writing, and reviewing Python code, they apply those same skills to IBM watsonx.data integration. Pipelines as code will unlock new paths for code reuse. By making this Python SDK available, data teams can choose from multiple authoring options that align with their skills and preferences.

With the SDK, teams can:

1. Pipelines as code:

  • Define and reuse pipeline logic in Python across environments
  • Version, review and audit changes through Git and pull requests
  • Create connections and design, manage and execute pipelines entirely in code
  • Automate testing, promotions and deployments with CI/CD
  • Enforce consistent governance and access controls programmatically

2. Access a unified data integration experience with one SDK

  • Use a single SDK for both batch (ETL/ELT/TETL) and real-time streaming pipelines
  • Eliminate custom scripts and tool-specific packages with one consistent programming model
  • Built to extend across additional integration styles, including unstructured data, replication and more
  • Streamline platform administration with programmatic control over users, projects and security settings

3. A two-way bridge between visual design and code:

  • Prototype pipelines in the visual canvas or author them directly in Python
  • Move seamlessly between UI and code with instant export and import through our Python SDK code generator
  • Accelerate onboarding while enabling automation and CI/CD at scale
  • Keep visual and programmatic workflows tightly connected

Together, these capabilities lay the groundwork for the next era of data integration, where pipelines behave like software, automation is the default and future AI agents can reason about, optimize and even maintain data flows at scale.

Real-world patterns: How teams use the Python SDK to scale integration work

While the SDK introduces a programmatic approach to pipeline development, its impact is most visible in how teams apply it day to day. Early adopters are converging on a set of common patterns that help them scale faster, reduce duplication, and operate with greater consistency.

Use case 1: Turning a single pipeline into a reusable template

A common starting point is a simple UI-built pipeline. For example, ingesting a CSV, applying a transformation and writing results to cloud storage. As demand grows, other teams want the same logic with different inputs.

With the Python SDK, that original pipeline can be exported into Python using our new Python Code generation feature and turned into a reusable, parameterized template. The new Parameter Sets and Value Sets SDK features lets you move these configurations out of the UI and into version control. Instead of manually typing values into forms, you can programmatically define and inject configurations for Dev, Test and Prod environments in one go. Variations are created by adjusting a few lines of code rather than redesigning the pipeline from scratch, resulting in faster delivery, fewer errors and a scalable pattern teams can standardize on.

Use case 2: Modifying pipelines at scale for infrastructure migration

Another common challenge arises when pipelines must be updated with many impacted data sources or environments, for example during a database or data store migration. Instead of updating pipelines in the UI, teams can use the SDK to duplicate flows programmatically, update connectors and connection configurations, adjust parameters and publish updates in seconds. This is especially valuable in environments where pipelines must evolve quickly as data sources change.

The SDK can securely connect to your hybrid environment, whether on public cloud/SaaS or in self-managed software environments. Rather than dozens of manual edits, one change in code can be applied consistently everywhere.

These patterns point to a broader shift: from manual configuration to repeatable, software-driven development. By treating pipelines as code, organizations can scale data integration more reliably and build the robust data foundation required for agentic AI.

Bringing it all together

The watsonx.data integration Python SDK is a key milestone in IBM’s vision for an AI-ready data foundation. By bringing programmatic automation to watsonx.data integration, teams can build and maintain pipelines with the same rigor and scalability as developing software - while still meeting users in their preferred modality to help close the data engineering skills gap.

As part of the broader watsonx.data portfolio, watsonx.data integration works seamlessly with watsonx.data intelligence to deliver a trusted, end-to-end data foundation. Together, these offerings enable organizations to move, understand, govern, and activate data across hybrid environments, powering AI and agentic workflows at scale.

Build pipelines faster with IBM watsonx.data integration

Get started using these sample scripts

Explore the documentation

Caroline Garay

Product Marketing Manager

IBM Data Integration

John Wen

Product Manager

IBM Data Integration

Jason Britto

Senior Software Engineer

IBM Data Integration

Mitch Barnett

Software Development Manager

IBM Data Integration