July 1, 2024 By Scott Brokaw 3 min read

We are thrilled to announce that IBM has acquired StreamSets, a real-time data integration company specializing in streaming structured, unstructured and semistructured data across hybrid multicloud environments.

Acquired from Software AG along with webMethods, this strategic acquisition expands IBM’s already robust data integration capabilities, helping to solidify our position as a leader in the data integration market and enhancing IBM Data Fabric’s delivery of secure, high-quality data for artificial intelligence (AI). 

According to a Forrester study conducted on behalf of IBM, 87% of organizations require data to be ingested and analyzed within one day or faster. As data variety, volume and velocity continue to rise, implementing a real-time data integration tool, such as StreamSets, helps decrease the staleness of data produced by traditional data pipelines, allowing for real-time insights and decision-making.

Unlocking the power of real-time data integration

The acquisition of StreamSets extends the breadth and depth of IBM Data Fabric’s data integration capabilities by enabling the design of real-time data pipelines. This helps users ingest, enrich and harness the potential of streaming data through features such as offset handling and delivery guarantees.

The goal is to enable continuous, real-time processing, integration and transfer of data when it is available, reducing latency and data staleness. StreamSets is available today as a SaaS service across major hyperscalers.

Enabling generative AI use cases with enhanced data integration capabilities for IBM Data Fabric

The emergence of generative AI has heightened the significance of data. According to IDC, stored data is set to increase by 250% by 2025, rapidly propagating on-premises and across clouds, applications and locations with compromised quality.

With growth comes complexity. Multiple data applications, formats and data silos make it harder for organizations to use all their data for AI. Data is becoming more diverse, distributed and dynamic, stored across multiple systems and repositories in hybrid and multicloud environments, ultimately resulting in data silos. This remarkable growth has posed a substantial challenge for organizations in curating high-quality data assets for analytics and AI use cases.

Effectively managing data quality within a distributed data landscape stands as a major obstacle for organizations striving to become more data-driven and embrace cutting-edge generative AI technologies. By prioritizing seamless integration between products and emphasizing strategic architectural decisions, organizations can gain a significant competitive edge with their AI implementations.

Using the real-time and streaming capabilities provided by StreamSets, coupled with IBM Data Fabric’s top-tier data integration services facilitated through IBM® DataStage® for bulk processing and bolstered by data observability via IBM® Databand®, enables us to comprehensively address modern data pipeline workloads.

With StreamSets’ innovative visual-oriented approach to building real-time data pipelines, we can now offer our clients the ability to capture and stream data in real time, regardless of its structure or complexity. This means that our clients can respond faster to changing business conditions, make more informed decisions and drive greater innovation.

Other features of StreamSets include:

  • Change data capture (CDC) support: Generate a feed of events by using transaction-based capture.
  • Hybrid cloud support: Integrate data across multiple cloud platforms and on-premises systems with StreamSets hybrid control and data planes, enabling workloads to run in the same physical location where data resides.
  • Reduce data drift with inflight transformation:Apply filtering and quality checks during ingestion. Automatically detect and alert changes in data structures and schemas, seamlessly adapting to evolving business requirements with zero downtime.

The future of IBM Data Integration

IBM data integration solutions play an essential role in organizations’ data architectures, enabling the connectivity, transformation and enrichment of data across various locations for productive and trusted use. It is crucial to choose the right integration style to fit the organization’s use case, whether it involves batch extract, transform and load (ETL) or extract, load and transform (ELT), data virtualization, change data capture or real-time streaming.

With the acquisition of StreamSets, our aim is to simplify how organizations approach streaming data use cases. We firmly believe that data should be the driving force behind innovation and growth, and we are dedicated to providing our customers with the necessary tools for success.

Existing StreamSets clients continue to receive the same high level of support and service they have come to expect. Moreover, this acquisition brings increased investment to extend connectivity and CDC support, integrate with data lineage and data observability, and focus on end-to-end data pipeline orchestration.

For existing IBM customers, this acquisition expands and complements our data integration capabilities. It provides data engineers with an integrated suite of tools that cater to multiple data integration patterns, such as batch, CDC and real-time patterns, all infused with data observability capabilities.

At IBM, we are committed to innovation and providing our customers with the hybrid multicloud, AI and data tools they need to succeed. The acquisition of StreamSets is a testament to this commitment, and we are excited to bring their innovative technology to our clients.

Unlock innovation and growth with IBM and StreamSets. Sign up for updates.

More from Analytics

Fine-tune your data lineage tracking with descriptive lineage

4 min read - Data lineage is the discipline of understanding how data flows through your organization: where it comes from, where it goes, and what happens to it along the way. Often used in support of regulatory compliance, data governance and technical impact analysis, data lineage answers these questions and more.  Whenever anyone talks about data lineage and how to achieve it, the spotlight tends to shine on automation. This is expected, as automating the process of calculating and establishing lineage is crucial to…

Reimagine data sharing with IBM Data Product Hub

3 min read - We are excited to announce the launch of IBM® Data Product Hub, a modern data sharing solution designed to accelerate data-driven outcomes across your organization. Today, we're making this product generally available to our clients across the world, following its announcement at the IBM Think conference in May 2024. Data sharing has become the lifeblood of modern organizations, fueling growth and driving innovation. But traditional approaches to data sharing can often be a bottleneck constricting the seamless sharing of data.…

In preview now: IBM watsonx BI Assistant is your AI-powered business analyst and advisor

3 min read - The business intelligence (BI) software market is projected to surge to USD 27.9 billion by 2027, yet only 30% of employees use these tools for decision-making. This gap between investment and usage highlights a significant missed opportunity. The primary hurdle in adopting BI tools is their complexity. Traditional BI tools, while powerful, are often too complex and slow for effective decision-making. Business decision-makers need insights tailored to their specific business contexts, not complex dashboards that are difficult to navigate. Organizations…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters