November 22, 2019 By Bharath Chari 3 min read

Haruto Sakamoto, the Chief Information Officer at a Japanese multinational imaging company, had a few challenges to contend with. His business units had a presence in 180 countries worldwide with geographically-dispersed data warehouses and business intelligence applications in various locations. The data stored in those warehouses and applications were in different formats, and all of these unorganized sources of data were leading to synchronization issues when it came to running time-sensitive queries and reporting by business users.

On the other side of the world, Chris Roberts, the data management architecture lead at a bank in North America, had a different problem. The Bank wanted to improve its customer experience through real-time notifications for new offers to compete with emerging Fintech companies, which had the potential to drive online traffic and customer interaction away from the traditional banks. One of the main issues was that transactional data was located in many systems, including IBM Db2 and Oracle databases whereas notification applications were located on a data lake.

Though at first glance it would appear that the issues Haruto and Chris were grappling with were unique to their organizations, they actually struggled with a similar problem.

Both of them need to access real-time data for reporting and trends analysis to make informed business decisions, increase revenue opportunities and provide improved customer experiences.

Traditionally, the businesses where Haruto and Chris operate in rely on these options to address their problems:

  • Classical Data Integration (extract, transform, load, or ETL) would have allowed batch, near-real time or event / service driven bulk data transformation for high volumes of complex data to be fed to data warehouses.
  • Replication using change data capture (CDC) technology provides bidirectional synchronization and simple transformations for event-driven or real time integration, and disaster recovery; it also enables real-time customer notifications from the cloud.
  • Virtualization to create virtual views of data from multiple databases, help with creating a single view of the data spread out across geographical locations based on simple transformations.

But what if Haruto and Chris had to decide on only one solution to address their challenges?

Enter data integration with real-time capture.

IBM InfoSphere DataStage with fully built-in CDC technology for real time capture deployed as containers can provide Haruto and Chris the best of both the Data Integration and Data replication worlds. DataStage allows for complex transformation with large data sets while CDC captures log based changes as they occur, transforms them using complex transformations and delivers to target databases on the cloud and data lakes using Kafka-based message queues.

Here’s how it works: DataStage real-time capture receives updates from enabled sources. The capture engine puts the updates as messages onto an internal Kafka topic. The DataStage real-time connector then consumes these messages and functions as the source side of any job that is receiving the updates. You can use the DataStage real-time connector like any other object on the canvas of a job in InfoSphere DataStage, the industry leader in Data Integration.

The 3 key benefits Haruto and Chris would both receive with this solution are:

  1. A single tool with a common user experience (using DataStage flow designer UI), with no need to manage two separate tools or make complex configurations.
  2. Faster time-to-value by removing the need for knowledge of CDC and management of agent technology
  3. Support for cloud data sources because remote capture is not dependent on data source agent technology, helping future-proof your solution.

With InfoSphere DataStage, Haruto and his team could deliver up-to-date and high availability of data for end-users, improved availability of data warehouses for real-time reporting and ensure Peak performance of production systems by elimination of batch windows. According to Haruto, “IBM InfoSphere change data capture offers us new functionality to meet our growing business demand for real-time reporting. It has helped us address business impact challenges of timeliness, completeness and correctness.”

Chris and his business were able to improve customer service by providing notifications of changes to their customers in real-time, such as when large transactions occur, or when balances slip below a predetermined level. Core transactional data is also available for other use cases which are expected to be developed over time.

To find out more about how IBM can perform bulk data movement, read this blog post on DataStage multi-cloud capabilities and for cases where simple replication (without complex transformations) will suffice, learn about IBM Data Replication.

Was this article helpful?
YesNo

More from Analytics

Announcing Control-M integration with IBM Databand for holistic data observability

2 min read - IBM® Databand® is designed to support the hybrid and multicloud data landscape and work with any orchestration, data integration or workflow automation tool. In the quest to bring all your monitoring data under one roof, Databand enables tighter integration with cloud and on-prem applications. Last time, we announced the Databand integration with Azure ADF, and this time it’s the integration with BMC Control-M. IBM Databand acts as a magnifying glass for your Control-M workflows, providing a more comprehensive understanding of…

IBM acquires StreamSets, a leading real-time data integration company

3 min read - We are thrilled to announce that IBM has acquired StreamSets, a real-time data integration company specializing in streaming structured, unstructured and semistructured data across hybrid multicloud environments. Acquired from Software AG along with webMethods, this strategic acquisition expands IBM's already robust data integration capabilities, helping to solidify our position as a leader in the data integration market and enhancing IBM Data Fabric’s delivery of secure, high-quality data for artificial intelligence (AI).  According to a Forrester study conducted on behalf of…

Fine-tune your data lineage tracking with descriptive lineage

4 min read - Data lineage is the discipline of understanding how data flows through your organization: where it comes from, where it goes, and what happens to it along the way. Often used in support of regulatory compliance, data governance and technical impact analysis, data lineage answers these questions and more.  Whenever anyone talks about data lineage and how to achieve it, the spotlight tends to shine on automation. This is expected, as automating the process of calculating and establishing lineage is crucial to…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters