April 15, 2020 By Virginie Grandhaye
Bharath Chari
3 min read

When people dream about becoming a baker or a pastry chef, they often think about the delicious pastries they’ll create, delighting their patrons with towering cakes wrapped in impossibly smooth fondant. But very rarely does anyone start off by thinking about the preparation involved in baking… Without being able to use freshly milled flour for baking, for example, you would actually never be able to eat a good piece of cake or a crusty loaf of bread. To produce those delicious pastries, a lot of preparation must happen before the actual baking process begins.

The same parallel can be made between AI and Data Integration. Let me explain:

The business challenge:

As an example, let’s examine a regional U.S. retailer who recently decided to modernize its supply chain management, including supply chain availability, fulfillment, and online cart. To accomplish this objective, the retailer decided to implement the Onera Decision Engine in Google Cloud Platform (GCP). The Onera Decision Engine is the cognitive operating system that harnesses AI to power the modern commerce supply chain, using advanced forms of cloud computing and machine learning technology to predict real-world behavior, generating nuanced insights and decisions. However, SaaS analytical systems won’t deliver maximum value if you can’t move, prepare, and deliver the right data in real time with high levels of throughput and performance. The retailer needed to publish 9 million messages per hour to the Google pub-sub messaging service during normal periods and 21 million messages per hour during peak periods.

That brings me to the point of accessing the data. Without Data, no AI….

Data everywhere…so where do you begin?

Data is typically spread across many systems:

  • Open data, on the Cloud.
  • Collected data (from social networks, or connected devices). Those can be on the public cloud, or local to your infrastructure.
  • Internal historical data of your company (customers data, historical orders…) Those are usually stored on a private Cloud, or on traditional storage systems, behind your firewall.

On top of this complexity, data can take many forms. They can be:

  • On traditional storage (Relational Databases).
  • Streamed data (for real time use cases)
  • Inside data lakes or data warehouses.
  • Inside corporate applications (like SAP).

This is a real issue: the most time-consuming task (80 percent) when driving an analytics project is “Collecting and organizing the data.”

Data Integration: Scalable data architecture for AI in the age of Covid-19

DataOps is an answer for shortening the cycle of making data available to the data scientist, where data integration capabilities (data transformation or extract, transform and load (ETL), data replication and data virtualization) play a vital role in providing access to high-quality data.

New SaaS analytical systems will not succeed without a robust and scalable data integration infrastructure for data movement, data integrationdata quality, and data governance that works across on premises and public and private clouds for AI. Similar to the fact that you will never bake a good piece of cake if you don’t have the right ingredients ahead of time

With the COVID-19 crisis, data integration is more critical than ever. With the defining moment we are all going through, companies need to think even more about their digital transformation – taking into account consumers’ behavioral changes, transforming their business models, and operationalizing AI and cloud first applications by transforming their infrastructure. Taking these  behavioral changes into account is a must for companies, should they wish to sustain and grow.

IBM DataStage: Deliver real time data for AI at scale and at high throughput

Going back to the example discussed: the US retailer attempted to implement their analytics and AI system using a vendor’s data integration product which was touted to be “built on cloud” but not able to exceed 1.2 million messages published to GCP per hour. In comparison, IBM’s multi-cloud data integration solution, IBM DataStage, is built on a massively parallel processing architecture was able to meet and exceed the client requirements. Moreover, IBM also demonstrated that the degree of parallel execution can be changed easily without having to make any changes to the job and can achieve higher rates of throughput simply by adding more hardware. The same job can publish 100 million records per hour or more just by running on additional cloud computing infrastructure. This DataStage example can be applied to many classes of SaaS analytical applications that require feeding decision engines in real time with unprecedented levels of throughput and performance.

If you are considering using AI in your business for other use-cases such as Enterprise Resource Planning (ERP), Customer Relationship Management (CRM), Human Resource Management (HRM) in retail, distribution, manufacturing and financial services, you should consider Data Integration as a mandatory step, to extract, load, transform, and deliver trusted data in real time for AI.

Read this Gartner report to find out how IBM is addressing these needs and has been positioned as a Leader in the Magic Quadrant for Data Integration tools for more than a decade.

Learn more about InfoSphere DataStage here.

Accelerate your journey to AI.

Was this article helpful?

More from Analytics

How the Recording Academy uses IBM watsonx to enhance the fan experience at the GRAMMYs®

3 min read - Through the GRAMMYs®, the Recording Academy® seeks to recognize excellence in the recording arts and sciences and ensure that music remains an indelible part of our culture. When the world’s top recording stars cross the red carpet at the 66th Annual GRAMMY Awards, IBM will be there once again. This year, the business challenge facing the GRAMMYs paralleled those of other iconic cultural sports and entertainment events: in today’s highly fragmented media landscape, creating cultural impact means driving captivating content…

How data stores and governance impact your AI initiatives

6 min read - Organizations with a firm grasp on how, where, and when to use artificial intelligence (AI) can take advantage of any number of AI-based capabilities such as: Content generation Task automation Code creation Large-scale classification Summarization of dense and/or complex documents Information extraction IT security optimization Be it healthcare, hospitality, finance, or manufacturing, the beneficial use cases of AI are virtually limitless in every industry. But the implementation of AI is only one piece of the puzzle. The tasks behind efficient,…

IBM and ESPN use AI models built with watsonx to transform fantasy football data into insight

4 min read - If you play fantasy football, you are no stranger to data-driven decision-making. Every week during football season, an estimated 60 million Americans pore over player statistics, point projections and trade proposals, looking for those elusive insights to guide their roster decisions and lead them to victory. But numbers only tell half the story. For the past seven years, ESPN has worked closely with IBM to help tell the whole tale. And this year, ESPN Fantasy Football is using AI models…

Data science vs data analytics: Unpacking the differences

5 min read - Though you may encounter the terms “data science” and “data analytics” being used interchangeably in conversations or online, they refer to two distinctly different concepts. Data science is an area of expertise that combines many disciplines such as mathematics, computer science, software engineering and statistics. It focuses on data collection and management of large-scale structured and unstructured data for various academic and business applications. Meanwhile, data analytics is the act of examining datasets to extract value and find answers to…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters