The relationship between the transactional and analytical data was always a bit tensed. One maintained respectful distance from other, usually communicating through their common friend – data pipeline. It was going steady for at least two decades. But the uneasiness is growing now. And why so? Firstly, data pipelines can be slow. By the time the transactional data arrives at a data platform, like data warehouse or data lake, it is already stale. Thus, the insight generated may not be relevant anymore. In the current era of agile business, real-time insight matters. Secondly, it is a one-way route. The value travels from the transactional to the analytical world, but not vice versa. It is important that the feedback loop is established so that the generated insight can contribute to the transactional world directly.
None of the traditional data platform architectures, such as data warehouse or data lake, tried to bring these two worlds closer. They were built for the analytical world. Even the new data lakehouse architecture will have the same lacuna. But how about data fabric?
Gartner (link resides outside ibm.com) clearly mentions that “Data fabric covers both transactional and analytical systems”. Our definition of data fabric is completely aligned to it: “A loosely coupled collection of distributed services, which enables the right data to be made available in the right shape, at the right time and place, from heterogeneous sources of transactional and analytical natures, across any cloud and on-premises platforms, usually via self-service, while meeting non-functional requirements including cost effectiveness, performance, governance, security and compliance.”
But traditionally that was never done in data platforms. It was always exclusive to the analytical hemisphere of data world. How do we address this? How are we going to bring transactional systems under the new data platform architecture called data fabric?
Two major technological trends are impacting enterprises across the world. First is cloud transformation. Many enterprises have moved the infrastructure and non-business-critical applications to cloud, reaping immense financial benefit. But that was just beginning. Now, the business-critical applications are being modernized and made ready to be moved to cloud. But in this trend, the focus was always on application and infrastructure. The data often came as a second thought, or at times was ignored.
The second trend is around data centricity. Here the enterprises are trying to transform themselves into a state where their operations and processes will be run based on data. The raw material of enterprise data is being refined and transformed into information and then insights. The idea is these insights will drive the business decisions, resulting in business benefits positively impacting both top and bottom line of the balance sheet.
But these two trends are residing in two parallel universes. The first one is often a CIO agenda, whereas the second one often is a business agenda led by multiple CXOs. As a result, they do not meet each other. A huge opportunity is lost.
The question is can we weave them together under a holistic organization goal? Answer is yes. And data fabric is expected to play a key role there. Let us see how.
In current stage of cloud transformation, as the business-critical applications are being modernized, a new opportunity arises. Almost all applications are associated with one or multiple databases. Like the applications, these databases are old and in severe need of modernization. Unless they are also modernized, full benefit cannot be reaped. Yet often they are ignored because of the fear of opening new can of worms. It is important to address that along with applications.
In modernizing these databases, multiple actions can be taken:
This new reformed database can expose its data through API, virtualization, messaging and other mechanism. Each of these exposures can be published as a ‘data asset’ or ‘data product’ in the marketplace of data fabric. Discovery and consumption will follow naturally.
Through an example, let us look at how both transactional and analytical systems can participate in data fabric.
Let us consider a retail organization. For the initial scope they considered three business-critical transactional systems.
Similarly on analytics side, apart from Einstein mentioned above, let us include few more data & AI platforms in this discussion.
While the basic systems were serving the purpose well, over the last few years, the enterprise observed they were lacking differentiation in the market. Their innovation was not best in the market. And a primary reason was lack of discoverable, trustable, and consumable data. Most of their data integration are point-to-point. Since discoverability was an issue, there were many duplications of effort to integrate and process the same data.
While in recent times the CIO had started a program for the application modernization, the data area was not covered there. To resolve this situation, the CIO and CDO jointly sponsored a data fabric program. An enterprise data marketplace was developed, where all participating transactional and analytical applications were supposed to publish their ‘data products’. Initially the above-mentioned six systems were earmarked for the data fabric participation. Let us see how these systems would go through transformation for this participation.
T1, an old and monolithic application, is considered one of the first applications to be modernized. Microservices based architecture was adopted. The large Sybase database was broken into multiple databases. The master and reference data are mostly stored in Azure Cosmos DB (link resides outside ibm.com). The transactional data was stored in SQL Server. The microservices were exposed as APIs for the consumption of different channels. Same APIs (e.g., ‘A’, ‘C’ in the diagram above) are also published. It also published the raw sales data as a file product ‘D’ in the marketplace.
T2, being an ERP, remained AS-IS. However, it started to publish the periodic accounts data as a file (product ‘E’ in the diagram) through the data marketplace. A2 ingested those files from marketplace.
T3 started to publish real-time customer data changes through data streaming. Those events are published in marketplace as product ‘F’. T1 subscribed those events to reflect the latest customer data in real time. At the same time, from the repository of Einstein, the file extracts of Salesforce CRM are published as product ‘G’.
A1 consumed raw sales data (‘D’) and raw customer data (‘G’). It produced conformed customer data and conformed sales data and published them as virtualized objects ‘H’ and ‘I’ in the marketplace respectively.
A2 ingested ‘I’ and ‘E’ and produced reconciled accounts as file product ‘J’.
In A3, a new AI model for personalized product recommendation, is developed. It consumed conformed customer and sales data, reconciled accounts, and real time customer updates. The trained inference model is deployed as an API ‘K’. T1 consumed ‘K’ to provide better personalized recommendations to the customers at storefront creating better sales.
As discussed above, data fabric opens up a new possibility in front of enterprise to bring their transactional and analytical data closer to each other. However, it is not just technological transformation, it also requires organizational and cultural shifts and changes. The application owners and data owners need to work together on a new operating model. The data needs to be thought of as a product as opposed to a piece of complex technology. If such changes are brought in, the enterprises can reap significant business benefits.
Build your data architecture
Learn more about data architecture