Data fabric is an architecture that facilitates the end-to-end integration of various data pipelines and cloud environments through the use of intelligent and automated systems.
Over the last decade, developments within hybrid cloud, artificial intelligence, the internet of things (IoT), and edge computing have led to the exponential growth of big data, creating even more complexity for enterprises to manage. This has made the unification and governance of data environments an increasing priority as this growth has created significant challenges, such as data silos, security risks, and general bottlenecks to decision making.
Data management teams are addressing these challenges head on with data fabric solutions. They are leveraging them to unify their disparate data systems, embed governance, strengthen security and privacy measures, and provide more data accessibility to workers, particularly their business users.
These data integration efforts via data fabrics allow for more holistic, data-centric decision-making. Historically, an enterprise may have had different data platforms aligned to specific lines of business. For example, you might have a HR data platform, a supply chain data platform, and a customer data platform, which house data in different and separate environments despite potential overlaps. However, a data fabric can allow decision-makers to view this data more cohesively to better understand the customer lifecycle, making connections between data that didn’t exist before.
By closing these gaps in understanding of customers, products and processes, data fabrics are accelerating digital transformation and automation initiatives across businesses.
Data virtualization is one of the technologies that enables a data fabric approach. Rather than physically moving the data from various on-premises and cloud sources using the standard ETL (extract, transform, load) processes, a data virtualization tool connects to the different sources, integrating only the metadata required and creating a virtual data layer. This allows users to leverage the source data in real-time.
See why IBM is recognized as a leader in the Gartner Magic Quadrant for Augmented Data Quality Solutions.
By leveraging data services and APIs, data fabrics pull together data from legacy systems, data lakes, data warehouses, sql databases, and apps, providing a holistic view into business performance. In contrast to these individual data storage systems, it aims to create more fluidity across data environments, attempting to counteract the problem of data gravity—i.e. the idea that data becomes more difficult to move as it grows in size. A data fabric abstracts away the technological complexities engaged for data movement, transformation and integration, making all data available across the enterprise.
Data fabric architectures operate around the idea of loosely coupling data in platforms with applications that need it. One example of data fabric architecture in a multi-cloud environment may look like the below, where one cloud, like AWS, manages data ingestion and another platform, such as Azure, oversees data transformation and consumption. Then, you might have a third vendor, like IBM Cloud Pak® for Data, providing analytical services. The data fabric architecture stitches these environments together to create a unified view of data.
That said, this is just one example. There isn’t one single data architecture for a data fabric as different businesses have different needs. The various number of cloud providers and data infrastructure implementations ensure variation across businesses. However, businesses utilizing this type of data framework exhibit commonalities across their architectures, which are unique to a data fabric. More specifically, they have six fundamental components, which Forrester (link resides outside ibm.com) describes in the “Enterprise Data Fabric Enables DataOps” report. These six layers include the following:
As data fabric providers gain more adoption from businesses in the market, Gartner (link resides outside ibm.com) has noted specific improvements in efficiency, touting that it can reduce “time for integration design by 30%, deployment by 30%, and maintenance by 70%.” While it’s clear that data fabrics can improve overall productivity, the following benefits have also demonstrated business value for adopters:
Data fabrics are still in their infancy in terms of adoption, but their data integration capabilities aid businesses in data discovery, allowing them to take on a variety of use cases. While the use cases that a data fabric can handle may not be extremely different from other data products, it differentiates itself by the scope and scale that it can handle as it eliminates data silos. By integrating across various data sources, companies and their data scientists can create a holistic view of their customers, which has been particularly helpful with banking clients. Data fabrics have been more specifically used for:
IBM Cloud Pak for Data is an open, extensible data platform that provides a data fabric to make all data available for AI and analytics, on any cloud.
Build, run and manage AI models. Prepare data and build models on any cloud using open source code or visual modeling. Predict and optimize your outcomes.
Learn about Db2 on Cloud, a fully managed SQL cloud database configured and optimized for robust performance.
Read the report on how your organization can move beyond the database—and create a data fabric where the right data is available to the right location and the right application at the right time.
Read the smartpaper on how to create a robust data foundation for AI by focusing on three key data management areas: access, governance, and privacy and compliance.
Listen to this webinar as IBM and Gartner discuss the future and importance of data fabric and its core components.
Data fabric can help businesses investing in AI, machine learning, Internet of Things, and edge computing get more value from their data.