What are the core capabilities of a data fabric architecture?
The 6 core capabilities of a data fabric
To simplify data access and empower users to leverage trusted information, organizations need a better approach that provides better insights and business outcomes faster, without sacrificing data access controls. There are many different approaches, but you’ll want an architecture that can be used regardless of your data estate. A data fabric is an architectural approach that enables organizations to simplify data access and data governance across a hybrid multicloud landscape for better 360-degree views of the customer and enhanced MLOps and trustworthy AI. In other words, the obstacles of data access, data integration and data protection are minimized, rendering maximum flexibility to the end users.
With this approach, organizations don’t have to move all their data to a single location or data store, nor do they have to take a completely decentralized approach. Instead, a data fabric architecture implies a balance between what needs to be logically or physically decentralized and what needs to be centralized.
Thanks to that balance, there is no limitation to the number of purpose-fit data stores that can participate in the data fabric ecosystem. This means you get a global data catalog that serves as an abstraction layer, single source of truth and single point of data access with infused governance.
Six core capabilities are essential for a data fabric architecture:
- A knowledge catalog: This abstraction layer provides a common business understanding of the data for 360-degree customer views, which allows for transparency and collaboration. The knowledge catalog serves as a library with insights about your data. To help you understand your data, the catalog contains a business glossary, taxonomies, data assets (data products) with relevant information like quality scores, business terms associated with each data elements, data owners, activity information, related assets and more.
- Automated data enrichment: To create the knowledge catalog, you need automated data stewardship services. These services include the ability to auto-discover and classify data, to detect sensitive information, to analyze data quality, to link business terms to technical metadata and to publish data to the knowledge catalog. To deal with such a large volume of data within the enterprise, automated data enrichment requires intelligent services driven by machine learning.
- Self-service governed data access: These services enable users to easily find, understand, manipulate and use the data with key governance capabilities like data profiling, data preview, adding tags and annotations to datasets, collaborate in projects and access data anywhere using SQL interfaces or APIs.
- Smart integration: Data integration capabilities are crucial to extract, ingest, stream, virtualize and transform data regardless of where it’s located. Using data policies designed to simultaneously maximize performance and minimize storage and egress costs, smart integration helps ensure that data privacy. Protection is applied on each data pipeline.
- Data governance, security, and compliance: With a data fabric, there’s a unified and centralized way to create policies and rules. The ability to automatically link these policies and rules to the various data assets through metadata, such as data classifications, business terms, user groups, roles and more are easily accessible. These policies and rules, which include data access controls, data privacy, data protection and data quality, can then be applied and enforced in large scale across all the data during data access or data movement.
- Unified lifecycle: End-to-end lifecycle to composes, builds, tests, deploys, orchestrates, observes and manages the various aspects of the data fabric, such as a data pipeline, in a unified experience using MLOps and AI.
These six crucial capabilities of a data fabric architecture enable data citizens to use data with greater trust and confidence. Irrespective of what that data is, or where it resides — whether in a traditional datacenter or a hybrid cloud environment, in a conventional database or Hadoop, object store or elsewhere — the data fabric architecture provides a simple and integrated approach for data access and use, empowering users with self-service and enabling enterprises to use data to maximize their value chain.