A data architecture describes how data is managed, from collection through to transformation, distribution and consumption. It sets the blueprint for data and the way that it flows through data storage systems. It is foundational to data processing operations and artificial intelligence (AI) applications.
The design of a data architecture should be driven by business requirements, which data architects and data engineers use to define the respective data model and underlying data structures, which support it. These designs typically facilitate a business need, such as a reporting or data science initiative.
As new data sources emerge through emerging technologies, such as the Internet of Things (IoT), a good data architecture helps ensure that data is manageable and useful, supporting data lifecycle management. More specifically, it can avoid redundant data storage, improve data quality through cleansing and deduplication and enable new applications.
Modern data architectures also provide mechanisms to integrate data across domains, such as between departments or geographies, breaking down data silos without the huge complexity that comes with storing everything in one place.
Modern data architectures often use cloud platforms to manage and process data. While it can be more costly, its compute scalability enables important data processing tasks to be completed rapidly. The storage scalability also helps to cope with rising data volumes, and to help ensure all relevant data is available to improve the quality of training AI applications.
Learn the building blocks and best practices to help your teams accelerate responsible AI.
Register for the ebook on generative AI
The data architecture documentation includes 3 types of data model:
A data architecture can draw from popular enterprise architecture frameworks, including TOGAF, DAMA-DMBOK 2 and the Zachman Framework for Enterprise Architecture.
The Open Group Architecture Framework (TOGAF)
This enterprise architecture methodology was developed in 1995 by The Open Group, of which IBM is a Platinum Member.
There are 4 pillars to the architecture:
As such, TOGAF provides a complete framework for designing and implementing an enterprise’s IT architecture, including its data architecture.
DAMA-DMBOK 2
DAMA International, originally founded as the Data Management Association International, is a not-for-profit organization dedicated to advancing data and information management. Its Data Management Body of Knowledge, DAMA-DMBOK 2, covers data architecture, governance and ethics, data modelling and design, storage, security and integration.
Zachman Framework for Enterprise Architecture
Originally developed by John Zachman at IBM in 1987, this framework uses a matrix of 6 layers from contextual to detailed, mapped against 6 questions such as why, how and what. It provides a formal way to organize and analyze data but does not include methods for doing so.
A data architecture demonstrates a high-level perspective of how different data management systems work together. These are inclusive of various data storage repositories, such as data lakes, data warehouses, data marts, databases and more. Together, these can create data architectures, such as data fabrics and data meshes, which are increasingly growing in popularity. These architectures place more focus on data as products, creating more standardization around metadata and more democratization of data across organizations via APIs.
The next section delves deeper into each of these storage components and data architecture types:
Types of data management systems
Types of data architectures
Data fabrics: A data fabric is an architecture, which focuses on the automation of data integration, data engineering and governance in a data value chain between data providers and data consumers. A data fabric is based on the notion of “active metadata” that uses knowledge graph, semantics, data mining and machine learning (ML) technology to discover patterns in various types of metadata (for example system logs, social and more). Then, it applies this insight to automate and orchestrate the data value chain. For example, it can enable a data consumer to find a data product and then have that data product provisioned to them automatically. The increased data access between data products and data consumers leads to a reduction in data siloes and provides a more complete picture of the organization’s data. Data fabrics are an emerging technology with enormous potential and they can be used to enhance customer profiling, fraud detection and preventive maintenance. According to Gartner, data fabrics reduce integration design time by 30%, deployment time by 30% and maintenance by 70%.
Data meshes: A data mesh is a decentralized data architecture that organizes data by business domain. Using a data mesh, the organization needs to stop thinking of data as a by-product of a process and start thinking of it as a product in its own right. Data producers act as data product owners. As subject matter experts, data producers can use their understanding of the data’s primary consumers to design APIs for them. These APIs can also be accessed from other parts of the organization, providing broader access to managed data.
More traditional storage systems such as data lakes and data warehouses can be used as multiple decentralized data repositories to realize a data mesh. A data mesh can also work with a data fabric, with the data fabric’s automation enabling new data products to be created more quickly or enforcing global governance.
Well-constructed data architecture can offer businesses several key benefits, which include:
As organizations build their roadmap for tomorrow’s applications, including AI, blockchain and Internet of Things (IoT) workloads, they need a modern data architecture that can support the data requirements.
The top 7 characteristics of a modern data architecture are:
IBM Cloud Pak for Data is an open, extensible data platform that provides a data fabric to make all data available for AI and analytics, on any cloud.
Build, run and manage AI models. Prepare data and build models on any cloud using open source code or visual modeling. Predict and optimize your outcomes.
Learn about Db2 on Cloud, a fully managed SQL cloud database configured and optimized for robust performance.
Read the smart paper on how to create a robust data foundation for AI by focusing on 3 key data management areas: access, governance privacy and compliance.
Data fabric can help businesses investing in AI, machine learning, Internet of Things and edge computing get more value from their data.