Data management is the practice of ingesting, processing, securing and storing an organization’s data, where it is then utilized for strategic decision-making to improve business outcomes. Over the last decade, developments within hybrid cloud, artificial intelligence, the Internet of Things (IoT), and edge computing have led to the exponential growth of big data, creating even more complexity for enterprises to manage. As a result, a data management discipline within an organization has become an increasing priority as this growth has created significant challenges, such as data silos, security risks, and general bottlenecks to decision-making. Teams address these challenges head on with a number of data management solutions, which are aimed to clean, unify, and secure data. This, in turn, allows leaders to glean insights through dashboards and other data visualization tools, enabling informed business decisions. It also empowers data science teams to investigate more complex questions, allowing them to leverage more advanced analytical capabilities, like machine learning, for proof-of-concept projects. If they’re successful at delivering and improving against business outcomes, they can partner with relevant teams to scale those learnings across their organization through automation practices.
Data management vs. master data management
While data management refers to a whole discipline, master data management is more specific in its scope as it focuses on transactional data—i.e. sales records. Sales data typically includes customer, seller, and product information. This type of data enables businesses to determine their most successful products and markets and their highest valued customers. Since master data is inclusive of personally identifiable information (PII), it also conforms to stricter regulations, such as GDPR.
The scope of a data management discipline is quite broad, and a strong data management strategy typically implements the following components to streamline their strategy and operations throughout an organization:
Data processing: Within this stage of the data management lifecycle, raw data is ingested from a range of data sources, such as web APIs, mobile apps, Internet of Things (IoT) devices, forms, surveys, and more. It is, then, usually processed or loaded, via data integration techniques, such as extract, transform, load (ETL) or extract, load, transform (ELT). While ETL has historically been the standard method to integrate and organize data across different datasets, ELT has been growing in popularity with the emergence of cloud data platforms and the increasing demand for real-time data. Independently of the data integration technique used, the data is usually filtered, merged, or aggregated during the data processing stage to meet the requirements for its intended purpose, which can range from a business intelligence dashboard to a predictive machine learning algorithm.
Data storage: While data can be stored before or after data processing, the type of data and purpose of it will usually dictate the storage repository that is leveraged. For example, data warehousing requires a defined schema to meet specific data analytics requirements for data outputs, such as dashboards, data visualizations, and other business intelligence tasks. These data requirements are usually directed and documented by business users in partnership with data engineers, who will ultimately execute against the defined data model. The underlying structure of a data warehouse is typically organized as a relational system (i.e. in a structured data format), sourcing data from transactional databases. However, other storage systems, such as data lakes, incorporate data from both relational and non-relational systems, becoming a sandbox for innovative data projects. Data lakes benefit data scientists in particular, as they allow them to incorporate both structured and unstructured data into their data science projects.
Data governance: Data governance is a set of standards and business processes which ensure that data assets are leveraged effectively within an organization. This generally includes processes around data quality, data access, usability, and data security. For instance, data governance councils tend align on taxonomies to ensure that metadata is added consistently across various data sources. This taxonomy should also be further documented via a data catalog to make data more accessible to users, facilitating data democratization across organizations. Data governance teams also help to define roles and responsibilities to ensure that data access is provided appropriately; this is particularly important to maintain data privacy.
Data security: Data security sets guardrails in place to protect digital information from unauthorized access, corruption, or theft. As digital technology becomes an increasing part of our lives, more scrutiny is placed upon the security practices of modern businesses to ensure that customer data is protected from cybercriminals or disaster recovery incidents. While data loss can be devastating to any business, data breaches, in particular, can reap costly consequences from both a financial and brand standpoint. Data security teams can better secure their data by leveraging encryption and data masking within their data security strategy.
While data processing, data storage, data governance and data security are all part of data management, the success of any of these components hinges on a company’s data architecture or technology stack. A company’s data infrastructure creates a pipeline for data to be acquired, processed, stored and accessed, and this is done by integrating these systems together. Data services and APIs pull together data from legacy systems, data lakes, data warehouses, sql databases, and apps, providing a holistic view into business performance.
Each of these components in the data management space are undergoing a vast amount of change right now. For example, the shift from on-premise system to cloud platforms are one of the most disruptive technologies in the space right now. Unlike on-premise deployments, cloud storage providers allow users to spin up large clusters as needed, only requiring payment for the storage specified. This means that if you need additional compute power to run a job in a few hours vs. a few days, you can easily do this on a cloud platform by purchasing additional compute nodes.
This shift to cloud data platforms is also facilitating the adoption of streaming data processing. Tools, like Apache Kafka, allow for more real-time data processing, enabling consumers to subscribe to topics to receive data in a matter of seconds. However, batch processing still has its advantages as it’s more efficient at processing large volumes of data. While batch processing abides by a set schedule, such as daily, weekly, or monthly, it is ideal for business performance dashboards which typically do not require real-time data.
Change only continues to accelerate in this space. More recently, data fabrics have emerged to assist with the complexity of managing these data systems. Data fabrics leverage intelligent and automated systems to facilitate end-to-end integration of various data pipelines and cloud environments. As new technology like this develops, we can expect that business leaders will gain a more holistic view of business performance as it will integrate data across functions. The unification of data across human resources, marketing, sales, supply chain, et cetera can only give leaders a better understanding of their customer.
Organizations experience a number of benefits when launching and maintaining data management initiatives:
Reduced data silos: Most, if not all, companies experience data silos within their organization. Different data management tools and frameworks, such as data fabrics and data lakes, help to eliminate data silos and dependencies on data owners. For instance, data fabrics assist in revealing potential integrations across disparate datasets across functions, such as human resources, marketing, sales, et cetera. Data lakes, on the other hand, ingest raw data from those same functions, removing dependencies and eliminating single owners to a given dataset.
Improved compliance and security: Governance councils assist in placing guardrails to protect businesses from fines and negative publicity that can occur due to noncompliance to government regulations and policies. Missteps here can be costly from both a brand and financial perspective.
Enhanced customer experience: While this benefit will not be immediately seen, successful proof of concepts can improve the overall user experience, enabling teams to better understand and personalize the customer journey through more holistic analyses.
Scalability: Data management can help businesses scale but this largely depends on the technology and processes in place. For example, cloud platforms allow for more flexibility, enabling data owners to scale up or scale down compute power as needed. Additionally, governance councils can help to ensure that defined taxonomies are adopted as a company grows in size.
Learn more about the IBM® Db2® family of products that span operational and warehousing solutions.
Discover the value of deploying Db2 on the cloud-native IBM Cloud Pak® for Data platform.
Explore IBM’s open source partnerships with MongoDB, EDB Postgres, DataStax and Cloudera.
Read the free 451 Research report to learn how data management on a unified platform for data, analytics and AI can accelerate time to insights.
Learn the best practices to ensure data quality, accessibility, and security as a foundation to an AI-centric data architecture (4.5 MB)
IBM research is regularly integrated into new features for IBM Cloud Pak for Data