My IBM

What is data governance?

20 September 2024

Authors

What is data governance?

Data governance is the data management discipline that focuses on the quality, security and availability of an organization’s data. Data governance helps ensure data integrity and data security by defining and implementing policies, standards and procedures for data collection, ownership, storage, processing and use.

The goal of data governance is to maintain safe, high-quality data that is easily accessible for data discovery and business intelligence initiatives. Acting rather like an air traffic control hub, the data governance function helps ensure that verified data flows through secured pipelines to trusted endpoints and users.

Artificial intelligence (AI), big data and digital transformation efforts are the primary drivers of data governance programs. As the volume of data increases from new data sources, such as Internet of Things (IoT) technologies, organizations need to reconsider their data management practices to scale their business intelligence (BI) efforts.

Data governance programs can help organizations protect and manage large amounts of data by improving data quality, reducing data silos, enforcing compliance and security policies and distributing data access appropriately.

Data governance vs. data management

Data governance is a subset of data management, which is the overarching practice of collecting, processing and using data securely and efficiently to support strategic decision-making and improve business outcomes.

While data management includes data governance, it also includes other areas of the data lifecycle, such as data processing, data storage and data security. Moreover, the various aspects of the data management process all influence one another.

Because these other areas of data management can impact data governance, various teams must work together to design and follow a data governance strategy.

For example, a data governance team might identify commonalities across disparate data sets. If they want to integrate that data, they’ll usually work with a data management team to define the data model and data architecture to facilitate those linkages.

Another example is data access, where a data governance team might set the policies concerning access to specific types of data, such as personally identifiable information (PII). Then, a data management team will provide that access directly or create the mechanism to provide that access, often through role-based access control (RBAC).

The latest AI News + Insights  

Discover expertly curated insights and news on AI, cloud and more in the weekly Think Newsletter.

Subscribe today

Data governance frameworks

A data governance framework details an organization’s structures and processes for managing critical data assets. It defines data ownership and responsibilities and specifies how data should be handled to maintain data quality, security and compliance.

There is no one-size-fits-all framework, as frameworks are typically tailored roadmaps for a particular organization’s unique data systems, data sources, industry protocols and government regulations.

Data governance frameworks commonly address items such as:

Program goals, roles and duties
Data standards, policies and processes
Auditing procedures
Data governance tools

Program goals, roles and duties

Data governance programs typically define a specific goal or set of goals, such as enhancing data quality, supporting compliance or enabling data-driven decision-making. They also select metrics to measure progress toward these goals. Key governance metrics might include:

Reductions in data errors and redundancy.
Cost reductions from greater efficiency and faster time-to-market.
Data consistency and completeness.
The level of data literacy and process compliance of employees.

Governance programs also define the roles and responsibilities of all involved: steering committee, data owners, data stewards and stakeholders.

Members of the steering committee—or governance council—oversee the data governance strategy and the overall direction of the framework. This committee often includes high-level executives, such as C-level management or VPs, who oversee various departments.
Data owners oversee the specific data domains across business units. These individuals are responsible for maintaining data accuracy, quality and consistency, and they offer input on data governance solutions, data policies and regulatory requirements.
Data stewards handle the daily management of specific data domains.
Stakeholders and business teams are the consumers of enterprise data.

Data standards, policies and processes

Governance frameworks set parameters around the data to be governed and the wanted outcomes. This includes setting guidelines for data formats, data models, master data management (MDM), metadata, naming conventions and more.

Governance frameworks often map data flows and define how data will be collected, stored, moved and archived. They might also identify the hardware, software and services that will support governance efforts and the organization's broader data architecture.

Some governance frameworks might define data scopes, which are access parameters for specific data assets, such as master data, metadata and historical data. A data scope can help ensure that users and apps only have access to the data they need and no one has access to data they shouldn't.

Auditing procedures

Governance frameworks outline testing, auditing and record-keeping procedures to maintain the governance program's transparency and explainability.

Regular audits can help verify that users are complying with the data governance framework. Audits can also help identify ways that the governance program must evolve to account for new data, processes or technologies.

Finally, audits can also help organizations achieve—and prove—regulatory compliance.

Data governance tools

Technology plays an important role in effective data governance. Enterprise data governance tools can vary from comprehensive platforms to specialized point solutions. Organizations choose different tools depending on their unique data architectures and governance frameworks.

Common capabilities of data governance solutions include:

Automatically discover and classify data.
Enforce data protection rules and role-based access controls.
Address privacy and compliance requirements.
Automation of metadata management, data cataloging and data lineage tracking.
Support for a business glossary.

Data governance solutions can handle various data formats. Some offer visualization capabilities to enhance the understanding of complex data sets and relationships, making it easier to identify trends, outliers and areas that require attention.

Mixture of Experts | 28 March, episode 48

Decoding AI: Weekly News Roundup

Join our world-class panel of engineers, researchers, product leaders and more as they cut through the AI noise to bring you the latest in AI news and insights.

Watch the latest podcast episodes

Benefits of data governance

Implementing a strong data governance framework can help organizations realize a wide variety of benefits:

Get more value from enterprise data
Promote innovation and efficiency
Provide a single source of truth (SSOT)
Help ensure data privacy, security and compliance
Securely use data for AI initiatives
Enable more accurate data analytics

Get more value from enterprise data

Organizations cannot make effective business decisions if those decisions are based on flawed data. Data governance can help ensure data integrity, accuracy, completeness and consistency through the creation of a framework that supports robust data stewardship a strong end-to-end data management process.

Trustworthy data helps organizations discover new opportunities, better understand their customers and workflows and optimize overall business performance.

A lack of data governance might lead to errors in performance metrics that steer an organization in the wrong direction, while data governance tools can help address inaccuracies before they influence business strategy.

For example, data lineage tools can help data owners trace data throughout its lifecycle, including any transformations the data experience during extract, transform, load (ETL) or extract, load, transform (ELT) processes. This enables organizations to identify and remedy the root causes of data errors.

Promote innovation and efficiency

When data access is restricted across an organization, it can limit innovation, create dependencies on subject matter experts (SMEs) and slow business processes.

Data governance programs distribute data access appropriately, giving each department or individual access only to the data they need. This enables cross-functional teams to work together more closely and efficiently while keeping data safe.

Provide a single source of truth (SSOT)

A properly governed data system can provide a single source of truth across an entire organization. Decision-making can be improved when all parties are working with the same data sets.

Centralizing data definitions and metadata in a single data catalog can help reduce confusion and inefficiencies. This documentation, in turn, becomes the foundation for self-service solutions that enable consistent data and data access across the organization.

Help ensure data privacy, security and compliance

Data governance policies often include operations to more easily meet government regulations regarding sensitive data and privacy, such as the EU’s General Data Protection Regulation (GDPR), the US Health Insurance Portability and Accountability Act (HIPAA) and industry requirements such as the Payment Card Industry Data Security Standards (PCI DSS). Violations of these regulatory requirements might result in costly government fines and public backlash.

Data governance tools help organizations set guardrails that can prevent data breaches, leaks and misuse. Governance frameworks help build data systems that are clear, explainable, fair and inclusive. In turn, these data systems safeguard privacy and security and maintain customer loyalty and trust.

Securely use data for AI initiatives

In an IDC survey, only 45.3% of respondents said they had "rules, policies and processes to enforce their responsible AI principles" to protect against security breaches, liability concerns, exposed customer data and regulatory risk.¹

Data governance involves understanding the origin, sensitivity and lifecycle of all the data that an organization uses. This is the foundation for any AI governance practice and is crucial in mitigating various enterprise risks.

Data governance helps organizations bring high-quality data to AI and ML initiatives while protecting that data and complying with relevant rules and regulations. For example, governance tools can help ensure that sensitive personal data is not fed to an AI when it shouldn't be.

Enable more accurate data analytics

Having the right data is the foundation for advanced data analytics and data science initiatives. Carefully governed data enables valuable initiatives such as business intelligence reporting or more complex predictive machine learning (ML) projects.

For example, properly profiling data—reviewing and cleansing data to better understand how it is structured—can help make better sense of the relationship between different data sets and sources.

Challenges of data governance

Data governance initiatives can face many hurdles in implementation. Some of these challenges include:

Lack of appropriate sponsorship
Inconsistent data architecture
Data visibility and control
Increased demand for access
AI data requirements

Lack of appropriate sponsorship

Effective data governance programs generally require sponsorship at two levels: executives and individual contributors. Chief data officers (CDOs) and data stewards are critical in the communication and prioritization of data governance within an organization.

The CDOs can provide oversight and enforce accountability across data teams to help ensure that data governance policies are adopted. Data stewards can help promote awareness of these policies to data producers and consumers to encourage compliance across the organization.

Without appropriate sponsorship, data users might be unaware of, or unconcerned with, governance policies. This can result in non-compliance, poor data integrity and compromised data security.

Inconsistent data architecture

Without the correct tools and data architecture, organizations might struggle to deploy an effective data governance program.

For example, teams might discover redundant data across different functions. To enable effective governance, data architects need to develop appropriate data models and data architectures to merge and integrate data across storage systems.

Teams might also need to adopt a data catalog to create an inventory of data assets across an organization. Or if they already have one, they might need to create a process for metadata management, which helps ensure that the underlying data is relevant and up-to-date.

Data visibility and control

Data governance, especially in hybrid and multicloud environments, often involves data stored in multiple formats across multiple providers and locations. Moreover, data might reside in different types of data stores, such as data lakes, data lakehouses and data warehouses.

Shadow IT can throw an additional wrench into the process. In a TechTarget study, the second-most common data security challenge reported was that employees were signing up for cloud applications and services without IT approval.²

This distribution of data can make it difficult to track and monitor data flows and data usage. Data governance requires a clear understanding of data sources, destinations, transformations, dependencies, ownership, access rights and responsibilities.

Enforcing data governance policies across multiple environments might require coordination among different stakeholders, such as data owners, data stewards, data consumers and data regulators.

Increased demand for access

The rise of self-service analytics and business intelligence presents data governance with new challenges.

Access requests from more users are coming in faster than before, but governance teams need to balance speed and accessibility with privacy and security concerns. Furthermore, streaming data systems and procedures must be finely tuned to avoid data leakage.

AI data requirements

When providing the data that powers AI training and operations, many data storage and governance tools fall short.

After all, AI is inherently more complex than standard IT-driven processes and capabilities—raising the importance of active and informed data governance. A KPMG report highlights the AI governance gap as one of the top risks currently threatening businesses.³ For example, without appropriate guardrails in place, AI might inadvertently expose sensitive PII or corporate secrets.

To reduce AI risks and complexities, organizations can combine AI-optimized data storage capabilities with data governance programs devised with AI in mind.

Data governance best practices

Planning and creation of a data governance framework takes time and effort across multiple stakeholders and teams. Common practices that organizations use when implementing governance programs include:

Automate for greater efficiency
Balance convenience and data safety
Build a data catalog
Use maturity models
Continually monitor and improve

Automate for greater efficiency

Automating certain parts of the data governance process can help improve efficiency and reduce errors. Data governance and management tools can help automate routine tasks such as:

Constructing data lineages to help visualize the flow of data through systems without the need for complex hand-coded solutions.
Propagating policies to assign metadata tags to specific data elements—such as columns or tables—to identify them as sensitive information.
Generating audit logs to record data interactions to understand how employees use data.
Data classification is based on predefined categories, such as PII, financial data, intellectual property or confidential information.

Balance convenience and data safety

Strong data security and access controls are fundamental to any data governance framework. At the same time, organizations want data access to be as frictionless as possible for those with the authorization to see and use specific data sets. Without this easy access to self-service information, collaboration and new insights are hampered.

Build a data catalog

Many organizations struggle to manage their data due to a lack of visibility. A central data catalog can operate as the single source of truth, enabling data integration and governance initiatives.

According to a Gartner report, demand for data catalogs is rising as organizations struggle with finding, inventorying and analyzing distributed and diverse data assets. With a robust data catalog, organizations can more easily locate and classify information at scale, allowing for better enforcement of data governance policies.

Use maturity models

Many organizations find it helpful to create a clear governance roadmap. Maturity models can provide this roadmap.

A data governance maturity model is a tool that helps organizations assess the current state of their data governance program, set goals and track progress over time.

Continually monitor and improve

Organizations can establish regular assessment and reporting mechanisms to monitor data and governance metrics over time. These assessments can help the organization identify issues and make improvements to governance processes.

Regularly reviewing the framework and adjusting it based on feedback, new regulations or changes in business strategy helps the framework stay relevant and effective.

Additionally, assessments can foster a culture that values data as a strategic asset, supporting effective business intelligence and data use across the organization.

Footnotes

¹ IDC MarketScape: Worldwide AI Governance Platforms 2023 Vendor Assessment, IDC, 2023.

² The Need for Data Compliance in Today’s Cloud Era, Enterprise Strategy Group by TechTarget, April 2023. (PDF, 867 KB).

³ Top risk forecast, KPMG, 2024.

Data governance for data leaders

Explore topics related to data governance and privacy such as scalability, enterprise-wide standards and data lineage.

Resources

2024 Gartner® Magic Quadrant™ for Data Integration Tools

IBM named a Leader for the 19th year in a row in the 2024 Gartner® Magic Quadrant™ for Data Integration Tools.

The data differentiator

Explore the data leader’s guide to building a data-driven organization and driving business advantage.

The hybrid, open data lakehouse for AI

Simplify data access and automate data governance. Discover the power of integrating a data lakehouse strategy into your data architecture, including cost-optimizing your workloads and scaling AI and analytics, with all your data, anywhere.

Managing data for AI and analytics at scale

Learn how an open data lakehouse approach can provide trustworthy data and faster analytics and AI projects execution.

Navigating governance, risk management and compliance in modern business

Explore the vital synergy of governance, risk and compliance (GRC) in modern business operations.

Learning Path: Enforcing data governance

Gain an introduction to the data fabric topic as well as guidance on enforcing data governance and security for shared data between applications.

What is data governance?

20 September 2024

Authors

Jim Holdsworth

Matthew Kosinski

What is data governance?

Data governance vs. data management

The latest AI News + Insights

Data governance frameworks

Program goals, roles and duties

Data standards, policies and processes

Auditing procedures

Data governance tools

Decoding AI: Weekly News Roundup

Benefits of data governance

Get more value from enterprise data

Promote innovation and efficiency

Provide a single source of truth (SSOT)

Help ensure data privacy, security and compliance

Securely use data for AI initiatives

Enable more accurate data analytics

Challenges of data governance

Lack of appropriate sponsorship

Inconsistent data architecture

Data visibility and control

Increased demand for access

AI data requirements

Data governance best practices

Automate for greater efficiency

Balance convenience and data safety

Build a data catalog

Use maturity models

Continually monitor and improve

Footnotes

Resources

Related solutions