What is data stewardship?
25 November 2024
Authors
Alice Gomstyn IBM Content Contributor
Alexandra Jonker Editorial Content Lead
What is data stewardship?

Data stewardship is a collection of data management practices designed to help ensure high data quality and accessibility. Data stewardship programs typically operate in alignment with an organization’s data governance policies.
 

Data stewards are charged with running data stewardship programs. Specific data steward responsibilities include defining data quality metrics, managing metadata and reference data, tracing data lineage and classifying sensitive data.

Different technologies and tools can support data steward workflows, including artificial intelligence (AI), data catalogs, relational databases, data quality platforms and data governance software.

3D design of balls rolling on a track
The latest AI News + Insights 
 Expertly curated insights and news on AI, cloud and more in the weekly Think Newsletter. 
Why is data stewardship important?

Companies today are collecting and analyzing more data than ever in hopes of unlocking valuable insights. However, data collection and analytics on their own aren’t enough to secure successful outcomes. Data stewardship and data stewards can support and guide effective data use within a data-driven culture.

In recent years, with the increasing adoption of AI, data stewardship has taken on additional significance. AI systems consume and produce massive quantities of data. Data stewardship helps ensure the quality and integrity of that data so that AI-powered business processes are effective, compliant with government regulations and aligned with governance and ethical AI standards.

Good data stewardship programs enable successful data curation by improving data quality, accessibility, usability and security. Data stewards help ensure that employees can access useful and accurate business data to empower data-driven decision-making and AI-driven productivity gains. Additional benefits of data stewardship include more consistent data interpretation and improved audit readiness.

Data stewards often collaborate with a host of stakeholders—including data owners, data analysts, data science experts and general business users—to achieve these benefits.

Employees who are not formally recognized as “data stewards” might nonetheless have data stewardship responsibilities and devote significant time to meeting their organizations’ data needs such as inventorying data and evaluating data quality. However, some data management experts say that formalizing data stewardship roles is important as it indicates that a company is serious about data quality management.1

AI Academy
Is data management the secret to generative AI?

Explore why high-quality data is essential for the successful use of generative AI.

What is the difference between data stewardship and data governance?

Data governance and data stewardship are separate but related concepts. Companies’ data governance programs help ensure data integrity and data security through policies, standards and procedures for data collection, ownership, storage, processing and use. Many data stewardship responsibilities entail implementing rules outlined in data governance frameworks. As such, data stewardship can be considered “the operational aspect” of data governance.2

What are the different types of data stewards?

Companies with more mature data stewardship programs might have different types of data steward roles, including:

  • Business data steward: Business data stewards specialize in managing data within specific business functions, such as marketing or customer service.

  • Technical data steward: As their title suggests, technical data stewards possess technical expertise in data processes and systems, including extract, transform and load (ETL) processes and data warehouses.

  • Enterprise data steward: Enterprise data stewards lead communities of data stewards within organizations and serve as liaisons to other business leads.3
What are use cases for data stewardship?

Use cases for data stewardship include:

  • Master data management
  • Data quality improvement
  • Metadata management
  • Reference data management
  • Identity resolution
  • Information security and data privacy protection
  • Data lineage tracing
  • Business process risk management
Master data management

Data stewardship is often key to master data management (MDM), which is an approach to managing an organization's critical data through technology, tools and processes. Organizations use MDM to create a single source of truth that integrates data from various sources so that all data users work with the same information.

Companies and data stewards often begin implementing an MDM initiative in a single data domain (logical groupings of similar data, such as customer data or employee data) before scaling such work across the organization’s data assets.4

Data quality improvement

Data stewards can improve data quality by reviewing the contents of a database, which is known as data profiling. They also work with data stakeholders to create data definitions, design data quality metrics and establish business rules for data, such as what values are considered valid or invalid.

For example, as explained in the book “Data Stewardship,” when the data collected is a customer’s marital status, a rule might state that “single,” “married,” “widowed” or “divorced” would be valid values, while a blank response would be considered invalid.5 Data stewards can also provide input on addressing data quality issues when they arise.

Metadata management

Metadata is information that describes a data point or dataset, such as the data’s creation date or authorship details. Data stewards can be responsible for creating high-quality metadata and evaluating the quality of existing metadata. As with general data quality, data stewards are tasked with addressing metadata quality issues.

Reference data management

Data stewards often maintain reference data, which is data that categorizes other data within the enterprise. Examples of reference data include country codes, currency information and product codes. Through data documentation, data stewards can record valid values for reference data, evaluate whether new valid values are necessary and reconcile reference data values across different systems.

In the lattermost case, using the marital status example, a data steward might be charged with determining what actions to take when one system allows “widowed” and “divorced” as marital status data while another only accepts “married” and “single.”6

Identity resolution

Often, multiple instances of data represent the same entity. Consider, for instance, a single customer who appears multiple times in a pharmacy chain’s database because they’ve had different prescriptions that were filled at different stores.

Through a process known as identity resolution, data stewards determine when different data instances refer to the same entity. In the case of the pharmacy customer, for example, identity resolution can help ensure that potentially dangerous drug interactions are detected when filling the customer’s prescriptions.7

Information security and data privacy protection

Information security is the protection of important information against unauthorized access, disclosure, use, alteration or disruption. Under data privacy regulations, companies are required to implement enhanced protections for sensitive information such as healthcare data. They’re also required to comply with rules governing data sharing, limiting data collection and more. Data stewards can play a role in data protection and regulatory compliance by creating and establishing security classifications for different types of data.

Data lineage tracing

Data lineage is the process of tracking data lifecycles, providing a clear understanding of where data originated, how it has changed and its ultimate destination. Data stewards can trace lineage, which helps an organization affirm data integrity for regulatory reporting purposes.

Business process risk management

Poor data quality can put business processes at risk. Data stewards can work with business process leaders to determine the use of data in a process and how vulnerable the process is to failure in the case of poor data quality.8

What technologies and tools support data stewardship?

Organizations can implement various solutions and tools to support data stewardship activities, including:

  • Artificial intelligence (AI)
  • Data catalogs
  • Data profiling and analysis tools
  • Relational database management systems (RDBMS)
  • Data governance software
Artificial intelligence

AI and data stewardship have what some might consider to be a symbiotic relationship. While data stewardship helps ensure AI systems work with high-quality data, AI-based tools can optimize data stewardship tasks. For example, AI-enabled data preparation tools can perform validation checks and flag errors such as improper formatting, while AI-driven data loss prevention tools can detect sensitive information and apply security controls as necessary.

Data catalogs

A data catalog is an inventory of all data assets in an organization. It’s designed to help data stewards and other data professionals find information easily and quickly. The metadata associated with each data asset enables the catalog’s searchability.

Data profiling and analysis tools

Data profiling and analysis tools can assess data for consistency and quality. Features of such tools might include capabilities for identifying anomalies, validating data sources and summarizing analysis results through custom reports.

Relational database management systems

One way that data stewards organize data is through relational databases. A relational database (RDB) is a type of database in which data is organized into rows and columns. The resulting tables might be linked together to demonstrate relationships between data points. Relational database management systems (RDBMS) are software solutions that data stewards and others can use to maintain and update RDBs.

Data governance software

Data governance software programs often incorporate data profiling and analysis tools as well as AI-driven capabilities. Features might include AI-powered metadata enrichment, data catalog creation, data lineage tracing and the establishment of role-based data access control.

Footnotes

All links reside outside of ibm.com.

1, 4 Allen et al. “Multi-Domain Master Data Management.” Morgan Kaufmann. 10 April 2015.

2, 3, 5, 6, 7, 8 Plotkin. “Data Stewardship, Second Edition.” Academic Press. 20 November 2020.

Related solutions Data management software and solutions

Design a data strategy that eliminates data silos, reduces complexity and improves data quality for exceptional customer and employee experiences.

Explore data management solutions
IBM watsonx.data

Watsonx.data enables you to scale analytics and AI with all your data, wherever it resides, through an open, hybrid and governed data store.

Discover watsonx.data
Data and analytics consulting services

Unlock the value of enterprise data with IBM Consulting, building an insight-driven organization that delivers business advantage.

Discover analytics services
Take the next step

Design a data strategy that eliminates data silos, reduces complexity and improves data quality for exceptional customer and employee experiences.

Explore data management solutions Discover watsonx.data