Data stewards are charged with running data stewardship programs. Specific data steward responsibilities include defining data quality metrics, managing metadata and reference data, tracing data lineage and classifying sensitive data.
Different technologies and tools can support data steward workflows, including artificial intelligence (AI), data catalogs, relational databases, data quality platforms and data governance software.
Companies today are collecting and analyzing more data than ever in hopes of unlocking valuable insights. However, data collection and analytics on their own aren’t enough to secure successful outcomes. Data stewardship and data stewards can support and guide effective data use within a data-driven culture.
In recent years, with the increasing adoption of AI, data stewardship has taken on additional significance. AI systems consume and produce massive quantities of data. Data stewardship helps ensure the quality and integrity of that data so that AI-powered business processes are effective, compliant with government regulations and aligned with governance and ethical AI standards.
Good data stewardship programs enable successful data curation by improving data quality, accessibility, usability and security. Data stewards help ensure that employees can access useful and accurate business data to empower data-driven decision-making and AI-driven productivity gains. Additional benefits of data stewardship include more consistent data interpretation and improved audit readiness.
Data stewards often collaborate with a host of stakeholders—including data owners, data analysts, data science experts and general business users—to achieve these benefits.
Employees who are not formally recognized as “data stewards” might nonetheless have data stewardship responsibilities and devote significant time to meeting their organizations’ data needs such as inventorying data and evaluating data quality. However, some data management experts say that formalizing data stewardship roles is important as it indicates that a company is serious about data quality management.1
Data governance and data stewardship are separate but related concepts. Companies’ data governance programs help ensure data integrity and data security through policies, standards and procedures for data collection, ownership, storage, processing and use. Many data stewardship responsibilities entail implementing rules outlined in data governance frameworks. As such, data stewardship can be considered “the operational aspect” of data governance.2
Companies with more mature data stewardship programs might have different types of data steward roles, including:
Use cases for data stewardship include:
Data stewardship is often key to master data management (MDM), which is an approach to managing an organization's critical data through technology, tools and processes. Organizations use MDM to create a single source of truth that integrates data from various sources so that all data users work with the same information.
Companies and data stewards often begin implementing an MDM initiative in a single data domain (logical groupings of similar data, such as customer data or employee data) before scaling such work across the organization’s data assets.4
Data stewards can improve data quality by reviewing the contents of a database, which is known as data profiling. They also work with data stakeholders to create data definitions, design data quality metrics and establish business rules for data, such as what values are considered valid or invalid.
For example, as explained in the book “Data Stewardship,” when the data collected is a customer’s marital status, a rule might state that “single,” “married,” “widowed” or “divorced” would be valid values, while a blank response would be considered invalid.5 Data stewards can also provide input on addressing data quality issues when they arise.
Metadata is information that describes a data point or dataset, such as the data’s creation date or authorship details. Data stewards can be responsible for creating high-quality metadata and evaluating the quality of existing metadata. As with general data quality, data stewards are tasked with addressing metadata quality issues.
Data stewards often maintain reference data, which is data that categorizes other data within the enterprise. Examples of reference data include country codes, currency information and product codes. Through data documentation, data stewards can record valid values for reference data, evaluate whether new valid values are necessary and reconcile reference data values across different systems.
In the lattermost case, using the marital status example, a data steward might be charged with determining what actions to take when one system allows “widowed” and “divorced” as marital status data while another only accepts “married” and “single.”6
Often, multiple instances of data represent the same entity. Consider, for instance, a single customer who appears multiple times in a pharmacy chain’s database because they’ve had different prescriptions that were filled at different stores.
Through a process known as identity resolution, data stewards determine when different data instances refer to the same entity. In the case of the pharmacy customer, for example, identity resolution can help ensure that potentially dangerous drug interactions are detected when filling the customer’s prescriptions.7
Information security is the protection of important information against unauthorized access, disclosure, use, alteration or disruption. Under data privacy regulations, companies are required to implement enhanced protections for sensitive information such as healthcare data. They’re also required to comply with rules governing data sharing, limiting data collection and more. Data stewards can play a role in data protection and regulatory compliance by creating and establishing security classifications for different types of data.
Data lineage is the process of tracking data lifecycles, providing a clear understanding of where data originated, how it has changed and its ultimate destination. Data stewards can trace lineage, which helps an organization affirm data integrity for regulatory reporting purposes.
Poor data quality can put business processes at risk. Data stewards can work with business process leaders to determine the use of data in a process and how vulnerable the process is to failure in the case of poor data quality.8
Organizations can implement various solutions and tools to support data stewardship activities, including:
AI and data stewardship have what some might consider to be a symbiotic relationship. While data stewardship helps ensure AI systems work with high-quality data, AI-based tools can optimize data stewardship tasks. For example, AI-enabled data preparation tools can perform validation checks and flag errors such as improper formatting, while AI-driven data loss prevention tools can detect sensitive information and apply security controls as necessary.
A data catalog is an inventory of all data assets in an organization. It’s designed to help data stewards and other data professionals find information easily and quickly. The metadata associated with each data asset enables the catalog’s searchability.
Data profiling and analysis tools can assess data for consistency and quality. Features of such tools might include capabilities for identifying anomalies, validating data sources and summarizing analysis results through custom reports.
One way that data stewards organize data is through relational databases. A relational database (RDB) is a type of database in which data is organized into rows and columns. The resulting tables might be linked together to demonstrate relationships between data points. Relational database management systems (RDBMS) are software solutions that data stewards and others can use to maintain and update RDBs.
Data governance software programs often incorporate data profiling and analysis tools as well as AI-driven capabilities. Features might include AI-powered metadata enrichment, data catalog creation, data lineage tracing and the establishment of role-based data access control.
All links reside outside of ibm.com.
1, 4 Allen et al. “Multi-Domain Master Data Management.” Morgan Kaufmann. 10 April 2015.
2, 3, 5, 6, 7, 8 Plotkin. “Data Stewardship, Second Edition.” Academic Press. 20 November 2020.
Learn how an open data lakehouse approach can provide trustworthy data and faster analytics and AI projects execution.
Explore the data leader's guide to building a data-driven organization and driving business advantage.
Discover why AI-powered data intelligence and data integration are critical to drive structured and unstructured data preparedness and accelerate AI outcomes.
Gain unique insights into the evolving landscape of ABI solutions, highlighting key findings, assumptions and recommendations for data and analytics leaders.
Simplify data access and automate data governance. Discover the power of integrating a data lakehouse strategy into your data architecture, including cost-optimizing your workloads and scaling AI and analytics, with all your data, anywhere.
Explore how IBM Research is regularly integrated into new features for IBM Cloud Pak for Data.
Design a data strategy that eliminates data silos, reduces complexity and improves data quality for exceptional customer and employee experiences.
Watsonx.data enables you to scale analytics and AI with all your data, wherever it resides, through an open, hybrid and governed data store.
Unlock the value of enterprise data with IBM Consulting, building an insight-driven organization that delivers business advantage.