What is a data management plan (DMP)?

Concentrated young african american woman working with economic report

What is a DMP?

A data management plan (DMP) is a document which defines how data handled throughout the lifecycle of a project—that is, from its acquisition to archival.

While these documents are typically used for research projects to meet funder requirements, they can be leveraged within a corporate environment as well to create structure and alignment between stakeholders.

Since DMPs highlight the types of data that will be used within the project and addresses the management of it throughout the data lifecycle, stakeholders, such as governance teams, can provide clear feedback on the storage and dissemination of sensitive data, such as personally identifiable information (PII), at the onset of a project. These documents allow teams to avoid compliance and regulatory pitfalls, and they can serve as templates on how to approach and manage data for future projects.

The latest AI News + Insights  

Discover expertly curated insights and news on AI, cloud and more in the weekly Think Newsletter.

Components of a data management plan

A data management plan typically has five components:

1. A statement of purpose
2. Data definitions
3. Data collection and access
4. Frequently asked questions (FAQs)
5. Research data limitations

Each of these focus areas enables research agencies and research funders (or perhaps your data management team) to assess the amount of risk associated with a given project. The data management plan also addresses how to manage that risk. For example, if sensitive data is used within a project, is it appropriate to re-use that data for future projects? Depending on the sensitivity of that data, it may not be appropriate, or it may require additional user consent.

Each component of a data management plan focuses on a particular piece of information, we’ll delve more into each one.

1. Statement of purpose: This explains why the team needs to acquire specific types of data over the course of the project. It should clearly outline the question that the team is attempting to answer with this dataset.

2. Data definitions: Data descriptions help end users and their audiences understand naming conventions and their correspondence with specific datasets. Some of this information may also be held within the metadata, typically labeling data by its data sources and file formats. Creating and abiding by pre-defined metadata standards throughout the data acquisition process will also ensure a more consistent collection and smoother integration process.

3. Data collection and access: This section of a DMP highlights how data will be collected, stored, and accessed from a data repository. It will likely address the data source of any existing data or the approach that will be taken to create new data, such as an experiment. It should also contain information around the timing of data—i.e. how often it will be updated and over what period of time. The type and timing of the data will generally inform its storage and access to third-parties. For example, unstructured data will require a non-relational system versus a relational one, and larger datasets will require more compute power compared to smaller ones. There also may be restrictions around data sharing due to privacy or intellectual property rights. Since project stakeholders will expect that sensitive data, such as personally identifiable information (PII), is treated with the upmost care and security, it’s important for data owners to be clear about their data management practices, particularly in this area. This will include answers to questions around the data’s long-term preservation, such as data archiving or data re-use. For data that is not sensitive in nature, there will be an expectation to provide a pathway for third parties to access raw data and research results.

4. Frequently Asked Questions: This section can be considered a “catch-all” for other popular questions within data management projects, such as sharing plans, citation preferences, and data backup methods. Researchers or data owners may to highlight any digital object identifiers (DOI) for owners of adjacent or related projects. Additionally, if project owners are archiving data, they’ll also need to address the length of the archive’s existence. Will it live for one year, five years, or perhaps indefinitely?

5. Research data limitations: This section addresses upfront limitations with the dataset, which will limit its ability to generalize more broadly to populations. For example, data may be focused on a specific demographic, such as a geography, gender, race, age group, et cetera.

Mixture of Experts | 10 July, episode 115

Your weekly news podcast for AI enthusiasts

Hear from industry experts on the latest in AI news, listen to Mixture of Experts podcast. New episodes on Fridays at 6am EST.

Go to episodes

Who uses data management plans?

Data management plans are predominantly used in more academic settings, particularly for federal government funded programs, such as the National Institutes of Health (NIH) and National Science Foundation (NSF), but corporations can also leverage them in either their research or data governance functions. While academics and researchers need to comply with funder requirements in grant applications, many research institutions create a DMP tool to provide participants with the relevant template for their research project. Data governance teams within organizations can set up similar protocols to ingest data requests from stakeholders advocating for new data initiatives.

Data management use cases

Grant applications

Researchers in both private and public sectors look to different funding agencies to sponsor research and innovation initiatives. DMPs mitigate risk for both parties, ensuring that data owners have assessed the value as well as their own personal responsibility (i.e. security and disaster recovery measures) to research data management.

Data governance initiatives

Data management plans are also incredibly helpful for new data initiatives in business settings, assisting all stakeholders in understanding the importance of new data sources and how it can tie to business outcomes. As developments within hybrid cloud, artificial intelligence, the internet of things (IoT), and edge computing continue to spur the growth of big data, enterprises will need to find ways to manage the complexity of it within their data systems.

3D render of a spiral of several icons lined up such as a camera, volume knob and a clipboard

Read the Data Leader's guide to learn how you can make your organization's data AI-ready.

Resources

3D render of several icons lined up such as a microphone and a camera

AI Agents run on data - is yours ready?

Your data is your competitive edge. Learn how to unlock it securely and drive measurable ROI from AI in this short webinar.

Data management explained

Techsplainers by IBM breaks down the essentials of data for AI, from key concepts to real‑world use cases. Clear, quick episodes help you learn the fundamentals fast.

3D rendering of several icons lined up such as a volume knob and a clipboard

Unify and access your data to help scale your AI

Learn why the path to AI-ready data often starts with effective access to both structured and unstructured data and the challenges that can impede data leaders.

Legal overhead turned into strategic insight

Learn how an AI-powered legal agent helps accelerate decision-making, reduce manual work and improve compliance.

Two men talking to each other on a podcast

AI Academy: Building a data strategy for enterprise AI

In this episode, Cathy Reese explains how organizations today need a data strategy that’s ready for advanced AI, which will require them to harness their highest quality data assets.

3D rendering of several icons lined up such as a camera and paper airplanes

The hybrid, open data lakehouse for AI

Simplify data access and automate data governance. Discover the power of integrating a data lakehouse strategy into your data architecture, including cost-optimizing your workloads and scaling AI and analytics, with all your data, anywhere.

Cost of a Data Breach Report 2025

Data breach costs have hit a new high. Get up-to-date insights into cybersecurity threats and their financial impacts on organizations.

3D render of two lines of several icons such as a camera, volume knob and a clipboard

The data leader’s guide to AI-ready data

Understand the actionable steps data leaders can take to overcome data challenges, establish the groundwork for a trusted data foundation and help get your organization’s data ready for AI.

3D render of several icons lined up such as a camera, volume knob and a clipboard

How the C-suite is turning information into impact

Explore insights from 1,700 CDOs in this cross-industry report for data leaders.