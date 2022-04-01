A data management plan typically has five components:

1. A statement of purpose

2. Data definitions

3. Data collection and access

4. Frequently asked questions (FAQs)

5. Research data limitations

Each of these focus areas enables research agencies and research funders (or perhaps your data management team) to assess the amount of risk associated with a given project. The data management plan also addresses how to manage that risk. For example, if sensitive data is used within a project, is it appropriate to re-use that data for future projects? Depending on the sensitivity of that data, it may not be appropriate, or it may require additional user consent.

Each component of a data management plan focuses on a particular piece of information, we’ll delve more into each one.

1. Statement of purpose: This explains why the team needs to acquire specific types of data over the course of the project. It should clearly outline the question that the team is attempting to answer with this dataset.

2. Data definitions: Data descriptions help end users and their audiences understand naming conventions and their correspondence with specific datasets. Some of this information may also be held within the metadata, typically labeling data by its data sources and file formats. Creating and abiding by pre-defined metadata standards throughout the data acquisition process will also ensure a more consistent collection and smoother integration process.

3. Data collection and access: This section of a DMP highlights how data will be collected, stored, and accessed from a data repository. It will likely address the data source of any existing data or the approach that will be taken to create new data, such as an experiment. It should also contain information around the timing of data—i.e. how often it will be updated and over what period of time. The type and timing of the data will generally inform its storage and access to third-parties. For example, unstructured data will require a non-relational system versus a relational one, and larger datasets will require more compute power compared to smaller ones. There also may be restrictions around data sharing due to privacy or intellectual property rights. Since project stakeholders will expect that sensitive data, such as personally identifiable information (PII), is treated with the upmost care and security, it’s important for data owners to be clear about their data management practices, particularly in this area. This will include answers to questions around the data’s long-term preservation, such as data archiving or data re-use. For data that is not sensitive in nature, there will be an expectation to provide a pathway for third parties to access raw data and research results.

4. Frequently Asked Questions: This section can be considered a “catch-all” for other popular questions within data management projects, such as sharing plans, citation preferences, and data backup methods. Researchers or data owners may to highlight any digital object identifiers (DOI) for owners of adjacent or related projects. Additionally, if project owners are archiving data, they’ll also need to address the length of the archive’s existence. Will it live for one year, five years, or perhaps indefinitely?

5. Research data limitations: This section addresses upfront limitations with the dataset, which will limit its ability to generalize more broadly to populations. For example, data may be focused on a specific demographic, such as a geography, gender, race, age group, et cetera.