Understand the New NIH Data Management and Sharing Policy: January 25, 2023
A new National Institutes of Health (NIH) policy (the “NIH data management and sharing policy” or “NIH DMSP”), effective Jan. 25, 2023, will require all researchers seeking NIH funding to submit a data management plan and share their research data at the same time that findings are published, or sooner. Read on for key things to know about the NIH DMSP, FAQs on NIH DMSP elements and a case scenario.
Key things to know about the NIH DMSP
- What: All NIH grant applications for projects that generate scientific data will be required to submit and comply with a data management and sharing plan.
- Why: To bolster research rigor and reproducibility, accelerate discovery, provide access to high-value datasets, promote data reuse, and expedite translational science.
- Where to learn more: In addition to the FAQs below, NIH offers many resources on scientific data sharing, including information about the new DMSP.
- Where to get additional help: Contact the UC Davis Library and DataLab for expert guidance and resources to help you successfully describe, share, analyze, publish, and find a repository for your data.
FAQs on NIH DMSP elements & case scenario
A data management plan describes how the scientific data will be collected, organized, documented, stored and shared, and who is responsible for what.
There are two helpful starting points in coming up with a plan. NIH official guidance includes a template with prompts. UC Davis researchers also have free access to the DMPTool, which provides sample text to get you started in addition to the prompts. You can also reach out to the library’s data experts if you would like someone to review your plan before submitting it.
For the first two elements of the plan, data types and tools, in many cases, you will be describing the practices already in place for your research team. If your project adopts a novel approach, documenting and standardizing practices beforehand will ensure easier coordination of the work between graduate students, postdocs, staff scientists, and others.
For these elements, you will name the components you will use and create (e.g., data files and protocols) and, if relevant, code to reproduce data processing and analysis.
We recommend providing the data in open or commonly used formats to improve data sharing. For instance, a tiff image can be viewed in multiple tools, while some proprietary image formats would require licensing specific software.
For this part of the plan, indicate whether your data and materials will be structured and described using formal standards (see NIH’s Implementation Resources). As described in the case scenario below, it is often easier to address this portion if you have a data-sharing repository in mind. In addition to the repository standards, it may be helpful to address protocols or codebooks documenting data collection and describing the data. The NIH recommends using Common Data Elements.
Any considerations about data access and release during and after the project should be specified and cleared through an IRB review or appropriate practices in the specific research community. If you are working with human subjects, refer to “Additional Considerations for Human Data” and the supplemental information.
Note: The NIH Genomic Data Sharing policy is folded into the new NIH DMSP, and the expectations for sharing remain the same.
The NIH provides a list of repositories that may be appropriate for your data. A subject-specific repository is usually the best place to share your data since it will be an intuitive location for other scientists to look for datasets in a particular field.
If the scope of your data does not align with the listed repositories, there are general repositories that take in a variety of subjects and interdisciplinary datasets. UC Davis has a membership to the Dryad repository, and UC Davis affiliates can deposit data there for free (as long as it is open and unrestricted, and contains no personally identifiable human subject information – see their FAQ for details).
Tip: When submitting data to a general repository, include data collection protocols, instruments, and other relevant documentation to ensure ease of data reuse. This will significantly enhance how FAIR (findable, accessible, interoperable and reusable) the data are.
Note: Specific centers or calls for grant proposals may have additional data management requirements. Check with the institute to which you submit your grant proposal to determine if they have an additional list of data management requirements.
The UC Davis Library and DataLab pages Data Analysis and Management and Data Management Toolkit include additional guidance and contact information.
Program staff at the proposed NIH Institute or Center will assess data management and sharing plans to ensure the elements have been adequately addressed and to assess the reasonableness of the responses. Successful applications will only be funded if the data management and sharing plan is complete and acceptable.
If funding is awarded, recipients will be required to comply with the version of the plan that the funding Institute or Center approved. Plans may be updated during regular reporting intervals, where progress on data sharing will also need to be reported.
Imagine you are working on a project involving high-throughput sequencing. When you are ready to publish your work, the project data would also need to be made available, and the Sequence Read Archive (SRA) with NCBI is a good way to do so. You have probably used the many databases curated and supported by NCBI–GEO, Protein, PubChem, which are useful because of the structured descriptions accompanying the data. Importantly, these data may have initially been available in a figure or a table as part of a publication, but what makes the repositories particularly useful is that one can find all the data that fit a particular criterion formatted the same way. Instead of scientists struggling to extract a sequence or a structure from a pdf or an image, the data are ready for further analysis.
So you try to do the right thing and go to the website to deposit your sequence files. The instructions require you to create a BioProject and BioSample, which in turn ask about the cell type, the collection date, the pooling of DNA extracts, the purpose of sequencing. Collecting this information will allow future reuse of the data, saving time and funds instead of collecting already existing data. All of the sample information is available in the project notes, but it takes time to assemble, enter and format it to be ready for submission with the sequenced sample data. The process would be much easier if the decision to use SRA had been made at the beginning of the project.
Ideally, that is what writing the data management plan will help you establish early on. When you write your plan, you will need to note how the data can be accessed and preserved. In this particular example, you selected SRA, and now you know upfront the standards needed to describe your sample. That is another element of the data management plan, and will provide you a standardized way to record that information, so that even when work is completed by different members of your team, the information is still entered consistently.
Questions on this case or your own scenario?
Experts at the UC Davis Library and DataLab are available for additional guidance. What is your scenario?