Publish, Share and Preserve Your Data
Sharing Must Be FAIR
Shareable data are guided by the principles embodied in FAIR:
- Findable: Data are identified through unique identifiers and clearly cited.
- Accessible: Data are publicly available, for example, in a Open Access repository.
- Interoperable: Data are in made actionable by being available in non-proprietary formats, for instance CSV rather than PDF or Excel files.
- Reusable: Data are properly documented through readme files, file naming protocols, and codebooks.
Unique Identifiers: DOI and ORCID
DOI to Identify Your Data
To make your data findable, you’ll first need a Digital Object Identifier or DOI, a link to a permanent URL. To share and preserve your data, you need to be able to refer to your files unambiguously.
One option is to choose a repository that will provide your dataset with a DOI. A dataset with a DOI can be discovered by following the link or searching DataCite. We can also assist you with minting or assigning DOIs for your project outputs.
ORCID to Identify Yourself
In addition to identifying your data, it is good practice to identify yourself with a distinct number provided by ORCID iD. This identifier allows you to distinguish yourself from researchers with the same or similar name. In addition, ORCID iDs are now required by a number of funders and publishers.
One of the best benefits of of using your ORCID iD is that you can configure your ORCID profile to automatically update when you publish a paper or a dataset. Visit ORCID to apply. It’s free.
Which Repository Should I Use?
The best repository for sharing your data depends on your discipline. If there is a national or subject-level repository in your discipline, that would be your first choice. To determine if such repositories exist, you can search the registry of repositories re3data, or check the Open Access Directory of Data Repositories at Simmons University.
It is also possible that your funding agency or your state has a repository you can use. In the absence of these, there are a number of general subject repositories which can take your data. Below is a table with some choices available to you.
General Subject Repositories
|Dryad||Dryad is an open-source, research data curation and publication platform. UC Davis is a proud partner of Dryad and offers Dryad as a free service for all UC Davis researchers to publish and archive their data. Datasets published in Dryad receive a citation and can be versioned at any time. Dryad is integrated with hundreds of journals and is an easy way to both publish data and comply with funder and publisher mandates.|
|Interuniversity Consortium for Political and Social Research (ICPSR)||Social and Behavioral Science Data, including political and social surveys, public health studies; allows variable-level searches. See ICPSR’s Guide to Social Science Data Preparation and Archiving|
|The Knowledge Network for Biocomplexity||Environmental and ecological datasets|
|QDR||A qualitative data repository|
|National Center for Biotechnology Information||Genetics, gene expression, genomics, proteomics, assays|
|Harvard Dataverse Network||General subject repository hosted at Harvard|
|Zenodo||General subject repository provided by CERN|
|figshare||General subject repository|
Preserving and Citing Data
Repositories can offer two types of preservation — bit-level and long-term preservation for re-use:
- Bit-level preservation means that the repository will provide access to the file as it was deposited.
- Long-term preservation for re-use usually includes additional curation, such as assessment of metadata quality when data are deposited and file format migrations so that the data can be accessed by current software and media.
Even if the repository you use provides only bit-level curation, you can improve the re-usability of your data by including thorough documentation of methods and analysis along with your data, and by using non-proprietary formats for your data files.
In addition, both your data, and you, the author, should have unique identifiers to ensure provenance and also to receive credit.
We recommend licensing your data under Creative Commons 0 license, to encourage data re-use, and requesting a citation when the dataset is re-used. We recommend using the DataCite format for data citations in publications:
Creator (PublicationYear). Title. Version. Publisher. ResourceType. Identifier
Baldwin, Bruce G. et al. (2017), A subset of Californian vascular plant species recognized by Baldwin et al. (2017), corresponding to to the “large and intermediate-sized genera” studied by Stebbins & Major (1965), v3, Dataset, https://dx.doi.org/10.6078/D1B885
UC Davis Storage and Collaboration Tools
UC Davis offers several “cloud services” to meet researchers’ needs to store and share data while working on them. Learn more about the tools and services available for cloud storage and collaboration, storage and hosting, backup service and more.