3.1 Introduction to data management plans
Now, that we’ve covered some of the policies, ethical considerations, and best practices in our previous course, we’ll introduce a tool you can use to wrap all these data management considerations into one document for your use - a data management plan.
As we mentioned above, data management plans (frequently abbreviated DMPs) are often a formal part of a grant proposal. To meet the data management requirements of many funding agencies, you need to provide a data management plan, which is typically a two-page document describing your data management practices. Different programs often have different data management plan requirements to address and we’re starting to see repercussions for poorly written DMPs or DMP non-compliance.
While it is good to know how to write a DMP so you aren’t caught off guard on a proposal, DMPs can also be a really useful tool for your own research project management. Writing a DMP at the outset of your project helps you make important choices about key events through the lifecycle of your data, from collection and acquisition to description and sharing. A DMP also helps provide some organizational structure for your work at the outset and can be used to help enforce sticking to your best practices throughout your project.
In collaborative research projects, the DMP also provides an opportunity for the entire research team to participate in the decision-making of what happens to your data -- establishing buy-in and shared expectation around working with and stewarding the data.
While we’ll introduce what most funding agencies require in a DMP below, here are some of the key questions to ask when planning your research in order to make the best choices for your data and write an effective DMP:
- What files are you producing? What file formats will they be in?
- How are data and project documents organized, named, described?
- Where is your data stored, backed up, and what security does it need? Who should have access to the data and how that accessed is provided?
- Should you share the data at the end of the project? If so, when and how?
What do funding agencies require?
Funding agencies are interested in ensuring that research funded with taxpayer dollars is made open and accessible (as appropriate) as a public good - this is often called public access. Due to this, they also want to ensure that data is also well-managed to maximize its value and reusability when shared. So, funding agencies ask researchers to address both how they will responsibly manage their research data as well as how they will make data available to the public within a formal data management plan. Below are the main questions that most funding agencies will want answered in a DMP.
Funders are often interested in the following topics:
- Can the data be shared?
- Is the output considered intellectual property? Do you have sensitive data that should be made public?
- What types of data and research outputs are being produced?
- Agencies are interested in a specific definition of research data, data that is “Necessary to validate research findings” – OMB circular A110
- This means any data you are producing that is figures, charts, images, data, code, statistics, etc.
- They want you to detail what activities you will do in the course of your research, what types and sizes of files those activities produce, and what of them will be retained to be shared
- How are you storing and organizing your data
- This includes details about security and access. This is a critical section if you have sensitive or restricted data.
- This information includes details on backup and storage platforms, organization of the data, and roles and responsibilities related to the management and sharing of the data.
- What standards are you using for file formats and metadata?
- While you have already detailed what files you are producing, agencies also want to know that you are choosing sustainable file formats for sharing and providing description for your data.
- How much contextual information accompanies your data? Are you using a disciplinary metadata standard, are you including a README file?
- Can you understand your data six months from now? Can a stranger understand it six years from now?
- How will you be providing public access? How will you preserve your data?
- We detail more information about sharing in the next section, but agencies want you to identify where you will share your data in your DMP.
- Agencies also want to know if your data will be available into the future - Where is data archived? Are there any restrictions on the reuse of your data or any licensing?
If you need to write a data management plan, there are campus resources to help you. Research Data Services can help review your plan and ensure you’re meeting your funder requirements. Campus also provides access to the DMPTool which can provide templates and help you structure your plan. You can request feedback on your DMP from Research Data Services in DMPTool as well.
Input from experts
“In my work leading the Media History Digital Library, I have generated large amounts of data and had to work across several development environments. Preparing a Data Management Plan forced me to think more systemically about sharing and preserving my data. It also made my projects far more competitive for external grants. I would encourage other researchers, PIs, and digital humanists to begin the process of preparing a DMP.”
Eric Hoyt, Associate Professor of Media and Cultural Studies
Department of Communication Arts
University of Wisconsin-Madison
"The data management plan is meant to bring a reality check to your project planning. Even if the data themselves are meant to be cost-free, storing and moving and serving those data will cost time and money at many points in the project. My own affinity for the DMP is that when you propose the project, you want to have those aspects covered so that you're not left at loose ends for how to pay for and manage your project data later in your work. Eventually, it will be time to publish your methods and findings and, hopefully, you'll pass your work on to another researcher to reproduce and carry on your analyses. The point of the data management plan is to get you thinking right up front about when and how you'll do those things and how much they will cost in terms of project resources."
Matt Garcia, Ph.D.
Postdoctoral Research Associate
Dept. of Forest & Wildlife Ecology
University of Wisconsin – Madison
3.2 Data sharing
As noted above, part of a data management plan is describing where you will share your data. However, there are also other exciting reasons to share data outside of funding requirements.
- To fulfill funder and journal requirements. As you’ve learned in this course, grant funders require you to share data and in some disciplines, journals may require it as well.
- To get credit for your data and raise interest in publications. One study found a 69% increase in citations for articles whose associated data were available online.
- To establish priority. Sharing your data online can help provide provenance (or origin) information. A dataset time-stamped to establish the date it was produced can provide a block to research “scooping” tactics.
- To speed research. Data sharing can accelerate discovery rates. A recent example of this is with research being done around Zika virus. Here at UW-Madison, researcher David O’Connor and his team are sharing their data to find solutions to this public health issue more efficiently.
- To prevent time consuming, expensive, or sometime cruel redundancies in research. Sharing data collected, for example on expensive instruments, allows other researchers to have access to data that they may not have the resources to collect themselves. Sharing data collected from animal protocols allows for fewer experiments on lab animals.
Where can I share my data?
If you’d like to share your data, there are a lot of way to do so!
- A disciplinary data repository, if one exists in your field. Examples include ICPSR, GenBank, and NOAA National Centers for Environmental Information.
- General or discipline-agnostic repositories. Examples include Open Science Framework, Zenodo, and figshare.
- An institutional repository with a commitment to storing digital materials for the long term. In this instance, institutional usually refers to a university or higher education institution. Examples include UW’s MINDS@UW and the Data and Information Services Center’s Online Data Archive.
- Online, via a personal or department-hosted website or data portal. This option will not archive and preserve your data, so you should have a secondary plan for that if your data should be made available for a longer period of time than the lifespan of a website.
- As supplementary materials or via a ‘data paper’ in an appropriate journal. Check with journals about their data policies. Depending on the journal, this solution may not make the data openly available to the public.
Input from experts
"At some point in your project, someone likely shared their data with you for your own analyses, and hopefully you will do the same for another researcher. Sharing data facilitates both the wider effort into different ways to see and analyze a problem and our efforts at reproducibility and transparency in the research community. Sometimes, the dataset itself is your product, and it may take considerable time and project resources to serve that to other interested researchers. Many academic publishers now require an explicit statement on data sharing and analysis methods in your journal submissions. The goal is an open scientific community where advances are not stifled for lack of access to datasets and established methods, or even an author's slow response to your request for those after their own paper is finally published. An open approach to data sharing is a way to support each other at pushing our understanding and our science farther than any single researcher could possibly accomplish on their own."
Matt Garcia, Ph.D.
Postdoctoral Research Associate
Dept. of Forest & Wildlife Ecology
University of Wisconsin – Madison
What are repositories?
Above, we introduced different types of repositories - disciplinary, general, institutional - as a solution for sharing your data. So, what is a repository?
A data repository is a centralized place to store digital data, usually supported and maintained by an organization or institution, that will preserve your data while also making it openly accessible to the public or a subset of users, such as other researchers. Repositories are a great solution for those who are interested in both the long-term preservation of their data and sharing their data.
Sometimes archiving, preservation, and backup will be used to mean the same thing, however they’re all a bit different.
Archiving is the long term storage of data and files.
Digital preservation is the practice of archiving files while ensuring the ability to access and use over time. Ensuring continual access can involve extra work file migration and adding metadata.
Backup is the process of creating copies of your data to prevent loss, backup storage is typically not enough to preserve your data, but may serve to archive your data.
The repositories will collect data supplied by researchers and make the data available for others to view, download, or re-use. Some repositories may provide other services like helping enhance your metadata, supplying a DOI, or making your data searchable or more discoverable to others.
Things to consider before sharing your data:
- Does your data contain confidential or private personal information? If you anonymize your data, how easily can individuals in the dataset be reidentified? Should access be restricted?
- How long do you want your data to remain available?
- How will the repository backup and store your data?
- What curation (assistance in description and deposit) do they provide and what metadata do they require?
- Are your datasets understandable to those who wish to use them? Have you included all the metadata, methodology descriptions, codebooks, data dictionaries, and other descriptive material that someone looking at the dataset for the first time would need?
- What reuse policies do you need for your data?
Choosing when to share your data:
Choosing the best place to share your data will depend on your answers to the above considerations. Sensitive data may not be able to be shared at all or may require deposit in a repository that can supply access and download restrictions.
If there is a commonly used data repository in your discipline, we often suggest that as your first choice for data sharing as your colleagues are already actively using and looking for data at that repository. If you’re unsure what disciplinary repositories are available to you, re3data is a great discovery tool to find one.
If you don’t have a good disciplinary repository to default to, when you’re looking for a repository, make sure to read the terms of service to see how they will care for your data. Do they have a preservation fund if something happened to the repository? Do they back your data up responsibly? Do the require good data description or provide other services such as DOIs for deposited datasets or the ability to access statistics about how often your deposited data is searched for, downloaded, or cited? There can be a lot of considerations when choosing a repository, so if you need assistance in identifying a suitable option for data sharing, contact your subject librarian or Research Data Services for help.