Broadly speaking, research data is the information needed to produce and support your research findings.Research data takes many forms, including both physical and digital formats. There is not a single shared definition. What information falls within the scope of ‘research data’ may change depending on your discipline, whether or not your job is positioned in academia, or whether or not you’re subject to funding agency guidelines or university policy.
This module will focus on best practices for managing digital research data, however physical data such as paper lab notebooks or physical samples as well as non-research data files are also important to manage well too.
Practices for managing physical data should follow standards in your discipline or research group if they exist, and questions can be directed to local resources. Here at UW-Madison, Research Data Services is able to answer your questions.
Practices for managing non-data files, can usually be directed to your local records manager. Here at UW-Madison, our University Records Officer can assist you.
While definitions for research data may be broad there are ways to categorize your data based on common forms, general types, and the stages that it moves through during the research process. Understanding your research data at this level will help you make more informed decisions as you begin to manage your data. Certain data types or data at certain stages of the research cycle may be harder to recreate or recollect once lost, so you may choose to different strategies that provide extra protections for data at a greater risk.
Examples of commonly used research data forms across general domains:
|Arts and Humanities||
Research data can fall into a few general categories, based on method of collection, that can be used when talking about types of research data.
As we noted above, research data will also move through the following stages during your project:
These different stages of research data are often represented in a more formal model that we call a data lifecycle. In reality, research doesn’t move in quite such an orderly fashion and often, many of these steps happen simultaneously.
However, the data lifecycle can be a helpful mental model to use because at each stage in the life cycle, there are best practices for managing data. Visualizing how your data is moving through your project can help remind you of key practices to incorporate. While this module won’t cover every single stage in the life cycle, we will provide a primer to some essential practices for getting started with managing your data.
Engage with the interactive Research Data Management Life Cycle. Selecting a stage of research data management will reveal its definition and that stage’s role in the life cycle. The Research Data Management Lifecyle can be opened in a new browser tab or window as well.
While data management can sound like a lot of work for little payoff, managing your research data well actually provides a lot of personal and practical benefits. Well managed and well described data is easier to sort through, access, and understand, making your research project more efficient. Having a good system also prevents the frustration of data loss in the case of hardware failure or other accidents, as you will have to spend less time trying to recover the lost data or redoing your work. Another personal benefit to researchers is that well managed data can help prevent publication retractions. Retractions can be an unintended consequence of poor data management, when it leads to errors in data or the loss of data that supports published material.
There are larger changes happening in the research community that have led to increased attention to research data management. First, research is increasingly computational, data-driven, and collaborative. As methods, instruments, and processes continue to advance, so too does the amount of data we are able to create and capture. The increasing size of data and corresponding infrastructure needed for storage and computing requires us to be more responsible, proactive data managers.
Second, funding agencies, especially federal agencies that provide funding through tax dollars, are increasingly interested in ensuring that publications and data from funded research are openly available to funders. They’ve put policies in place that require data to be managed and shared, something we’ll talk about in another course.
Third, increasing emphasis is being placed on the reproducibility and reusability of research. Reproducibility refers to a researcher being able to understand another researcher’s methods well enough to move from the same raw data or beginning point and reproduce the results of the work.
Another important reason to manage data is the fact that data is often a valuable asset as well as a very delicate one. Depending on the type of work, data can be expensive. This expense is both in terms of monetary cost spent on instruments or infrastructure needed to collect data but also terms of resource cost in the time spent to work with that data. The investment you make in your data can be maximized by describing and sharing it so that others can reuse or build upon it. For example, if a researcher has access to a prohibitively expensive instrument, sharing the data from their project makes it available to other researchers who may not have the same resources.
Data is also more fragile than you may imagine. It can be easy to think that our digital data is somehow safer than physical samples or notebooks we may keep in a lab, but the truth is that digital data relies on hardware that physically exists somewhere in our world. Digital data live on computers in our offices, servers in the basement, on instruments in our labs, and on flash drives in our backpacks. That physical hardware can be damaged by natural causes or accidents, files can be corrupted, and data formats can be rendered inaccessible with constantly and quickly changing technology. Managing your data well can help prevent these losses.
Of the following, what is considered research data? Check all that apply.
Choose one of the following questions and answer below.
 New England Collaborative Data Management Curriculum, "Module 2: Types, Formats, and Stages of Data" by Lamar Soutter Library, University of Massachusetts Medical School licensed under CC BY SA 4.0 at https://library.umassmed.edu/resources/necdmc/index