Lesson 1: Data has Value

Introduction to Research Data Management

1.1 What do we mean when we say "research data"?

Broadly speaking, research data is the information needed to produce and support your research findings.

Research data takes many forms, including both physical and digital formats. There is not a single shared definition. What information falls within the scope of ‘research data’ may change depending on your discipline, whether or not your job is positioned in academia, or whether or not you’re subject to funding agency guidelines or university policy.

This module will focus on best practices for managing digital research data, however physical data such as paper lab notebooks or physical samples as well as non-research data files are also important to manage well too.

Practices for managing physical data should follow standards in your discipline or research group if they exist, and questions can be directed to local resources. Here at UW-Madison, Research Data Services is able to answer your questions.

Practices for managing non-data files, can usually be directed to your local records manager. Here at UW-Madison, our University Records Officer can assist you.

1.2 Forms, types and stages of research data

While definitions for research data may be broad there are ways to categorize your data based on common forms, general types, and the stages that it moves through during the research process. Understanding your research data at this level will help you make more informed decisions as you begin to manage your data. Certain data types or data at certain stages of the research cycle may be harder to recreate or recollect once lost, so you may choose to different strategies that provide extra protections for data at a greater risk.

Common Forms:

The most common data forms can vary by discipline. Below we’ve included examples of some commonly used data across a few broad domains.

Examples of commonly used research data forms across general domains:

Discipline Specific Data Types [1]
Hard Sciences
  • Measurements generated by sensors, laboratory instruments
  • Computer modeling
  • Simulations
  • Observations and/or field studies
  • Specimens
Social Sciences
  • Survey responses
  • Focus group and individual interviews
  • Economic indicators
  • Demographics
  • Opinion polling
Arts and Humanities
  • Text - including the text of novels, poems, historical letters or documents, etc.
  • Images
  • Geospatial data, historical maps
  • Video - films, recordings, etc.
  • Music or other audio recordings

Types of Data

Research data can fall into a few general categories, based on method of collection, that can be used when talking about types of research data.

  • Observational
    • Captured in real-time via observation or sensors, instruments.
    • This data cannot be reproduced or recaptured, sometimes called ‘unique data’
  • Experimental
    • Data from lab equipment and under controlled conditions, this is data produced by intervention from a researcher trying to produce a change via an altered variable
    • This data is often reproducible, but can be expensive to do so
  • Simulation data
    • Data generated from test models studying actual or theoretical systems, imitation of a real-world process or system
    • For this data, the models and metadata (information about the data) may be just as valuable, if not more than the output data
  • Compiled or derived data
    • Results of data analysis, or data aggregated together from multiple, existing sources
    • This data can often be reproduced but is very expensive and time consuming to do so
  • Reference or Canonical
    • Fixed or organic collection datasets, usually peer-reviewed, and often published and curated
    • This data is typically from existing widely-used data sources such as census data or gene sequence data banks

Stages of Research Data

As we noted above, research data will also move through the following stages during your project:

  • Raw Data
    • Depending on the type of data you’re using, your raw data may be very valuable and not reproducible. We always suggest keeping a raw copy separate from your other data.
  • Processed Data
    • As long as your processing and cleaning are well documented, this data is likely reproducible, though it may be time consuming to do so.
  • Finalized and/or Publishsed Data
    • At this point you have a final dataset ready for publication or sharing which can also lead to your data moving into another stage where it can be reused by yourself and others
  • Reuse or Combined with Existing Data

These different stages of research data are often represented in a more formal model that we call a data lifecycle. In reality, research doesn’t move in quite such an orderly fashion and often, many of these steps happen simultaneously.

However, the data lifecycle can be a helpful mental model to use because at each stage in the life cycle, there are best practices for managing data. Visualizing how your data is moving through your project can help remind you of key practices to incorporate. While this module won’t cover every single stage in the life cycle, we will provide a primer to some essential practices for getting started with managing your data.

Engage with the interactive Research Data Management Life Cycle. Selecting a stage of research data management will reveal its definition and that stage’s role in the life cycle. The Research Data Management Lifecyle can be opened in a new browser tab or window as well.

The Data Lifecycle

1.3 Why is it important to manage research data?

Personal Benefits

While data management can sound like a lot of work for little payoff, managing your research data well actually provides a lot of personal and practical benefits. Well managed and well described data is easier to sort through, access, and understand, making your research project more efficient. Having a good system also prevents the frustration of data loss in the case of hardware failure or other accidents, as you will have to spend less time trying to recover the lost data or redoing your work. Another personal benefit to researchers is that well managed data can help prevent publication retractions. Retractions can be an unintended consequence of poor data management, when it leads to errors in data or the loss of data that supports published material.

Changes in research expectations

There are larger changes happening in the research community that have led to increased attention to research data management. First, research is increasingly computational, data-driven, and collaborative. As methods, instruments, and processes continue to advance, so too does the amount of data we are able to create and capture. The increasing size of data and corresponding infrastructure needed for storage and computing requires us to be more responsible, proactive data managers.

Second, funding agencies, especially federal agencies that provide funding through tax dollars, are increasingly interested in ensuring that publications and data from funded research are openly available to funders. They’ve put policies in place that require data to be managed and shared, something we’ll talk about in another course.

Third, increasing emphasis is being placed on the reproducibility and reusability of research. Reproducibility refers to a researcher being able to understand another researcher’s methods well enough to move from the same raw data or beginning point and reproduce the results of the work.

Caring for a valuable and delicate asset

Another important reason to manage data is the fact that data is often a valuable asset as well as a very delicate one. Depending on the type of work, data can be expensive. This expense is both in terms of monetary cost spent on instruments or infrastructure needed to collect data but also terms of resource cost in the time spent to work with that data. The investment you make in your data can be maximized by describing and sharing it so that others can reuse or build upon it. For example, if a researcher has access to a prohibitively expensive instrument, sharing the data from their project makes it available to other researchers who may not have the same resources.

Data is also more fragile than you may imagine. It can be easy to think that our digital data is somehow safer than physical samples or notebooks we may keep in a lab, but the truth is that digital data relies on hardware that physically exists somewhere in our world. Digital data live on computers in our offices, servers in the basement, on instruments in our labs, and on flash drives in our backpacks. That physical hardware can be damaged by natural causes or accidents, files can be corrupted, and data formats can be rendered inaccessible with constantly and quickly changing technology. Managing your data well can help prevent these losses.

Overall benefits

Research data management also:
  • Ensures you’re maximizing the effective use and value of your data and information assets
  • Helps continually improve the quality of the data including: data accuracy, integrity, integration, timeliness of data capture and presentation, relevance, and usefulness
  • Ensures appropriate use of your data and information
  • Facilitates data sharing
  • Ensures sustainability and accessibility in the long term for re-use in science and the advancement of new discoveries

Check your understanding

Of the following, what is considered research data? Check all that apply.

    Answers: A, B, C are correct.
    Context: According to the definitions of research data we have covered in this section, drafts for publication would not be considered as research data. The satellite images, the code to visualize the data, and the audio recording would all be important for another researcher to have to be able understand, interpret, and reproduce your work. However, another researcher would likely have little use for your drafts for publication.

Challenge Activity

Choose one of the following questions and answer below.

What are common types of data in your area of study or work? List some below.

If you’re currently conducting research, list some of the data you will produce below.

[1] New England Collaborative Data Management Curriculum, "Module 2: Types, Formats, and Stages of Data" by Lamar Soutter Library, University of Massachusetts Medical School licensed under CC BY SA 4.0 at https://library.umassmed.edu/resources/necdmc/index