Lesson 2: Ethical Considerations for Responsible Research

Responsible Data Planning, Use, and Sharing

Introduction

Complying with the policies and regulations your data may be subject to is an important part of properly caring for your data. However, there are also a few other considerations for working with sensitive data as well as properly crediting and reusing others’ data that can help ensure you’re working with data ethically throughout the lifecycle.

2.1 Sensitive data

When working with data, it is important to be aware that some data has further risk associated with it and could potentially cause harm to individuals, communities, nations, animals, or other entities if made publicly available. As the researcher, it is your responsibility to ensure that you assess your data for risk, avoid collecting sensitive data if it’s not necessary to the work, and properly protect the data through effective storage and security practices. [1]

Human subjects data


Data that could potentially identify an individual should be considered sensitive and receive extra care and handling, greater security restrictions than other data types, and should be de-identified prior to sharing or potentially not shared at all. Some data that falls in this category is personally identifiable information, which includes information like: health information, name, age, address, occupation, race/ethnicity, and more.

There are two different types of identifiable information - direct identifiers and indirect identifiers.

    Direct identifiers

    These do exactly as they sound - enable direct identification of or provide enough detail to make it easy distinguish someone from others. Direct identifiers include information like:
    • name
    • address
    • social security number
    • other unique numbers related to the individual such as driver’s license number, insurance accounts, medical records number, etc.
    • email address
    • full face photos
    • vehicle information and license plates

    Indirect identifiers

    These are data that can be used in combination together to enable identification of someone. Indirect identifiers include things like:
    • birthdate
    • ethnicity, race, or indigenous status
    • gender
    • detailed geographic information (e.g., state, county, province, or census tract of residence)
    • profession, detailed title, or organizations the person belongs to
    • rare diseases or health information
    • For example, here at UW-Madison, knowing someone’s ethnicity, gender, and area of study may enable you to identify who that person is.

There are also some other human subject data types that are legally regulated such as in the case of HIPAA and FERPA data which require even more restriction. We’ll go into more details about HIPAA and FERPA in the next section, where we discuss legal implications for data.

Any research you might do here at UW-Madison that may involve human subject data should go through the Institutional Review Board for either Education, Behavioral, and Social Sciences or for Medical Sciences prior to data collection, but ideally during your planning phase.


Check Your Understanding


Direct Indirect
License Plate
Social Security number
Phone number
Birthdate
Gender
Profession
Remember: Direct identifiers are information that are unique enough that enable you to easily identify someone. Indirect identifiers are pieces of information that, while alone may not identify anyone, used together they may enable identification.

Other considerations for human subjects data

If your data falls in this category, you will at some point in time have to complete training on responsible conduct of research that may cover information like human subjects, data management, research misconduct (like plagiarism and falsification), etc.

While human subject training might include information on the rights and autonomy that human subjects should have in studies, another consideration is understanding the impact that data collection and data sharing may have on the communities you’re working with.

For Example:
In the past, research teams have repeatedly caused harm to indigenous communities across North America by publishing, filming and recording, or otherwise giving access to information that was not supposed to be shared broadly. This has happened in different ways - broadcasting restricted community knowledge that had been shared specially with a researcher, misleading consent forms or research purposes, and conducting invasive research that does not benefit the community. This has also likely happened to many other communities across the globe that have worked with researchers.

It is incredibly important to question the methods and purposes of your research - who is benefitting from the research? What have you promised to not share and what have you explicitly discussed with the community that you can share. Are you letting the culture or community inform your data rather than imposing your own ideas?

Respect different knowledge systems and let the language and definitions used by a community inform your work. When conducting research with and about underrepresented communities as someone not from that community, consider the way your collection tools like forms and surveys or your research variables ask people to define themselves or their needs. Do the tools provide options for the way the community would define those things themselves? If not, will the data really be able to answer your question or help that community?

There are resources available written by different communities about working responsibly with data. Contact your librarian or Research Data Services for help. for assistance locating appropriate resources.


Other sensitive data

Data outside of the human subjects category can also be considered sensitive. You may often know outright if you’re working with sensitive data as it will be subject to laws, contracts, or policies that you have to comply with. However, it is always good to think through your data and any other products of your research to understand the impact they could have if shared.

Examples include:

2.2 Data as intellectual property

Much like scholarly publications, research data is a scholarly output of a researcher’s work. Due to this, it is important to understand when research data is considered intellectual property as well as how to cite it correctly so that it can contribute to the scholarly discourse.

In this section we’ll provide a brief introduction to copyright and licensing, data citation, and Digital Object Identifiers (DOIs).

Copyright and Licensing

In the United States, research data that is considered factual cannot be copyrighted. However, sometimes the associated metadata, databases, figures, software, or work that could be considered a ‘creative’ output can be considered an asset you want to control reuse or redistribution over by applying appropriate licensing. Licenses define how others may interact with, reuse, modify, or redistribute your work. Choosing a license for your data ensures that it is used appropriately by other researchers.

For scientific or factual data, many researchers choose to apply a Creative Commons 0 license, so that the data is distributed freely.

For creative works, there are varying levels of Creative Commons licenses that you can choose to apply. The license levels build on one another and range from unrestricted to fairly restrictive terms. The Creative Commons website has a good guide that can help you decide what restrictions you may want to apply including request for attribution or non-commercial use.

To learn more about Creative Commons licenses and how they differ from copyright view Lesson 2 of the Copyright and Fair Use micro-course.


creative commons symbol
Creative Commons

For software or code, there are multiple choices. You will want to select from a license you are comfortable from ones such as the GNU licenses, MIT license, or Apache licenses.

Citation Standards

You should cite datasets for the same reasons you cite books and journal articles: for dataset creators to receive appropriate credit for their work, and to make clear the antecedents to your research.

Data citation standards may vary between disciplines and some professional organizations, academic journals, and repositories may also have guidance on preferred data citation formats. However, in general, the information you capture in a data citation is similar to the information included in a citation for any other work.

The Inter-university Consortium for Political and Social Research (ICPSR) suggests the minimum elements of data citation as:
  • Title
  • Author
  • Date
  • Version
  • Persistent identifier (such as the Digital Object Identifier, Uniform Resource Name (URN), or Handle System)
    • Note: Persistent identifiers are preferred, but if not available a URL will suffice.

Brief Introduction into DOIs

As mentioned above, a persistent identifier is a useful piece of information for data citation. A commonly used persistent identifier for research data is a DOI, or a digital object identifier.

A DOI is a series of alphanumeric characters that serve as a unique identifier for a specific publication, dataset, or other digital object. The DOI for that object won’t change over time as the URL might if the web page is moved or the website is changed. Instead, the URL is information that is attached behind the DOI and it can be updated over time.

This allows researchers to make their datasets easier to locate, access, and cite for other researchers. Reliable location, identification, and citation by others is a critical component for researchers to enable reuse of their data, replication of their work, and to track the impact of their research.

A DOI has to be provided by a DOI Registration Agency, however many publishers, repositories, and institutions work with such agencies in order to provide DOIs for their communities. When sharing your data, we recommend checking with your publish or repository to see if a DOI is provided for you.


[1] Briney, K. (2015). Data Management for Researchers : Organize, Maintain and Share Your Data for Research Success. Exeter, UK: Pelagic Publishing.