Lesson 4: Storage and Backup

Introduction to Research Data Management

Storing and Backing up Your Data

In a brief sentence, describe how you currently store and backup your important files.

Hint: It's okay to not have a system in place! In this section, we’ll talk about ways to improve your storage and backup.



4.1 Data Management Storage and Backup

Let’s define storage and backup, both essential practices for ensuring the care and keeping of your data.

Storage is the act of keeping your data in a secure location that you can access readily. Files in storage should be the working copies of your files that you access and change regularly.

Backup is the practice of keeping additional copies of your data in separate physical or cloud locations from your files in storage. Backup copies are copies you would access in the case of data loss and needing to access previous versions of your work.

Storage systems often provide mirroring, in which data is written simultaneously to two drives. This is not the same thing as backup since alterations in the primary files will be mirrored in the second copy.

Good storage and backup practices help protect your data and research from losses due to hardware failure, natural disaster, or file corruption. You spend a lot of time collecting your data, so ensuring you have a good system for backing up your data will prevent you from having to spend time trying to recover your files, recollect data, or redo any cleaning or analysis.

Other benefits:
  • A granting agency may require that you retain data for a given period and may ask you to explain in a data plan how you will store and back it up.
  • Storing and backing up your data ensures that it will be there when you need to use it for publications, theses, or grant proposals.
  • Good preservation practices help make your data available to researchers in your lab/research group, department, or discipline in the future.

How many copies of my data should I have?

A good rule of thumb to remember is LOCKSS, or Lot Of Copies Keep Stuff Safe. However, you don’t need to go overboard with the number of copies you have. Typically, the rule to follow is the rule of three.

Three copies, in at least two physically separate locations, on more than one type of storage hardware.

This might look like:
  • A copy in active storage, that means a copy you are regularly accessing and working on during your research. This will likely be on your computer or a shared network drive in a lab.
  • A second copy on a different device on or off-site, such as an external hard drive in your office or a backup server provided by your IT department.
  • A third copy, preferably off-site. This might be on a cloud application like Box, Google Drive, or other appropriate cloud solution.

The goal here is to get your backups and storage as physically far apart as possible to prevent any loss due to natural disaster, such as a fire or flood occurring in the lab where you’re doing research. If your backups are all housed together, it could ruin both the primary copy of your data and any backups that you keep in the same building. Having at least one off-site backup increases the chances that you can restore your data if such a disaster happens.


Photo of a person evaluating data on a laptop and taking notes
Backing up data in a secure location

Setting a Schedule

Backing up your data can be done automatically or manually, depending on your level of comfort with those types of systems.

If backing up your data manually, you’ll want to determine how often you should back up your research data and will need to weigh the benefits of having up-to-date backup copies against the work involved with frequent backups. Once you’ve determined how often you should back up your data, set a schedule for regularly doing so.

It’s important to remember that backing up your data doesn’t require backing up every bit of data every time. You can also choose to back up only the files that have been changed or added since the last backup. This is called an incremental backup, which requires less time and storage space than a full backup.

There are also a number of automatic options depending on the hardware or cloud systems you are using. Some cloud tools, like UW Box, have a sync option that will automatically sync certain files and folders depending on the settings you provide. The IT contacts in your departments may also have automatic solutions for you.

Other Important Notes

Terms of Use: When you are deciding on cloud applications to use for your data, always read the terms of service so you know what permissions you are granting the company that supports the application and how any data might be potentially shared. Part of protecting your data is understanding the risks to your data - and that includes knowing what risks could come through your storage and backup tools.

For those going to school or working at UW-Madison, we recommend that you always use your institutionally provided Box and Google Drive accounts over your personal accounts. UW-Madison has an agreement with Box and Google to provide more intellectual property protections than your personal accounts would provide.

For those not part of the UW-Madison community, just be sure to always read the terms of service and understand what you’re agreeing to. It’s unlikely that those terms of service would ever be exploited and harm your data, but it may help you make decisions about what tool is right for you and your data.

Sensitive Data: Remember, if you have sensitive data, be sure that any applications you choose to are approved for that type of data. For example, here at UW-Madison if you have sensitive data you should be using Box or a network drive or if you have protected health information, there is a more secure version of Box you can apply for.

USBs: Be cautious about using USB flash drives to backup your data. They have some advantages that can make them an appealing option: they’re affordable, they’re convenient, and you probably own at least one already. However, flash drives’ portability makes them easy to misplace, have stolen, or accidentally break.

Limiting access: Regardless of the storage and backup solutions you choose, limiting access to your data is an easy way to provide an extra layer of security.

Ways that you can do this include: limit physical access to data and storage solutions by keeping offices locked or restricted as appropriate, remove old collaborators who no longer need access from shared solutions, and don’t travel with your data on a physical device if you can avoid it.


4.2 Where to store your data at UW-Madison

Option Description Capacity Security Best Use
Campus Computing Infrastructure (CCI) Shared, scalable, secure options for a variety of needs, from home/group directories to long-term archiving Varies. Scale-Out Storage costs $0.20/GB/yr, based NetID restricted. Can add permissions for campus and external users, via Manifest, a tool that allows departments to authorize users to log in to their resources using groups of NetIDs, and allows for the creation of new NetIDs for UW affiliates and collaborators. Shared Storage can scale up to hundreds of TBs. Contact CCI to schedule a meeting with the CCI Engagement team to discuss your needs.
UW-Madison Box A cloud solution for storing, managing, and sharing files provided by UW-Madison. Unlimited storage, free to UW-Madison faculty, staff, and students NetID restricted; can add permissions for campus or external users.
For protected health information, the privacy and security coordinators should be contacted to see if you should be using secure Box folders.
Versions your files automatically.
Stores and transfers files securely.

Information about security of Box apps

Box provides useful fine-grained controls for sharing files and folders outside of Box.

The UW-Madison enterprise agreement protects the intellectual property rights of UW-Madison faculty, staff and students (unless shared with others outside the university).

UW G-Suite Google Drive provides a cloud based solution for storage and collaboration. Unlimited storage, free to UW-Madison faculty, staff, and students NetID restricted; can add permissions for other campus users, using their NetIDs.
Not a secure environment for restricted personal data
The UW-Madison enterprise agreement protects the intellectual property rights of UW-Madison facstaff and students (unless shared with others outside the university).

Google Drive can be more useful for real-time collaborating than Box.

LabArchives An electronic lab notebook software licensed by campus for researchers, staff, and student performing research activities. Unlimited storage with a individual file size limit of 4GB. Accounts must be created at request of the PI by the ELN team.

The UW-Madison LabArchives instance provides extra data security such as encryption and firewalls.

May not be appropriate for sensitive data, human subjects data, or other restricted data types. Consult with the ELN team, IRB, or your local security officers.

Accepts many file types, allows versioning, securely stores files.

Supports multiple user roles, permissions.

The UW-Madison enterprise agreement protects the intellectual property rights of UW-Madison facstaff and students.

LabArchives is currently on a multi-year license and the ELN team suggests keeping an exported archival copy of your notebook at the end of a project.

Departmental server or storage network Your department's IT unit may offer storage on their server or network. Varies Protected by user accounts and passwords. Contact your department's IT unit for information.
External hard drives- available at DoIT Tech Store Flash drives, CDs, and DVDs. Varies Not secure unless kept in a secure location and sensitive data are encrypted. Best for short-term storage (approx 1-5 yrs) since media formatting can fail.
Portable media available at DoIT Tech Store Flash drives, CDs, and DVDs. Varies Not secure unless kept in a secure location and sensitive data are encrypted. Best for short-term storage (approx 1-5 yrs) since media formatting can fail.
Third Party Cloud Storage Dropbox and others. Varies Varies UW-Madison has no negotiated terms of services with these providers. See Guidelines for use of non-UW-Madison applications for research for help evaluating your risks and rights.

4.3 Data Backup Options at UW-Madison

Option Description Size & Cost Notes
Bucky Backup A managed service for data backup and recovery solution that utilizes IBM’s Tivoli Storage Three levels are available: Lite, Enterprise, and Archive. Varies, depending on the level of service. Bucky Backup Lite starts at approximately $0.20/GB/year. Allows you to schedule automatic backups for critical data.

The archive service should not be used for backup and files you will need to overwrite. Archiving is for preserving files as they are.

Compare service levels
UW Box A cloud solution for storing, managing, and sharing files provided by UW-Madison. Unlimited storage, free to UW-Madison faculty, staff, and students Box can also be used as a backup option, rather than an active storage option.

Box Drive and Box Sync are both applications that allow you to edit files on your desktop and have changes synced to Box.

Versions your files automatically.

Information about security of Box apps. Box provides useful fine-grained controls for sharing files and folders outside of Box.

For protected health information, the privacy and security coordinators should be contacted to see if you should be using secure Box folders.

The UW-Madison enterprise agreement protects the intellectual property rights of UW-Madison faculty, staff and students (unless shared with others outside the university).

Departmental Server Your department's IT staff may offer backup. Varies Contact your departmental IT staff for details.
Departmental Server Your department's IT staff may offer backup. Varies Contact your departmental IT staff for details.
External hard drives- available at DoIT Tech Store External hard drives Varies Remember to have another backup copy available as hardware can fail. Flash drives are also available but remember that they are easily lost and easily corrupted.

Remember: Verifying your backups

Backing up your data doesn’t always go according to plan. It’s important to check your backups periodically so that you know you can restore important data from one of your backup copies if necessary.

Set a schedule for checking your backup data integrity. Make sure to check that the correct files were backed up, that they do not contain errors, and that they are the most up-to-date versions of the files.

One way to do this is a checksum algorithm, which compares backup files to the originals to make sure the backups are accurate.


At the beginning of this section we asked you to briefly describe your current backup and storage practices. From what we've learned about in this section, what of the options below could you implement to improve your practices?

Keep backups of important data on an external storage device.

Move my external storage device into a different location than my computer or other storage.

Set a schedule for/begin an automatic system for backing up my important data.

Move my data off of a USB (thumb drive) to a more stable backup option.

Other: