Now that we’ve talked about types of data and why we should manage data, we’re ready to move on to our data management essentials! But first, we always suggest one of the best things you can do before starting a project is to brainstorm all the data types and forms you’ll be producing. One way you can do this is to conduct a data inventory of all the activities or instruments that will produce, collect, manipulate, or analyze data during your project.
Making a data inventory part of your planning stage, can you help you improve the choices you make for organization, storage, and the other best practices we’ll cover here because you’ll have a better understanding of the scope of your data as well as any big changes that will happen to it during the project.
Having a good file naming and organization method is one of the simplest things you can do to make a huge impact on your data management! However, it can also be one of the hardest things to change in your data management practices, because it’s often something we do by hand and changing our personal habits can be difficult.
Though it can be difficult to implement, a good file naming convention and folder organization method can make quick improvements to your research process. It makes your data easier to search through and it makes it easier distinguish similar files or versions from one another. It also provides built-in description about the contents of the file and can make it easier to share documents with collaborators as they’ll be able to find and understand the file.
One of the most important things for file naming is to develop a naming convention, a template of standard information you’ll use in most file names, and to always use that convention anytime you have multiple related files in a folder. Without a set convention, you may end up recording haphazard information, or not capturing enough important information, each time you create a file: this will make it harder to remember what keywords you can use to search for the file.
There is not one recommended naming convention that will work for everyone. Each project and person are different, but below we’ve laid out some suggestions for creating a file naming convention.
You might also want to distinguish new versions of your data with significant changes from versions with only minor changes. One way to do this is to include major and minor revisions in your file-naming system.
For example, if you’re making minor changes to a file with a name that ends in “_v01,” you could name the new version “_v01-01,” meaning the first minor change to the first version of the file. The next time you make major changes to this data, the file you create is labeled “_v02,” because it’s a new version.
Be sure to use leading zeros with these numbers. Leading zeroes allow the computer to correctly sort files whose names include numbers that increase sequentially. For example, if you expect to have 10 or more versions of your data, you would write the version number in your file name as 01, 02, and so on, until you get to 10.
There are also tools out there like Git, for those who need more extensive version control. However, be aware that tools like Git should be used with plain text files and not Microsoft or other proprietary formats. We’ll talk a bit more about this later in the module.
The best naming convention for your files will vary from project to project, based on the type of research you’re doing, and what the most important pieces of information to capture about your data are. Working to implement a good naming convention will make a huge difference when searching for, working with, and collaborating on files.
Another key component of an effective file management strategy is establishing a well organized hierarchical folder structure to go along with your file naming conventions.
While there’s no easy answer as to how many folders you should have or how to best organize them, the trick is to create a structure that balances breadth versus depth.
Try to limit the number of top level folders you create and try to limit the number of folders nested within those as well. You don’t want to create so many layers in your hierarchy that accessing the actual data files becomes difficult but you also don’t want to have too many files within each folder which may also make finding your data more difficult.
Suppose you’re creating a file for data about chemicals that you’ve found in rainwater. You want the name of the file to contain the following information:
Select the name for this file that follows the best practices for naming a file:
This file name avoids spaces and special characters, uses the YYYYMMDD date format, and includes leading zeroes where appropriate.
Try creating a file name based off the metadata provided below that adheres to the best practices we've covered. Remember files names should be brief, meaningful, use standards, and avoid special characters. A meaningful filename should capture important information (metadata) about the file and make it easier for you to find and understand the file later.
Participant ID: 62814
Date of the initial interview: Friday, April 7, 2017
Information about the file: This file is the second half of an interview with the participant as the recorder filled up and the first file had to be transferred to the interviewers computer mid-interview.
File type: .mp3
Interviewer: Professor Sounds