Lesson 2: File Naming & Organization

Introduction to Research Data Management

2.1 Data inventory

Now that we’ve talked about types of data and why we should manage data, we’re ready to move on to our data management essentials! But first, we always suggest one of the best things you can do before starting a project is to brainstorm all the data types and forms you’ll be producing. One way you can do this is to conduct a data inventory of all the activities or instruments that will produce, collect, manipulate, or analyze data during your project.

Making a data inventory part of your planning stage, can you help you improve the choices you make for organization, storage, and the other best practices we’ll cover here because you’ll have a better understanding of the scope of your data as well as any big changes that will happen to it during the project.

2.2 File naming and organization

Having a good file naming and organization method is one of the simplest things you can do to make a huge impact on your data management! However, it can also be one of the hardest things to change in your data management practices, because it’s often something we do by hand and changing our personal habits can be difficult.

Though it can be difficult to implement, a good file naming convention and folder organization method can make quick improvements to your research process. It makes your data easier to search through and it makes it easier distinguish similar files or versions from one another. It also provides built-in description about the contents of the file and can make it easier to share documents with collaborators as they’ll be able to find and understand the file.

File Naming Best Practices

One of the most important things for file naming is to develop a naming convention, a template of standard information you’ll use in most file names, and to always use that convention anytime you have multiple related files in a folder. Without a set convention, you may end up recording haphazard information, or not capturing enough important information, each time you create a file: this will make it harder to remember what keywords you can use to search for the file.

There is not one recommended naming convention that will work for everyone. Each project and person are different, but below we’ve laid out some suggestions for creating a file naming convention.

How to create a file naming convention:


Example filename: Mendota_Buoy6_20180711_v2

  • Be brief. Best practice is to pick 3-4 key pieces of information about the files.

    • In our example file name, we’ve chosen 4 key pieces of information - the lake and buoy the data was gathered from, the date it was gather, and version number indicating there was a previous version of this dataset.
    • For some computers, long file names can cause issues when trying to load or open the file. More practically though, long filenames can also make it much harder for you to search and locate your files.

  • Be meaningful. Your file name should provide a clear description of the data.

  • In our example file name, Mendota is the lake and Buoy 6 is the location at which the the data was gathered which gives us important metadata about
    • Where the data was collected.
      • This might be labels you’ve assigned to specific locations or instruments (for example, “Sensor5” or “SiteB”) or geospatial coordinates.
    • When the data was collected.
      • The date of collection is often important for analyzing your results. The time might also be significant, particularly if you are collecting data multiple times a day.
    • The method used to collect the data.
    • The name or initials of the researcher who collected the data, if multiple researchers are collecting data for your research.
    • The version of the data, if multiple versions are being created or you're working collaboratively and want to record significant changes.


  • Use standards.

    • Dates should always be formatted YYYYMMDD or YYYY-MM-DD. This is an international standard, ISO 8601. Using a standard makes sure the date will be interpreted the same way every time by yourself or others.
    • If your discipline uses certain scientific standards for categorizing things, animals, or people (also called a controlled vocabulary or ontology), be sure any relevant information in your file names uses that standard. This ensures you’re describing your data following the best practices in your field.

  • No special characters or spaces. Special characters and spaces can cause interoperability issues across hardware and software and can also prevent data from being imported to certain programs.

    • Use dashes ( - ), underscores ( _ ), or camelcase (capitalizing the first letter of each word - e.g. CamelCase) instead. These won't cause issues for hardware or software reading in the files.

  • Avoid common words like 'draft' or 'final'. Words like draft and final can cause confusion if you’re creating multiple drafts, they often stack up and we end up with file names like “research_final_final_draft” which causes confusion. Instead, use version numbers or dates to provide more information regarding the stage of your project and which draft to use.

    • An easy way to version file names is to add “_v01” to the file name, usually just before the period and file extension, and to increase the number by one for each new version.

      You might also want to distinguish new versions of your data with significant changes from versions with only minor changes. One way to do this is to include major and minor revisions in your file-naming system.

      For example, if you’re making minor changes to a file with a name that ends in “_v01,” you could name the new version “_v01-01,” meaning the first minor change to the first version of the file. The next time you make major changes to this data, the file you create is labeled “_v02,” because it’s a new version.

      Be sure to use leading zeros with these numbers. Leading zeroes allow the computer to correctly sort files whose names include numbers that increase sequentially. For example, if you expect to have 10 or more versions of your data, you would write the version number in your file name as 01, 02, and so on, until you get to 10.

      There are also tools out there like Git, for those who need more extensive version control. However, be aware that tools like Git should be used with plain text files and not Microsoft or other proprietary formats. We’ll talk a bit more about this later in the module.

The best naming convention for your files will vary from project to project, based on the type of research you’re doing, and what the most important pieces of information to capture about your data are. Working to implement a good naming convention will make a huge difference when searching for, working with, and collaborating on files.


2.3 Hierarchical organization or folder best practices

Another key component of an effective file management strategy is establishing a well organized hierarchical folder structure to go along with your file naming conventions.

While there’s no easy answer as to how many folders you should have or how to best organize them, the trick is to create a structure that balances breadth versus depth.

Try to limit the number of top level folders you create and try to limit the number of folders nested within those as well. You don’t want to create so many layers in your hierarchy that accessing the actual data files becomes difficult but you also don’t want to have too many files within each folder which may also make finding your data more difficult.



Photo file folders
Organized paper file structure
Below we provide some examples and further tips for creating an organized and useful folder structure.

Examples:

    (1)
  • MyDocuments\Research\Sample12.tiff
  • vs.

  • C:\\NSFGrant01234\Data\WaterQuality\LakeMendota_20141030.tiff
OR

    (2)
  • PROJECT 01
    • Administrative
    • Outputs
      • Publications
      • Presentations
    • Grant 01
    • Grant 02
    • Data
      • Raw
      • Analyzed
    • PROJECT 02

    • PAST PROJECTS
      • 2013
        • Project 03
          • Data
      • 2014


Best Practices:

  • Apply your naming conventions to your folder names. Make sure each folder has a brief but descriptive name. Avoid folder names that are ambiguous or overlap with the names of other folders. In the examples above the folder names are clear and indicate the category or subject of the files within.

  • Group by similarity, function, or topic instead of by file type.
    • If you collect a lot of data in a certain file type, like images, you can end up with a folder with too many files in it. Notice that in the examples, none of the folders are specific file formats or types.
    • Example 1 instead uses a ‘WaterQuality’ folder under a ‘Data’ folder. With this structure, we can tell right away that everything within that folder is data we’ve collected and it relates to the water quality. The file name then provides further information about the location and date.

  • Distinguish between past and active work. Create archive or yearly folders that you can put past work in. This can help clean up your folder structure quite quickly!
    • In example 2, you can see the use of a ‘past’ folder which archives work from previous years. Using this type of structure means you’re narrowing your search scope and looking among a smaller set of folders to locate data.
    • In example 2, the folder structure archives past work by the year that work was done - you can choose to organize your ‘past’ folder in whatever way is most effective for you.

  • Create folders to keep your raw data separate from your processed data. This will prevent you overwriting your raw data and losing any important information.
    • This is illustrated in example 2 which has a ‘Data’ folder - which is a great way to ensure that you alway know where to look for your data files.

Check your understanding

Suppose you’re creating a file for data about chemicals that you’ve found in rainwater. You want the name of the file to contain the following information:

  • Description of what the data is: Concentrations of chemicals in rainwater
  • Location where the rainwater was collected: Collection Site #4
  • Date when the rainwater was collected: July 20, 2018
  • Version of the file: version 1

Select the name for this file that follows the best practices for naming a file:


Answer: C
This file name avoids spaces and special characters, uses the YYYYMMDD date format, and includes leading zeroes where appropriate.



Challenge Activity

Try creating a file name based off the metadata provided below that adheres to the best practices we've covered. Remember files names should be brief, meaningful, use standards, and avoid special characters. A meaningful filename should capture important information (metadata) about the file and make it easier for you to find and understand the file later.

Participant ID: 62814

Date of the initial interview: Friday, April 7, 2017

Information about the file: This file is the second half of an interview with the participant as the recorder filled up and the first file had to be transferred to the interviewers computer mid-interview.

File type: .mp3

Interviewer: Professor Sounds