My data’s data: a metadata story

Bridging Science and Community with Data

Blog by Ashley Whipple, Biologist & EDI Fellow 2020

Ashley releasing an American pika during one of her many field work experiences

Even though the work can be laborious, studying the natural world and its wildlife is enthralling. I enjoy being a biologist and doing field work, and the truth is, I love breaking out the clipboard and scribbling down information on datasheets! I have dedicated most of my early career to field work, but there’s a whole other side to fieldwork and data collection that gets little attention. Let me ask you this: did you ever wonder what happens to the data we work so hard to collect? Where does it go? Who analyzes it? Who transforms those data into useful information that non-biologists can use and understand?

For many, data management is not part of the job as volunteers or field technicians. However, the steps between compiling the data and sharing it with the public takes a lot of effort and is a large part of a scientist’s job. This summer I learned what it takes to clean, organize, and archive data for public use. I had the pleasure of training with the Environmental Data Initiative (EDI) and working with the Pepperwood Foundation as a fellow. EDI is a National Science Foundation-funded program whose goal is to promote and facilitate the compiling and re-use of environmental data. Every summer, EDI collaborates with a handful of host sites like Pepperwood to mentor aspiring scientists and teach them data management skills. Public and accessible information is important because it can facilitate new scientific research and promote cross disciplinary partnerships.

So, how do we make data publicly available? The first step is to clean and organize the data. This means you carefully comb through the information and look for any errors or mis-entered values. Next, you organize it into a logical table format. Column names should be simple and consistent, and dates should be in an unambiguous format such as YYYY-MM-DD. This allows anyone to understand the table. The last step is to create metadata. Metadata is data about your data; the who, what, when, how, and why. Metadata should be rich in information so that anyone can read it and understand what your data is and how they can reuse it.

Annual Point Count Breeding Bird Survey. Photos of California Quail (left) & Northern Saw-whet Owl (right) by Gerald and Buff Corsi

To make the data public, a package (i.e. data and metadata) is uploaded to the EDI Data Portal. Pepperwood’s research along with many other organizations are available on the EDI Data Portal. Check out 13 years of Point Count Breeding Bird Survey data from Pepperwood here:

My fellowship was a quick two and a half months and I learned so much. If you are interested in best data management practices or want to take part in an EDI training, find more information here:


Post a comment