Skip to Content

Validating Metadata at the VCR/LTER

Printer-friendly versionPrinter-friendly version
Spring 2011

Bridget Long (VCR)

Metadata is an invaluable tool for information managers. It allows for increased organization and sharing of data all while giving the information mangers the peace of mind that when archived, future users will be able to utilize the dataset. Having a standardized structure for metadata (e.g., Ecological Metadata Language (EML)) has also helped to increase the value of metadata power by improving its readability, flexibility, and utility for archival processing and usage with software applications. These characteristics of EML have allowed it to flourish in the LTER network and in other ecological projects.

Checking the validity of metadata is an important part of the metadata preparation process. Without validation, threats are posed to the future usage of the datasets. Without accurate “data about data”, users have difficulty knowing what attributes were measured in the dataset or interpreting the data tables documented in the metadata.

I was charged with the detailed checking of the functionality of EML metadata at the Virginia Coast Reserve LTER, specifically,  whether the data tables described in the metadata actually corresponded with the actual data tables. I initially felt overwhelmed at the volume of metadata needed to be validated. However, with the help of a website ( managed by the Taiwan Forestry Research Institute (briefly described at:, the task became more manageable. This particular website takes an EML document and creates a statistical program from it using R. It has several different options for how to input EML data, whether by URL or by uploading an EML file and the output can be either in the HTML-R graphical interface or a command line. A third option (that was not utilized in this project, but is worthy of mention) is the creation of a research location from an EML document with Google Maps or Google Earth.

Using the LTER-VCR data catalog, I systematically put each EML document into the website where it would generate an attribute table with each variable, a short description of the variable, the type of variable, as well as its units and range. After generating the table, then a form for the actual dataset would appear. Here, a text file or a comma-separated value file would be needed for input. After that step, either an error message indicating a difference in the data and metadata would appear or the data table would appear in R.

On the whole, the VCR-LTER data catalog of 167 datasets was in good condition. There were only nine datasets whose metadata needed to be fixed and their errors were mainly restricted to the type of variable and its range. Two other datasets needed additional information for one or more attributes.  Some datasets, that did not have data tables defined (mostly GIS data), were not checked.

Apart from problems in metadata, I was able to identify certain datasets that were not being displayed correctly in the online catalog. These problems were caused by issues such as broken links and some corrupted database keys resulting from inadvertently added trailing spaces.  These issues were resolved using a shell program to correct bad network links and correcting keys using MySQL commands. Without this systematic checking process, these datasets would not have been identified and their problems solved.

Bridget Long is a student assistant working on Information Management at the Virginia Coast Reserve LTER using supplement funds from the 2010 IM Supplement.