Skip to Content

Are LTER data online?

Printer-friendly versionPrinter-friendly version
Spring 2012

Margaret O'Brien (SBC) and Don Henshaw (AND), IMC co-chairs

The perception of whether LTER data are online is being formed in part by whether data are discoverable and accessible through a single network portal, and in particular, our Network metadata catalog. Sites have progressively added metadata content in Ecological Metadata Language (EML) to this Metacat catalog since its establishment in 2005. Over the ensuing years, the style of contributions to this catalog has varied enormously. For some sites, the understanding was that even sparse EML was adequate to lead a user back to the site’s catalog for data and more information; other sites provided EML that could directly deliver data along with metadata; and a few adopted EML as the basis of local site information systems as well as for network contributions. The bottom line though, is that although our varied practices reflect our local traditions, they have not gotten us closer to becoming a cohesive group.  The goal of our network metadata catalog has not been clear and has suffered from lack of attention, and consequently - correct or not - the perception is that LTER data are not easily available online.

Our goal now is to change that perception. LTER data distribution policies are exceptional and our systems are widely admired and copied.  The most recent views of the Network Metacat catalog show that only a few percent of datasets have no link to data at all.  Many scientists outside the LTER have said “you mean I can just go to your website and download your data?!” Clearly, we have made great strides toward thoughtful and pragmatic data publication practices. However, the completeness of the metadata and the ease in which data are accessible is highly variable and uneven among sites and is in need of immediate improvement.

NSF has made increased data availability a high priority. They have emphasized that the Network catalog should provide access to site data - not just metadata, and that the volume should reflect all work at the sites, including data not intended for PASTA. These two expectations - simplified discovery for all data via the network portal and automated use of “PASTA-ready” data - are not necessarily conflicting or insurmountable, but will take commitment to network goals from all sites.

Here are recommendations from IMExec for making data more easily discoverable, accessible and usable through the network portals:

  • Inventory your LTER-funded studies and catalog all site data sets. We still need a develop practices to clearly identify and justify different types of data, including Type II.
  • Prioritize the development of data sets for inclusion in the LTER metadata catalog and for PASTA. Priorities should be driven by scientific questions.
  • Improve metadata content to improve discoverability
    • Improve data set titles, abstracts, add LTER controlled vocabulary keywords
    • Improve data entity and attribute descriptions, and
    • Adopt network standards for URL construction and location
  • Improve EML documents to comply with EML best practices, particularly in the placement of URL links to data sets at the entity level, allowing the network catalog to better represent site metadata and the automated processing of site data sets.
    • Become more familiar with EML best practices
    • Consider automated approaches to generating EML, e.g., DEIMS, Metabase
  • Be prepared to simplify or remove web forms or cumbersome logins that might be obstacles to data access, per evolving Network recommendations.

There is obvious benefit to improving metadata and increasing the amount of data available through the network data portal. And while it may require significant resources, the Network became obligated to do so in 2003 when the LTER Coordinating Committee unanimously passed a motion to adopt “a tiered trajectory toward improved IM functionality for synthesis, and the trajectory increasingly incorporates common, structured metadata - the network adopts a general goal of improving each site's position in the trajectory”. In the spirit of this tiered trajectory we plan to target certain site data sets to be “PASTA-ready” and establish specific scientific workflows to build value-added data products. As LTER is faced with demonstrating the potential of the NIS and justifying the investment, improvement in the quality of site data and metadata will go a long way towards illustrating its value.