Skip to Content

Dataset Attributes and the LTER IMC: A First Step

Printer-friendly versionPrinter-friendly version
Issue: 
Fall 2010

Corinna Gries (NTL), Wade Sheldon (GCE), Karen Baker (CCE, PAL)

A virtual water cooler was held 1-2 Nov 2010 to initiate an LTER IMC process of discussing attribute labels and descriptions. Dataset attribute labels play an essential role in the discovery and integration of data as well as in preparation for mapping to ontologies. The lack of attribute standardization in LTER is being recognized as a major impediment to data discovery as well as to modeling efforts at sites and to cross site synthetic data analyses.

Two existing attribute labeling systems were discussed as examples of different approaches and degrees of standardization. USEPA’s STORET code system combines parameter, medium, method, units, etc., within a five digit code number. This rigid approach has led to some problems like force fitting and a proliferation of extensions to the system to fit local needs. CUAHSI on the other hand allows for flexible combinations of some standardized concepts (attribute name, unit, sample medium) with more free form descriptions (methods). The drawback here is that sample methods need to be organized and evaluated by users prior to data synthesis.

Having learned some hard lessons about under-estimating the scope of work involved in standardization (e.g with the Ecological Metadata Language efforts and EcoTrends), it is clear that this undertaking – an information science issue involving classification, category formation, indexing, vetting, managing, and updating over time – represents a large, concerted effort. The goal is broader than a one-time task of standardization; it's a complex community effort involving both practices and tools for developing an ongoing process for attribute standardization.

The first steps include articulating the issues

  • Prerequisites to starting the process
  • Thoughts on scope and coordination of design, development, deployment, and enactment efforts
  • Models of deployment and enactment (e.g. static versus dynamic vocabularies)
  • How to include sites, synthesis projects, PASTA, and end-user input
  • How to incorporate LTER synthesis efforts as part of the process
  • Resource issues (proposals, partners, multi-site LTER supplement efforts)

The next steps include

  • Evaluate current practices and consider Best Practices
  • Review LTER projects ClimDB and Unit Registry as case exampled
  • Consider additional models, e.g. SEADATANET (reference)
  • Consider intersections with LTER controlled vocabulary
  • Survey LTER site attribute models
  • Name and form a working group

An initial step has already been taken by holding an LTER IMC all-site virtual water cooler discussion and presenting the topic for consideration by the LTER Information Management Committee. It is the first step in the process of gathering information and identifying a team of individuals interested in defining the issue. The VTC-initiated discussion revealed a wide range of reactions, with some sites considering this yet another added task and other sites expressing clear visions of how such an effort would be beneficial to site, network, and community level work.

References

SEADATANET. Pan-European infrastructure for Ocean & Marine Data Management and Brisitsh Oceanographic Data Centre (BODC). (http://seadatanet.maris2.nl/v_bodc_vocab/welcome.aspx)