Skip to Content

Developing a Controlled Vocabulary for LTER Data

Printer-friendly versionPrinter-friendly version
Issue: 
Fall 2009

John Porter, VCR/LTER

During the one-day 2009 LTER Information Managers Meeting, there was a working session on the creation of a controlled vocabulary for use in providing a more consistent keywording for LTER documents. Past analyses had indicated that LTER were highly esoteric, with over half the keywords used only a single time (http://intranet.lternet.edu/archives/documents/Newsletters/DataBits/06sp...).

The working group presented a draft science keyword list derived using the following procedure:
Inigo San Gil and Duane Costa assembled list of LTER EML keywords from the LTER Metacat, that list was then cross-linked to the NBII Thesaurus (http://thesaurus.nbii.gov), the Global Change Master Directory (GCMD) keywords (http://globalchange.nasa.gov/Resources/valids/ ) and words recently used by Metacat searches. Based on ANSI/NISO Z39.19-2005 (Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularies) standard, words on the list were changed to preferred forms and a list of synonyms created. Specific place names and taxonomic names were removed, because they can be handled more succinctly in other parts of EML documents, or in alternative keyword lists.

Keywords were then selected from the list based on two criteria. Any word that was used in the NBII and GCMD word lists was automatically included, and any word that was used by two or more different LTER sites was included. This draft list was then circulated to LTER Information Managers who suggested additions and deletions. A SurveyMonkey survey was used to get “votes” for or against specific words and changes were made to the list based on the results of the survey.

The final list has 640 keywords, with an additional 148 synonyms (including the non-preferred forms of words used in the list). Thirty-one percent (201) of the keywords were also found in the NBII Thesaurus and 21 words were found in the GCMD keyword list.

Following the report, discussions focused on several issues. The first was what steps needed to be taken to make the final list an “official” LTER list. The sense was that such a step needed to be taken by the LTER Executive Board (EB) and the working group was charged with preparing a proposal to the EB for adoption and on procedures for managing the list.

There was also substantial discussion of other efforts (e.g., SONET, Semtools) that might be leveraged. The group discussed the need to develop tools such as autocomplete keywording tools (similar to the one implemented by Duane Costa on the LTER Metacat to suggest search words) that could be deployed at sites to help in document creation, and semi-automated tools that could analyze document content and suggest words (i.e., semantic annotation).

Also discussed was leveraging the keyword list by creating hierarchical polytaxonomys which link keywords to more general concepts, and Barbara Benson reported on preliminary steps along these lines. Additionally, the group suggested that augmentation of the synonym list, so that a wider array of word-forms could be linked to the preferred list, was desirable. There was also discussion of some innovative ways the word list could be used, such as identifying “unfindable” datasets that had no keywords drawn from the preferred list.

Presentations and notes from the working group, along with the list can be found at: http://intranet.lternet.edu/im/node/489.