The LTER Network All-Site Bibliography
- James W Brunt, LTER Network Office (LNO)
The U.S. Long-Term Ecological Reseach (LTER) Network All-site Bibliography serves to account for the scientific contributions of the LTER Program, facilitate cross-site synthesis and synthetic studies, and generate new interest in LTER sites and LTER research. The infrastructure supporting the bibliographic records and the functionality of it's interfaces continue to evolve as described in Brunt and Maddux (2002). There have been some major jumps forward – each time producing a functional new product. However, until now, none of these products have been maintainable and thus quickly became obsolete. The first successful attempt used unique delivery scripts for each site that quickly became obsolete. This solution also relied on indexing software that, you guessed it, quickly became obsolete. The task of building a distributed all-site bibliography was intractable because of the heterogeneity of the way sites were storing and managing their bibliography data and the frequency at which these methods changed. More recently, we standardized on a particular end-user software package, EndNote®, to take advantage of a proprietary web-publishing solution, Reference Web Poster®. This solution was successful because of the standardization but extremely limited in the way information could be retrieved and used. Neither of these previous solutions had the power of an open relational database management system behind them. This was the most important requirement in attempting to provide any value-added components to this very useful data set.
The LTER All-Site Bibliography has now hopefully completed its last platform migration. Now, finally, implemented in a highly normalized relational database model, it should serve the LTER Network for years to come. Moving the bibliography database out of proprietary software and into an open rdbms framework has allowed us to now focus on generating standards-based outputs, easing the burden of updates for sites, providing useful searching and reporting, and providing virtual bibliographies.
The LTER Network Information System (NIS) working group, in 2002, established the following requirements for the All-Site Bibliography database and interface improvements:
- Must be able to accept EndNote export format as input
- Must be able to distinguish duplicates from new entries and provide appropriate administrative responses
- Must be able to house multiple media types and distinguish between them
- Must be able to provide site lists and individual lists based on authentication
- Must be able to produce endnote export format
- Must be able to provide personalized lists
- Must have compatible elements to satisfy Ecological Metadata Language (EML) literature
- Must have z39.50 connection for compatibility with bibliography clients
These requirements have now all been met with the implementation described here.
The database model is based on the EndNote generic type with additional attributes for LTER specific content and additional tables to manage authors names and keywords at a higher level of normalization. The model was adapted from one in use at CAP LTER.
Figure 1-1. LTER All-Site Bibliography Architecture
In 2002 we mapped out an architecture for the future of the LTER All-Site Bibliography (Brunt and Maddux, 2002). Although all components of the architecture were not specified we have followed this architecture as a roadmap for development. Figure 1-1 shows the current architecture as it is implemented. The 26,000 bibliographic data records are now housed in a MySQL database management system instance running on MS Windows 2000 Server®.
There are currently two interfaces for searching, web via php (http://search.lternet.edu/biblio/) and Z39.50 (See Box 1-1; Note: screen shots of the interfaces are not included here because they are at present too dynamic). The php scripting is implemented under an Apache 2 web server running on RedHat Linux ES. These scripts set a PHPSESSID variable on the local browser to handle the marking and exporting of subsets of bibliographic entries. A simple search that searches on title, author, and keyword fields and an advanced search that offers some limited boolean searching on specific fields are implemented. The EndNote generic type information is automatically converted in the search interface and displayed based on the reference type attribute in the web search and administrative interfaces. For example, a journal article will display the word "Journal" instead of the generic type attribute "Secondary Title".
The Z39.50 compliant server is implemented in java as a swing application and uses an XML serializer and xalan stylesheet processor to map between the database fields and the Z39.50 map (Figure 1-2). Z39.50 is widely used by the library community and is based on the dublin core metadata standard. Which means you can now search the bibliography via Z39.50 compatible clients. We specifically mapped this server to the EndNote generic type to make it more useful as a connection client tool in EndNote. Once the references are downloaded their document type can be changed in EndNote for better display. Z servers are not terribly fast, so if you do a large search and retrieval it could take a while. In this prototype, we have not attempted to optimize for speed of delivery.
Figure 1-2 – LTER All-Site Bibliography Z39.50 Gateway
Most of the common fields contained in the bibliography database have been mapped to the Z server and back out to the EndNote fields again. Some improvements on searching, allowing searching by Site Code and Reference Type (Journal Article, Book Section, etc.) have been made in this release. Even though these fields don't normally show up in EndNote it is possible through the miracle of XSLT to query them anyway and return the EndNote formatted records You can now get all the Journal Articles in one query. You can also make them show up as EndNote Journal articles if you so desire by editing the connection file ( under the edit pull down ) and resetting the "default reference type" to Journal Article. Yes, kind of a round about approach but unless it's a big search it's not that difficult to change them and the citation formatting seems to know what to do with them even if they are generic.
The administrative interface is also php based but calls perl scripts to do the bulk upload and EML export functions. These scripts rely on the LTER Network Office (LNO) security framework variables of "primary site" and "user id" to determine identity and authorization for updating and managing. Authorization can be either site or network in scope. Administration web forms allow the site designate to manage all the bibliography entries associated with their site and to add new ones.
From: http://search.lternet.edu/biblio/ - select the "(Manage Site Bibliography)" link. The user is prompted for an LTER network ID password and will then see their site bibliographic records sorted by year descending. From here you can view individual entries, page through the entries, search for specific entries, edit individual entries or add new entries. Edit, and delete functions are provided for each bibliography entry. Entry can either be by single entry form or via a bulk upload process. The bulk upload process allows for manual validate, ingestion, processing and deletion. During the ingestion process the duplicates are identified based on title, author, and year fields. During the processing phase duplicates can either be accepted as updates or deleted – this can be done individually or for the entire upload.
Upload an EndNote Import File - will allow you to upload an endnote import file directly into the database and then get some feedback on the process. We set this up as a push instead of a pull because different sites are usually ready at different times. Note: The EndNote Import file format is based on the Refer / BibIX tag system although it has undergone some evolution to accommodate new reference types. (See: http://savanna.lternet.edu/reports/endnotetags.php ).
Upload an EndNote Import Stream from a URL - will allow you to upload an endnote export data stream directly into the database from a URL. This will also drive the harvesting function which isn't complete yet. Box 1-2 demonstrates the URL provided by Wade Sheldon at the Georgia Coastal Ecosystem LTER (GCE) that can be tweaked to provide the whole bibliography or parts, including only new publications.
We continue to work on the robustness and usability of the interfaces and the data quality. At this point there are a number of data quality issues that need to be resolved with existing data.
- Designation of the publication as LTER funded. Some sites track this while others do not. We're trying to make it as easy as possible to designate.
- Creating individual author entries. EndNote now supports individual author entries and those sites that use them are in good shape. For those sites that don't we are trying to parse the existing author strings into individual authors. Once we're done the site can download the EndNote file to replace there local one. This level of granularity is necessary to be compliant with EML and to allow the linking of individuals to publications in the database.
In addition, we've taken on some new requirements that have been requested to make the all-site bibliography more useful:
- the implementation of a harvest function,
- the capability of providing virtual bibliographies for site web sites,
- a web services interface for updates, and
- linking individuals in the LTER personnel database to bibliography entries.
Virtual Bibliographies - At press time this feature isn't completed yet but work is in progress. This will allow sites to download the PHP code and implement the search routines on their site web page as if they were local or have them implemented at LNO under site.lternet.edu/biblio.
The php code, perl code, java code, database model and configuration examples will be archived on LNO CVS for this project (http://cvs.lternet.edu).
This article is based on work supported by the U.S. Long Term Ecological Research Network Office which is funded by National Science Foundation Cooperative Agreement No. DEB-02-36154, and the University of New Mexico. T hanks to Troy Maddux and all the LTER Information Managers that contributed data and ideas to this effort, particularly Peter McCartney, Wade Sheldon, John Campbell, and Eda Meléndez-Colom.
Brunt, James, and Troy Maddux. 2002. LTER All-Site Bibliography 2002 – Update. Databits Fall 2002. (http://intranet.lternet.edu/archives/documents/Newsletters/DataBits/02fall/).
Chinn, Harvey, and Caroline Bledsoe. 1997. Internet access to ecological information – the US LTER All-Site Bibliography Project. Bioscience 47(1):50-58. (http://www.aibs.org/bioscience/bioscience-archive/vol47/jan.97.computer.html)