Skip to Content

The Santa Barbara Coastal (SBC) LTER's implementation of projectDB using Metabase

Printer-friendly versionPrinter-friendly version
Issue: 
Fall 2011

Margaret O'Brien (SBC)

This article provides an overview of our efforts to integrate a network-wide resource for describing research projects, Project DB, with our local solution for project data.  ProjectDB was developed in a cross-site collaboration of LTER information managers to create software tools to track and catalog research projects (Walsh and Downing, 2008; O'Brien, 2009). Metabase is an extensive relational database in production at GCE LTER, and which was adopted by MCR and SBC in a PostgreSQL implementation. Together, these tools provide the potential to extensively document and present to the public linkages between broad research projects and themes and the data that support them. The remainder of this article describes the tools and tasks involved, outlines our implementation thus far, and discusses future plans, the lessons learned, and the advantages gained to this point.

The tools comprising ProjectDB are

1. an XML schema specification (lter-project-2.1.0), which is based on EML 2.1;

2. XSL stylesheets for HTML display of the XML content;

3. a Javascript for handling a tabbed layout;

4. Cascading Style Sheets for elements which benefit from uniform presentation;

5. XQuery files to return project XML from an eXist XML database.

The system is organized to be available from the Network SVN repository, and all files may be housed in and called from an eXist XML database. The tools work together, and as-is, can be used to display project XML documents from within HTML “frames”. However, SBC chose to integrate projectDB into its information system, as GCE did in 2010 (Sheldon, 2010).

Metabase already has components for housing all information needed for LTER-projects. One of our major goals for Metabase is to export EML datasets, and we chose to first export LTER-project XML. This was advantageous for several reasons: first, the LTER-project schema is a subset of EML schema, and so limiting the scope of exports represented a lower hurdle that would quickly demonstrate the usability of both Metabase and projectDB. Secondly, redesigning the research section of our website was a high priority for SBC scientists. Third, "projects" represent only a fraction of the features available in Metabase, and so made it a good entry point for someone new to the design. Lastly, other Metabase features and content needs can be planned based on lessons learned from these project exports.

We tackled the description of our research projects in phases, and for Phase I (conducted in 2011) limited the work to high-level research "themes" rather than including details of specific activities. The themes will provide groupings for descriptions of specific activities to be added later. We expect that in Phase II we will include specific core activites and add additional browse, search, and cross-link functionality. Phase III will include other non-core activities and associated collaborations.

Project requirements for Phase I were

1. allow for coordination and writing of SBC high-level research themes;

2. provide for information storage in Metabase with export as LTER-project XML;

3. display research themes on the web with a modification of the standard XSL to HTML transformation;

4. provide browsing of themes using SBC keywords groups for “Habitat” and “Core Research Area” in a manner similar to that used by the SBC data catalog so that from a user’s point of view, the two catalogs will have the same “look and feel.”

Specific tasks for Phase I were

1. define the high-level themes and collect information for personnel, abstracts, related papers, images, temporal coverage, keywords, and associated data. Themes did not include spatial coverage during Phase I; 

2. a) store the information in SBC-Metabase, and b) export as LTER-project XML;

3. adapt the browsing and menu design used for SBC’s data catalog (O’Brien, 2010) for the project catalog;

4. adapt the presentation tools developed by the projectDB workshop (i.e., XSL stylesheets, Javascript and CSS) and in use at GCE (Sheldon, 2009);

5. plan for submission of SBC’s projectDB files to the Network SVN repository.

Task 1 (gathering information) required coordination over several months among most of SBC’s scientists and was conducted by SBC project manager, Jenny Dugan, and Lead PI, Dan Reed. Database content was prepared and uploaded to Metabase by the IM assistant, Alex Guerra (Task 2a). The information manager, Margaret O'Brien, conducted tasks 2b-5. Website-related tasks (3 and 4) required a total of about one month, including iterative design with several SBC scientists. Feedback and discussion from the MCR information manager, M. Gastil-Buhl, was invaluable throughout.

Implementation

Project XML from Metabase: Export from Metabase was performed with Perl, and script development was planned so that components can be reused for exporting EML datasets. For Phase I, we did not use an eXist database to hold and query XML; instead, XML was exported to the website’s ‘lib’ directory, which was adequate for our current query needs. Storing static files also meant that output could be easily examined and checked for schema-validity with desktop tools (e.g., Oxygen), which is essential during script development.

Query and display projects: In Phase I, we planned to browse projects ('themes') using canned queries for "Habitat" or "Core Research Area." After making a selection, the user sees a list of projects each with a title, truncated abstract, and an image (Fig. 1). Since we did not implement XQuery for Phase I, we used XPath in the template, which was specifically directed to select certain <keywordSet> nodes (see below). This would not be efficient for a large number of projects, nor for complex queries.

The second view is of the project itself (Fig. 2). The script to return a single project was very simple, since Javascript controls the tabbed views. The script has only three lines of essential code as described earlier (O’Brien, 2009); i.e., calls to the XML content and the XSL stylesheet, followed by the transformation to HTML.

SBC projectDB query results  SBC projectDB default project view tab

Figure 1 (left). Screenshot of the output for a query for SBC research themes related to “Population Studies”. A short display of each project is provided, with links, and the menus appear again on the left.

Figure 2 (right). Default tab view used for a SBC research theme. Descriptions of detailed activities will use different or additional tabs, e.g., including coverage.

We made significant adaptations to the XSL stylesheet for project display, and SBC scientists provided considerable input on the presentation over several iterations. We also created wrapper stylesheets so that we can reuse components in later phases with more projects (e.g., "activities").

“Coverage” tab: For Phase I and research themes, we did not include the “Coverage” tab. Research themes were too broad to consistently report on sampling sites. We will add both temporal and geographic coverage to research activities in a later phase.

Images: Including images with a research description was very important to the site scientists as a way to engage interest. They felt that, alone, the project descriptions were somewhat dry, and while this was appropriate for information-packed dataset displays, descriptions of research should be more visually enticing. Images were added to the “Description” tab, and in other available blank space, e.g., on the “Personnel” tab. We were able to control the image that appeared by using the <associatedMaterial> element’s ‘category’ attribute.

Related Data: We did not add a list of datasets to the research theme’s "Products" mainly because these themes are broad and interdisciplinary, encompassing many types of data. Instead, we chose to highlight the data that could be associated with a particular theme. We added a tab for "Related Data", and duplicated the specific queries to the SBC data catalog that would display data collections of interest to someone browsing each theme. We used the <keywordSet> element for this purpose as well. Figure 3 shows the similarity between the "Related Data" tab in a project view (lower right) and the SBC Data Catalog index (upper left). In the project view, only a subset of choices is presented to the user, but otherwise the interfaces are nearly the same.

SBC data portal and one project's data links

Figure 3 Comparison of SBC data catalog index and SBC Research Themes “Related Data” tab.

Project keywords were essential for both queries and for building links to related data, and for these, we used features of LTER-project XML that are not available in EML. Figure 4 shows the XML keywordSets for the project in Figure 2, above, that can be used independently. In EML, the <keywordSet> element has no attributes, but the LTER-project schema added an attribute called “name”. So for returning lists of projects (e.g., as in Fig 1), SBC’s code specifically selected only <keywordSet> nodes with a name attribute containing "query". To build the forms for dataset links, the code makes use of different <keywordSet> nodes, i.e., those with a name attribute containing "data".

Sample XML showing a keywordSet

Figure 4. Sample of LTER-project XML showing how keyword content can be used to drive different uses in the same project XML file. Note: at SBC, the Core Research Area Thesaurus contains additional terms and so the thesaurus name deliberately does not contain the string 'LTER'. However, since terms from the five LTER Core Research Areas are required for datasets, this is labeled as a distinct thesaurus.

Discussion

In future phases, SBC projectDB will incorporate descriptions of specific activities, such as sampling campaigns and student projects. We will also plan linkages from each activity to related research themes, and the products can be mores specific, e.g. to individual datasets.

For tabbed display, Javascript is faster and more straightforward than the "stage" method (with CGI parameters) that SBC’s data catalog uses, which is not surprising, as Javascript works through the client rather than calling the server for each view. In each of its catalogs, SBC adapted existing code to save time, so we now can compare two different implementations of a similar process. When we next upgrade our data catalog, we will convert it to using the Javascript code for tabbed display to match our projectDB implementation.

In Phase I, SBC used simple XPath instead of true queries, which will not be practical when we have added activities, or need to build more complex cross-links. XQquery with eXist is one option for us, as the web services are already built. However, with Metabase we also have the option of building searches and cross-links within the database itself, and/or exporting projectXML directly as a web service. We look forward to exploring these options.

Building datasets from Metabase was efficient and straightforward. We look forward to further developing our XML export, both for projects and datasets with other features of PostgresQL. Our use of projectDB has highlighted some useful additions to the Metabase schema that we will communicate to GCE, CWT and MCR.

Visitors to the GCE and SBC websites can compare the two implementations (http:/gce.lternet.edu/public/research/projects.asp, http://sbc.lternet.edu/research/catalog). SBC has chosen to display only broad research themes at this time, and will add more detailed activities later. GCE described all its activities in their initial implementation. SBC made some significant changes to the XSL stylesheets, mostly to accommodate these broad themes. However the look and feel of the two displays is still quite similar and SBC “activities” (when added) will use a style almost identical to the GCE view. As projectDB develops as a network resource, more feedback is to be expected, and centralized storage of the most popular templates may make maintenance of a network look-and-feel almost trivial.

Our use of keywords in projectDB brought up some interesting points. As far as using the five Core Research Area (CRA) Keywords, we found that the process of attaching these terms to research themes was very straightforward, and all project themes ‘fit’ - often into only one CRA term. However, most sties (SBC included) seem to have difficulty when trying to attach the CRA keywords to datasets. This is most likely because data are described at a very granular level, whereas the CRA keywords are broad and thematic. But we found that using projectDB as a container for research descriptions was an effective way to link between the CRA term and data.

It was also very straightforward to use projectDB keywords to build queries to another catalog, but this would not have been possible without the “name” attribute for <keywordSet> nodes. The addition of this attribute has already been suggested as a useful enhancement to the EML schema as well. LTER’s work with both EML and projectDB will demonstrate the advantages clearly.

References

O’Brien, M. 2010. Using EML in Your Local Data Catalog. LTER Databits, Fall 2010.

Sheldon, W. 2010. Getting started with eXist and XQuery. LTER Databits, Spring 2009.

Walsh, J. and Downing, J. 2008. ProjectDB – Planning and Development of a Collaborative Programming Effort. LTER Databits, Fall 2008