Skip to Content

PROJECTDB – Planning and Development of a Collaborative Programming Effort

Printer-friendly versionPrinter-friendly version
Issue: 
Fall 2008

Jonathan Walsh (BES),  Jason Downing (BNZ)

The ProjectDB is a means of tracking and cataloging the projects being undertaken by Long Term Ecological Research Network sites.  Information managers may recognize it as similar to previous “db” projects such as SITEDB wherein information is agglomerated and made available as a whole.  This makes it possible to draw comparisons and similarities to the sites as well as gather data from all the sites to represent the network as a whole.

Although PROJECTDB is similar to SITEDB, STREAMDB and CLIMDB, it’s development is being done with a notable difference.  PROJECTDB is being written collaboratively.  It will be designed and programmed as a group effort among numerous LTER sites.  This brings forth the benefit of diverse knowledge and expertise while  including accommodations for individual site differences.

Goals and Scope

The project will create a common framework to express, track, analyze and report research projects at LTER sites.  It will use XML as the mechanism for exchanging information.

The resulting system will consist of web clients for each site in popular languages such as Java, ASP, PERL, etc., which will be able to retrieve the project information.  The information will be transformed in two ways – one to transform it from XML to HTML and the other using individual style sheets to produce a human-readable web display with site specific layout and color scheme.

A harvester, just like in CLIMDB and STREAMDB will be constructed for sites that wish to use their own legacy database engines to store their project data.  These sites will produce XML files for their projects to be harvested.  It will rely on the EML schema and check for well-formed structure upon intake.

Information managers were polled to see if they would be interested in being involved in such a project and if so, which component or components of the project.  The project phasing was as follows:

  1. Define functionality
  2. Implement and test
  3. Provide usability feedback
  4. Write code

Two workshops were planned.  The first was to develop use case scenarios, define methods, and identify necessary changes to the Ecological Metadata Language (EML) schema.  That workshop was held in November.

The second workshop will consist entirely of programming.  

Technology for this project

This project will manage a database of LTER research project information.  Ideally that will consist of structured project descriptions including but not limited to:

  • Description
  • Datasets/Metadata
  • Products
  • Goals
  • Rich text description
  • Photographs
  • Links to Investigators’ pages

<a href="http://databits.lternet.edu/files/image001.gif"><img src="http://databits.lternet.edu/files/image001.gif" style="height: 309px; width: 250px;"></a>

eXist is a database system written entirely in Extensible Markup Language (XML) and was favored for use with this project.  Web Services will be employed in order to provide transport from the database (or perhaps more accurately, databases) of project information to the clients.  Utilizing web services will allow a great deal of flexibility and accommodate various means of data feed, thus supporting legacy systems.

Web Services: REST vs. SOAP.

There are two major Web Service transport schemes being used: Simple Object Access Protocol (SOAP) and Representational State Transfer (REST).  Briefly, web services are applications that can be hosted on a remote machine and executed remotely, returning some result.  The sign-in for Microsoft Hotmail, for example, is a web service.  REST was determined to be more appropriate because of its reduced dependence on external sources and more straightforward integration with http.

Advice from Sven Bohm was to use REST.  “REST is just basically http requests,” said Sven, “SOAP maps more closely to a procedural standpoint.  REST maps more to everything being an endpoint.  In REST, there are four basic verbs – PUT, POST, DELETE, and GET.  So SOAP has few nouns and many verbs and REST has few verbs and many nouns.”

The REST interface to search the database was determined to be best represented as such:

Basic search:

  • Search creator, keywords, title, abstract;
  • Full text search (possibly slow);
  • Support for Boolean operations between words.

Advanced search:

  • Creator
  • Keyword (Use controlled vocabulary for each different keyword set)
  • Title
  • Abstract
  • Temporal, e.g. year
  • Spatial
  • Funding agency
  • Organization
  • Support Boolean operations between those search fields

The REST search service should then return a browse list: ID, creator, title, keywords, temporal coverage, and alternately the full document.

Database selection: eXist

<a href="http://databits.lternet.edu/files/image002.jpg"><img src="http://databits.lternet.edu/files/image002.jpg"></a>

eXist is a database management system written entirely using XML.  This database works easily under the Apache TomCat web server.

Corinna Gries, who is familiar with eXist says the internal user management is a bit primitive but other than that it will do the job nicely.  She currently keeps the data list for Central Arizona Phoenix (CAP) on it with over 700 EML documents.  Keyword searches are fast using a JAVA interface.

Demonstrations of eXist are online at: http://exist-db.org/

Metacat was also discussed as an alternative database but after discussion with Inigo San Gil, Mark Servilla, and Corinna the group felt eXist was a more appropriate system but admittedly it could be done with either.

AJAX for data entry and presentation

Wade Sheldon presented the pros and cons of Ajax (Asynchronous JavaScript and XML).  According to Wade, it’s a very efficient way to “skin” a website.  By creating an XML style sheet (XSLT) the interface can be made to look easily like the calling page.  So each LTER site can easily customize the PROJECTDB interface to look like their existing website.  Additionally, the style sheets could be “manufactured” and given to the LTER site webmaster who would then only need to know how to deal with XML generated on their own system.

Online EML Editor

Central Arizona Phoenix and the Network Office are jointly working on an online EML editor.  Information stored – in EML format - on the eXist database can be maintained with this editor.  The editor will provide an online interface and forms to simplify the process of managing this information.  This editor will provide a way to access the information in the PROJECTDB eXist database but LTER sites that choose to keep their legacy data in their own database format can provide their own means of editing and still participate in PROJECTDB.

Use Cases

Use case scenarios were developed to determine the requirements for the various elements of the system.

Ken Ramsey researched the writing of Use Case Scenarios and found a good guide online.  The URI for the guide, “Writing Effective Use case Examples” is here: http://www.gatherspace.com/static/use_case_example.html

The group worked together using video teleconferencing and developed use cases for the system.  It turned out to be an iterative process in which actors, goals, and other attributes that make up a use case scenario were identified.

In one example, the use case scenario in which a research project is described, the following key elements can be identified:

</p><li>Title: Describe a Research Project</li>
<li>Goal: Provide the ability to collect and organize information for annual reports using information about active research projects.</li>
<li>Actors: site manager, site leadership, report writer, administrator.</li><p>

Once identified, the use cases can be placed into a grid and can then be ranked by such factors as scope, complexity, and priority.

As the use cases are identified and the elements are attributed to them, there becomes a framework that can be passed to a software team to most effectively go about producing an application that will meet all the needs.  This is in progress at the time of this writing.  Ken Ramsey, Suzanne Remillard and Kristin Vanderbilt are the lead authors.

Specifications

The generation of specifications will follow the completion of the use case scenarios.  The team is using the use case structure and methods set forth by Ramsey and using Google Docs to collaboratively edit the document.  Another collaborative tool being used in this effort is the use of a version control system (See SVN, below)

Possible change to eml schema

It was determined that the existing EML schema will need modification when it comes to describing projects.  Margaret O’Brien is working on changes to the EML schema that will help facilitate the PROJECTDB system.  One example is the EML Project descriptor:  In the Project description node there is an attribute titled 'KeywordSet' that will be given an optional NAME sub-attribute to allow tracking of the Habitat/Ecosystem/System for the project.

SVN

Subversion (SVN) is being used to help coordinate with multiple programmers and multiple revisions.  Subversion (as opposed to attempts at overthrowing authority!) is software for revision control.  This software is being fostered by Mark Servilla of the Network Office for this project.   It allows tracking of all work and all changes to all elements --- even if done in different languages.  It allows the project to be “rolled back” to any point in time.  It also allows developers to be apart from each other in space or time as they work together on a common goal. Subversion is being adopted by the Network Office as a replacement to the older standard revision tracking system, Concurrent Versions System (CVS).  Subversion has a more advanced storage format, and is faster.

Project Workshop Afterthoughts

This type of network collaboration approach has proven to be an effective technique in developing information management tools and continually shows more promise for the future.  That being said, this project would not be possible without the support of the LTER Network Office, James Brunt, and the entire LNO staff.  The use of the LNO facility enables project participants to focus efforts on specific issues for a dedicated time period while in supportive environment with ample resources (technological and human).  The Polycom VTC system allowed for participants who were unable to travel during the workshop dates to have meaningful interaction and make significant contributions to the team products.  Special thanks to James Williams for his efforts to keep the remote articipants connected.