Developing a Searchable Document and Imagery Archive for the GCE-LTER Web Site

Spring 2008

In 2007 the Georgia Coastal Ecosystems LTER program began its second cycle of NSF funding, and as part of the transition from GCE-I to GCE-II we conducted a top-to-bottom review of our integrated information system. One major conclusion of this review was that we needed to do a better job of managing the various electronic resource files that are acquired during the course of GCE research and project management activities, including documents (e.g. publication reprints, reports, protocols), imagery (e.g. rendered maps, photos, logos) and other types of static files. During GCE-I, many of these resources were informally organized in server directories using a file system management approach, with online access provided via URL on various public and private GCE web pages. The only effective way to search for some categories of files was using Google Site Search, and many people ignored the web site entirely and contacted GCE IM staff directly for assistance locating specific files.

We also noted in our review that several types of files are already being managed effectively, with file information and network paths stored in relational databases. For example, links to both publicly-accessible and private reprints and presentations are stored in the GCE bibliographic database ( In addition, links to organism photos and other relevant files are stored in the GCE taxonomic database ( Both of these databases are also integrated with the centralized GCE personnel and metadata databases to support crossreferencing and dynamic linking between personnel records, data sets, publications, and species information. Consequently, we decided to leverage and extend our existing centralized databases and web framework rather than explore the use of other stand-alone file archival systems to provide a more integrated solution.

We began by developing a database schema to store information about files not already managed in databases ( Primary tables are ‘Resources’, which contains top-level information about resources, such as type, title, and abstract, and ‘ResourceFiles’, which contains physical file details. These tables are linked via foreign key relationships to lookup tables for type, category and theme, as well as web directory information for URL generation. Resources are also linked to search key words and an authors table, which is actually a junction table to the GCE personnel database. Attributes were also included to support record management operations on dynamic web pages (e.g. ‘DisplayOnWeb’ bit field in ‘Resources’, to permit entries to be taken offline), as well as filetype- based and custom thumbnail images for each resource (‘IconURL’ in FileTypes and ‘Thumbnail’ in ResourceFiles, resp.). The database schema was implemented using SQL Server 2000, and SQL views were developed for searching and displaying database contents, as well as to dynamically integrate information for general file resources with information for reprints and publications in the GCE bibliography and images and other files in the GCE taxonomic database to support transparent cross-database queries and record display.

After populating the database tables with information about resource files stored on the
GCE file server (using a combination of data mining and manual entry approaches), we
developed dynamic web applications for querying this database and displaying file details
( The main search page contains a
top panel with drop-down menus for selecting file type, category and theme, and text boxes for
specifying search text in the title, abstract and key words and author last name (fig. 1). Record
display option fields are also provided for specifying sorting, abstract display and records per
page. Below the search panel isa dynamically-generated "browse" interface, displaying the contents of the database listed hierarchically by type, category and theme. Clicking on any level in the hierarchy executes a search for files matching the corresponding terms automatically.

Figure 1. GCE document and imagery archive search and browse interface

Search results are displayed in a summary table, with results grouped by category and theme (fig. 2). Complete titles are displayed along with contributor name, year, and abstract. Display of abstracts can be controlled using options in the search panel and also toggled by pressing the "Abstracts" or "Hide Abstracts" button at the top of the page.

Clicking on the file icon or thumbnail image downloads the file directly, as indicated by the link help text (displayed by hovering on the image). Clicking on the title opens a document details page that includes all available information on the file, including title, abstract, contributors, file date and version, download link and file size (fig. 3). The corresponding virtual directory hierarchy is also listed in the "Archive" field, with each entry hyperlinked to execute a query for documents matching the respective type, category and theme. Corresponding breadcrumb navigation links are also provided for parity with the rest of the GCE web site. A citation is provided for every resource, which is either drawn from the bibliographic database (i.e. for reprints and presentation files), or generated based on authorship, origination date and title information in the resource database of taxonomic database for non-bibliographic entries.

Figure 2. GCE document and imagery archive search results page

An identical set of web applications is also available on the private GCE web site for project participants. These applications support searching and browsing for all resources in the public archive as well as many additional files that are not publicly available, such as restrictedaccess reprints, project governance information, unreleased presentations from project meetings, and confidential personnel information. The distinction between public and private resources is controlled using the "PublicAccess" bit field in the "Resources" table, and reinforced using appropriate web directory access permissions on the server. GCE participants can use web forms on the private site to archive new files and update prior entries, with access control enforced based on project role and login so that only the original contributor, a co-author, or site IM can revise prior archive entries. JPEG thumbnail images are generated automatically on upload for supported image file types using a server-side thumbnail generation component (ASPThumb).

Figure 3. GCE document and imagery archive resource details page

The new GCE document and imagery archive provides many important benefits to web visitors and project participants. First and foremost, all public and private file-based GCE resources can now be archived, discovered, and accessed using an integrated web-based interface. Stable URLs and full citations are also provided for all resources to support external hyperlinks and appropriate attribution for GCE content providers. Thumbnail images are also provided for all imagery, including maps, site and organism photos, and logos, to permit visual browsing for items of interest.

This archive also simplifies file management and web content management for IM staff. GCE personnel can now archive files on their own, and the dynamic cross-referencing of bibliography reprints and publications, as well as species list photographs, alleviates the burden of maintaining information about these resources in multiple databases and web pages. File versioning is also handled automatically, with date/time stamps appended to file names on upload to prevent over-writing of prior versions. URLs for general or specific archive queries or resources can also be included in web navigation menus and page links, providing direct access to frequently-requested content. Additionally, recent additions to the archive are automatically listed on the dynamic GCE news page (, ensuring that visitors are aware of new uploads and updates of interest.