Skip to Content

Automating Data Harvests with the GCE Data Toolbox

Printer-friendly versionPrinter-friendly version
Issue: 
Fall 2013

Wade Sheldon (GCE), John Chamblee (CWT), and Richard Cary (CWT)

As described in the Spring 2013 issue of Databits, infusions of funding from the ARRA award to the LTER Network (Chamblee, 2013a) plus an NSF SI2 grant to Tony Fountain and colleagues (Gries, 2013) allowed us to make quantum leaps in both the functionality and usability of the GCE Data Toolbox for MATLAB software in 2012-2013. Accompanying funding for user training and support also allowed us to introduce more potential users to this software, and to work directly with new and existing users to take full advantage of this tool (Chamblee, 2013a; Henshaw and Gries, 2013; Peterson, 2013). This intensive work on the toolbox not only resulted in major improvements to the software, but allowed us to develop critical user support resources (https://gce-svn.marsci.uga.edu/trac/GCE_Toolbox/wiki/Support) and establish an email list and Wiki pages to encourage ongoing peer support and collaboration.  This process also provided the necessary momentum to remove remaining GCE-specific code from the main distribution and open the Subversion repository to public access, completing a 12-year transition from the toolbox being a proprietary GCE-LTER software tool to an open source community software package.

While the GCE Data Toolbox software can be leveraged by Information Managers in many ways to process, analyze, and distribute the data their LTER sites collect, the use case that has resonated most in training events is automating sensor data harvesting and quality control. The toolbox is ideally suited for automating data harvesting, because it combines prebuilt import filters for many common data logger formats (e.g. Campbell Scientific Instruments, Sea-Bird Electronics) and data services (e.g. NOAA HADS, NOAA NCDC, USGS NWIS, Data Turbine) with editable metadata templates containing attribute descriptors and QA/QC rules. By applying a metadata template on import, users can simultaneously parse, document, and quality control raw data in a single workflow step (figure 1).

 


GCE Data Toolbox workflows

Figure 1. Coupling import filters with metadata templates containing pre-defined QA/QC rules allows the GCE Data Toolbox to accomplish three major stages of the data management cycle in one step.


 

Metadata content and QA/QC rules can be defined, copied, imported and even synchronized across multiple templates using form-based GUI applications, greatly simplifying management and re-use of this critical and time-intensive information. Tools and demo workflows are provided for generating basic data distribution websites, complete with indexed data summary pages, plots and downloadable data files (e.g. see http://gce-lter.marsci.uga.edu/public/file_pickup/toolbox_demo/data/index.html). These pages can easily be “skinned” using XSLT and CSS to match the appearance of an LTER site’s web page for production use. Furthermore, once a data harvesting workflow is configured it can be executed on a timed basis using MATLAB’s built-in timer objects, automatically appending newly acquired data to existing data sets to produce long-term time series data sets (figure 2). The GCE Data Toolbox therefore provides users with a comprehensive yet practical out-of-the box solution for many real-time sensor data harvesting scenarios (Chamblee, 2013b). Six LTER sites have already implemented harvesters using this software (GCE, CWT, AND, NWT, NTL, and HBR), and several other sites plan to as well. We are currently compiling a set of documented use cases for inclusion in an upcoming LTER Network News article to encourage further adoption.

 


Harvest Timers

Figure 2. Data harvesting workflows can be executed on a schedule by registering them with the GCE Data Toolbox harvest manager. Harvests can then be queried, started or stopped using the toolbox GUI application, and continue to run in the background as long as MATLAB is running.


 

Although the supplemental funding has now ended, the GCE and CWT LTER sites continue to collaborate on development of the toolbox as part of core IM activities, with input from the broader toolbox user community. Recent work includes developing a fully automated web dashboard application for monitoring the status of real-time data harvesting systems (figure 3), including emailing of quality reports when problems arise, and completing support for registering metadata content in the Metabase Metadata Management System for archiving data in PASTA. Support for generating EML metadata directly from the toolbox is also nearly complete. We will also seek additional funds next year to continue development on the toolbox software, and continue to look for opportunities to lead or participate in hands-on training events to engage with more potential user groups.

 


 

Harvest Dashboard

Figure 3. Automated data harvest dashboard webpage generated by the GCE Data Toolbox. Note that multiple views are supported using XSLT stylesheets, including selected variables (shown) and all variables displayed.


 

Literature Cited

Chamblee, J. 2013a. GCE and CWT Host Successful Workshop to Demonstrate, Improve, and Promote the Adoption of the GCE Data Toolbox for Matlab. LTER Databits - Information Management Newsletter of the Long Term Ecological Research Network, Spring, 2013 issue. (http://databits.lternet.edu/spring-2013/gce-and-cwt-host-successful-workshop-demonstrate-improve-and-promote-adoption-gce-data-t)

Chamblee, J. 2013b. Coweeta LTER Upgrades Sensor Stations by Implementing the GCE Data Toolbox for Matlab to Stream Data. LTER Databits - Information Management Newsletter of the Long Term Ecological Research Network, Spring, 2013 issue. (http://databits.lternet.edu/spring-2013/coweeta-lter-upgrades-sensor-stations-implementing-gce-data-toolbox-matlab-stream-data)

Gries, C. 2013. Integrating Open Source Data Turbine with the GCE Data Toolbox for MATLAB. LTER Databits - Information Management Newsletter of the Long Term Ecological Research Network, Spring, 2013 issue. (http://databits.lternet.edu/spring-2013/integrating-open-source-data-turbine-gce-data-toolbox-matlab)

Henshaw, D. and Gries, C. 2013. Sensor Networks Training Conducted at LNO. LTER Databits - Information Management Newsletter of the Long Term Ecological Research Network, Spring, 2013 issue. (http://databits.lternet.edu/spring-2013/sensor-networks-training-conducted-lno)

Peterson, F. 2013. My Experiences as a Participant in the Sensor Training Workshop. LTER Databits - Information Management Newsletter of the Long Term Ecological Research Network, Spring, 2013 issue. (http://databits.lternet.edu/spring-2013/my-experiences-participant-sensor-training-workshop)