My Experiences as a Participant in the Sensor Training Workshop
Fox Peterson (AND)
The sensor software tools training workshop held the week of April 22, 2013, at the LTER Network Office in Albuquerque was outstanding. I wanted to write up a brief overview of my experiences as a participant in hopes of summarizing the experience and exciting others about participating in data management workshops in the future. A recurring theme throughout the workshop was that of "middleware"; that is, a tool that bridges raw data input to database storage in such a way as to provide standardized quality control and reduce the need for redundancy in writing programs or repeating processes to consolidate data. Thus, I'll focus on that here, and how I think I will use it in my work with the Carbon-budget group, Biomass group, Biomicrometeorology lab, and DIRT (detritus input and removal treatment) network at H.J. Andrews (AND).
On the first day of the workshop, we learned about the CUAHSI Observations Data Model (ODM) and associated data import and visualization tools in the Hydrologic Information System. Basically, the ODM tools are capable of taking raw data, assigning relevant metadata, performing user specified quality checks, and organizing that combination of metadata and data in a database (we used Microsoft SQL Server). What was fantastic about this tool to me was that once the metadata and data were joined, this process never needed to be repeated unless a new sensor was added. Although I don't generally work with hydrologic data, I tested the tool on data from two sonic anemometers (one in Antarctica and one at the H.J. Andrews) and was able to reuse the metadata values shared by both anemometers while specifying those values specific to site. This was very convenient! Another great feature of this tool for those who do work with hydrologic data is that the CUAHSI HydroDesktop can import directly from several key stream gage websites, such as USGS sites, and put the extracted data into a database. I have searched USGS stream data before and been annoyed at having to click on the links of multiple gages, download the data by hand, and make it into a format I can use; using the HydroDesktop would be a huge time saver for scientists in fields like urban aquatics.
On the second day, we were fortunate to have Matt Miller of Data Turbine come and speak. Admittedly, I am not sure my skills in "networks" were quite to the point that I fully benefited from his talk, but I was able to get the basic gist of it and play with the Data Turbine tools. Data Turbine essentially acts as a harvester between users and streaming data, and moreover they are sharable and controllable through the Ring Buffer Network Bus (RBNB), which offers both added connectivity and added security for data assimilation. Matt showed us how to set up a local server and download sample data from the North Temperate Lakes LTER, and then plot and quality check this data in near real-time. He was very attentive to the importance in some fields of not only having extremely fast data but having access to this data as soon as possible, and to how memory and power constraints come into play with this type of operation. I feel that I will be able to use what I learned from Matt to think about our meteorological data for the biomicrometeorlogy group and C budgeting on AND Watershed 1. The second afternoon's lecture was my favorite because it was about my favorite tool, MATLAB! Wade Sheldon, LTER's reigning MATLAB expert, introduced us to the GCE Data Toolbox and helped us develop data templates for our own loggers, as well as play with some existing data he had from a Campbell data logger. I could talk for quite some time about this tool, but mostly I am eager to use it for all of my own data imports, develop templates, and share these templates with Wade. I would love to become a part of the development of the GCE Data Toolbox. Additionally, some of the functions which run with the GCE Data Toolbox are extremely helpful; Wade has modified some functions such as the Matlab "whos" to do quality checks or locate errors in imports; this will greatly expedite data pre-processing!
On the third day, we began with a talk on Kepler. I will honestly say this is probably the tool that I will end up using the most of all. One great thing about Kepler is that it integrates rather seamlessly with R and can be used to generate fairly open ended R-based workflows. I kept thinking about most of my data with Biomass, DIRT, and C-budget. This data is not traditional "sensor network" data, but it is big, and messy, data; if we were to view humans collecting observations as a sort of sensor that gather information irregularly, then this observational data fits within the "sensor network" framework. Many humans gathering data are not comfortable with the use of tools like SQL Server or MATLAB, and prefer spreadsheets. However, if they were given the opportunity to simply make a flat file and turn on a pre-defined Kepler workflow to import this spreadsheet into a database like SQL Server and produce meaningful output (like graphs or analyses in R), I believe they would be accepting of this. This would be fantastic for DIRT - for example we could standardize metadata and data storage across scientists and sites by running the same workflows. That Kepler and R are both open source is also a huge plus, as MATLAB and SQL Server can be too "spendy" for some! The afternoon on the third day was devoted to R, and that was good practice for me, because I prefer MATLAB and always need time to brush up on my R in order to stay fluent.
On the final day, we had an excellent presentation on the National Ecological Observatory Network (NEON) from Jeff Taylor. All I can say is, wow, if I was fired up about sensor networks at the site level, I'm even more fired up about site networks at the synoptic level. NEON is a huge dream of standardization, homogenization, and synthesis, with the ability to produce some very powerful results that may be able to affect non-science consumers. As a member of the C-budget group and originally a forester, I believe that the arguments something like NEON could put forth for important topics such as climate change and forest restoration would absolutely make a huge impact for many people. I had the good fortune to talk to Jeff afterwards and share emails, and his soil scientists and I are going to speak about the standard ways of sampling soil stores and effluxes in the footprint of eddy covariance towers. This will be super because I will get to share with them the challenges of working in complex terrain, and they will be able to help me establish a framework of what is the "norm" in non-complex terrain. I can't wait to be more involved with information management!
I want to give a HUGE thanks to Don, Corinna, and the many trainers. I am also fortunate to have met many new "nerdy friends" who I know I will be drawing on in the future. This is always an added benefit of a well-run workshop; that one comes away not only with new tools, but with a big, extensible, friendly "support team". Learning about different types of middleware to automate metadata assignment and quality control makes me more and more comfortable with larger and faster data, and I am eager to see how data management will evolve in the next few years with the availability of these fantastic tools. THANKS EVERYONE!