Skip to Content

An Incoming Computer Scientist’s View on Unifying Standards and Procedures in the LTER Community

Printer-friendly versionPrinter-friendly version
Issue: 
Fall 2011

Irbis M. Gallegos (Visiting Assistant Professsor, University of Texas at El Paso)

I recently attended the LTER EIM’11 Conference in Santa Barbara CA.  Even though I am a computer scientist with multidisciplinary research experience on developing cyberinfrastructure for geoscientists and environmental scientists, I must admit that I was excited about attending the conference. I knew I was about to meet top researchers from the environmental science community and, not only that, I was about to present what I thought would be the silver bullet for providing data quality assurance for scientific sensor data. 

During the poster session at the conference reception, as I started walking around, meeting great people, and learning about all the fascinating projects developed within the different LTER communities, I discovered other work that overlapped to a great extent with my research effort. At first I felt anxious, thinking that I had spent the last three years of my research career working on something that was already done, but then I realized that this wasn’t necessarily true. After stopping for a moment and thinking about the situation, I realized that even though some of the work was similar to mine, my work was really the next step towards where the LTER data quality communities could be possibly moving in a near future. Also, the fact that my research was so similar to others, even though it had been conducted independently from LTER efforts, assured me that I was in fact making progress and going in the right direction inasmuch as my work could contribute to the LTER community’s data quality goals.

After I had convinced myself that my work was indeed novel, I attended the conference’s birds-of-feather session on “automating data processing and quality control using workflow software.”  I heard great talks about efforts to automate data quality processes through workflow systems, a 10-year research effort to automate data quality checking using off-the-shelf back-end statistical analysis software, and the most recent effort on detecting environmental events at near-real time. The last two were of particular interest to me because that is the topic of my dissertation work. 

At that point, I asked myself, “How is it that I spent three years working on a cyberinfrastructure approach to improve data quality processes in scientific sensor data and did not know about these other efforts?” I knew I had done a literature review, that I had talked to my environmental scientist colleagues about current efforts, and that I had even browsed through the LTER community website as I was looking for job opportunities. In addition, it also seemed that most of the data quality research used expert processes similar to my work to detect the anomalies in the data. Yet the expert knowledge associated with such processes did not surface in my literature review. I realized that the LTER community has an opportunity to improve the ways in which emerging ideas are disseminated.  

The remaining text is based on my personal experience as an outsider to the community. In my opinion, one of the main challenges for the LTER community is to determine how to establish the means to effectively communicate the accomplishments and research efforts of its members. For example, at my university, I collaborate with scientists conducting research at the Jornada-LTER site. However, I am unaware of other research efforts at this site. My only contact with the LTER community is through the scientists with whom I collaborate. Other efforts are hard to discover.  As a result, it is possible that other research efforts on data quality are being conducted at the same research site without others being aware. Thus, scientists lose the opportunity to collaborate and share information.

In addition, it appears that the LTER sites are further subdivided into smaller, individual communities each with its own data procedures. For instance, my research work has a component that involves identifying data quality processes that are common to the scientific communities that collect data through sensors. As soon as I started working on this component, I recognized that even the communities sensing the same type of data, e.g., Eddy covariance data, had different data quality processes. In addition, the data analysis and data verification infrastructures are different. Thus, attaining inter-operability and sharing data procedures between sites can be extremely difficult. 

Another challenge is how to document and share the data standards and processes that should be common for the different LTER research sites. For example, I came across a well known issue in the data quality community as I was working on my research.  Other approaches face the same issue of having to adapt to the formats of different data loggers. Consequently, data files need to be parsed and analyzed. Such a challenge has been addressed through the development of different “parsers” that read and interpret the data files. However, such an approach is not scalable because a new parser has to be developed for every type of data file that needs to be verified. This challenge could be addressed if the community defines a data standard for such data files. The standard, if endorsed by the community, would provide the means for scientific instrumentation vendors to develop instruments that generate consistent data formats. A similar need is evident with the data quality standards, e.g., measurement units, that could facilitate specification of data properties of interest.

The good news is that the LTER community has been building the sense of community needed for scientists to have confidence in the work of other colleagues. I find it promising that contrary to the common belief that scientists are reluctant to share openly collected and derived datasets, I find that scientists are willing to share the processes and tools used to create datasets.  However, the fact that different efforts, e.g., those related to data quality processes, overlap with each other (even though they are developed independently at different institutions) highlights the need to establish procedures and processes for documenting and sharing efforts within the community, as well as establishing standards.