The IMC meets the PASTA Challenge
John Chamblee (CWT), Margaret O’Brien (SBC)
John and Margaret are Co-Chairs of the LTER Network Information Management Committee
In January of 2013, the LTER Network Office launched a production version of the new LTER data repository based on a framework and application suite known as PASTA (Provenance Aware Synthesis and Tracking Architecture). Its launch was a significant achievement by the LNO developers, and sites are on track to have their data into PASTA by the end of this year. This is excellent news, but there are still issues to resolve as we move forward. Here, we review some recent accomplishments and challenges associated with the ongoing rollout and implementation of PASTA.
As of this writing, there are 872 data packages from 12 sites loaded into the PASTA framework. Many sites are putting recent supplement funds to use adapting their EML-generating systems to meet PASTA's structural and data requirements. Others are thinking through their overall approaches to PASTA submission. In the process, many sites are re-evaluating their inventories, re-designing and/or integrating time series, and generally improving the quality of the data going into PASTA. These efforts will all contribute to greater overall availability for LTER data.
The accelerated timeline for PASTA’s development also brings some significant challenges to both policy and site-level implementation. The adoption of PASTA as the basis of the LTER data cataloging system means that we will switch from the current metadata-only catalog to one designed for both data and metadata. Overall, this step will improve availability and accessibility because PASTA provides DOI-based identification for data packages, has the capacity for synthesis data provenance tracking, and more fully exploits the Ecological Metadata Language schema.
Some of the issues that involve the IMC are:
- Now that PASTA is available and we are loading data into it, how will we (scientists and IMs) make the most of it? How will workflows be developed, advertised and used by our synthesis working groups? This will be a major topic at the upcoming IMC meeting in Fairbanks. Additionally, an IMC workshop will design documented workflows and provide practical, real-world experience that will inform best practices for EML metadata and PASTA development (Sheldon, et al., 2013).
- NSF has stressed that LTER should have a “one-stop shop” for all data. The IMC will be providing feedback to the NIS web portal development process to ensure that LTER data are available to scientists in a way that makes them easily discoverable and accessible. In 2013, an IMC working group will enhance the LTER Controlled Vocabulary to further enable automatically enhanced searches of the catalog (Porter, et al., 2013).
- Sites have always handled data that are sensitive or have restricted access (Network "Type II", http://www.lternet.edu/policies/data-access). In some cases, data distribution may even be prohibited by legally binding use or redistribution agreements (although it may be beneficial to post metadata). The PASTA framework requires that a data object be attached to every metadata record, and EML is fully capable of defining highly specific access rules. PASTA will be the basis of our one-stop catalog, but site-compatible practices for handling all levels of access are still being developed. The IMC will outline a full set of sites’ catalog use cases and suggest potential solutions for handling all forms of sensitive, restricted, or provisional data at the Network level.
- The current policy of collecting user information at the time of download continues to be a major topic of debate within the LTER Network, and several committees have outlined the costs and benefits of tracking data usage. The practice of collecting data usage information is linked to our data access policy, and so is the purview of the Science Council. Should changes in policy be enacted, the IMC will then determine their implications to our generation of data packages, and to the operational definition of "restricted data" (see previous point). We will also consider the implications of practices we develop for LTER as a DataONE node, since the access control policies we set will impact the ways in which DataONE displays our data.
We continue to operate in a period of rapid change. The good news is that we are making rapid progress. In many ways this is a watershed moment in LTER, when we are poised to capitalize on our long-term investments. We focused here on the challenges, but the fact remains that we should all be very proud of PASTA’s existence as a production framework, and that IMC has pulled together to do what it takes to make it our "one-stop shop" for LTER data. As we move forward, the results of our collective effort will pay dividends in terms of both data availability and network-wide collaboration for years to come.