Skip to Content

Using PASTA Audit Web Services

Printer-friendly versionPrinter-friendly version
Issue: 
Fall 2013

John Porter (VCR)

One of the ways to motivate investigators to share their data is to show them that shared data is serving a scientific purpose. No one wants to spend the time and effort to produce data resources that are never accessed (a write-only dataset). Additionally, some investigators want information regarding who is accessing their data. One solution to meeting these information needs is to publish periodic reports on downloads and to distribute these reports to the contacts or authors for each dataset.

The PASTA data portal (https://portal.lternet.edu) provides access to Audit services under its “Tools” menu. For example, the “Data Package Access Report” allows searches for information regarding downloads of specific components of data packages (entire packages, metadata, quality reports and specific entities) filtered by dates, users and user groups.  As such they are a valuable tool if you wish to assess resource usage for annual reports or other purposes.  However, the pages produced are not ideal for communicating to investigators information on how the data they provided are being used.  Issues for investigators include:

  1. you need to be authenticated (login) to view audit pages and many investigators don’t remember their LTERnet passwords;
  2. audit reports need to be specifically requested, so they can't be automatically provided on a periodic basis;
  3. listings tend to be in a lengthy log format, reporting the date and time of each individual download of a resource; and
  4. although you can search for all the downloads within a given scope (e.g., knb-lter-vcr), there is no mechanism for retrieving only packages related to a particular investigator.  Therefore, multiple searches are required to retrieve all of the packages related to a researcher.

Fortunately, all of these issues can be addressed by using the underlying web services that are used “behind the scenes” in the web portal.

PASTA supports a wide array of audit web services (see: https://pasta.lternet.edu/audit/docs/api for full documentation) that allow programs, as well as users, to access the information on PASTA usage.  Invoking a web service returns an XML file containing the requested information.  Programs can easily parse the returned information to create individualized summaries that can be shared with investigators.  

There are some “tricks of the trade” when writing programs to access PASTA.  Examples given are from a Python program written to provide reports to VCR/LTER investigators and is available at: https://svn.lternet.edu/websvn/listing.php?repname=VCR&path=%2Ftrunk%2FPASTAsummary

  • PASTA Basics – You need to understand the basics of how PASTA identifies particular resources, in particular that each package has a scope (e.g., knb-lter-vcr), an identifier (e.g., 25) and a revision (e.g., 29) – typically strung together with slashes, so that knb-lter-vcr/25/29 specifies a unique package. As discussed below, entities (data files) associated with a particular package have resourceIds that can be appended to the packageId to refer to an individual entity – but these are relatively cryptic and not something you’d ever want to type in yourself.
  • Authentication – Just as the Portal requires a login to access audit reports, so do the web services.  Most programming languages support access to web services using variants of the CURL library.  To avoid the need to encode passwords, it is possible to generate a “userData” string that encodes the username and password (here using Python):

uName='uid=VCR,o=LTER,dc=ecoinformatics,dc=org'
pWord='myPassword'
userData="Basic " + (uName +":" + pWord).encode("base64").rstrip()

The userData string can be used to add an “Authorization” header to a request (here with the Python version of CURL [urllib or urllib2]):

req.add_header('Authorization', userData)

  • Selecting Resources – Each resource in PASTA has a unique identifier (resourceId) that needs to be used to access it. Thus accessing audit records regarding a given data entity (i.e., data file) requires that you first use the normal PASTA search services to find the appropriate entity (e.g., http://pasta.lternet.edu/package/data/eml/knb-lter-vcr/25/29 lists the resourceIds for the two entities associated with that package: ed6cf6c6df81ce0de14caf429aef63ef and c5b325e8182f30350782fb06be53be7c). These can then be used to access the relevant audit log entries for those entities.

As noted above, PASTA returns results as easily-parsed XML. There are such a wide variety of XML processing systems, I won’t go into the details regarding how that information is extracted, but anything, including a simple string search, can be used.   

In the sample program XML is used both for retrieving data and for output.  The PASTAsummary.py program burrows down into the package structure and even the original EML metadata to extract the title of the dataset, its list of contacts, and the entities it contains.  It then queries the PASTA audit services to produce an XML structure containing the summaries.  That XML can then be converted using a stylesheet into a customized HTML display (see figure) that can be saved or emailed. Sample output from PASTAsummary.py program

The utility of the web services in PASTA demonstrate that this is an approach that could more widely be applied to LTER Network systems, exposing capabilities while not restricting formats of outputs.