Skip to Content

Using Python for Web Services and GIS Analysis

Printer-friendly versionPrinter-friendly version
Issue: 
Spring 2013

John Porter (VCR)

Computer languages are a bit like cups of coffee - there is always another one not too far away! They also have their time. FORTRAN once ruled the roost, but now is used only for specialized applications. Similarly, C, PASCAL, Java and C++ all have had their time at the "top" as the "hot" language, and continue to be widely used today. So why do we need Python, a scripting language named after, not a snake, but a comedy troop (Monty Python)? And why is it such a "hot" language today? To answer those questions I'll talk a bit about the characteristics of Python and give some examples of how it is used at the Virginia Coastal Reserve LTER.

Python is an object-based, modular, scripting language with simple syntax. Taking those in order, each variable in Python is an object, and comes associated with methods/functions that can be used to manipulate it. Apart from basic functionality, Python uses "modules" that can be imported to support specific features (e.g., dates, database interfaces), thus reducing the potential "bloat" of trying to load all possible functions at one time. Python is a scripting language because one of its best uses is short programs in applications where blazing speed is not an issue. Python is not compiled, which makes it slower to run, but easier to write, with simplified debugging. Finally, gone are the declaration statements, curly braces, and semi-colons beloved by Java and C programmers. Instead, Python uses white space for organization: line ends serve as statement terminators and indentation indicates the scope of loops. A major goal of Python design was to have programs that were easy to read, as well as to write... which is an important consideration in LTER, where long-term durability of software is often an issue.

One of the main reasons for using Python is that support for it is increasingly being built into 3rd party applications. For example, ESRI's ArcGIS software now supports the "ArcPy" library of functions, allowing you to run any of the tools built in to ArcGIS directly in Python. This allows automation of frequently performed tasks without requiring access to the ArcGIS graphical user interface. An additional benefit is that running a script in Python that uses ArcGIS functions is much faster than running those same functions in ArcGIS itself, which is a massive program. For example, one application I wrote that required 45 minutes to run as an ArcGIS Model, ran in only 22 minutes when using Python, presumably because much of the "overhead" associated with the ArcGIS interface was eliminated. Running ArcGIS tools directly in Python also aids with looping, because you can use the string manipulation functions in Python to create lists of files to be processed, then use a FOR loop to process each of them.

One nice feature of ArcGIS is that it will create basic Python programs for you. The ModelBuilder tool allows you to create "models" using a graphical interface (essentially a box and arrow diagram of a workflow). Once completed, you can "export" the model to a Python program. I frequently use this feature to develop and test the core functionality of a program, then edit the resulting Python script to add looping and other functions. One downside of using Python with ArcGIS is that there are still some rare, but annoying, cases where ArcPy will indicate that a function has completed when it is still running, thus causing file access errors with a subsequent statement to manipulate the file.

We also use Python in association with R to facilitate the import of some data logger files that have extensive header information that need to be "mined" for data (such as unit location, serial number, etc.). Parsing data out of headers is difficult-to-impossible to do in R, but because Python supports the use of "regular expressions", it is a powerful tool for parsing data out of complex (but regular) file structures. We use Python to pre-process the data logger files to collect information from the headers and prepend it to the actual data, thus creating a comma-separated-value (CSV) file where each line contains information from the header (e.g., the location where the instrument was deployed) along with the periodic readings (e.g., temperature, pressure). The CSV file is then ready for further processing using R.

A final example of Python use has been to work with some of the web services provided by PASTA. Its urllib2 module allows Python to act as a web browser to interact with web services, and it also supports several modules for manipulating XML documents. Here is a code snippet that queries PASTA to fetch an EML document and extracts the title and a list of entities:

# Import the needed modules
import urllib2
from datetime import datetime
import xml.etree.ElementTree as ET

# set the information for the dataset selection
pastaScope='knb-lter-vcr'
pastaId=25
pastaVersion=27
pastaFromTime='2012-01-01T00:00:00'
pastaToTime='2013-03-31T23:59:00'

# Set up an authentication string
# Note that you can then save userData string for future use.
uName='uid=ME,o=LTER,dc=ecoinformatics,dc=org'
pWord='mypassword'
userData="Basic " + (uName + ":" + pWord).encode("base64").rstrip()

# Get the data package metadata to extract dataset title and entities
# set the URL up for query and prepare a request
emlUrl="http://pasta.lternet.edu/package/metadata/eml/"+pastaScope+"/"+str(pastaId)+"/"+str(pastaVersion)
emlReq=urllib2.Request(emlUrl)
emlReq.add_header('Authorization', userData)
# execute the request and fetch the document
emlSock=urllib2.urlopen(emlReq,timeout=60)
emlString=emlSock.read()

# Parse the EML document returned using the ElementTree Module
emlRoot=ET.fromstring(emlString)

# print the title and search info, converting numbers to strings and contenating them together using +
print "Downloads of dataset "+pastaScope+"."+str(pastaId)+"."+str(pastaVersion)+ " between "+pastaFromTime+" and "+pastaToTime+":\n"+emlRoot.find('./dataset/title').text

# use the 'findall' method of the emlRoot object to find all the entities, count them, then list them
entityRecords=emlRoot.findall('.//entityName')
print("contains "+str(len(entityRecords))+" data entities");
for entityRecord in entityRecords:
print(entityRecord.text)

The resulting output when run is:

Downloads of dataset knb-lter-vcr.25.27 between 2012-01-01T00:00:00 and 2013-03-31T23:59:00:
Hourly Meteorological Data for the Virginia Coast Reserve LTER 1989-present
contains 2 data entities
VCR97018VCR_LTER_Hourly_Weather.zip

Some good Python tutorials (which feature interactive windows to allow you run exercises without actually installing Python on your system) are at: http://www.learnpython.org/ and http://www.codecademy.com/tracks/python/. If you'd prefer a free, full university-level course, try https://www.udacity.com/. Their Computer Science courses lean heavily on Python - and if you want, you can even pay to get the academic credit.