Skip to Content

What does PostgreSQL, the world's most advanced open source database, offer beyond the standard of an object-relational database management system?

Printer-friendly versionPrinter-friendly version
Issue: 
Spring 2006

- Sabine Grabner (MCR)

I am about to develop new information management facilities for MCR, one of the two newest stations within the LTER network, established in September 2004. My vision is to design and implement a relational database schema and related middleware and web-interfaces for inserting, updating and querying of datasets. On insert or update of data and metadata as content in the database, an application would generate or update a metadata document in EML with as little user interaction as possible or even automatically. The newest version of the EML document is automatically stored in the database and METACAT.

Since the most elegant and comprehensive collection of data is worth nothing once its meaning fades from the brains of their creators, it needs an even more elegant way to describe data and keep track of data manipulation, like EML. Especially in the context of our long term research goal, the value of our data is bound to its description. For my understanding it is also important to store the data itself in relational tables so that queries for future data demand or cross-site queries can be managed more easily than from static text files. The fact that an existing data format needs to be converted - a time-consuming task of writing parsing scripts - makes the use of those data very unattractive and makes the data management redundant in the first place. By using a RDB, data can be not only queried as a whole dataset, which context is somewhat arbitrary, but also by spatial, temporal or physical means.

For MCR's DBMS, I decided to use the open source object-relational database PostgreSQL. Besides its technical strength in the areas of administration, encoding, master-slave replication for distributed server architecture, backup, security, licensing, ACIDity and SQL standard compliance, and the fact that all this comes for free, PostgreSQL is also extensible, and this is one of its most attractive features which complements my goals.

Some extension packages come with the core PostgreSQL distribution, and many other extensions can be found at pgFoundry and GBorg or you can even design and implement new functions, operators, and custom data types yourself and add to the PostgreSQL server. The web sites also feature a whole list of software like graphical clients, packages to convert data and programs from other systems (such as MySQL, Oracle) into PostgreSQL, procedural languages, drivers, interfaces, database design tools, monitoring tools, administrator tools and developer tools.

The bitter part of open source software usually is the poor documentation, although this is not true for PostgreSQL. The online documentation is a straight forward reference, while the authors of the PostgreSQL guide book incorporated strongly their personal flavor and opinions, which is great for a newbie or if you've got some time to sit down and read as opposed to just pick up for one immediate simple answer.

So, what does this sweeping adjective "extensible" mean for storing, describing and accessing of very heterogeneous scientific data sampled by physical measurements, human observations or imaging, in varying temporal and spatial scale, as a function of time and/or space?

In the following I will cover three extensions which sound very promising for LTER information management purposes. xml2 and tsearch2 come in the core distribution and PostGIS is downloadable from http://postgis.refractions.net/index.php.

xml2 lets you execute XPath queries against XML documents stored in a PostgreSQL database and convert XML documents using XSLT style sheets. This feature allows storing metadata which are described by EML as a whole document and could eventually lead into a redesign of the database schema of METACAT for performance improvement. tsearch2 is a full-text indexing and searching package that lets you turn your PostgreSQL server into a search engine.

PostGIS adds support for geographic objects in PostgreSQL, allowing it to be used as a spatial database for geographic information systems (GIS), following the OpenGIS Simple Features Specification for SQL. Open source software compatible with a PostGIS enhanced PostgreSQL database as a backend, include GRASS GIS and MapServer (a development environment for building spatially-enabled internet applications).

Besides publicly available extensions such as these, you have the option of writing your own extensions (server-side programming) in Procedural Languages or High Level Languages such as C and C++. In the latter case simply compile your code into a dynamic object file (.dll or .so) and add it to the PostgreSQL server. The procedural language PL/pgSQL basically adds control structures such as conditionals, loops, and exception handling to the SQL language. As addition to server-side programming, PostgreSQL features a whole list of API's for client-side programming supporting a number of programming languages such as C, C++, Java, Perl, PHP, TCL, Python and many more.

In the following I will list some other notable features and important outlooks. Unique to PostgreSQL is GiST indexing, in addition to the supported indexes B-tree, Hash and R-tree. GiST stands for "Generalized Search Tree" which is a height-balanced tree structure and provides a possibility to create custom data types with indexed access methods. Unlike most other DBMSs PostgreSQL uses the multi-version concurrency control (MVCC) model to coordinate multi-user updates as opposed to locking a table against updates and queries during a single-user update. One of the improvements people from the PostgreSQL development team are working on right now is advancing the support of database server clustering.

This article on PostgreSQL, the world's most advanced open source database, is neither meant to point out any standard features a relational database management system RDBMS has to fulfill to comply the definition nor whether it is good or bad to use any non-standard features by means of portability of a schema. It is rather meant to point out some features which are unique to the open source client/server object-relational DBMS PostgreSQL with particular focus on those most useful to the needs of LTER information management. If I intrigued you on PostgreSQL, I would suggest both, the online reference and the PostgreSQL guide book for some further reading. If you are interested in a tabular comparison of DBMSs, wikipedia hosts an apparently up to date resource.