Skip to Content

SchemaSpy: No dust will collect on this Database Documentation.

Printer-friendly versionPrinter-friendly version
Fall 2010

M.Gastil-Buhl (MCR)

SchemaSpy is a very useful program for creating database documentation from the database itself, and because it does so automatically, it is easy to keep such documents current.  Of the six products I trialed, not only is this program the most universal but also creates the most useful description of database structure. DBScribe, described in Databits (W.Sheldon, Spring 2009), although a good product, only runs on PCs and cannot connect to a remote PostgreSQL host.

Recently I ported some of the databases Wade Sheldon wrote for GCE-LTER from SQL-Server to PostgreSQL. Those familiar with Microsoft SQL-Server Management Studio know how convenient it is to generate Entity-Relationship Diagrams (ERD) and MySQL Workbench offers a similar feature. SchemaSpy works for all the rDBMS used by LTER sites (MySQL, Oracle, MS SQL-Server, and PostgreSQL). It is independent of the OS (PC, Mac, Linux) because it is written in Java. It can access a database on the local host or a remote host. It runs on the command line so it can be invoked automatically at regular intervals. (There is also a GUI built for it which I did not test.) To summarize: SchemaSpy creates (or refreshes) the documentation with a single command; that part is not interactive. The documentation itself is an interactive webpage. The features I describe below are within that webpage.

Figure 2. Database documentation in tabbed html pages generated by SchemaSpy

Figure 1. The documentation is published as a tree of html pages which appear as tabs.

The tables in the ERD are mapped links to expand or contract scope to explore the data model of a whole database, just one schema of the database, or selected tables. The view of part of the ERD can be interactively expanded to show just the neighboring parent and child tables or all relations within two generations either side at once with toggle. SchemaSpy automatically places parent tables to the left and child tables to the right of the selected table. The foreign key relationships between tables point directly to the referenced columns (shown as rows in the box for each table). In most ERD-drawing tools the point where this line connects is not automatically aligned with specific columns. Figure 2, below, is a screenhot; the SchemaSpy homepage offers a live example. Sharp eyes will note the row counts for each table. As this database was just ported, no data has been entered yet. Because SchemaSpy is easy to refresh, I did not wait until the final version of the port is completed. Even small changes, such as converting the SQL-Server bit data type to boolean in PostgreSQL are worth updating in the documentation, especially when development is a collaboration between sites.  Figure 2 shows just a subset of the tables in the GCE implementation of ProjectDB; compare to the complete ERD shown in Sheldon & Carpenter, this issue. A disadvantage of diagrams drawn by SchemaSpy is that they are optimized for interactive browsing rather than creating a graphic that fits a whole ERD into one page.  (For that I use another graphviz-based tool.) 

Entity-Relationship Diagram of a subset of the GCE implementation of ProjectDB

Figure 2. Entity-Relationship Diagram of a subset of the GCE implementation of ProjectDB as drawn by SchemaSpy. Primary key columns are green.

Complete and accessible documentation is necessary, but not sufficient; it must be maintainable. How large a fraction of our work time can we afford to spend documenting and updating the documentation? Time invested depreciates as the continually developing systems diverge from static documentation. Recently, during the VTC with our NSF program manager at the September IMC meeting, we were reminded of the value of "sustainable software". A necessary part of that is maintaining useful documentation that is both technically rich as well as meaningful to non-technical stakeholders. The easier it is to maintain documentation, the more likely it will be kept up to date by a busy Information Manager. SchemaSpy can be re-run with one command to update the documentation of a database, using only the database itself.

Further Information:

  • SchemaSpy homepage on sourceforge:
  • Described in 'Java Power Tools' by O'Reilly, 2008, Chapter 30 'Automatically Generating Technical Documentation'.
  • Dependency: Graphviz (optional but the most useful part.)
  • Summary on Freshmeat:
    • License: LGPL "Lesser" General Public License [2]
    • OS Independent because implemented in Java
    • "SchemaSpy analyzes database metadata to reverse engineer dynamic Entity Relationship (ER) diagrams. It works with just about any JDBC-compliant database and can identify Ruby on Rails style databases, as well as other implied relationships."
  • GUI version also in Java.