A useable simulation model archive: Does it really exist?
Mark E. Harmon, Richardson Chair and Professor, Department of Ecosystems and Society, Oregon State University, Corvallis, OR, 97331
Edward B. Rastetter, Senior Scientist, The Ecosystems Center, Marine Biological Laboratory, Woods Hole, MA, 02543
At some point in a career everyone no doubt wonders what their legacy will be. That is certainly the case for the two of us, as we are nearer the end than the start of our careers as ecologists. While this might seem a narcissistic obsession, there is a practical side as well for everyone else. One of the hallmarks of humans as a species is the ability to pass knowledge, skills, technology, and experience from one generation to the next. So it is sensible to wonder what will be passed along from one scientific generation to the next. For scientists, the hope is to pass along critical concepts, facts, technologies, samples, and observations. If this does not happen then each generation must start over and scientific progress will be delayed because old findings have to be rediscovered.
It is not enough to want to pass a legacy along - there must be mechanisms to enable this process. In thinking over the various parts of our own legacies, we are confident that for some parts the mechanisms are well established. Most anything appearing in a journal article or book, be it either data, concepts, or knowledge, will be archived in various libraries, online databases, and other well established and redundant systems such as JSTOR, Questia, SCOPUS, the Web of Science, and even the Library of Congress. Ecological data itself can be entered, documented, and retrieved using several systems such as CDIAC (Carbon Dioxide Information Analysis Center), the US LTER's NIS (Network Information System) and the Ecological Society of America's Ecological Archives, all of which will continue to evolve but at least are in place. The ability to pass along physical samples is less sure, however. Systems to archive, curate, and retrieve this material are widespread, ranging from classical organism collections in museums to site-based systems such as Hubbard Brook's Sample Archive. The main issue is whether or not one can take advantage of these systems based on one's location and type of sample. What we do not seem to be able to find is a system to enable legacies for ecological simulation models (Thornton et al 2005). The rest of this commentary describes what we think such a system would look like and the steps required to create it.
Before describing this system, let us first be clear why we think simulation models are in particular need of attention. There are many forms of models used in ecology. Analog models (e.g., microcosms) are generally not saved, although no doubt some are in museums and many are described in publications. Since being replaced by digital models there is probably little need to be concerned about the ability to recreate analog models outside of an educational setting. Conceptual (e.g., N-saturation model of Aber et al. 1989) and analytical (e.g., NEE model of Shaver et al 2007) models are usually relatively simple and well documented in publications. The same is usually true for empirical models as the main purpose of many publications is to use data to develop these relationships. While the data to develop empirical models is often inadequately presented in publications, there are at least systems (as noted above) to store everything from the raw to the processed, cleaned data. Simulation models are generally more complicated than analytical ones, and although described in publications, there is generally not enough room to do this fully; hence much of the information needed to use or recreate them exists outside the publication system. Moreover, unlike either an analog or analytical model, digital simulation models are generally not simple to recreate and given the limited description in publications, may be impossible to recreate exactly from that source alone. This is unfortunate because simulation models have become a way to synthesize ecological knowledge, explore integrative hypotheses, and analyze complex systems. As these reflect relatively new ways to think about ecological systems, failing to pass simulation models from one generation to the next is potentially an extremely unfortunate situation that could slow progress in ecological sciences. In a sense it would be similar to every generation having to reinvent the elemental analyzer or some other critical piece of technology that we currently take for granted.
Developing a system to usefully document, archive, and retrieve ecological simulation models will involve considerable thought. Part of the complexity of this effort is reflected by the fact simulation models are really an amalgamation of concepts, hypotheses, data, and technology. Fortunately parts of other systems can be reused and modified to create this new model archiving system. For example, data are usually used to drive simulation models and data is a primary model output. Documentation of these model-related data can take advantage of existing systems such as LTER's NIS that document, archive, and retrieve data (spatial or non-spatial). Model parameters can also be described using these systems; however, there is additional information on the source, transformation (often parameters are derived from data and this process needs to be described), uncertainty, and other aspects of model parameters that need to be added. It is extremely useful to understand how sensitive a model is to changes in driving variables and parameters. While some of this information may be described in publications, detailed examinations of sensitivity often undertaken by model developers are generally not formally documented. It would be useful for future users if a sensitivity analysis was part of every simulation model's documentation (e.g., Grimm et al. 2014). As simulation models are developed it is not unusual to have multiple versions of models and while it may not be practical to save every version, those that represent significant milestones (e.g., either a major change in functionality or publication of a key analysis) of development should be archived. Fortunately conventions developed for other forms of software development can be used. While storing of the computer code (i.e., source and executable files) will not be challenging per se, one cannot expect computer code created on one operating system environment to automatically be useable on another or for the code to run under some future operating system. Therefore, in addition to the code itself, it may be necessary to archive the operating system in which that code was developed, which in turn might mean also physically saving the hardware able to run that operating system. Alternatively, new technology now allows the development of virtual machines to simulate one operating system and associated hardware on another. These virtual machines can be more easily passed from one generation of system to the next. Finally, as the point of archiving simulation models would be to use them again, this process would be greatly helped by archiving input and output data that can be used to test if the recreated model is acting as expected and to serve as a template for formatting new parameter and driver files to be used with the archived model.
We envision an archival system where not only the full information needed to recreate the model is available, but the model itself is available and usable under any future operating system. Imagine, for example, being able to rerun the original Botkin et al. (1972) JABOWA simulations for Hubbard Brook from your computer without having to recode the model. Bear in mind the original JABOWA was developed on punch cards for an IBM mainframe computer, a system not currently available to anyone outside of a museum. In the archive we envision, the JABOWA code and a simulated IBM operating system would be archived on a virtual machine so that the model could be rerun not only with the original input files but with any newly created input file in the same format.
We have few illusions as to the challenges to be faced in developing the proposed system and while issue has been previously noted by others (Thornton et al 2005), little appears to have been done to address it. This indicates to us that perhaps one of the largest challenges is to have scientists and funding agencies recognize the need for such a system and to understand that it would be different from what currently exists. We may be mistaken, but in our conversations with others we have the impression that there is a widespread belief that such a system already exists (how could it not?) or that current systems for data will be sufficient. We are not really sure a general system exists and suspect those that might are not sufficient without some modification. Another challenge is that a system to document, archive, and retrieve simulation models will cost time and resources in development and in use. Those using data systems will understand that proper documentation and archiving of data can add 25-35% effort to a project, which in a fixed budget world means fewer publications and presentations. We would expect the same costs for a simulation model system; unless scientists and funding agencies support these costs in terms of lower short-term productivity there will be reluctance of simulation model developers to bear them. This would be indeed unfortunate in that failure to accept these short-term costs will likely come at the expense of long-term productivity.
Aber, JD, KJ Nadelhoffer, P Steudler, and JM Melillo. 1989. Nitrogen saturation in northern forest ecosystems. BioScience 39: 378-386.
Botkin, DB, JF Janak, and JR Wallis. 1972. Source Some Ecological Consequences of a Computer Model of Forest Growth. Journal of Ecology 60:849-873.
Grimm, V, J Augusiak, A Focks, BM Frank, F Gabsi, ASA Johnston, C Liu, BT Martin, M Meli,V Radchuk, P Thorbek, and SF Railsback. 2014. Towards better modelling and decision support: Documenting model development, testing, and analysis using TRACE. Ecological Modelling 280:129-139.
Shaver, GR, LE Street, EB Rastetter, MT van Wijk, and M Williams. 2007. Functional convergence in regulation of net CO2 flux in heterogeneous tundra landscapes in Alaska and Sweden. Journal of Ecology 95: 80--817.
Thornton, P. E., Cook, R. B., Braswell, B. H., Law, B. E., Shugart, H. H., Rhyne, B. T., and Hook, L. A. 2005. Archiving numerical models of biogeochemical dynamics. Eos, Transactions AmericanGeophysical Union 86: 431-431.