Skip to Content

Sustainable development

Printer-friendly versionPrinter-friendly version
Issue: 
Spring 2014

Inigo San Gil (MCM,LNO)

Abstract

In this article, we cover some sustainable code theory and practices. To exemplify these practices, we describe the Drupal Ecological Information Management System (DEIMS) working group practices adopted over the course of the last year, detailing the advantages and challenges.

Outline

After a brief introduction and motivation (why would anyone invest in sustainable code development practices), this article shows specific experiences based on sustainable code practices, including a detailed list of sustainable practices adopted by DEIMS. Having lived with these new methodologies for over a year now, we will look back and offer some reflections on what works and what pitfalls we have eliminated from our development pipelines. We emphasize the outcomes that helped our day to day development routine. In the spirit of transparency and full disclosure, we balance the article by noting the overhead and costs that are involved in adopting these sustainable practices. For the impatient, we advance a summary: overall, adopting some sensible sustainable development practices is an investment that you want to make, whether your daily work goes in tandem with an open source thriving community, or whether you work on a small team in custom-code closed source solutions.

Motivation: The good in sustainability.

Why would anyone change his/her project development modus operandiand adopt costly practices ? When your project is aimed at a large user base, sustainability becomes critical. You will have to look for efficiencies that will make the lifecycle of your project attain its original goals with the least effort.  Aspects to take into account when looking for better sustainability practices include, licensing, development process, project management, community process and marketing. One of the advantages was detailed by Servilla and Costa (2010): openness and the use of a software version control system. Following on openness, perhaps, an important early decision is choosing an open source platform or a proprietary base solution. Some proprietary products may have a clear edge on meeting your project needs. However, your choice may influence the sustainability of the project. Arguments in favor and against open source abound (von Hippel and von Krogh, 2003). At the very least, we should caution that an open source choice is not per-se a winning criteria. Not all open source project are the same: choose a project backed by a strong, active, thriving respectful community.  There are many adjectives that we used to constrain 'open source', and each of them is relevant.

Strong.  Small open source projects face daunting challenges.  Most (but not all) open source projects begin small, and many terminate in the maturing process or even before. Lack of enthusiasm, competition, dwindling support to keep the project alive are common causes of premature dismissal of a promising open source initiative. If you have a choice between adopting a solution sponsored by a strong open source initiative or a nascent weak one, you may want to avoid the risk of the weaker one, being most other parameters more or less balanced. For example, Drupal has over a million registered users with 33,513 developers2 whereas our earlier attempt (Aguilar et al., 2010) to meet the metadata editor challenge, based on the promising XForms technology, is de-facto obsolete with the announcement of discontinued support by Mozilla (2014) and the World Wide Web consortium.

Active. What level of activity? The total level of activity may offer you little actionable information. The activity level has to be viewed in the perspective of the community that supports it, and the balance between users and developers. It is easy to forget about the size of project when gaging the activity level. Large projects will likely have more activity than a small project, but may not necessarily be more active than a small but active project. Measuring activity and activity trends is elusive. Some guides we used: Ask how active and engaged is the community around the components of the solution being evaluated. Grade the overall activity relative to the smaller parts that compose the project. If the information is available to you, evaluate also the context: Examine similar projects of similar size and age. Other factor that may show the level of activity is the release schedule, and the typical content of such releases. Do not confuse activity with stability, stable projects tend to have longer release schedules and fewer volume of changes. For example, all Drupal projects offer clear measures of use, adoption, release and commits. For example, at any given Drupal project page, you will see the following data under the "project information" section: A maintenance status, development status, reported installs, number of downloads, date when the project was last updated and other statistics compiled in a graphs, regarding the maintenance state, such number of new issues, the response rate, the response time lag to an issue, number of open bugs and participants.  Per se, these statistics do not reveal much, but in the context of other similar projects, the statistics are of enormous guidance. Not all projects offer tools to gage the health of the system, but the mere existence of such tools should weigh in your decision.

Thriving. Is the community around this product a happy, growing community, or did it stall? Is it plaged by disenchanted participants? Are there many forks that surfaced owned by disgruntled participants? In contrast, a thriving community moves forward and the members irradiate enthusiasm about their involvement.

Beware of open source projects whose major contributions come from developers on the payroll of the project. Lack of substantial voluntary contributions may denote a lack of faith and passion for the project, ingredients that increase the long-term viability and sustainability of any software project.

Respectful. Good projects take in feedback and adapt to necessary changes. Holier than thou attitudes and a tight grip on the project code denotes dismissive and disrespectful attitudes towards the constituents, and often a sign that the project lacks any future. Communities built on a code of conduct that fosters integration, diversity and participation are more likely to thrive that communities controlled by a few persons that leave little room for innovation and progress.  The diversity of the community is also a good indicator of the community health, specially the gender ratio, as Petersen (2014) argues in her analysis of the influence of diversity on open source communities in the US and abroad.

Beyond the choice of open source project as a base framework, there are several choices you can make to ensure viability of your project. In addition to the aforementioned version control system, you may want to ensure your project is ready to be deployed.  No custom scripts, no obscure licenses (but procure a copy of a known license): streamlining deployment is key in the adoption process. A complex installation may erode the confidence of the adopter. One of DEIMS key objectives was to package the product in the simplest form possible. The DEIMS adopter has the choice of building from source, or to deploy a pre-built package. Long-winded documentation may detract the adopter. If the install process resembles what the adopter may expect, your project's chances of pleasing the adopters increase.  For example, installing the DEIMS package is almost the same as installing Drupal, however, this was not the case in the first DEIMS release.

Sustainable practices include the creation of quality documentation in several formats.  Good documentation that targets developers and users encourages them to commit to the use and maintenance of the software. Documentation is not just a set of guidelines, it is a reassuring sign the project meets one of the critical pilars of sustainability. Ironically, documentation is one of the first things dropped from the task list when pressed by tight schedules.  Software project documentation is akin to the metadata of a dataset. Make sure you budget resources to produce some documentation in the form of how-tos, videos, guided tours and the likes.  

As an added bonus, the longevity of your project may be helped by the addition of product-oriented training services to your adopters. Hands-on, relevant exercises offer the attendee a chance to actually practice.  If possible, design the exercises to cover issues of interest to your attendees to make a more satisfactory user experience.

Another component of sustainability within the development process is a public repository. GitHub was deployed in 2008, and today it is billed as the largest code-project host in the world, with over 13 million at the time of writing (Wikipedia, 2014). GitHub success is based on the ease to create, use and share code. Much more so than another long standing repository known as Sourceforge. In addition, attributing credit and tools for cooperation are well implemented.  An excellent collection of indexed help-pages would help the newbie to start using GitHub in no time. As a plus, GitHub lowered the barrier of git-knowledge requisite. Simple apps can be downloaded for any platform. The GitHub apps transform the relatively complex git operations with usable and intuitive workflows. Giving credit to contributors in GitHub is easy which is both stimulating and challenging for the developer. GitHub places the details of the development process the forefront of the project, providing a clean view of the chronological, developer-centric activity. Branches, commits, pull requests are all acompanied by good documentation for the visitor.3

Embracing sustainability: from theory to practice

In this section, you will find a summary of what DEIMS did differently during the last year in terms of sustainability. We highlight practices since the inception of the project, and how those have been changed over time.

A unified LTER Information Management system. Perhaps the most important sustainability aspect that DEIMS continues to exercise is the development of a common information management base platform for LTER sites (Gries et al., 2010).  There is nothing of more value in my mind than to keep a united team in the face of change and challenge.  Not as a traditional union, but as a cost-savings effort.  The DEIMS team acted on the realization that the LTER sites had been solving the same problems with different approaches and technologies over the last few decades.  Decades ago, this decentralized approach made sense: development choices were driven by the site in-house expertise. After all, each LTER site is responsible primarily before the NSF review panels about their own site-specific work.

In the age of decentralized information services (the cloud) and collaboration, access to information and sharing has never been easier. In fact, it is so easy to collaborate that it is hardly justifiable to work alone for a solution of problems for which there are existing solutions. Information sharing barriers are low enough to justify an investment on your LTER network partner products which in term produce a greater Return to the Investment than facing the same challenges alone. 

Adopted Agile as the development process while we were intensively scaffolding the new major version. This meant we have daily scrums with the developer team, where LTER participation resulted in important efficiencies for the project.  Agile (Highsmith, 2004) principles can be summarized as close cooperation, rapid development and delivery of the highest value product at each development iteration process.  The cooperation and rapid implies that we were intensively engaged with a developer team that moved faster than what could be imagined. If you want to experience the returns that Agile promises, your team will have to get engaged in the agile process actively with self-motivation. How does Agile relate to sustainability? IT projects need to move to the pace of the underlying industry.  Waterfall, which was the development process followed in earlier decades, where all milestones of the project were defined at the beginning, turned out to be inflexible in terms of adapting new trends and components, frequently rendering obsolete projects before the delivery date. For example, a five year schedule to release a software product is an anacronism in an industry where we see unexpected changes every six months in major products. Most of the major software corporations changed to Agile which creates more relevant and sustainable projects.  DEIMS lacked any structured development workflow in the initial stages.  It was largely two Drupal enthusiasts guided by two information managers, using Google Code as the public code repository.

Some argue waterfall has a place for large and long cycle projects, however, there is no clear consensus to our knowledge. We found many special beneficial features in the Agile process - we could test a critical piece of the architecture before any dependency was built around it, saving us time. We could also evaluate the real cost of deploying features before committing entirely to the task completion. This pre-evaluation was used to deck some costly features, such as a responsive template for a slide show. 

Fig 1. The following is a schematic that compares two major approaches to software development process. Note that waterfall starts from the top, and does not move into the next stage of development until the previous is complete. In contrast, in Agile the overall project is broken into atomic components, and each component goes into the entire development cycle, repeating the process constantly, and allowing to define and revisit the highest value component at each stage.

Agile vs. Waterfall

Adopted GitHub as the project repository.  We have a core project called DEIMS where the common core is installed.  LTER sites need certain extensions that sometimes are unique to the living model of the site.  All extensions and related customizations (such as the migration code from the previous information management sources) are also hosted in GitHub (see the Arctic LTER, Luquillo LTER, International LTER, the McMurdo LTER, the North Temperate Lakes LTER and Sevilleta LTER). As mentioned earlier, before this release, DEIMS used Google Code as public repository. While Google Code served well to accept community contributions and to market the product, it lacked the awesomeness of GitHub development workflows.

 

Fig 2. A simplified schematic of a GitHub based typical development workflow.

Github Workflow

An eye on the near future to decide the practical implementations for the immediate future, a condicio sine que non for any sustainable software effort. The best predictive knowledge you can use is based on science. In the absence of this knowledge, we found useful some near-term knowledge. We enrolled a development team with deep knowledge in the new version of Drupal. In contrast, DEIMS earlier release was did not account for changes. You should design keeping an eye of the advancements of your supporting framework, which in software projects, may be a mix of Java, Javascript, PHP, derivations of such languages and many others components of your final stack. Where is the main trend in the next release? What features are being deprecated, and what tools are being matured?

One Drupal aspect I value the most is the sincerity of the Drupal steering committee.  For the last three years, I listened to the "state of Drupal" hour-long talk offered at the main Drupal Conference hosted in the United States. Drupal founder Dries Buytaert (2012, 2013, 2014) analyzes the placement of Drupal in the current web context, and where, as a community, Drupal should be positioned. This analysis ensures that while the Drupal community works intensively on the current platform, the designs for the next major release are being worked on and applied. In the last four years, the Drupal community has experienced two major version changes, the Drupal-seven release has usability as the main focus while the forthcoming Drupal-eight release emphasizes the underlying engine. With D8, Drupal makes a brave leap from a procedural-leaning software base, to a object-oriented code base. DEIMS has been developed with Drupal-seven in its core, but newer critical components were developed with the next iteration in mind, creating a DEIMS future-proof release (Reid, 2014). Some DEIMS components that are object-oriented include the Ecological Metadata Language, the Biological Data Profile, the International Standards Organization 1911X and the harvest list services. DEIMS has implemented seventy-five percent of the list of Drupal-8 module backports that Reid covers.

Steering the human capital. A sustainable project must have good governance. DEIMS mimics well the Drupal diversity within its ranks.  DEIMS works with open input from its constituents, stakeholders and public: The issue queues both at GitHub and drupal.org are not limited to an LTER subset of people, but open to the world for feedback and therefore, open to the world for contributions. Questions are addressed openly and accepted in terms of merits, welcoming development from those who conform to the Drupal code practices. 

A serene and playful arena.  Sustainability relies on the people who make up the ecosystem. Just like in the anthropocene era, in software development the human dimension plays a fundamental role. How we work as a group determines the viability of a project. Scholarly analyses (Koziolek, 2011) often forget to mention this critical pillar of sustainability, perhaps because of the difficulty to objectively quantify and assess the human role. One surprising aspect of the crowded issue queues of Drupal is the relative absence of un-constructive comments. Experts are patient and point the newbies in the right directions, and overall there is a to-the-point focus on the queue. No deviations to the task at hand.  As a result, any Google search would likely land you in the relevant topic, sans the spam or the abuse. Part moderation, part code-of-conduct, the forums are inviting and useful. DEIMS added issue reporting to both GitHub and drupal.org, while continuing the Google Groups and email for group communication.

Reuse Readiness. As Marshall (2008) argues "The reusability of software products is a factor often neglected in common measures of technology maturity". When you consider your project, ask yourself: How can a new developer re-use and re-purpose your software project. Can he or she extend your project framework to make it conform to new requirements?  DEIMS design stricltly followed this pattern.  DEIMS re-uses the contributions of 86 extension to the 41 modular pieces in the Drupal core. The vast majority of the project is re-usability of existing, living and actively maintained code.  DEIMS only created 11 modular extensions that are entirely custom, some of them to meet du jour4 requirements, such as a content export specification or the consumption of external services, such as the client for the LTER unit dictionary.

Exit strategies.  A sustainable design must have a clear technological exit strategy. DEIMS has several paths that an adopter can use to amass and export its content.  DEIMS can provide the entire cross-related content using SQL queries rendered in comma delimited files. DEIMS can use services, both expressed in XML schema conforming services, schema-less XML and JSON, which is favored by most modern applications. DEIMS also used the Drupal migrate framework, dividing content into sources, destinations and mappings, and implementing plugins to adapt the sources, destinations and even tweak the mappings for the oh-so-common exceptions.  

Challenges

Changing habits is not trivial. Think of the many times you go to the grocery store with your eco-friendly bags5.  Better yet, look around you in the checkout lane: how many people still load their goodies in plastic or paper bags. If your checkout lanes are not much different from those in Albuquerque, you may be wondering why such a simple habit is not quite adopted yet, despite the benefits. The analogy holds for software development. Change brings the deepest fears on us, yet we need and embrace change to succeed in providing society with the best predictive knowledge that science can offer.

Versioning systems such as Subversion or Git have been around for a long time, yet, some of us still manage to skip the learning curve and keep our custom and idiosyncratic personal stash of versions and what-did-i-do-yesterday confusion.

Documentation is one of the first things dropped from the task list when pressed by tight schedules. Yet, a software project without documentation is like a dataset without context (metadata). 

Complexity. The data generated by LTER is complex, heterogeneous, incomplete and unwieldy. The systems to document these data and information are not surprisingly complex.  With each iteration that makes the use of the information system, the complexity of the management system increases. 

Funding model and structure.  Information management at LTER as a community is not far from the cooperative model necessary to thrive in this increasingly complex information delivery ecosystem. However, funding allocations present a challenge that has to be overcome. DEIMS managed to bring together funds from several sites to develop a product that benefits all, however, the mechanisms we used to fund the working group and project are not the best, nor easy to implement.

Summary and Concluding Remarks

We conclude the short research and experiences on sustainability narrated here with the hope to open a dialogue on the important sustainable development topic. We saved many hours by following practices that required us to learn something new (Agile, Gitflow, GitHub, the Drupal API, to mention a few). A final few take home ideas will be echoed in the next bullets.

Open your code to the wider community.  Let strangers tell you about bugs, problems and possible enhancements. Let people request a new feature, suggest changes, and also, let them contribute fixes to your living project.

Do not build from scratch. Whatever your need, ideas, or recipes, all those have been invented before, several times. Adopt a seasoned, open source solution that is synergistic with your project goals.  Joining and extending an existing project will give you a heads-up in community, project maintenance, sustainability and maturity from the start.

Evaluate and test your options.  When embracing an existing project, make a list of pros and cons. Look for red-flags, predictable impediments that would doom your courageous initiative.

Find what moves you most in the sustainable lifecycle, and invest your efforts in there, most developers find their peak productivity when they do things they consider fun or engaging or challenging.

Finally, a shameless plea for collaboration. Please join in with your excellent expertise and enthusiasm to power the circulation of knowledge for the ecological community. There are countless ways that you can contribute, here is a short list.

Documentation big and small. Issues you find. Features you miss.

Testing: Can you test some aspect that?

Developing: Ticked by the ISO dataset object? The queue that waits for external services? The deploying process? there are ample new (even for Drupal) ways in which we used the Drupal API. If developing is amusing to you, you'll find many opportunities in a code of this magnitude. Check the GitHub issue queue and the Drupal issue queue for DEIMS.

Translations: You could help translating the project (the components, from documentation to labels). Drupal has an awesome translation management system!

Funding: We are opened to your sponsorship.

You are most welcome in joining the DEIMS grass-roots initiative, and we would love your (anticipated) contributions. 

Footnotes and Comments

1Modus Operandi aka MO.  Oh, yes, it is latin for routine, habit, common method, standard protocol or more literally, method of operation.

2Number of active Drupal developers as indicated in the project front page. Resource visited on June 2014 at http://drupal.org

3One interesting consequence of this level of transparency at GitHub is how open source is changing how we are hired. GitHub and other public repositories provide employers with the ability to judge a developer by the amount and quality of her contributions. The footprint we leave on the web provides a metric which may be arguably more important (see Lopp, 2013) than the old 'list of mastered computing languages' and 'roles' that appear in classic resumes.

4From Merriam-Webster: popular, fashionable, or prominent at a particular time <the buzzword du jour>

5Bagging in western Europe is optional: Do you want a plastic bag? We got them! ranging from 10 to 20 euro cents per bag and just as conveniently available in the checkout lane.  The bagging experience spectrum is substantially different.  

References

Aguilar, Raul; Pan, Jerry; Gries, Corinna; San Gil, Inigo and Giri Palamisamy. A flexible online metadata editing and management system (2010) Ecological Informatics. 5. 26–31. Resource at: http://caplter.asu.edu/docs/papers/2010/CAPLTER/Aguilar_etal_2010.pdf

Buytaert, Dries. Denver DrupalCon Keynote. (2012). Resource at: http://www.youtube.com/watch?v=RddJvlbSY88

Buytaert, Dries. Portland DrupalCon Keynote. (2013). Resource at: http://www.youtube.com/watch?v=PCLx4fRHmCk

Buytaert, Dries. Austin DrupalCon Keynote. (2014). Resource at: http://austin2014.drupal.org/keynote-dries-buytaert

Callaway, Tom.  How to know your free or open source software project is doomed to fail.  (2009). Resource at: http://spot.livejournal.com/308370.html, last accessed June 2014.

Gries, Corinna; San Gil Inigo; Vanderbilt, Kristin and Garrit, Hap. Drupal developments in the LTER Network (2010), Databits, Spring 2010 ed. Resource at http://databits.lternet.edu/spring-2010/drupal-developments-lter-network

Highsmith, Jim.  Agile Project Management: Creating Innovative Products (2004). Addison-Wesley.

Hippel, Eric von, and Georg von Krogh. Open source software and the “private-collective” innovation model: Issues for organization science. (2003) Organization science 14.2 : 209-223.

Koziolek, Heiko. Sustainability evaluation of software architectures: a systematic review. (2011) Proceedings of the joint ACM SIGSOFT conference--QoSA and ACM SIGSOFT symposium--ISARCS on Quality of software architectures--QoSA and architecting critical systems--ISARCS. 

Lopp, Michael. The Engineer, the designer and the dictator. (2013) Portland DrupalCon. Resource at : http://www.youtube.com/watch?v=rK4Om-_My7Q

Marshall, James J., and Downs, Robert. Reuse readiness levels as a measure of software reusability. (2008). Geoscience and Remote Sensing Symposium, 2008. IGARSS 2008. IEEE International. Vol. 3. IEEE.

Mozilla, Archive of Obsolete Content (2014) https://developer.mozilla.org/en-US/docs/Archive/Web/XForms

Penzenstadler, Birgit; Bauer, Veronika; Calero, Coral and Franch, Xavier. Sustainability in software engineering: A systematic literature review. (2012). 32-41. 16th International Conference on Evaluation & Assessment in Software Engineering, Ciudad Real, Spain.

Petersen, Erynn. Austin DrupalCon Keynote. (2014).  Resource at: http://www.youtube.com/watch?v=zoia8WZ6q5w

Tate, Kevin.  Sustainable Software Development: An Agile Perspective (2005). Addison-Wesley. ISBN:0321286081

Reid, Dave.  Future proof your Drupal 7 site. (2014). Watch resource at http://www.youtube.com/watch?v=z3_MqGLjqkA

Servilla, Mark and Costa, Duane. Openness and transparency development lter network information system (2010). Resource at: http://databits.lternet.edu/spring-2010/openness-and-transparency-development-lter-network-information-system

GitHub, Wikipedia (2014) Resource at http://en.wikipedia.org/wiki/GitHub