Skip to Content

SIO Ocean Informatics Update: Growing Infrastructure in Support of Scientific Research

Printer-friendly versionPrinter-friendly version
Issue: 
Spring 2010

Karen Baker (PAL, CCE) and Jerry Wanetick (CCE, PAL)

We report on the growth of an information infrastructure that began with the Ocean Informatics Initiative in 2002 at Scripps Institution of Oceanography. The initial aim to support the scientific research of two LTER sites, PAL and CCE, has expanded to include additional projects. In order to inform our future development, we look back on some of the milestones.

We define information infrastructure as encompassing both computational and informatics capacity. Two key points have guided the work of Ocean Informatics. First, we view the growth of infrastructure and its articulation as an ongoing process -- both conceptually and in practice. Second, we recognize the multi-dimensional nature of infrastructure. Working within the context of the intertwined social, organizational, and technical dimensions is sometimes called a 'sociotechnical' approach.

Infrastructure comes center stage as a result of expanding expectations associated with data. Today, there is an increase in data and information management work due to the increasing quantity of data and its diversity, the increasing sophistication of data analysis and visualization, and the increasing number of standards and exchange requirements. There is concurrent expansion of scientific work as researchers assume responsibility for not only data use in support of traditional local scientific work but also for preparation of data for reuse by disciplines outside of the traditional boundaries of the intended domain. Data reuse is stimulated by new, integrative, science approaches and goals.

Tasks related to instrument platforms and computational hardware have evolved, moving from a single central computer to distributed computational arrangements and hybrid storage solutions as well as from independent applications to distributed information systems, online data repositories, and web service architectures. And the next steps are in sight: machine virtualization and cloud computing. In addition, analytic work has exploded, now including collaborative efforts and community processes particularly those relating to semantic issues and standards.

Four elements of the Ocean Informatics (OI) infrastructure are summarized in Table 1. Each element is described in three dimensions: social, organizational and technical.

Table 1. Multiple Dimensions of some Ocean Informatics infrastructural elements

Infrastructure Dimensions
Elements Social Organizational Technical 
Foundation Roles IM strategy Design process
Collaboration Teams Shared resources Shared solutions
Networking Communities and communication Policy, personnel, resources, and identity Online connectivity and applications
Environments Learning Information Distributed collaboration

1. Foundation

The foundational elements of our informatics approach are the roles that delineate the distribution of work, the information management strategies that frame the work, and the design approach that defines the ongoing work process. With individual roles within information management evolving as rapidly as computer applications and data requirements, the work of OI participants includes mediation, translation, and articulation. We anticipate mediation requirements increasing over time even as workflow tasks become more automated. We foresee information expertise requirements expanding and diversifying from need for a programmer, a system administrator, and a data manager to include the expertise of information managers, systems architects, infrastructure designers, informaticians, and data scientists. Organizationally, we recognize the importance of developing an information management strategy and to do so in multiple arenas at multiple levels - group, project, community, institutional, and domain. First and foremost this involves understanding informatics as a scientific discipline replete with theoretical and empirical concerns. Our understanding of the design process has changed as computational capabilities have moved from ftp to gopher to WWW and most recently to emergent immersive environments framed by Web 2.0 and Google Earth. The OI design approach is integrative, keeping the long-term firmly in mind while undertaking everyday work. We've moved from focus on individual file systems to relational information systems to a vision of distributed systems, abstraction layers, and web services.
 

2. Collaboration

New types of interdisciplinary collaboration are key to imagining and planning for contemporary connectivity. For Ocean Informatics this has taken the form of multi-project, collaborative teams that bring together different disciplines (e.g. biology, physics, and science studies), different projects (e.g. LTER, CalCOFI, and SCCOOS) and different organizations (university and government labs). In order to support needed expertise and facilitate shared resources, a recharge facility, The Computational Infrastructure Services (CIS), was established in 2008 at a division level. Shared solutions have expanded with CIS to include desktop support, a ticketed help line, shared printers, and augmented storage. On the horizon are virtualization of servers and participation in a collocation facility to address physical platform location at an institutional level.

3. Networking

 
As the roles associated with data and information continue to change and diversify, communities and communications become important to formation of working groups. Within the LTER an information management committee represents an active community of coworkers that is drawn upon. Organizationally, there are policy, personnel, resources and identity to consider. At the institutional level, we have established a co-directorship of OI that reports at the division level. The technical dimension of networking involves developing and maintaining online connectivity and applications. Technological services and tools enable bridging individual repository arrangements but creating a coordinated web remains a grand challenge. Networks require extensive infrastructure in place at the individual, institutional, and national levels.

4. Environment

In planning and carrying out the work of Ocean Informatics, we emphasize learning and a learning environment by continually forming and reforming reading groups and creating professional development and leadership opportunities in addition to training in new technology use. Organizationally, we focus on infrastructure as central to creating an information environment. Such arrangements ensure data and information are able to travel among laboratories, crossing projects and data repository boundaries. A distributed, collaborative environment is the very public, online digital realm within which we operate, and the substrate for much of our work.

Table 2. Ocean Informatics Timeline

Time frame

Milestone

Implementation Details

2002

Need for new approach to shared infrastructure recognized

Conceptualized Ocean Informatics as infrastructure to support PAL LTER

2003

Ocean Informatics Initiative launched

Baker, Wanetick, Jackson, 2003 paper

2004

Shared servers & backups expanded

iOcean, server online

2004

Multi-project effort launched

Began funded support with CCE LTER

2005

Staff expanded and informatics reading group initiated

Discussion groups launch with Berners-Lee and The Semantic Web

2005

Dictionary work launched

Unit Dictionary prototype followed by implementation of units in local system

2006

Open Directory implemented

Federated authentication and authorization; LDAP and Kerberos

2006

Collaboration server and software instantiated

iSurf, collaboration server online

2006

Design studio established

Physical manifestation of OI with design table to facilitate co-design activities

2007

Initiated collaboration with WHOI IM

Began joint discussions; resulted in paper by Baker and Chandler, 2008, DSR

2007

Digital storage infrastructure expanded

From physical disks to RAID configuration

2007

DataZoo information system launch

Development of multi project architecture

2007

Added two new projects

Began funded support with CalCOFI

2008

Recharge Facility established

Desktop support initiated with centralized remote-office; shared printing established

2008

Semantic dimension added to DataZoo

Launch of  qualifier system with units  and attributes as DataZoo semantic integration

2008

DataZoo information environment launch

Multi-component approach for differing data collections

2009

Ocean Informatics institutional integration

Ocean Informatics directors report at division level

2009

DataZoo web service architecture launch

LTER Unit Registry, OI Controlled Vocabularies, OI Plot Server

2010

Facility Services expanded

Help request implemented; cishelp@sio.ucsd.edu

2010

Collaboration server updated

vSurf, server virtualization online

Toward an Information Environment

Table 2 provides a timeline of Ocean Informatics milestone events.

Co-founders Karen Baker and Jerry Wanetick summarized the initial Ocean Informatics vision in two early publications highlighting design, collaborative care, informatics and the concept of an information environment (Jackson and Baker, 2004. Ecological Design, Collaborative Care, and Ocean Informatics in Proceedings of the Participatory Design Conference, 27-31 July, Toronto; Baker, Jackson, and Wanetick, 2005. Strategies Supporting Heterogeneous Data and Interdisciplinary Collaboration: Towards an Ocean Informatics Environment in Proceedings of the 38th Hawaii International Conference on System Sciences (HICSS), 3-6 January, Big Island, Hawaii, pp. 1-10, IEEE, New Brunswick, NJ).

It's a long way from the time of individual punchcard deck submissions and centralized computer centers to working on shared infrastructure with distributed servers in collaborative arrangements. Ocean Informatics strives to maintain sensitivity to the new understandings and transformational aspects that emerge from the interplay of traditional computational arrangements with new concepts featuring design processes and collaboration enabled by new technologies. Attention to development of multi-dimensional infrastructure facilitates the move from individual data services and independent systems to new types of information environments.