Skip to Content

A Special Issue of Science on Data

Printer-friendly versionPrinter-friendly version
Spring 2011

Karen Baker (PAL, CCE)

A recent issue of Science (11 Feb 2011) focuses on data and the data deluge ( seems to reveal our nascent understanding of data as a complex entity with multiple scales and with similarities as well as differences across disciplines. The eleven articles in this special issue present different perspectives on data from the areas of climate, ecology, social sciences, health, stem cell, genomics, and neuroscience as well as visualization, signal processing, and metaknowledge. Data issues abound with categories named and frequently identified without the benefit of a broader or shared context. This results in a jumble of ill-described black boxes of data.

Adding to the confusion are statements found in summative articles such as

that proclaims “Scientists are wasting much of the data they are creating.” Such a leading sentence is misleading in its over-simplification of complex issues. Buried in this statement is blame and perhaps the suggestion that a choice was made not to pursue an alternative, ie to not waste data. Following the line of reasoning presented that includes assigning blame for data difficulties, the author could also have started off with an equally unsuitable statement that “scientists are taking too much data”.

This issue of Science is important in that it provides a readily accessible aggregation of disciplinary articles, a set of overviews of data in diverse fields. And there are additional hints of new understandings and new ways of knowing. The cover of the special issue itself (see first image below) is not composed of hardware images or streams of 0’s and 1’s suggesting binary bits but rather shows a semantically-summative word cloud. That is, Science Magazine, a science culture barometer of sorts, takes a semantic turn in using a word cloud to convey the notion of data issues. Another word cloud generated in 2008 from a working group abstract on Designing Infrastructure at a Computer-Supported Cooperative Work (CSCW) Conference (see second image below) provides an interesting comparative opportunity. In the NSF articles, the most frequent terms identified are data, research, information, new, climate, science, analysis, visualization, researchers and access while from the workshop studying how we do our science, the top terms include cyberinfrastructure/CI, CSCW, designing, group, research, interoperability, practitioners, collaborative, heterogeneous, and long-term. The former set highlights the what of science while the latter provides a window into issues involved in carrying out science today.

Though this special issue of Science represents a small step toward interdisciplinary understanding of data issues, I remain puzzled at the difficulty in conveying the perspective that those close-to-the-data-origin realize from experience – sometimes called the downstream or bottom-up viewpoint. From this vantage point, there is a lack of appropriate conventions and vocabularies for dealing with the current digital-age capacity to generate data. There is no need to assign blame but rather there is a critical need to recognize that these are remarkable times of transition, a time when we are in the midst of developing new conventions and vocabularies in all fields and at all levels of organization.

Science Cover