Content Header

Abstract

James L. Mullins
Purdue University Library, West Lafayette, Indiana, USA

Enabling international access to data sets: the Distributed Data Curation Center (D2C2)

The increasing recognition of the role that libraries and librarians must play in the management of massive datasets, enabling e-science and e-research, is on a steep trajectory. The latest indication of this is the report issued as a result of the Association of Research Libraries (ARL) workshop held in Arlington, Virginia, USA, September 26-27,2006, on New Collaborative Relationships: The Role of Academic Libraries in the Digital Data Universe, issued by NSF under the title, To Stand the Test of Time: Long-term Stewardship of Digital Data Sets in Science and Engineering. The presenter of this proposed paper was one of the participants, invited to represent academic libraries. The complete report is available at: http://www.arl.org/info/events/digdatarpt.pdf .

To date not many research university libraries have determined what their role is in working with disciplinary colleagues, computer scientists, information technologists, and statisticians, possibly because the challenge is immense and complex. However, it is one that requires the collaboration of librarians, drawing upon their specialized knowledge and insights on organizing data using agreed upon standards, ontologies and taxonomies, and create middle ware that provides links to and retrieval of available and needed data.

Case Study: During the past several years Purdue University Libraries have become aware of the issues associated with the management and curation of data sets, both massive and laboratory/research team specific. Understanding that it would be difficult for researchers and faculty in disciplines who have a certain “perception” of librarians and the role of libraries, it was determined that it was necessary to create an entity that could serve as a “neutral place” through which librarians and colleagues throughout the university could interact and ultimately apply for grants. From this, the Distributed Data Curation Center (D2C2) was conceived and created.

The aim of the D2C2 is to address curation issues and work on problems related to unorganized, disparate, heterogeneous and distributed data, data workflow and environments. It obviously will work closely with the efforts of other agencies, centers, and groups which are doing related work so that practices and standards can be shared, reviewed and evaluated. Just as the D2C2 can build on the efforts of others who have been working in this area for the past few years, it is assumed that applications and solutions developed in the D2C2 will benefit others. As it is not likely that one “fits all” solution in this area will ever be agreed upon, it is the aim of the D2C2 to research insights, applications and systems to facilitate the distributed nature of curation.

To convey that this is a university wide initiative (although the acting director of D2C2 is the associate dean for research of the Libraries), the advisory board that reports to the provost and the vice president for research is comprised of the deans of agriculture, engineering, science, and libraries along with the vice-president for information technology as well as highly respected Purdue researchers in science and engineering. D2C2 reports directly (as required of all Purdue research centers) to a dean -- the dean of libraries in this case. For more information about D2C2, see Purdue Centers website, http://dagon.admin.purdue.edu/cgi-bin/ci.cgi scroll down to Distributed Data Curation Center.

This paper will delve into the reason for, the challenge to create, and the progress made to date for the Distributed Data Curation Center (D2C2) at Purdue University, USA.