skip to main content
10.1145/1816123.1816173acmconferencesArticle/Chapter ViewAbstractPublication PagesjcdlConference Proceedingsconference-collections
research-article

Digital libraries for scientific data discovery and reuse: from vision to practical reality

Published: 21 June 2010 Publication History

Abstract

Science and technology research is becoming not only more distributed and collaborative, but more highly instrumented. Digital libraries provide a means to capture, manage, and access the data deluge that results from these research enterprises. We have conducted research on data practices and participated in developing data management services for the Center for Embedded Networked Sensing since its founding in 2002 as a National Science Foundation Science and Technology Center. Over the course of eight years, our digital library strategy has shifted dramatically in response to changing technologies, practices, and policies. We report on the development of several DL systems and on the lessons learned, which include the difficulty of anticipating data requirements from nascent technologies, building systems for highly diverse work practices and data types, the need to bind together multiple single-purpose systems, the lack of incentives to manage and share data, the complementary nature of research and development in understanding practices, and sustainability.

References

[1]
Hey, T., Tansley, S. & Tolle, K. (Eds.). (2009). The Fourth Paradigm: Data-Intensive Scientific Discovery. Redmond, WA: Microsoft. Retrieved from http://research.microsoft.com/en--us/collaboration/fourthparadigm/ on 16 December 2009.
[2]
Embedded, Everywhere: A Research Agenda for Networked Systems of Embedded Computers. (2001). Washington, D.C.: National Academy Press. Retrieved from http://www.nap.edu/ on 11 March 2005.
[3]
Center for Embedded Networked Sensing. (2009). Retrieved from http://research.cens.ucla.edu on 14 April 2009.
[4]
Hey, T. & Trefethen, A. (2005). Cyberinfrastructure and e-Science. Science, 308: 818--821.
[5]
Atkins, D. E., Droegemeier, K. K., Feldman, S. I., Garcia-Molina, H., Klein, M. L., Messina, P., Messerschmitt, D. G., Ostriker, J. P. & Wright, M. H. (2003). Revolutionizing Science and Engineering Through Cyberinfrastructure: Report of the National Science Foundation Blue-Ribbon panel on Cyberinfrastructure. National Science Foundation. Retrieved from http://www.nsf.gov/cise/sci/reports/atkins.pdf on 18 September 2006.
[6]
Borgman, C. L. (2007). Scholarship in the Digital Age: Information, Infrastructure, and the Internet. Cambridge, MA: MIT Press.
[7]
Borgman, C. L. (2004). The Interaction of Community and Individual Practices in the Design of a Digital Library. International Symposium on Digital Libraries and Knowledge Communities in Networked Information Society, University of Tsukuba, Tsukuba, Ibaraki, Japan., University of Tsukuba. Retrieved from http://www.kc.tsukuba.ac.jp/dlkc/e-proceedings/papers/dlkc04pp9.pdf on 10 April 2006.
[8]
Borgman, C. L., Wallis, J. C. & Enyedy, N. (2006). Building digital libraries for scientific data: An exploratory study of data practices in habitat ecology. 10th European Conference on Digital Libraries, Alicante, Spain, Berlin: Springer. 170--183.
[9]
Borgman, C. L., Wallis, J. C. & Enyedy, N. (2007). Little Science confronts the data deluge: Habitat ecology, embedded sensor networks, and digital libraries. International Journal on Digital Libraries, 7(1--2): 17--3029 September 2007.
[10]
Borgman, C. L., Wallis, J. C., Enyedy, N. & Mayernik, M. S. (2006). Capturing habitat ecology in reusable forms: A case study with embedded networked sensor technology. Society for the Social Studies of Science, Vancouver, BC.
[11]
Borgman, C. L., Wallis, J. C., Mayernik, M. S. & Pepe, A. (2007). Drowning in data: Digital library architecture to support scientific use of embedded sensor networks. Vancouver, British Columbia, Canada, Association for Computing Machinery: 269--277. Retrieved from http://doi.acm.org/10.1145/1255175.1255228 on June 17--23, 2007 Accessed.
[12]
Mayernik, M. S., Wallis, J. C. & Borgman, C. L. (2007). Adding Context to Content: The CENS Deployment Center. American Society for Information Science & Technology, Milwaukee, WI, Information Today.
[13]
Pepe, A., Borgman, C. L., Wallis, J. C. & Mayernik, M. S. (2007). Knitting a fabric of sensor data and literature. Information Processing in Sensor Networks, Cambridge, MA, Association for Computing Machinery/IEEE.
[14]
Wallis, J. C., Borgman, C. L., Mayernik, M. S. & Pepe, A. (2008). Moving archival practices upstream: An exploration of the life cycle of ecological sensing data in collaborative field research. International Journal of Digital Curation, 3(1). Retrieved from http://www.ijdc.net/ijdc/issue/current on 24 November 2008.
[15]
Wallis, J. C., Borgman, C. L., Mayernik, M. S., Pepe, A., Ramanathan, N. & Hansen, M. (2007). Know Thy Sensor: Trust, Data Quality, and Data Integrity in Scientific Digital Libraries. 11th European Conference on Digital Libraries, Budapest, Hungary, Berlin: Springer. 380--391.
[16]
Shankar, K. (2002). Scientists, Records, and the Practical Politics of Infrastructure. PhD Dissertation, Department of Information Studies: University of California, Los Angeles.
[17]
Shankar, K. (2003). Scientific data archiving: the state of the art in information, data, and metadata management. Retrieved from http://works.bepress.com/borgman/234 on 30 January 2010.
[18]
Incorporated Research Institutions for Seismology. (2010). Retrieved from http://www.iris.edu/hq/ on 1 February 2010.
[19]
UC James San Jacinto Reserve Data Management System. (2010). Retrieved from http://dms.jamesreserve.edu/ on 1 February 2010.
[20]
Szewczyk, R., Osterweil, E., Polastre, J., Hamilton, M., Mainwaring, A. & Estrin, D. (2004). Habitat monitoring with sensor networks. Communications of the ACM, 47(6): 34--40.
[21]
Ecological Metadata Language. (2010). Retrieved from http://knb.ecoinformatics.org/software/eml/ on 1 February 2010.
[22]
Sensor Modeling Language. (2010). Retrieved from http://vast.uah.edu/SensorML/ on 1 February 2010.
[23]
Open Geospatial Consortium. (2010). Retrieved from http://www.opengeospatial.org/ on 1 February 2010.
[24]
Pepe, A., Mayernik, M., Borgman, C. L. & Van de Sompel, H. (2010). From Artifacts to Aggregations: Modeling Scientific Life Cycles on the Semantic Web. Journal of the American Society for Information Science and Technology, 61(3): 567--582. Retrieved from http://www3.interscience.wiley.com/journal/123214737/abstract on 1 February 2010.
[25]
Chang, K., Yau, N., Hansen, M. & Estrin, D. (2006). SensorBase.org -- A Centralized Repository to Slog Sensor Network Data. Proceedings of the International Conf. on Distributed Networks(DCOSS)/EAWMS.
[26]
Edwards, P. N., Jackson, S. J., Bowker, G. C. & Knobel, C. P. (2007). Understanding Infrastructure: Dynamics, Tensions, and Design. National Science Foundation: University of Michigan. Retrieved from http://hdl.handle.net/2027.42/49353 on 26 July 2007.
[27]
The Dublin Core Metadata Initiative Terms. (2009). Retrieved from http://dublincore.org/documents/dcmi--terms/ on 14 April 2009.
[28]
Pepe, A. & Rodriguez, M. A. (2010, forthcoming). Collaboration in sensor network research: an in-depth longitudinal analysis of assortative mixing patterns. Scientometrics. Retrieved from http://www.springerlink.com/content/v1w5695932tg52g2/ on 1 February 2010.
[29]
Dryad. (2010). Retrieved from http://datadryad.org/ on 12 April 2010.
[30]
Greenberg, J., White, H. C., Carrier, S. & Scherle, R. (2009). A Metadata Best Practice for a Scientific Data Repository. Journal of Library Metadata, 9(3): 194--212.

Cited By

View all
  • (2021)Enriching the metadata of map images: a deep learning approach with GIS-based data augmentationInternational Journal of Geographical Information Science10.1080/13658816.2021.196840736:4(799-821)Online publication date: 23-Aug-2021
  • (2020)BibliographieQu’est-ce que le travail scientifique des données ?10.4000/books.oep.14792(349-411)Online publication date: 18-Dec-2020
  • (2019)E-RPID PEARC 2019Practice and Experience in Advanced Research Computing 2019: Rise of the Machines (learning)10.1145/3332186.3333255(1-4)Online publication date: 28-Jul-2019
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
JCDL '10: Proceedings of the 10th annual joint conference on Digital libraries
June 2010
424 pages
ISBN:9781450300858
DOI:10.1145/1816123
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

  • IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 June 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. collaborative research
  2. cyberinfrastructure
  3. data deluge
  4. distributed research
  5. escience

Qualifiers

  • Research-article

Conference

JCDL10
Sponsor:
JCDL10: Joint Conference on Digital Libraries
June 21 - 25, 2010
Queensland, Gold Coast, Australia

Acceptance Rates

Overall Acceptance Rate 415 of 1,482 submissions, 28%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)12
  • Downloads (Last 6 weeks)2
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2021)Enriching the metadata of map images: a deep learning approach with GIS-based data augmentationInternational Journal of Geographical Information Science10.1080/13658816.2021.196840736:4(799-821)Online publication date: 23-Aug-2021
  • (2020)BibliographieQu’est-ce que le travail scientifique des données ?10.4000/books.oep.14792(349-411)Online publication date: 18-Dec-2020
  • (2019)E-RPID PEARC 2019Practice and Experience in Advanced Research Computing 2019: Rise of the Machines (learning)10.1145/3332186.3333255(1-4)Online publication date: 28-Jul-2019
  • (2017)Uncertainty about the long-termProceedings of the 17th ACM/IEEE Joint Conference on Digital Libraries10.5555/3200334.3200366(257-260)Online publication date: 19-Jun-2017
  • (2017)Uncertainty about the Long-Term: Digital Libraries, Astronomy Data, and Open Source Software2017 ACM/IEEE Joint Conference on Digital Libraries (JCDL)10.1109/JCDL.2017.7991584(1-4)Online publication date: Jun-2017
  • (2015)Ship space to database: Motivations to manage research data for the deep subseafloor biosphereProceedings of the American Society for Information Science and Technology10.1002/meet.2014.1450510105651:1(1-10)Online publication date: 24-Apr-2015
  • (2015)Data Practices and Curation Vocabulary DPCVocabJournal of the Association for Information Science and Technology10.1002/asi.2318466:3(616-633)Online publication date: 1-Mar-2015
  • (2014)The ups and downs of knowledge infrastructures in scienceProceedings of the 14th ACM/IEEE-CS Joint Conference on Digital Libraries10.5555/2740769.2740814(257-266)Online publication date: 8-Sep-2014
  • (2014)Building Digital Collections Using Open Source Digital Repository SoftwareInternational Journal of Digital Library Systems10.4018/ijdls.20140101024:1(10-24)Online publication date: 1-Jan-2014
  • (2014)The ups and downs of knowledge infrastructures in science: Implications for data managementIEEE/ACM Joint Conference on Digital Libraries10.1109/JCDL.2014.6970177(257-266)Online publication date: Sep-2014
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media