ABSTRACT
In 2008, the National Science Foundation released the DataNet solicitation, which presents an ambitious vision for a comprehensive data curation cyberinfrastructure in support of fourth paradigm science. The program subsequently funded two projects, DataONE and the Data Conservancy. The authors put forth an uncertainty framework for understanding the larger socio-cultural issues that influence the progress of DataNet projects and cyberinfrastructure projects in general. This framework highlights the key technical, organizational, scientific, and institutional contexts that the projects must consider as they mature.
- Sustainable Digital Data Preservation and Access Network Partners (DataNet). 2007. http://www.nsf.gov/pubs/2007/nsf07601/nsf07601.htm.Google Scholar
- Arms, W.Y. What are the Alternatives to Peer Review? Quality Control in Scholarly Publishing on the Web. Journal of Electronic Publishing 8, 1 (2002).Google ScholarCross Ref
- Arms, W.Y. Implementation and Innovation in the NSDL. 2008. http://nsdlreflections.files.wordpress.com/2008/09/nsdl.Google Scholar
- Avery, P. Open science grid: Building and sustaining general cyberinfrastructure using a collaborative approach. First Monday 12, 2007, 1--11.Google ScholarCross Ref
- Bechhofer, S., Ainsworth, J., Bhagat, J., et al. Why Linked Data is Not Enough for Scientists. Sixth IEEE e--Science conference (e-Science 2010), (2010). Google ScholarDigital Library
- Bijker, W.E., Hughes, T.P., and Pinch, T.J. The Social construction of technological systems: new directions in the sociology and history of technology. MIT Press, Cambridge, Mass., 1987.Google Scholar
- Bizer, C., Cyganiak, R., and Heath, T. How to Publish Linked Data on the Web. 2007. http://www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial/.Google Scholar
- Borkum, M., Lagoze, C., Frey, J.G., and S., C. A Semantic eScience Platform for Chemistry. IEEE EScience, (2010). Google ScholarDigital Library
- Brown, J.S. and Duguid, P. The social life of documents. first monday 1, 1 (1996).Google Scholar
- Choudhury, S. Collecting for Digital Repositories: Data Perspective. ALA CDER, (2009).Google Scholar
- 1Christensen, C.M., Horn, M.B., and Johnson, C.W. Disrupting class: how disruptive innovation will change the way the world learns. McGraw-Hill, New York, 2008. Google ScholarDigital Library
- Christensen, C.M. The innovator's dilemma: when new technologies cause great firms to fail. Harvard Business School Press, Boston, Mass., 1997. Google ScholarDigital Library
- Committee on the Fundamentals of Computer Science: Challenges and Opportunities. Computer Science: Reflections on the Field, Reflections from the Field. Computer Science and Telecommunications Board, The National Academies Press, Washington, DC, 2004.Google Scholar
- Cragin, M.H., Smith, L.C., Palmer, C.L., and Heidorn, P.B. Extending the Data Curation Curriculum to Practicing LIS Professionals. Proceedings of DigCCurr2009 Digital Curation: Practice, Promise and Prospects, Citeseer, 92.Google Scholar
- Cummings, J., Finholt, T.A., Foster, I., Kesselman, C., and Lawrence, K.A. Beyond Being There: A Blueprint for Advancing the Design, Development, and Evaluation of Virtual Organization. Arlington, VA, 2008.Google Scholar
- Edwards, P.N. Y2K: Millennial reflections on computers as infrastructure. History and technology 15, 1 (1998), 7--29.Google Scholar
- Edwards, P.N., Jackson, S.J., Bowker, G.C., and Knobel, C.P. Understanding Infrastructure: Dynamics, Tensions, and Design. National Science Foundation, 2007.Google Scholar
- Engestrom, J., Mierttinen, R., and Punamäki-Gitai, R.-L. Activity Theory and Individual and Social Transformation. Cambridge University Press, Cambridge, 1999.Google Scholar
- Friedlander, A. Emerging infrastructure : the growth of railroads. Corporation for National Research Initiatives, Reston, Va., 1995.Google Scholar
- Friedlander, A. Natural monopoly and universal service : telephones and telegraphs in the U.S. communications infrastructure, 1837--1940. Corporation for National Research Initiatives, Reston, Va., 1995.Google Scholar
- Friedlander, A. Power and light : electricity in the U.S. energy infrastructure, 1870--1940. Corporation for National Research Initiatives, Reston, Va., 1996.Google Scholar
- Friedlander, A. "In God We Trust": All others pay Cash: Banking as an American infrastructure, 1800 to 1935. Corporation for National Research Initiatives, Reston, VA, 1996.Google Scholar
- Fulker, D.W. Collaboration, Alignment and Leadership. 2008. http://nsdlreflections.wordpress.com/2008/09/25/collaboration-alignment-and-leadership-by-david-fulker.Google Scholar
- Garfield, E. and Welljams-Dorof, A. Citation data: Their use as quantitative indicators for science and technology evaluation and policy-making. Public Policy 19, 5 (1992), 321--327.Google Scholar
- Guthrie, K., Griffiths, R., and Maron, N. Sustainability and Revenue Models for Online Academic Resources. An Ithaka Report, (2008).Google Scholar
- Han, H., Giles, C.L., Zha, H., Li, C., and Tsioutsiouliklis, K. Two supervised learning approaches for name disambiguation in author citations. Proceedings of the 4th ACM/IEEE joint conference on Digital libraries, ACM/IEEE (2004). Google ScholarDigital Library
- Hey, T., Tansley, S., and Tolle, K., eds. The Fourth Paradigm. Microsoft Research, Redmond, WA, 2009.Google Scholar
- Higgins, S. The DCC curation lifecycle model. International Journal of Digital Curation 3, 1 (2008).Google ScholarCross Ref
- Hilf, E.B., Kappenberg, B., and Roosendaal, H.E. Author identification: the benefit of being able to identify researchers uniquely. The Euroscientist, 5 (2008).Google Scholar
- Hochachka, W.M., Caruana, R., Fink, D., et al. Data-mining discovery of pattern and process in ecological systems. Journal of Wildlife Management 71, 7 (2007).Google ScholarCross Ref
- Kahin, B. Cyberinfrastructure and innovation policy. First Monday 12, 6 (2007), 1--17.Google ScholarCross Ref
- Katz, R.N. and Gandel, P.B. The Tower, the Cloud, and Posterity. EDUCAUSE, Inc., Boulder, CO, 2008.Google Scholar
- Kelling, S., Cook, R., Damoulas, T., et al. Estimating species distributions, across space through time and with features of the environment. In Data Intensive Science. 2011.Google Scholar
- Kelling, S., Hochachka, W.M., Fink, D., et al. Data-intensive Science: A New Paradigm for Biodiversity Studies. BioScience 59, 7 (2009), 613--620.Google Scholar
- Kunze, J., Cook, R.B., Cruse, P., Tenopir, C., Vision, T.J., and Michener, W.K. Defining the Data Citation Problem in the DataNet Context. AGU Fall Meeting Abstracts 1, (2009), 08.Google Scholar
- Lagoze, C., Payette, S., Shin, E., and Wilper, C. Fedora: An Architecture for Complex Objects and their Relationships. International Journal of Digital Libraries 6, 2 (2005), 124--138. Google ScholarDigital Library
- Lagoze, C. Lost Identity: The Assimilation of Digital Libraries into the Web. PhD, 2010, Cornell University.Google Scholar
- Latour, B. Reassembling the social: an introduction to actor-network-theory. Oxford University Press, Oxford ; New York, 2005.Google Scholar
- Levy, D.M. and Marshall, C.C. Going Digital: A look at assumptions underlying digital libraries. Communications of the ACM 38, 4 (1995), 77--84. Google ScholarDigital Library
- Levy, D.M. Fixed or Fluid? Document Stability and New Media. ECHT '94 Proceedings of the 1994 ACM European conference on Hypermedia technology, ACM Press (1994). Google ScholarDigital Library
- Levy, D.M. Scrolling forward: making sense of documents in the digital age. Arcade Publishers, New York, 2001.Google Scholar
- Loukides, M. Data Science? The future belongs to the companies and people that turn data into products. 2010.Google Scholar
- Ludaeshcer, B., Altintas, I., Berkley, C., et al. Scientific workflow management and the Kepler system: Research Articles. Concurrency and Computation: Practice and Experience 18, 10 (2006), 1039--1065. Google ScholarDigital Library
- Madin, J., Bowers, S., Schildhauer, M., Krivov, S., Pennington, D., and Villa, F. An ontology for describing and synthesizing ecological observation data. Ecological Informatics 2, 3 (2007), 279--296.Google ScholarCross Ref
- Maron, N., Smith, K.K., and Loy, M. Sustaining Digital Resources: An On-the-Ground View of Projects Today: Ithaka Case Studies in Sustainability. Ithaka S+ R, 2009.Google Scholar
- Nardi, B.A. and O'Day, V. Information ecologies: using technology with heart. MIT Press, Cambridge, Mass., 1999. Google ScholarDigital Library
- Palmer, C.L. and Cragin, M.H. Scholarship and disciplinary practices. Annual review of information science and technology 42, 1 (2008), 163--212. Google ScholarDigital Library
- Pollard, T.J. and Wilkinson, J.M. Making Datasets Visible and Accessible: DataCite's First Summer Meeting. Ariadne, 64 (2010).Google Scholar
- Renear, A.H., Sacchi, S., and Wicket, K.M. Definitions of Dataset in the Scientific and Technical Literature. Proceedings of the 73rd ASIS&T Annual Meeting, (2010). Google ScholarDigital Library
- Roosendaal, H. and Geurts, P. Forces and functions in scientific communication: an analysis of their interplay. Coooperative Research Information Systems in Physics, (1997).Google Scholar
- Schummer, J. Scientific communication across disciplines. In R. Holliman, J. Thomas, S. Smidt, E. Scanlon and E. Whitelegg, eds., Practising Science Communication in the Information Age. Oxford University Press, 2009, 54--66.Google Scholar
- Sokvitne, L. An Evaluation of the Effectiveness of Current Dublin Core Metadata for Retrieval. VALA 2000, Victorian Association for Library Automation (2000).Google Scholar
- Star, S.L. and Ruhleder, K. Steps toward an ecology of infrastructure: Design and access for large information spaces. ACM Conference on Computer Supported Cooperative Work, (1994), 111--134.Google Scholar
- Van House, N.A. and Cronin, B. Science and Technology Studies and Information Studies. Annual Review of Information Science and Technology 38, (2004), 3--36.Google Scholar
- Velden, T. and Lagoze, C. The transformation of scientific communication systems in the digital age: towards a methodology for comparing scientific communication cultures. Proceedings of Workshop Oxford e-Research 08, (2008).Google Scholar
- Wallis, J.C., Borgman, C.L., Mayernik, M.S., and Pepe, A. Moving archival practices upstream: An exploration of the life cycle of ecological sensing data in collaborative field research. International Journal of Digital Curation 3, 1 (2008).Google ScholarCross Ref
- Weissman, V. and Lagoze, C. Towards a policy language for humans and computers. In Lecture Notes in Computer Science. Springer, 2004.Google Scholar
- Willinsky, J. The access principle: the case for open access to research and scholarship. MIT Press, Cambridge, Mass., 2006. Google ScholarDigital Library
- Yang, K.H., Peng, H.T., Jiang, J.Y., Lee, H.M., and Ho, J.M. Author name disambiguation for citations using topic and web correlation. Research and Advanced Technology for Digital Libraries, (2008), 185--196. Google ScholarDigital Library
- Zhang, R., Shen, D., Kou, Y., and Nie, T. Author Name Disambiguation for Citations on the Deep Web. Lecture Notes in Computer Science 6185/2010, (2010), 198--209.. Google ScholarDigital Library
Index Terms
A research agenda for data curation cyberinfrastructure
Recommendations
Data Curation with a Focus on Reuse
JCDL '16: Proceedings of the 16th ACM/IEEE-CS on Joint Conference on Digital LibrariesA dataset from the field of High Performance Computing (HPC) was curated with the focus on facilitating its reuse and to appeal to a broader audience beyond HPC specialists. At an early stage in the research project, the curators gathered requirements ...
Cyberinfrastructure Collaboration for Distributed Digital Preservation
ESCIENCE '08: Proceedings of the 2008 Fourth IEEE International Conference on eScienceThe data deluge is beginning to have an effect on libraries and archives. As custodians of the scholarly record, libraries and archives are being asked to play an active role in long-term digital preservation in both science and the humanities. A report ...
Data curation profiling of biocollections
ASIST '16: Proceedings of the 79th ASIS&T Annual Meeting: Creating Knowledge, Enhancing Lives through Information & TechnologyIn the contexts of the data deluge and open data, scientists studying biodiversity benefit from online access to global datasets of existing vouchered biological and paleontological collections. Using biocollections collected over time across the world ...
Comments