ACM Home Page
Please provide us with feedback. Feedback
Building a research library for the history of the web
Full text PdfPdf (333 KB)
Source International Conference on Digital Libraries archive
Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries table of contents
Chapel Hill, NC, USA
SESSION: Digital preservation table of contents
Pages: 95 - 102  
Year of Publication: 2006
ISBN:1-59593-354-9
Authors
William Y. Arms  Cornell University Ithaca, NY
Selcuk Aya  Cornell University Ithaca, NY
Pavel Dmitriev  Cornell University Ithaca, NY
Blazej J. Kot  Cornell University Ithaca, NY
Ruth Mitchell  Cornell University Ithaca, NY
Lucia Walle  Cornell University Ithaca, NY
Sponsors
ACM: Association for Computing Machinery
SIGIR: ACM Special Interest Group on Information Retrieval
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 14,   Downloads (12 Months): 290,   Citation Count: 2
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1141753.1141771
What is a DOI?

ABSTRACT

This paper describes the building of a research library for studying the Web, especially research on how the structure and content of the Web change over time. The library is particularly aimed at supporting social scientists for whom the Web is both a fascinating social phenomenon and a mirror on society.The library is built on the collections of the Internet Archive, which has been preserving a crawl of the Web every two months since 1996. The technical challenges in organizing this data for research fall into two categories: high-performance computing to transfer and manage the very large amounts of data, and human-computer interfaces that empower research by non-computer specialists.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Arms, W., Aya, S., Dmitriev, P., Kot, B., Mitchell, R., Walle, L., A Research Library for the Web based on the Historical Collections of the Internet Archive. D-Lib Magazine. February 2006. http://www.dlib.org/dlib/february06/arms/02arms.html
2
 
3
 
4
Burner, M., and Kahle, B., Internet Archive ARC File Format, 1996. http://archive.org/web/researcher/ArcFileFormat.php
 
5
Chakrabarti, D., Zhan, Y., and Faloutsos, C., R-MAT: recursive model for graph mining. SIAM International Conference on Data Mining, 2004.
 
6
Gerner, N., Sosa, C., Fall 2005 Semester Report for Web Lab Database Load Group. M.Eng. report, Computer Science Department, Cornell University, 2005. http://www.infosci.cornell.edu/SIN/WebLib/papers/Gerner2005.doc.
7
 
8
Jeyabalan, K., Kallukalam, J., Representation of Web Graph for in Memory Computation. M.Eng. report, Computer Science Department, Cornell University, 2005. http://www.infosci.cornell.edu/SIN/WebLib/papers/JeyabalanKallukalam2005.doc.
 
9
 
10
Mitchell, S., Mooney, M., Mason, J., Paynter, G., Ruscheinski, J., Kedzierski, A., Humphreys, K., iVia Open Source Virtual Library System. D-Lib Magazine, 9 (1), January 2003. http://www.dlib.org/dlib/january03/mitchell/01mitchell.html
 
11
Shah, S., Generating a web graph. M.Eng. report, Computer Science Department, Cornell University, 2005. http://www.infosci.cornell.edu/SIN/WebLib/papers/Shah2005a.doc.
 
12
Shah, S., Retro Browser. M.Eng. report, Computer Science Department, Cornell University, 2005. http://www.infosci.cornell.edu/SIN/WebLib/papers/Shah2005b.pdf.


Collaborative Colleagues:
William Y. Arms: colleagues
Selcuk Aya: colleagues
Pavel Dmitriev: colleagues
Blazej J. Kot: colleagues
Ruth Mitchell: colleagues
Lucia Walle: colleagues