ABSTRACT
The fast dissemination of new research results on the world-wide web poses new challenges for search engines. In this paper we describe a new approach to seek scientific papers relevant to a pre-defined research area. Different from other approaches, we do not search for web pages which contain certain keywords, but we search for web pages which are created by scientists who are active in the research area under consideration. The names of these scientists are obtained from the DBLP server [9]. The HomePageSearch system finds the Home Pages according to the names, and Mops finds research papers close to the Home Pages. It creates an index of these papers and makes it accessible on the web. We conclude that such a focused crawling is very effective for building high-quality collections and indices of scientific papers, using ordinary desktop hardware.
- 1.Alf-Christian Achilles.The Collection of Computer Science Bibliographies. http://liinwww.ira.uka.de/bibliography/.]]Google Scholar
- 2.ACM.The ACM Research Repository. http://www.acm.org/rep sitory/.]]Google Scholar
- 3.Ricardo Baeza-Yates and Berthier Ribeiro-Neto. Modern Information Retrieval . Addison-Wesley-Longman,May 1999.]] Google ScholarDigital Library
- 4.Andrew Birrell and Paul McJones.pstotext. http://www.research.digital.com/SRC/virtualpaper/- pstotext.html .]]Google Scholar
- 5.Soumen Chakrabarti,Martin van den Berg,and Byron Dom.Focused cra ling:A new approach to topic-speci .c eb resource discovery.In Proceedings f the Eighth World-Wide Web Conference ,1999.]] Google ScholarDigital Library
- 6.IBM.DB2 Product Family. http://www.software.ibm.c m/data/db2/.]]Google Scholar
- 7.David M.Jones.The Hypertext Bibliography Project. http://the ry.lcs.mit.edu/~dmjones/hbp/.]]Google Scholar
- 8.Steve Lawrence,C.Lee Giles,and Kurt Bollacker. Digital libraries and autonomous citation indexing. IEEE Computer ,32(6):67 -71,1999. http://citeseer.nj.nec.com/cs/.]] Google ScholarDigital Library
- 9.Michael Ley.DBLP Computer Science Bibliography. http://dblp.uni-trier.de/.]]Google Scholar
- 10.Y.H.Li and Anil K.Jain.Classi .cation of text documents.The Computer Journal ,41(8):537 -546, 1998.]]Google ScholarCross Ref
- 11.New Zealand Digital Library.Computer Science Technical Reports. http://www.nzdl.org/.]]Google Scholar
- 12.Udi Manber and Sun Wu.GLIMPSE:a tool to search through entire .le systems.In Usenix Winter 1994 Technical Conference ,pages 23 -32,1994.]] Google ScholarDigital Library
- 13.Andrew McCallum,Kamal Nigam,Jason Rennie,and Kristie Seymore.Automating the construction of internet portals ith machine learning.Information Retrieval 3(2):127-163,2000. http://www.c ra.jprc.com/.]] Google ScholarDigital Library
- 14.Thomas M.Mitchell.Machine Learning .McGraw-Hill, 1997.]] Google ScholarDigital Library
- 15.NCSTRL.Net orked Computer Science Technical Library. http://www.ncstrl.org/.]]Google Scholar
- 16.Jason Rennie and Andrew McCallum.Using reinforcement learning to spider the eb e .ciently.In Proceedings f the Sixteenth Internati nal C nference on Machine Learning ,1999.]] Google ScholarDigital Library
- 17.Jonathan Shakes,Marc Langheinrich,and Oren Etzioni.Dynamic reference sifting:A case study in the homepage domain.In Proceedings f the Sixth International World Wide Web Conference ,1997. http://ahoy.cs.washington.edu:6060/.]] Google ScholarDigital Library
Index Terms
- Finding scientific papers with homepagesearch and MOPS
Recommendations
The w-index: A measure to assess scientific impact by focusing on widely cited papers
Based on the principles of the h-index, I propose a new measure, the w-index, as a particularly simple and more useful way to assess the substantial impact of a researcher's work, especially regarding excellent papers. The w-index can be defined as ...
Acknowledgments in scientific publications: Presence in Spanish science and text patterns across disciplines
The acknowledgments in scientific publications are an important feature in the scholarly communication process. This research analyzes funding acknowledgment presence in scientific publications and introduces a novel approach for discovering text ...
Measuring the Centrality of the References in Scientific Papers
DocEng '18: Proceedings of the ACM Symposium on Document Engineering 2018Citation analysis is considered as major and one of the most popular branches of bibliometrics. Citation analysis is based on the assumption that all citations have similar values and weights each equally. Specific research fields like content-based ...
Comments