Abstract
This article presents a high-level discussion of some problems in information retrieval that are unique to web search engines. The goal is to raise awareness and stimulate research in these areas.
- H. Ahonen, H. Mannila, and E. Nikunen. "Generating grammars for SGML tagged texts lacking DTD." PODP'94 - Worskhop on Principles of Document Processing, 1994. http://www.cs.Helsinki.FI/u/hahonen/publications.html.Google Scholar
- G. K. Berland, M. N. Elliott, L. S. Morales, J. I. Algazy, R. L. Kravitz, M. S. Broder, D. E. Kanouse, J. A. Muñoz, J.-A. Puyol, M. Lara, K. E. Watkins, H. Yang, and E. A. McGlynn. "Health Information on the Internet Accessibility, Quality, and Readability in English and Spanish." Journal of the American Medical Association, 285(2001): 2612-2621.Google ScholarCross Ref
- K. Bharat, A. Z. Broder, J. Dean, and M. Henzinger. "A comparison of Techniques to Find Mirrored Hosts on the World Wide Web." Journal of the American Society for Information Science, 31(2000): 1114-1122. Google ScholarDigital Library
- S. Brin, J. Davis, and H. García-Molina. "Copy detection mechanisms for digital documents." Proceedings of the ACM SIGMOD International Conference on Management of Data, 1995, pages 398-409. Google ScholarDigital Library
- S. Brin, and L. Page. "The Anatomy of a Large-Scale Hypertextual Web Search Engine." In Proceedings of the 7th International World Wide Web Conference (WWW7), 1998, pages 107-117. Also appeared in Computer Networks 30(1998): 107-117. Google ScholarDigital Library
- S. Brin, L. Page, R. Motwani, and T. Winograd. "What can you do with a Web in your Pocket?" Bulletin of the Technical Committee on Data Engineering, 21(1998): 37-47.Google Scholar
- A. Z. Broder. "On the resemblance and containment of documents." In Proceedings of Compression and Complexity of Sequences, IEEE Computer Society, 1997, pages 21-29. Google ScholarDigital Library
- S. Chakrabarti. Enhanced topic distillation using text, markup tags, and hyperlinks. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval, 2001. Google ScholarDigital Library
- S. Chakrabarti. Integrating the Document Object Model with hyperlinks for enhanced topic distillation and information extraction. In Proceedings of the 10th International World Wide Web Conference (WWW10), 2001. Google ScholarDigital Library
- J. Cho, N. Shivakumar, and H. Garcia-Molina. "Finding replicated web collections." In Proceedings of the ACM SIGMOD International Conference on Management of Data, 2000, pages 355-366. Google ScholarDigital Library
- N. Craswell, D. Hawking, and S. Robertson. "Effective Site Finding using Link Anchor Information." In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval, 2001. Google ScholarDigital Library
- P. Faraday. "Attending to Web Pages." CHI 2001 Extended Abstracts (Poster), 2001, pages 159-160. Google ScholarDigital Library
- J. Kleinberg. "Authoritative sources in a hyperlinked environment." In Proceedings of the 9th Annual ACM-SIAM Symposium on Discrete Algorithms, 1998, pages 668-677. Google ScholarDigital Library
- T. Joachims. "Evaluation Search Engines using Clickthrough Data". To appear, 2002.Google Scholar
- S. Nestorov, S. Abiteboul, and R. Motwani. "Extracting Schema from Semistructured Data." In Proceedings of the ACM SIGMOD Conference on Management of Data, 1998, pages 295-306. Google ScholarDigital Library
- S. Ravi Kumar, P. Raghavan, S. Rajagopalan and A. Tomkins. "Trawling emerging cyber-communities automatically." In Proceedings of the 8th International World Wide Web Conference (WWW8), 1999. Google ScholarDigital Library
- C. Silverstein, M. R. Henzinger, J. Marais, and M. Moricz. "Analysis of a very large Alta Vista query log." SIGIR Forum, 33(1999): 6-12. Google ScholarDigital Library
- World Wide Web Consortium. "Web Style Sheets." http://www.w3.org/Style/.Google Scholar
Recommendations
Overlap Among Major Web Search Engines
ITNG '06: Proceedings of the Third International Conference on Information Technology: New GenerationsOur study examined the overlap among results retrieved by three major Web search engines for a large set of more than 10,316 queries. Previous smaller studies have discussed the lack of overlap in results returned by Web search engines for the same ...
A study of results overlap and uniqueness among major web search engines
The performance and capabilities of Web search engines is an important and significant area of research. Millions of people world wide use Web search engines very day. This paper reports the results of a major study examining the overlap among results ...
Mining Web search engines for query suggestion
Queries to Web search engines are usually short and ambiguous, which provides insufficient information needs of users for effectively retrieving relevant Web pages. To address this problem, query suggestion is implemented by most search engines. However,...
Comments