skip to main content
10.1145/1076034.1076067acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article

Using ODP metadata to personalize search

Published:15 August 2005Publication History

ABSTRACT

The Open Directory Project is clearly one of the largest collaborative efforts to manually annotate web pages. This effort involves over 65,000 editors and resulted in metadata specifying topic and importance for more than 4 million web pages. Still, given that this number is just about 0.05 percent of the Web pages indexed by Google, is this effort enough to make a difference? In this paper we discuss how these metadata can be exploited to achieve high quality personalized web search. First, we address this by introducing an additional criterion for web page ranking, namely the distance between a user profile defined using ODP topics and the sets of ODP topics covered by each URL returned in regular web search. We empirically show that this enhancement yields better results than current web search using Google. Then, in the second part of the paper, we investigate the boundaries of biasing PageRank on subtopics of the ODP in order to automatically extend these metadata to the whole web.

References

  1. J. Bortz. Statistics for Social Scientists. Springer Verlag, 1993.Google ScholarGoogle Scholar
  2. S. Brin, R. Motwani, L. Page, and T. Winograd. What can you do with a web in your pocket? Data Engineering Bulletin, 21(2):37--47, 1998.Google ScholarGoogle Scholar
  3. S. Chakrabarti, M. van den Berg, and B. Dom. Focused crawling: a new approach to topic-specific web resource discovery. In Proceedings of the 8th Intl. WWW Conference, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. P.-A. Chirita, D. Olmedilla, and W. Nejdl. Pros: A personalized ranking platform for web search. In Proceedings of the International Conference on Adaptive Hypermedia and Adaptive Web-Based Systems, Aug 2004.Google ScholarGoogle ScholarCross RefCross Ref
  5. C. Ding, X. He, P. Husbands, H. Zha, and H. D. Simon. Pagerank, hits and a unified framework for link analysis. In Proceedings of the 25th annual International ACM SIGIR Conference, pages 353--354. ACM Press, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. Dwork, R. Kumar, M. Naor, and D. Sivakumar. Rank aggregation methods for the web. In Proceedings of the 10th International WWW Conference. ACM Press, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. Ester, H.-P. Kriegel, and M. Schubert. Accurate and efficient crawling for relevant websites. In Proceedings of the 30th International VLDB Conference, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Google search api. http://api.google.com.Google ScholarGoogle Scholar
  9. Google search engine. http://www.google.com.Google ScholarGoogle Scholar
  10. Z. Gý'ongyi, H. Garcia-Molina, and J. Pendersen. Combating web spam with trustrank. In Proceedings of the 30th International VLDB Conference, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. T. Haveliwala. Topic-sensitive pagerank. In Proceedings of the 11th International WWW Conference, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. G. Jeh and J. Widom. Scaling personalized web search. In Proc. of the 12th Intl. WWW Conference, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. M. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5):604--632, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. O. Kolesnikov, W. Lee, and R. Lipton. Filtering spam using search engines, 2003.Google ScholarGoogle Scholar
  15. R. Lempel and S. Moran. The stochastic approach for link-structure analysis (SALSA) and the TKC effect. Computer Networks (Amsterdam, Netherlands: 1999), 33(1-6):387--401, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Y. Li, Z. A. Bandar, and D. McLean. An approach for measuring semantic similarity between words using multiple information sources. IEEE Transactions on Knowledge and Data Engineering, 15(4):871--882, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. S. E. Middleton, D. C. D. Roure, and N. R. Shadbolt. Capturing knowledge of user preferences: ontologies in recommender systems. In Proceedings of the First International Conference on Knowledge Capture, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. G. Miller. Wordnet: An electronic lexical database. Communications of the ACM, 38(11):39--41, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. Y. Ng, A. X. Zheng, and M. I. Jordan. Stable algorithms for link analysis. In Proc. 24th Annual Intl. ACM SIGIR Conference. ACM, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Open directory project. http://dmoz.org/.Google ScholarGoogle Scholar
  21. L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical report, Stanford University, 1998.Google ScholarGoogle Scholar
  22. Stanford webbase project. http://webbase.stanford.edu.Google ScholarGoogle Scholar
  23. F. Tanudjaja and L. Mui. Persona: A contextualized and personalized web search. In Proceedings of the 35 Annual Hawaii International Conference on System Sciences, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. M. Williamson. Using dmoz open directory project lists with novell bordermanager, 2003.Google ScholarGoogle Scholar
  25. J. B. Winer. Statistical principles in experimental design. McGraw Hill, 1962.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Using ODP metadata to personalize search

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
              August 2005
              708 pages
              ISBN:1595930345
              DOI:10.1145/1076034

              Copyright © 2005 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 15 August 2005

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • Article

              Acceptance Rates

              Overall Acceptance Rate792of3,983submissions,20%

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader