ABSTRACT
The Open Directory Project is clearly one of the largest collaborative efforts to manually annotate web pages. This effort involves over 65,000 editors and resulted in metadata specifying topic and importance for more than 4 million web pages. Still, given that this number is just about 0.05 percent of the Web pages indexed by Google, is this effort enough to make a difference? In this paper we discuss how these metadata can be exploited to achieve high quality personalized web search. First, we address this by introducing an additional criterion for web page ranking, namely the distance between a user profile defined using ODP topics and the sets of ODP topics covered by each URL returned in regular web search. We empirically show that this enhancement yields better results than current web search using Google. Then, in the second part of the paper, we investigate the boundaries of biasing PageRank on subtopics of the ODP in order to automatically extend these metadata to the whole web.
- J. Bortz. Statistics for Social Scientists. Springer Verlag, 1993.Google Scholar
- S. Brin, R. Motwani, L. Page, and T. Winograd. What can you do with a web in your pocket? Data Engineering Bulletin, 21(2):37--47, 1998.Google Scholar
- S. Chakrabarti, M. van den Berg, and B. Dom. Focused crawling: a new approach to topic-specific web resource discovery. In Proceedings of the 8th Intl. WWW Conference, 1999. Google ScholarDigital Library
- P.-A. Chirita, D. Olmedilla, and W. Nejdl. Pros: A personalized ranking platform for web search. In Proceedings of the International Conference on Adaptive Hypermedia and Adaptive Web-Based Systems, Aug 2004.Google ScholarCross Ref
- C. Ding, X. He, P. Husbands, H. Zha, and H. D. Simon. Pagerank, hits and a unified framework for link analysis. In Proceedings of the 25th annual International ACM SIGIR Conference, pages 353--354. ACM Press, 2002. Google ScholarDigital Library
- C. Dwork, R. Kumar, M. Naor, and D. Sivakumar. Rank aggregation methods for the web. In Proceedings of the 10th International WWW Conference. ACM Press, 2001. Google ScholarDigital Library
- M. Ester, H.-P. Kriegel, and M. Schubert. Accurate and efficient crawling for relevant websites. In Proceedings of the 30th International VLDB Conference, 2004. Google ScholarDigital Library
- Google search api. http://api.google.com.Google Scholar
- Google search engine. http://www.google.com.Google Scholar
- Z. Gý'ongyi, H. Garcia-Molina, and J. Pendersen. Combating web spam with trustrank. In Proceedings of the 30th International VLDB Conference, 2004. Google ScholarDigital Library
- T. Haveliwala. Topic-sensitive pagerank. In Proceedings of the 11th International WWW Conference, 2002. Google ScholarDigital Library
- G. Jeh and J. Widom. Scaling personalized web search. In Proc. of the 12th Intl. WWW Conference, 2003. Google ScholarDigital Library
- J. M. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5):604--632, 1999. Google ScholarDigital Library
- O. Kolesnikov, W. Lee, and R. Lipton. Filtering spam using search engines, 2003.Google Scholar
- R. Lempel and S. Moran. The stochastic approach for link-structure analysis (SALSA) and the TKC effect. Computer Networks (Amsterdam, Netherlands: 1999), 33(1-6):387--401, 2000. Google ScholarDigital Library
- Y. Li, Z. A. Bandar, and D. McLean. An approach for measuring semantic similarity between words using multiple information sources. IEEE Transactions on Knowledge and Data Engineering, 15(4):871--882, 2003. Google ScholarDigital Library
- S. E. Middleton, D. C. D. Roure, and N. R. Shadbolt. Capturing knowledge of user preferences: ontologies in recommender systems. In Proceedings of the First International Conference on Knowledge Capture, 2001. Google ScholarDigital Library
- G. Miller. Wordnet: An electronic lexical database. Communications of the ACM, 38(11):39--41, 1995. Google ScholarDigital Library
- A. Y. Ng, A. X. Zheng, and M. I. Jordan. Stable algorithms for link analysis. In Proc. 24th Annual Intl. ACM SIGIR Conference. ACM, 2001. Google ScholarDigital Library
- Open directory project. http://dmoz.org/.Google Scholar
- L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical report, Stanford University, 1998.Google Scholar
- Stanford webbase project. http://webbase.stanford.edu.Google Scholar
- F. Tanudjaja and L. Mui. Persona: A contextualized and personalized web search. In Proceedings of the 35 Annual Hawaii International Conference on System Sciences, 2002. Google ScholarDigital Library
- M. Williamson. Using dmoz open directory project lists with novell bordermanager, 2003.Google Scholar
- J. B. Winer. Statistical principles in experimental design. McGraw Hill, 1962.Google ScholarCross Ref
Index Terms
- Using ODP metadata to personalize search
Recommendations
Interest-based personalized search
Web search engines typically provide search results without considering user interests or context. We propose a personalized search approach that can easily extend a conventional search engine on the client side. Our mapping framework automatically maps ...
Categorizing web search results into meaningful and stable categories using fast-feature techniques
JCDL '06: Proceedings of the 6th ACM/IEEE-CS joint conference on Digital librariesWhen search results against digital libraries and web resources have limited metadata, augmenting them with meaningful and stable category information can enable better overviews and support user exploration. This paper proposes six fast-feature ...
Utility analysis for topically biased PageRank
WWW '07: Proceedings of the 16th international conference on World Wide WebPageRank is known to be an efficient metric for computing general document importance in the Web. While commonly used as a one-size-fits-all measure, the ability to produce topically biased ranks has not yet been fully explored in detail. In particular, ...
Comments