skip to main content
10.1145/1076034.1076067acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article

Using ODP metadata to personalize search

Published: 15 August 2005 Publication History

Abstract

The Open Directory Project is clearly one of the largest collaborative efforts to manually annotate web pages. This effort involves over 65,000 editors and resulted in metadata specifying topic and importance for more than 4 million web pages. Still, given that this number is just about 0.05 percent of the Web pages indexed by Google, is this effort enough to make a difference? In this paper we discuss how these metadata can be exploited to achieve high quality personalized web search. First, we address this by introducing an additional criterion for web page ranking, namely the distance between a user profile defined using ODP topics and the sets of ODP topics covered by each URL returned in regular web search. We empirically show that this enhancement yields better results than current web search using Google. Then, in the second part of the paper, we investigate the boundaries of biasing PageRank on subtopics of the ODP in order to automatically extend these metadata to the whole web.

References

[1]
J. Bortz. Statistics for Social Scientists. Springer Verlag, 1993.
[2]
S. Brin, R. Motwani, L. Page, and T. Winograd. What can you do with a web in your pocket? Data Engineering Bulletin, 21(2):37--47, 1998.
[3]
S. Chakrabarti, M. van den Berg, and B. Dom. Focused crawling: a new approach to topic-specific web resource discovery. In Proceedings of the 8th Intl. WWW Conference, 1999.
[4]
P.-A. Chirita, D. Olmedilla, and W. Nejdl. Pros: A personalized ranking platform for web search. In Proceedings of the International Conference on Adaptive Hypermedia and Adaptive Web-Based Systems, Aug 2004.
[5]
C. Ding, X. He, P. Husbands, H. Zha, and H. D. Simon. Pagerank, hits and a unified framework for link analysis. In Proceedings of the 25th annual International ACM SIGIR Conference, pages 353--354. ACM Press, 2002.
[6]
C. Dwork, R. Kumar, M. Naor, and D. Sivakumar. Rank aggregation methods for the web. In Proceedings of the 10th International WWW Conference. ACM Press, 2001.
[7]
M. Ester, H.-P. Kriegel, and M. Schubert. Accurate and efficient crawling for relevant websites. In Proceedings of the 30th International VLDB Conference, 2004.
[8]
Google search api. http://api.google.com.
[9]
Google search engine. http://www.google.com.
[10]
Z. Gý'ongyi, H. Garcia-Molina, and J. Pendersen. Combating web spam with trustrank. In Proceedings of the 30th International VLDB Conference, 2004.
[11]
T. Haveliwala. Topic-sensitive pagerank. In Proceedings of the 11th International WWW Conference, 2002.
[12]
G. Jeh and J. Widom. Scaling personalized web search. In Proc. of the 12th Intl. WWW Conference, 2003.
[13]
J. M. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5):604--632, 1999.
[14]
O. Kolesnikov, W. Lee, and R. Lipton. Filtering spam using search engines, 2003.
[15]
R. Lempel and S. Moran. The stochastic approach for link-structure analysis (SALSA) and the TKC effect. Computer Networks (Amsterdam, Netherlands: 1999), 33(1-6):387--401, 2000.
[16]
Y. Li, Z. A. Bandar, and D. McLean. An approach for measuring semantic similarity between words using multiple information sources. IEEE Transactions on Knowledge and Data Engineering, 15(4):871--882, 2003.
[17]
S. E. Middleton, D. C. D. Roure, and N. R. Shadbolt. Capturing knowledge of user preferences: ontologies in recommender systems. In Proceedings of the First International Conference on Knowledge Capture, 2001.
[18]
G. Miller. Wordnet: An electronic lexical database. Communications of the ACM, 38(11):39--41, 1995.
[19]
A. Y. Ng, A. X. Zheng, and M. I. Jordan. Stable algorithms for link analysis. In Proc. 24th Annual Intl. ACM SIGIR Conference. ACM, 2001.
[20]
Open directory project. http://dmoz.org/.
[21]
L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical report, Stanford University, 1998.
[22]
Stanford webbase project. http://webbase.stanford.edu.
[23]
F. Tanudjaja and L. Mui. Persona: A contextualized and personalized web search. In Proceedings of the 35 Annual Hawaii International Conference on System Sciences, 2002.
[24]
M. Williamson. Using dmoz open directory project lists with novell bordermanager, 2003.
[25]
J. B. Winer. Statistical principles in experimental design. McGraw Hill, 1962.

Cited By

View all
  • (2023)Personalized and Diversified: Ranking Search Results in an Integrated WayACM Transactions on Information Systems10.1145/363198942:3(1-25)Online publication date: 9-Nov-2023
  • (2023)Can Automated Metadata Extraction Make Scientific Data More Navigable?2023 IEEE 19th International Conference on e-Science (e-Science)10.1109/e-Science58273.2023.10254801(1-10)Online publication date: 9-Oct-2023
  • (2021)Learning a Fine-Grained Review-based Transformer Model for Personalized Product SearchProceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3404835.3462911(123-132)Online publication date: 11-Jul-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
August 2005
708 pages
ISBN:1595930345
DOI:10.1145/1076034
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 August 2005

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. biased pageRank
  2. metadata
  3. open directory
  4. personalized search

Qualifiers

  • Article

Conference

SIGIR05
Sponsor:

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)12
  • Downloads (Last 6 weeks)2
Reflects downloads up to 15 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Personalized and Diversified: Ranking Search Results in an Integrated WayACM Transactions on Information Systems10.1145/363198942:3(1-25)Online publication date: 9-Nov-2023
  • (2023)Can Automated Metadata Extraction Make Scientific Data More Navigable?2023 IEEE 19th International Conference on e-Science (e-Science)10.1109/e-Science58273.2023.10254801(1-10)Online publication date: 9-Oct-2023
  • (2021)Learning a Fine-Grained Review-based Transformer Model for Personalized Product SearchProceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3404835.3462911(123-132)Online publication date: 11-Jul-2021
  • (2021)OpenMatch: An Open Source Library for Neu-IR ResearchProceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3404835.3462789(2531-2535)Online publication date: 11-Jul-2021
  • (2021)Analysis of User Generated Content Based on a Recommender System and Augmented RealityTelematics and Computing10.1007/978-3-030-89586-0_17(207-228)Online publication date: 1-Nov-2021
  • (2020)Group-based Personalization Using Topical User ProfileAdjunct Publication of the 28th ACM Conference on User Modeling, Adaptation and Personalization10.1145/3386392.3399559(181-186)Online publication date: 14-Jul-2020
  • (2020)Personalized Entity Search by Sparse and Scrutable User ProfilesProceedings of the 2020 Conference on Human Information Interaction and Retrieval10.1145/3343413.3378011(427-431)Online publication date: 14-Mar-2020
  • (2020)LARQ: Learning to Ask and Rewrite Questions for Community Question AnsweringNatural Language Processing and Chinese Computing10.1007/978-3-030-60457-8_26(318-330)Online publication date: 2-Oct-2020
  • (2020)Personalization in text information retrievalJournal of the Association for Information Science and Technology10.1002/asi.2423471:3(349-369)Online publication date: 28-Jan-2020
  • (2019)A Zero Attention Model for Personalized Product SearchProceedings of the 28th ACM International Conference on Information and Knowledge Management10.1145/3357384.3357980(379-388)Online publication date: 3-Nov-2019
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media