skip to main content
10.1145/1498759.1498823acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article

Mining user web search activity with layered bayesian networks or how to capture a click in its context

Published: 09 February 2009 Publication History

Abstract

Mining user web search activity potentially has a broad range of applications including web result pre-fetching, automatic search query reformulation, click spam detection, estimation of document relevance and prediction of user satisfaction. This analysis is difficult because the data recorded by search engines while users interact with them, although abundant, is very noisy. In this work, we explore the utility of mining search behavior of users, represented by observed variables including the time the user spends on the page, and whether the user reformulated his or her query. As a case study, we examine the contribution this data makes to predicting the relevance of a document in the absence of document content models. To this end, we first propose a method for grouping the interactions of a particular user according to the different tasks he or she undertakes. With each task corresponding to a distinct information need, we then propose a Bayesian Network to holistically model these interactions. The aim is to identify distinct patterns of search behaviors. Finally, we join these patterns to a list of custom features and we use gradient boosted decision trees to predict the relevance of a set of query document pairs for which we have relevance assessments. The experimental results confirm the potential of our model, with significant improvements in precision for predicting the relevance of documents based on a model of the user's search and click behavior, over a baseline model using only click and query features, with no Bayesian Network input.

References

[1]
E. Agichtein, E. Brill, S. Dumais, and R. Ragno. Learning user interaction models for predicting web search result preferences. In Proceedings of SIGIR 2006, pages 3--10, New York, NY, USA, 2006. ACM Press.
[2]
M. Brand. Structure learning in conditional probability models via an entropic prior and parameter extinction. Neural Computation, 11(5):1155--1182, 1999.
[3]
A. Broder. A taxonomy of web search. SIGIR Forum, 36(2):3--10, 2002.
[4]
B. Carterette and R. Jones. Evaluating search engines by modeling the relationship between relevance and clicks. In NIPS 2007, 2007.
[5]
A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via de EM algorithm. The Journal of Royal Statistical Society, 39:1--37, 1977.
[6]
D. Downey, S. T. Dumais, and E. Horvitz. Models of searching and browsing: Languages, studies, and application. In Proceedings of IJCAI 2007, pages 2740--2747, 2007.
[7]
G. Dupret and B. Piwowarski. User behavior and search engine query logs: a generative model to predict clickthrough rate. In Proceedings of SIGIR 2008, 2008.
[8]
J. H. Friedman. Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29(5):1189--1232, 2001.
[9]
D. He, A. Goker, and D. J. Harper. Combining evidence for automatic web session identification. Information Processing & Management, 38(5):727--742, September 2002.
[10]
D. Kelly and N. J. Belkin. Display time as implicit feedback: understanding task effects. pages 377--384, New York, NY, USA. ACM.
[11]
T. Lau and E. Horvitz. Patterns of search: Analyzing and modeling web query refinement. In A. Press, editor, Proceedings of ICUM, 1999.
[12]
G. Miller, E. Galanter, and K. Pribram. Plans and the structure of behavior. Holt, Rhinehart, & Winston, New York, 1960.
[13]
S. Mizzaro. How many relevances in information retrieval? Interacting With Computers, 10(3):305--322, 1998.
[14]
S. Ozmutlu. Automatic new topic identification using multiple linear regression. Information Processing & Management, 42(4):934--950, July 2006.
[15]
F. Radlinski and T. Joachims. Query chains: learning to rank from implicit feedback. In Proceeding of ACM SIGKDD 2005, pages 239--248, New York, NY, USA, 2005. ACM Press.
[16]
G. Ridgeway. Generalized boosted models: A guide to the gbm package. http://i-pensieri.com/gregr/papers/gbm-vignette.pdf, 2005.
[17]
X. Shen, B. Tan, and C. Zhai. Implicit user modeling for personalized search. In Proceedings of CIKM 2005, pages 824--831, New York, NY, USA, 2005. ACM.
[18]
B. Tan, X. Shen, and C. Zhai. Mining long-term search history to improve search accuracy. In Proceedings of KDD 2006, 2006.
[19]
J. Teevan, E. Adar, R. Jones, and M. Potts. Information re-retrieval: Repeat queries in yahoo's logs. In Proceedings of SIGIR 2007. ACM, 2007.
[20]
R. W. White and S. M. Drucker. Investigating behavioral variability in web search. In Proceedings of WWW 2007, pages 21--30, New York, NY, USA, 2007. ACM.

Cited By

View all
  • (2023)Formally Modeling Users in Information RetrievalA Behavioral Economics Approach to Interactive Information Retrieval10.1007/978-3-031-23229-9_2(23-64)Online publication date: 18-Feb-2023
  • (2022)Electroencephalography and Self-assessment Evaluation of Engagement with Online Exhibitions: Case Study of Google Arts and CultureCulture and Computing10.1007/978-3-031-05434-1_21(316-331)Online publication date: 16-Jun-2022
  • (2021)Does More Context Help? Effects of Context Window and Application Source on Retrieval PerformanceACM Transactions on Information Systems10.1145/347405540:2(1-40)Online publication date: 27-Sep-2021
  • Show More Cited By

Index Terms

  1. Mining user web search activity with layered bayesian networks or how to capture a click in its context

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    WSDM '09: Proceedings of the Second ACM International Conference on Web Search and Data Mining
    February 2009
    314 pages
    ISBN:9781605583907
    DOI:10.1145/1498759
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 09 February 2009

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. click-through data
    2. query log analysis
    3. relevance prediction
    4. user modelling
    5. web retrieval

    Qualifiers

    • Research-article

    Conference

    WSDM'09
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 498 of 2,863 submissions, 17%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)26
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 16 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Formally Modeling Users in Information RetrievalA Behavioral Economics Approach to Interactive Information Retrieval10.1007/978-3-031-23229-9_2(23-64)Online publication date: 18-Feb-2023
    • (2022)Electroencephalography and Self-assessment Evaluation of Engagement with Online Exhibitions: Case Study of Google Arts and CultureCulture and Computing10.1007/978-3-031-05434-1_21(316-331)Online publication date: 16-Jun-2022
    • (2021)Does More Context Help? Effects of Context Window and Application Source on Retrieval PerformanceACM Transactions on Information Systems10.1145/347405540:2(1-40)Online publication date: 27-Sep-2021
    • (2020)The Curious Case of Session IdentificationExperimental IR Meets Multilinguality, Multimodality, and Interaction10.1007/978-3-030-58219-7_6(69-74)Online publication date: 22-Sep-2020
    • (2018)Social SearchSocial Information Access10.1007/978-3-319-90092-6_7(213-276)Online publication date: 3-May-2018
    • (2017)A data-intensive approach for discovering user similarities in social behavioral interactions based on the bayesian networkNeurocomputing10.1016/j.neucom.2016.09.042219:C(364-375)Online publication date: 5-Jan-2017
    • (2016)Modeling clicks using document popularityProceedings of the 31st Annual ACM Symposium on Applied Computing10.1145/2851613.2851691(1021-1026)Online publication date: 4-Apr-2016
    • (2014)A Top-N Recommender Model with Partially Predefined StructureProceedings of the 2014 IEEE 11th International Conference on e-Business Engineering10.1109/ICEBE.2014.29(112-119)Online publication date: 5-Nov-2014
    • (2014)Your Search Behavior and Your PersonalityPervasive Computing and the Networked World10.1007/978-3-319-09265-2_47(459-470)Online publication date: 2014
    • (2013)Mining search and browse logs for web searchACM Transactions on Intelligent Systems and Technology10.1145/2508037.25080384:4(1-37)Online publication date: 8-Oct-2013
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media