ABSTRACT
Millions of users retrieve information from the Internet using search engines. Mining these user sessions can provide valuable information about the quality of user experience and the perceived quality of search results. Often search engines rely on accurate estimates of Click Through Rate (CTR) to evaluate the quality of user experience. The vast heterogeneity in the user population and presence of automated software programs (bots) can result in high variance in the estimates of CTR. To improve the estimation accuracy of user experience metrics like CTR, we argue that it is important to identify typical and atypical user sessions in clickstreams. Our approach to identify these sessions is based on detecting outliers using Mahalanobis distance in the user session space. Our user session model incorporates several key clickstream characteristics including a novel conformance score obtained by Markov Chain analysis. Editorial results show that our approach of identifying typical and atypical sessions has a precision of about 89%. Filtering out these atypical sessions reduces the uncertainty (95% confidence interval) of the mean CTR by about 40%. These results demonstrate that our approach of identifying typical and atypical user sessions is extremely valuable for cleaning "noisy" user session data for increased accuracy in evaluating user experience.
- E. Agichtein, E. Brill, and S. T. Dumais. Improving web search ranking by incorporating user behavior information. In SIGIR, pages 19--26, 2006. Google ScholarDigital Library
- K. Ali and M. Scarr. Robust methodologies for modeling web click distributions. In WWW ?07: Proceedings of the 16th international conference on World Wide Web, pages 511--520. ACM Press, 2007. Google ScholarDigital Library
- V. Almeida, D. A. Menascé, R. H. Riedi, F. Peligrinelli, R. C. Fonseca, and W. M. Jr. Analyzing robot behavior in e-business sites. In SIGMETRICS/Performance, pages 338--339, 2001. Google ScholarDigital Library
- R. Baeza-Yates, C. Hurtado, M. Mendoza, and G. Dupret. Modeling user search behavior. In LA-WEB ?05: Proceedings of the Third Latin American Web Congress, page 242, 2005. Google ScholarDigital Library
- J. Borges and M. Levene. Data mining of user navigation patterns. Web Usage Analysis and User Profiling, Springer-Verlag as Lecture Notes in Computer Science, 1836:92--111, 1999. Google ScholarDigital Library
- L. D. Catledge and J. E. Pitkow. Characterizing browsing strategies in the world-wide web. Computer Networks and ISDN Systems, 27(6):1065--1073, 1995. Google ScholarDigital Library
- L. Clark, I. Ting, C. Kimble, P. Wright, and D. Kudenko. Combining ethnographic and clickstream data to identify user Web browsing strategies, Information Research, 11(2) paper 249, 2006.Google Scholar
- J. F. Cove and B. C. Walsh. Online text retrieval via browsing. Information Processing and Management, 24(1):31--37, 1988. Google ScholarDigital Library
- M. D. Dikaiakosa, A. Stassopoulou, and L. Papageorgioua. An investigation of webcrawler behavior: characterization and metrics. Computer Communications, 28(8):880--897, 2005. Google ScholarDigital Library
- C. Holscher and G. Strube. Web search behavior of internet experts and newbies. In Proceedings of the 9th international World Wide Web conference on Computer networks, pages 337--346, 2000. Google ScholarDigital Library
- T. Joachims, L. Granka, B. Pan, H. Hembrooke, and G. Gay. Accurately interpreting clickthrough data as implicit feedback. In SIGIR ?05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pages 154--161, 2005. Google ScholarDigital Library
- R. A. Johnson and D. W. Wichern, editors. Applied multivariate statistical analysis. Prentice-Hall, Inc., 1988. Google ScholarDigital Library
- N. Kammenhuber, J. Luxenburger, A. Feldmann, and G. Weikum. Web search clickstreams. In Proceedings of the 6th ACM SIGCOMM on Internet measurement (IMC), pages 245--250, 2006. Google ScholarDigital Library
- Kosala and Blockeel. Web mining research: A survey. SIGKDD: SIGKDD Explorations: Newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining, ACM, 2, 2000. Google ScholarDigital Library
- D. A. Menascé, V. Almeida, R. H. Riedi, F. Ribeiro, R. C. Fonseca, and W. M. Jr. In search of invariants for e-business workloads. In ACM Conference on Electronic Commerce, pages 56--65, 2000. Google ScholarDigital Library
- A. L. Montgomery, S. Li, K. Srinivasan, and J. C. Liechty. Modeling online browsing and path analysis using clickstream data. In Mining Business Databases. Joint Statistical Meetings (JSM), 2003.Google Scholar
- R. R. Sarukkai. Link prediction and path analysis using markov chains. Computer Networks, 33:377?386, 2000. Google ScholarDigital Library
- A. Stassopoulou and M. D. Dikaiakos. Crawler detection: A bayesian approach. In International Conference on Internet Surveillance and Protection (ICISP), 2006. Google ScholarDigital Library
- P. Tan and V. Kumar. Modeling of web robot navigational patterns. In Proc. ACM WebKDD Workshop, 2000.Google Scholar
- P. Tan and V. Kumar. Discovery of web robot sessions based on their navigational patterns. Data Mining and Knowledge Discovery, 6:9--35, 2002. Google ScholarDigital Library
- I. Ting, C. Kimble, and D. Kudenko. UBB mining: Finding unexpected browsing behaviour in clickstream data to improve a web sites design. In IEEE/WIC/ACM International Conference on Web Intelligence (WI), pages 179--185, 2005. Google ScholarDigital Library
- D. Vise. Clicking to steal. Washington Post Magazine, April 17 2005.Google Scholar
- H. Weinreich, H. Obendorf, and E. Herder. Data cleaning methods for client and proxy logs. In WWW Workshop Proceedings: Logging Traces of Web Activity: The Mechanics of Data Collection, 2006.Google Scholar
Index Terms
- Characterizing typical and atypical user sessions in clickstreams
Recommendations
A New Algorithm for Inferring User Search Goals with Feedback Sessions
For a broad-topic and ambiguous query, different users may have different search goals when they submit it to a search engine. The inference and analysis of user search goals can be very useful in improving search engine relevance and user experience. ...
Automatic identification of user interest for personalized search
WWW '06: Proceedings of the 15th international conference on World Wide WebOne hundred users, one hundred needs. As more and more topics are being discussed on the web and our vocabulary remains relatively stable, it is increasingly difficult to let the search engine know what we want. Coping with ambiguous queries has long ...
Predicting User Knowledge Gain in Informational Search Sessions
SIGIR '18: The 41st International ACM SIGIR Conference on Research & Development in Information RetrievalWeb search is frequently used by people to acquire new knowledge and to satisfy learning-related objectives. In this context, informational search missions with an intention to obtain knowledge pertaining to a topic are prominent. The importance of ...
Comments