skip to main content
10.1145/1367497.1367617acmconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article

Characterizing typical and atypical user sessions in clickstreams

Published:21 April 2008Publication History

ABSTRACT

Millions of users retrieve information from the Internet using search engines. Mining these user sessions can provide valuable information about the quality of user experience and the perceived quality of search results. Often search engines rely on accurate estimates of Click Through Rate (CTR) to evaluate the quality of user experience. The vast heterogeneity in the user population and presence of automated software programs (bots) can result in high variance in the estimates of CTR. To improve the estimation accuracy of user experience metrics like CTR, we argue that it is important to identify typical and atypical user sessions in clickstreams. Our approach to identify these sessions is based on detecting outliers using Mahalanobis distance in the user session space. Our user session model incorporates several key clickstream characteristics including a novel conformance score obtained by Markov Chain analysis. Editorial results show that our approach of identifying typical and atypical sessions has a precision of about 89%. Filtering out these atypical sessions reduces the uncertainty (95% confidence interval) of the mean CTR by about 40%. These results demonstrate that our approach of identifying typical and atypical user sessions is extremely valuable for cleaning "noisy" user session data for increased accuracy in evaluating user experience.

References

  1. E. Agichtein, E. Brill, and S. T. Dumais. Improving web search ranking by incorporating user behavior information. In SIGIR, pages 19--26, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. K. Ali and M. Scarr. Robust methodologies for modeling web click distributions. In WWW ?07: Proceedings of the 16th international conference on World Wide Web, pages 511--520. ACM Press, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. V. Almeida, D. A. Menascé, R. H. Riedi, F. Peligrinelli, R. C. Fonseca, and W. M. Jr. Analyzing robot behavior in e-business sites. In SIGMETRICS/Performance, pages 338--339, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. Baeza-Yates, C. Hurtado, M. Mendoza, and G. Dupret. Modeling user search behavior. In LA-WEB ?05: Proceedings of the Third Latin American Web Congress, page 242, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. Borges and M. Levene. Data mining of user navigation patterns. Web Usage Analysis and User Profiling, Springer-Verlag as Lecture Notes in Computer Science, 1836:92--111, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. L. D. Catledge and J. E. Pitkow. Characterizing browsing strategies in the world-wide web. Computer Networks and ISDN Systems, 27(6):1065--1073, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. L. Clark, I. Ting, C. Kimble, P. Wright, and D. Kudenko. Combining ethnographic and clickstream data to identify user Web browsing strategies, Information Research, 11(2) paper 249, 2006.Google ScholarGoogle Scholar
  8. J. F. Cove and B. C. Walsh. Online text retrieval via browsing. Information Processing and Management, 24(1):31--37, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. D. Dikaiakosa, A. Stassopoulou, and L. Papageorgioua. An investigation of webcrawler behavior: characterization and metrics. Computer Communications, 28(8):880--897, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. C. Holscher and G. Strube. Web search behavior of internet experts and newbies. In Proceedings of the 9th international World Wide Web conference on Computer networks, pages 337--346, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. T. Joachims, L. Granka, B. Pan, H. Hembrooke, and G. Gay. Accurately interpreting clickthrough data as implicit feedback. In SIGIR ?05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pages 154--161, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. R. A. Johnson and D. W. Wichern, editors. Applied multivariate statistical analysis. Prentice-Hall, Inc., 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. N. Kammenhuber, J. Luxenburger, A. Feldmann, and G. Weikum. Web search clickstreams. In Proceedings of the 6th ACM SIGCOMM on Internet measurement (IMC), pages 245--250, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Kosala and Blockeel. Web mining research: A survey. SIGKDD: SIGKDD Explorations: Newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining, ACM, 2, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. D. A. Menascé, V. Almeida, R. H. Riedi, F. Ribeiro, R. C. Fonseca, and W. M. Jr. In search of invariants for e-business workloads. In ACM Conference on Electronic Commerce, pages 56--65, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. L. Montgomery, S. Li, K. Srinivasan, and J. C. Liechty. Modeling online browsing and path analysis using clickstream data. In Mining Business Databases. Joint Statistical Meetings (JSM), 2003.Google ScholarGoogle Scholar
  17. R. R. Sarukkai. Link prediction and path analysis using markov chains. Computer Networks, 33:377?386, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. A. Stassopoulou and M. D. Dikaiakos. Crawler detection: A bayesian approach. In International Conference on Internet Surveillance and Protection (ICISP), 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. P. Tan and V. Kumar. Modeling of web robot navigational patterns. In Proc. ACM WebKDD Workshop, 2000.Google ScholarGoogle Scholar
  20. P. Tan and V. Kumar. Discovery of web robot sessions based on their navigational patterns. Data Mining and Knowledge Discovery, 6:9--35, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. I. Ting, C. Kimble, and D. Kudenko. UBB mining: Finding unexpected browsing behaviour in clickstream data to improve a web sites design. In IEEE/WIC/ACM International Conference on Web Intelligence (WI), pages 179--185, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. D. Vise. Clicking to steal. Washington Post Magazine, April 17 2005.Google ScholarGoogle Scholar
  23. H. Weinreich, H. Obendorf, and E. Herder. Data cleaning methods for client and proxy logs. In WWW Workshop Proceedings: Logging Traces of Web Activity: The Mechanics of Data Collection, 2006.Google ScholarGoogle Scholar

Index Terms

  1. Characterizing typical and atypical user sessions in clickstreams

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            WWW '08: Proceedings of the 17th international conference on World Wide Web
            April 2008
            1326 pages
            ISBN:9781605580852
            DOI:10.1145/1367497

            Copyright © 2008 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 21 April 2008

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            Overall Acceptance Rate1,899of8,196submissions,23%

            Upcoming Conference

            WWW '24
            The ACM Web Conference 2024
            May 13 - 17, 2024
            Singapore , Singapore

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader