ACM Home Page
Please provide us with feedback. Feedback
Evaluating the markov assumption for web usage mining
Full text PdfPdf (208 KB)
Source Workshop On Web Information And Data Management archive
Proceedings of the 5th ACM international workshop on Web information and data management table of contents
New Orleans, Louisiana, USA
SESSION: Web clustering and usage mining table of contents
Pages: 82 - 89  
Year of Publication: 2003
ISBN:1-58113-725-7
Authors
Søren Jespersen  Linkage Software
Torben Bach Pedersen  Aalborg University
Jesper Thorhauge  Conzentrate
Sponsors
ACM: Association for Computing Machinery
SIGMIS: ACM Special Interest Group on Management Information Systems
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 17,   Downloads (12 Months): 148,   Citation Count: 2
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues   peer to peer  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/956699.956717
What is a DOI?

ABSTRACT

Web usage mining concerns the discovery of common browsing patterns, i.e., pages requested in sequence, from web logs. To cope with the enormous amounts of data, several aggregated structures based on statistical models of web surfing have appeared, e.g., the Hypertext Probabilistic Gramma(HPG) model [2]. These techniques typically rely on the Markov assumption with history depth n, i.e., it is assumed that the next requested page is only dependent on the last n pages visited. This is not always valid, i.e. false browsing patterns may be discovered. However, to our knowledge there has been no systematic study of the validity of the Markov assumption wrt. web usage mining and the resulting quality of the mined browsing patterns.In this paper we systematically investigate the quality of browsing patterns mined from structures based on the Markov assumption. Formal measures of quality, based on the closeness of the mined patterns to the true traversal patterns, are defined and an extensive experimental evaluation is performed, based on two substantial real-world data sets. The results indicate that a large number of rules must be considered to achieve high quality, that long rules are generally more distorted than shorter rules and that the model yield knowledge of a higher quality when applied to more random usage patterns. Thus we conclude that Markov-based structures for web usage mining are best suited for tasks demanding less accuracy such as pre-fetching, personalization, and targeted ads.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
 
3
4
 
5
 
6
 
7
 
8
R. Cooley, P. Tan, and J. Srivastava. Websift: the web site information filter system. In Proc. of WebKDD, pp. 163--182, 1999.
 
9
 
10
 
11
R. Kimball and R. Merz. The Data Webhouse Toolkit. Wiley, 2000.
 
12
 
13
S. Ross. A First Course in Probability. Prentice Hall, 1998.
 
14
M. Spiliopoulou and L. C. Faulstich. WUM: a Web Utilization Miner. In Proc. of WebDB, pp. 184--203, 1998.
15


Collaborative Colleagues:
Søren Jespersen: colleagues
Torben Bach Pedersen: colleagues
Jesper Thorhauge: colleagues

Peer to Peer - Readers of this Article have also read: