skip to main content
research-article

Preprocessing the web server logs: an illustrative approach for effective usage mining

Published: 16 May 2012 Publication History

Abstract

Data preprocessing is an important activity for discovering behavioral patterns. The analysis of web logs is an essential task for System Administrators to safeguard adequate bandwidth and to maintain server capacity on their business websites. A web Log file represents user activities occurring over a period of time. Web log files offer valuable insight into the effective usage of the web site. It helps maintain an account of the actual usage in a regular working system as compared to the virtual setting of a usability lab. This research paper focuses on the preprocessing techniques implemented on a specially designed Web Sift (WebIS) tool on an IIS web server and also proposes some efficient heuristics and techniques

References

[1]
Haigh, S. and Megarity, J. Measuring Web Site Usage: Log File Analysis. Network Notes #57, 1998.
[2]
Natheer, K. and Chan, C.C. Active User-Based and Ontology-Based Web Log Data Preprocessing for Web Usage Mining. Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence (WI '06). 2006.
[3]
Davidson, B. D. (2001). Web Traffic Logs: An Imperfect Resource for Evaluation. Ninth Annual Conference of The Internet Society.
[4]
Ciesielski, V. and Anand, L. Data mining of web access logs from an academic web site. Design and application of hybrid intelligent systems, 2003. Pp 1034--1043
[5]
Drott, M. C. (1998). Using Web Server Logs to Improve Site Design. Association for Computing Machinery (ACM) Proceeding of the Sixteenth Annual International Conference on Computer Documentation.pp. 43--50
[6]
Pramudiono, I. Parallel Platform for Large Scale Web Usage Mining. PhD Thesis, 2004
[7]
Tsuyoshi, M and Saito, K. Extracting User's Interest for Web Log Data. Proceeding of IEEE/ACM/WIC International Conference on Web Intelligence (WI'06), 2006.
[8]
Rubin, Jeffrey (2003), Integrating Content Management Systems with Legacy Applications. Collegiate Sports Information Director's Association of America (CoSIDA), Cleveland, Ohio, July 2003.
[9]
Jeffrey, X. Y., Yuming, O, Zhang, C, Zhang, S. Identifying Interesting Customers through Web Log Classification. IEEE Intelligent Systems #20, 2005. pp 55--59.
[10]
Novak and Hoffman (1996). New Metrics for New Media: Towards the Development of Web Measurement Standards. http://www2000.ogsm.vanderbilt.edu/novak/web.standards/webstand.html

Cited By

View all
  • (2018)Development of Weblog Pre-Processing System: A Parallel Approach2018 2nd International Conference on Trends in Electronics and Informatics (ICOEI)10.1109/ICOEI.2018.8553715(132-135)Online publication date: May-2018
  • (2016)LogMineProceedings of the 25th ACM International on Conference on Information and Knowledge Management10.1145/2983323.2983358(1573-1582)Online publication date: 24-Oct-2016
  • (2014)Who and what links to the Internet ArchiveInternational Journal on Digital Libraries10.1007/s00799-014-0111-514:3-4(101-115)Online publication date: 1-Aug-2014
  • Show More Cited By

Index Terms

  1. Preprocessing the web server logs: an illustrative approach for effective usage mining

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM SIGSOFT Software Engineering Notes
    ACM SIGSOFT Software Engineering Notes  Volume 37, Issue 3
    May 2012
    129 pages
    ISSN:0163-5948
    DOI:10.1145/2180921
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 16 May 2012
    Published in SIGSOFT Volume 37, Issue 3

    Check for updates

    Author Tags

    1. clustering
    2. pattern discovery
    3. patterns summary
    4. preprocessing
    5. sequential patterns
    6. web usage mining

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)3
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 11 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2018)Development of Weblog Pre-Processing System: A Parallel Approach2018 2nd International Conference on Trends in Electronics and Informatics (ICOEI)10.1109/ICOEI.2018.8553715(132-135)Online publication date: May-2018
    • (2016)LogMineProceedings of the 25th ACM International on Conference on Information and Knowledge Management10.1145/2983323.2983358(1573-1582)Online publication date: 24-Oct-2016
    • (2014)Who and what links to the Internet ArchiveInternational Journal on Digital Libraries10.1007/s00799-014-0111-514:3-4(101-115)Online publication date: 1-Aug-2014
    • (2013)Access patterns for robots and humans in web archivesProceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries10.1145/2467696.2467722(339-348)Online publication date: 22-Jul-2013

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media