skip to main content
10.1145/1810617.1810685acmconferencesArticle/Chapter ViewAbstractPublication PageshtConference Proceedingsconference-collections
poster

Automatic extraction of structure, content and usage data statistics of web sites

Published: 13 June 2010 Publication History

Abstract

In this paper we present a web mining tool which automatically extracts the structure, content and usage data statistics of web sites. This work inspired by the fact that web mining consists of three axes: web structure mining, web content mining and web usage mining. Each one of those axes is using the structure, content and usage data respectively. The scope is to use the developed multi-thread web crawler as a tool to automatically extract from web pages data that are associated with each one of those three axes in order afterwards to compute several useful descriptive statistics and apply advanced mathematical and statistical methods. A description of our system is provided as well as some experimentation results.

References

[1]
Aitchison J. The Statistical Analysis of Compositional Data. Monographs on Statistics and Applied Probability. Chapman & Hall Ltd, 1986.
[2]
Osmar Rachid Za. Resource And Knowledge Discovery From The Internet And Multimedia Repositories. Technical report, Phd dissertation, Simon Fraser University, March 1999.
[3]
Thió-Henestrosa S., Gómez O., Cepero R., CODAPACK 3D. A new version of Compositional Data Package. 3rd Compositional Data Analysis Workshop. Girona, 2008.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
HT '10: Proceedings of the 21st ACM conference on Hypertext and hypermedia
June 2010
328 pages
ISBN:9781450300414
DOI:10.1145/1810617

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 June 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. classification algorithm
  2. content and usage data
  3. crawling
  4. structure
  5. web mining

Qualifiers

  • Poster

Conference

HT '10
Sponsor:
HT '10: 21st ACM Conference on Hypertext and Hypermedia
June 13 - 16, 2010
Ontario, Toronto, Canada

Acceptance Rates

Overall Acceptance Rate 378 of 1,158 submissions, 33%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 233
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media