skip to main content
10.1145/1242572.1242736acmconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
Article

Classifying web sites

Published:08 May 2007Publication History

ABSTRACT

In this paper, we present a novel method for the classification of Web sites. This method exploits both structure and content of Web sites in order to discern their functionality. It allows for distinguishing between eight of the most relevant functional classes of Web sites. We show that a pre-classification of Web sites utilizing structural properties considerably improves a subsequent textual classification with standard techniques. We evaluate this approach on a dataset comprising more than 16,000 Web sites with about 20 million crawled and 100 million known Web pages. Our approach achieves an accuracy of 92% for the coarse-grained classification of these Web sites.

References

  1. E. Amitay, D. Carmel, A. Darlow, R. Lempel, and A. Soffer, The Connectivity Sonar: Detecting Site Functionality by Structural Patterns, Proc. 14th Conf. on Hypertext and Hypermedia, Nottingham, United Kingdom, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. M. Ester, H. -P. Kriegel, and M. Schubert, Web Site Mining: A New Way to Spot Competitors, Customers and Suppliers in the World Wide Web, Proc. 8th Int. Conf. on Knowledge Discovery and Data Mining, Edmonton, Canada, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. C. Lindemann and L. Littig, Coarse-grained Classification of Web Sites by Their Structural Properties, Proc. 8th Int. Workshop on Web Information and Data Management, Arlington, VA, 2006 Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Yahoo! Mindset, http://mindset.research.yahoo.comGoogle ScholarGoogle Scholar

Index Terms

  1. Classifying web sites

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          WWW '07: Proceedings of the 16th international conference on World Wide Web
          May 2007
          1382 pages
          ISBN:9781595936547
          DOI:10.1145/1242572

          Copyright © 2007 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 8 May 2007

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • Article

          Acceptance Rates

          Overall Acceptance Rate1,899of8,196submissions,23%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader