skip to main content
10.1145/1012807.1012821acmconferencesArticle/Chapter ViewAbstractPublication PageshtConference Proceedingsconference-collections
Article

Automatic categorization of web sites based on source types

Published: 09 August 2004 Publication History

Abstract

An important issue with the Web is verification of the accuracy, currency and authenticity of the information associated with Web sites. One way to address this problem is to identify the "source" or "sponsor" of the Web site. However, source identification is non-trivial because the source of a Web site cannot always be determined by the URL or content of the site. In this paper, we propose a method for source identification that uses various types of inbound, outbound and internal interactions that arise due to hyperlinks between and within Web sites.

References

[1]
M. E. et.al. Web site mining: a new way to spot competitors, customers and suppliers in the world wide web. In Proc. of ACM SIGKDD, pages 249--258, 2002.
[2]
R. L. et.al. The connectivity sonar: Detecting site functionality by structural patterns. In Proc. of Conference on Hypertext and Hypermedia, 2003.
[3]
S. C. et.al. Enhanced hypertext categorization using hyperlinks. In Proc. of SIGMOD-98, pages 307--318.
[4]
A. K. McCallum. Bow: A toolkit for statistical language modeling, text retrieval, classification and clustering. http://www.cs.cmu.edu/ mccallum/bow, 1996.

Cited By

View all
  • (2011)Intelligent Web-History Based on a Hybrid Clustering Algorithm for Future-Internet SystemsProceedings of the 2011 13th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing10.1109/SYNASC.2011.24(145-152)Online publication date: 26-Sep-2011
  • (2011)SUTComputer Networks: The International Journal of Computer and Telecommunications Networking10.1016/j.comnet.2011.06.00555:13(3001-3014)Online publication date: 1-Sep-2011
  • (2010)Semantic Space models for classification of consumer webpages on metadata attributesJournal of Biomedical Informatics10.1016/j.jbi.2010.06.00543:5(725-735)Online publication date: 1-Oct-2010
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
HYPERTEXT '04: Proceedings of the fifteenth ACM conference on Hypertext and hypermedia
August 2004
284 pages
ISBN:1581138482
DOI:10.1145/1012807
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 August 2004

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. classification
  2. site interaction
  3. source categorization
  4. web site categorization

Qualifiers

  • Article

Conference

HT04
Sponsor:
HT04: 15th Conference on Hypertext and Hypermedia
August 9 - 13, 2004
CA, Santa Cruz, USA

Acceptance Rates

Overall Acceptance Rate 378 of 1,158 submissions, 33%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 15 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2011)Intelligent Web-History Based on a Hybrid Clustering Algorithm for Future-Internet SystemsProceedings of the 2011 13th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing10.1109/SYNASC.2011.24(145-152)Online publication date: 26-Sep-2011
  • (2011)SUTComputer Networks: The International Journal of Computer and Telecommunications Networking10.1016/j.comnet.2011.06.00555:13(3001-3014)Online publication date: 1-Sep-2011
  • (2010)Semantic Space models for classification of consumer webpages on metadata attributesJournal of Biomedical Informatics10.1016/j.jbi.2010.06.00543:5(725-735)Online publication date: 1-Oct-2010
  • (2007)Extraction of Anchor-Related Text and Its Evaluation by User StudiesHuman Interface and the Management of Information. Methods, Techniques and Tools in Information Design10.1007/978-3-540-73345-4_51(446-455)Online publication date: 2007
  • (2005)Survey of semantic text portion for building web directory from people's viewsProceedings of the 2005 International Conference on Active Media Technology, 2005. (AMT 2005).10.1109/AMT.2005.1505277(96-101)Online publication date: 2005

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media