ACM Home Page
Please provide us with feedback. Feedback
Clustering documents in a web directory
Full text PdfPdf (181 KB)
Source Workshop On Web Information And Data Management archive
Proceedings of the 5th ACM international workshop on Web information and data management table of contents
New Orleans, Louisiana, USA
SESSION: Web clustering and usage mining table of contents
Pages: 66 - 73  
Year of Publication: 2003
ISBN:1-58113-725-7
Authors
Giordano Adami  ITC-irst, Povo, Italy
Paolo Avesani  ITC-irst, Povo, Italy
Diego Sona  ITC-irst, Povo, Italy
Sponsors
ACM: Association for Computing Machinery
SIGMIS: ACM Special Interest Group on Management Information Systems
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 15,   Downloads (12 Months): 92,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues   peer to peer  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/956699.956715
What is a DOI?

ABSTRACT

Hierarchical categorization of documents is a task receiving growing interest due to the widespread proliferation of topic hierarchies for text documents. The worst problem of hierarchical supervised classifiers is their high demand in terms of labeled examples, whose amount is related to the number of topics in the taxonomy. Hence, bootstrapping a huge hierarchy with a proper set of labeled examples is a critical issue. In this paper, we propose some solutions for the bootstrapping problem, implicitly or explicitly using a taxonomy definition: a baseline approach where documents are classified according to class labels, and two clustering approaches, where training is constrained by the a-priori knowledge of the taxonomy structure, both at terminological and topological level. In particular, we propose the TaxSOM model, that clusters a set of documents in a predefined hierarchy of classes, directly exploiting the knowledge of both their topological organization and their lexical description. Experimental evaluation was performed on a set of taxonomies taken from the Google Web directory.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
 
3
 
4
M. Bonifacio, P. Bouquet, and P. Traverso. Enabling distributed knowledge management. managerial and technological implications. Informatik/Informatique, 3(1), 2002.
 
5
M. Ceci and D. Malerba. Hierarchical classification of html documents with webclassii. In Proc. of the 25th European Conf. on Information Retrieval (ECIR'03), volume 2633 of Lecture Notes in Computer Science, pages 57--72, 2003.
 
6
 
7
 
8
 
9
10
 
11
 
12
B. Jeon and D. Landgrebe. Partially supervised classification using weighted unsupervised clustering. IEEE Trans. on Geoscience and Remote Sensing, 37(2):1073--1079, 1999.
 
13
 
14
 
15
T. Kohonen, S. Kaski, K. Lagus, J. Salojarvi, J. Honkela, V. Paatero, and A. Saarela. Self organization of a massive document collection. IEEE Trans. on Neural Networks, 11(3):574--585, 2000.
 
16
 
17
A. McCallum and K. Nigam. Text classification by bootstrapping with keywords. In ACL99 - Workshop for Unsupervised Learning in Natural Language Processing, 1999.
 
18
 
19
 
20
 
21


Collaborative Colleagues:
Giordano Adami: colleagues
Paolo Avesani: colleagues
Diego Sona: colleagues

Peer to Peer - Readers of this Article have also read: