ACM Home Page
Please provide us with feedback. Feedback
A scalability analysis of classifiers in text categorization
Full text pdf formatPdf (243 KB)
Source Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval table of contents
Toronto, Canada
SESSION: Text categorization table of contents
Pages: 96 - 103  
Year of Publication: 2003
ISBN:1-58113-646-3
Authors
Yiming Yang  Carnegie Mellon University, Pittsburgh, PA
Jian Zhang  Carnegie Mellon University, Pittsburgh, PA
Bryan Kisiel  Carnegie Mellon University, Pittsburgh, PA
Sponsor
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 12,   Downloads (12 Months): 150,   Citation Count: 13
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues   peer to peer  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/860435.860455
What is a DOI?

ABSTRACT

Real-world applications of text categorization often require a system to deal with tens of thousands of categories defined over a large taxonomy. This paper addresses the problem with respect to a set of popular algorithms in text categorization, including Support Vector Machines, k-nearest neighbor, ridge regression, linear least square fit and logistic regression. By providing a formal analysis of the computational complexity of each classification method, followed by an investigation on the usage of different classifiers in a hierarchical setting of categorization, we show how the scalability of a method depends on the topology of the hierarchy and the category distributions. In addition, we are able to obtain tight bounds for the complexities by using the power law to approximate category distributions over a hierarchy. Experiments with kNN and SVM classifiers on the OHSUMED corpus are reported on, as concrete examples.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
M. Berry. Large-scale singular value computations. volume 6-1, pages 13--49, 1992.
 
2
 
3
R. F. E. Osuna and F. Girosi. An improved training algorithm for support vector machines. In Neural Networks for Signal Processing VII-Proceedings of 1997 IEEE Workshop, New York, 1995.
4
 
5
 
6
 
7
T. Joachims. The Maximum-Margin Approach to Learning Text Classifiers: Methods, Theory, and Algorithms. Ph.D. thesis, University of Dortmund, 2000.
 
8
D. Lewis, F. Li, T. Rose, and Y. Yang. The reuters corpus volume i as a text categorization test collection. In Journal of Machine Learning Research, page (to appear), 2003.
9
 
10
F. Li and Y. Yang. A loss function analysis for classification methods in text categorization. In ICML, 2003 (submitted).
 
11
A. Newell and P. Rosenbloom. Mechanisms of skill acquisition and the law of practice. In J. Anderson, editor, Cognitive Skills and Their Acquisition, pages chapter 1, pp 1--55, Hillsdale, NJ, 1981. Lawrence Erlbaum Associates, Inc.
 
12
J. Platt. Sequetial minimal optimization: A fast algorithm for training support vector machines. In Technical Report MST-TR-98-14. Microsoft Research, 1998.
 
13
S. Robertson and S. Walker. Microsoft cambridge at trec-9. In D. Harmon, editor, Proceedings of the Nineth Text REtrieval Conference (TREC-9), 2001.
 
14
V. Vapnik. Statistical Learning Theory. John Wiley and Sons, New York, 1998.
 
15
16
 
17
18
 
19
 
20
 
21

CITED BY  13
 
 
 
 
 

Collaborative Colleagues:
Yiming Yang: colleagues
Jian Zhang: colleagues
Bryan Kisiel: colleagues

Peer to Peer - Readers of this Article have also read: