ACM Home Page
Please provide us with feedback. Feedback
Rule-based word clustering for text classification
Full text pdf formatPdf (79 KB)
Source Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval table of contents
Toronto, Canada
POSTER SESSION: Posters table of contents
Pages: 445 - 446  
Year of Publication: 2003
ISBN:1-58113-646-3
Authors
Hui Han  The Pennsylvania State University University Park, PA
Eren Manavoglu  The Pennsylvania State University University Park, PA
C. Lee Giles  The Pennsylvania State University University Park, PA
Hongyuan Zha  The Pennsylvania State University University Park, PA
Sponsor
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 12,   Downloads (12 Months): 70,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues   peer to peer  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/860435.860543
What is a DOI?

ABSTRACT

This paper introduces a rule-based, context-dependent word clustering method, with the rules derived from various domain databases and the word text orthographic properties. Besides significant dimensionality reduction, our experiments show that such rule-based word clustering improves by 8 the overall accuracy of extracting bibliographic fields from references, and by 18.32 on average the class-specific performance on the line classification of document headers.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. Harshman. Indexing by latent semantic analysis. Journal of the American Society of Information Science, 41(6):391--407, 1990.
 
3
 
4
 
5
K. Seymore, A. McCallum, and R. Rosenfeld. Learning hidden Markov model structure for information extraction. In AAAI 99 Workshop on Machine Learning for Information Extraction, 1999.
 
6
N. Slonim and N. Tishby. The power of word clusters for text classication. In ECIR, 2001.
 
7
V. Vapnik. Statistical Learning Theory. 1998.


Collaborative Colleagues:
Hui Han: colleagues
Eren Manavoglu: colleagues
C. Lee Giles: colleagues
Hongyuan Zha: colleagues

Peer to Peer - Readers of this Article have also read: