| Rule-based word clustering for text classification |
| Full text |
Pdf
(79 KB)
|
| Source
|
Annual ACM Conference on Research and Development in Information Retrieval
archive
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
table of contents
Toronto, Canada
POSTER SESSION: Posters
table of contents
Pages: 445 - 446
Year of Publication: 2003
ISBN:1-58113-646-3
|
|
Authors
|
|
Hui Han
|
The Pennsylvania State University University Park, PA
|
|
Eren Manavoglu
|
The Pennsylvania State University University Park, PA
|
|
C. Lee Giles
|
The Pennsylvania State University University Park, PA
|
|
Hongyuan Zha
|
The Pennsylvania State University University Park, PA
|
|
| Sponsor |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 12, Downloads (12 Months): 70, Citation Count: 1
|
|
|
ABSTRACT
This paper introduces a rule-based, context-dependent word clustering method, with the rules derived from various domain databases and the word text orthographic properties. Besides significant dimensionality reduction, our experiments show that such rule-based word clustering improves by 8 the overall accuracy of extracting bibliographic fields from references, and by 18.32 on average the class-specific performance on the line classification of document headers.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. Harshman. Indexing by latent semantic analysis. Journal of the American Society of Information Science, 41(6):391--407, 1990.
|
| |
3
|
|
| |
4
|
|
| |
5
|
K. Seymore, A. McCallum, and R. Rosenfeld. Learning hidden Markov model structure for information extraction. In AAAI 99 Workshop on Machine Learning for Information Extraction, 1999.
|
| |
6
|
N. Slonim and N. Tishby. The power of word clusters for text classication. In ECIR, 2001.
|
| |
7
|
V. Vapnik. Statistical Learning Theory. 1998.
|
Peer to Peer - Readers of this Article have also read:
-
Open signaling for ATM, internet and mobile networks (OPENSIG'98)
ACM SIGCOMM Computer Communication Review
29, 1
Andrew T. Campbell
, Irene Katzela
, Kazuho Miki
, John Vicente
-
Constructing reality
Proceedings of the 11th annual international conference on Systems documentation
Douglas A. Powell
, Norman R. Ball
, Mansel W. Griffiths
-
Active bridging
ACM SIGCOMM Computer Communication Review
27, 4
D. Scott Alexander
, Marianne Shaw
, Scott M. Nettles
, Jonathan M. Smith
-
M4: a metamodel for data preprocessing
Proceedings of the 4th ACM international workshop on Data warehousing and OLAP
Anca Vaduva
, Jörg-Uwe Kietz
, Regina Zücker
-
Active electronic mail
Proceedings of the 2002 ACM symposium on Applied computing
S. Karnouskos
, A. Vasilakos
|