Article

On redundancy of training corpus for text categorization: a perspective of geometry

Authors:
Shuigeng Zhou

Fudan University, Shanghai, China

Fudan University, Shanghai, China
View Profile

,
Jihong Guan

Tongji University, Shanghai, China

Tongji University, Shanghai, China
View Profile

SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrievalAugust 2005Pages 671–672https://doi.org/10.1145/1076034.1076183

Published:15 August 2005Publication History

SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval

Pages 671–672

ABSTRACT

No abstract available.

References

W. Lam and C. Y. Ho. Using a generalized instance set for automatic text categorization, In Proc. of SIGIR'98, pp.81--89, ACM, 1998. Google ScholarDigital Library
Y. Yang and X. Liu. A re-examination of text categorization, In Proc. of SIGIR'99, ACM, 1999. Google ScholarDigital Library
Y. Yang, J. Zhang and B. Kisiel. A Scalability Analysis of Classifiers in Text Categorization, In Proc. of SIGIR'03, pp.96--103, ACM, 2003. Google ScholarDigital Library
D. V. Khmelev and W. J. Teahan. A Repetition Based Measure for Verification of Text Collections and for Text Categorization, In Proc. of SIGIR'03, pp. 104--110, ACM, 2003. Google ScholarDigital Library

Index Terms

On redundancy of training corpus for text categorization: a perspective of geometry
1. Computing methodologies
  1. Machine learning
2. Information systems
  1. Data management systems
    1. Database management system engines

Recommendations

Using the Web as corpus for self-training text categorization
Abstract
Most current methods for automatic text categorization are based on supervised learning techniques and, therefore, they face the problem of requiring a great number of training instances to construct an accurate classifier. In order to tackle this ...
Read More
Combining coregularization and consensus-based self-training for multilingual text categorization
SIGIR '10: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval

We investigate the problem of learning document classifiers in a multilingual setting, from collections where labels are only partially available. We address this problem in the framework of multiview learning, where different languages correspond to ...
Read More
Semi-supervised text categorization: Exploiting unlabeled data using ensemble learning algorithms

Text categorization is one of the fundamental tasks in text mining. Classical supervised methods need lot of labeled data to train a classifier. Since assigning labels to the large amount of data is very costly and time consuming, it is useful to use ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
August 2005
708 pages
ISBN:1595930345
DOI:10.1145/1076034
General Chairs:
Ricardo Baeza-Yates
University of Chile, Chile
,
Nivio Ziviani
Federal University of Minas Gerais, Brazil
,
Program Chairs:
Gary Marchionini
University of North Carolina, USA
,
Alistair Moffat
University of Melbourne, Australia
,
John Tait
University of Sunderland, UK
Copyright © 2005 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 15 August 2005
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
kNN text categorization
redundancy
training corpus
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate792of3,983submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 396
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

On redundancy of training corpus for text categorization: a perspective of geometry

SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Using the Web as corpus for self-training text categorization

Combining coregularization and consensus-based self-training for multilingual text categorization

Semi-supervised text categorization: Exploiting unlabeled data using ensemble learning algorithms

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

On redundancy of training corpus for text categorization: a perspective of geometry

SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Using the Web as corpus for self-training text categorization

Combining coregularization and consensus-based self-training for multilingual text categorization

Semi-supervised text categorization: Exploiting unlabeled data using ensemble learning algorithms

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media