skip to main content
10.1145/1141277.1141402acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
Article

Associative text categorization exploiting negated words

Published: 23 April 2006 Publication History

Abstract

Associative classification has been recently applied to text document categorization. However, differently from classification of structured data, the quality of the generated classifier is rather low. This effect is mainly due to the poor precision of generated rules.To increase the precision of associative classifiers we propose the use of classification rules including negated words, i.e. words that the considered document should not contain. Rules are in the form "If a document includes words A and B, but not word Z, then it belongs to class C1". Mining classification rules with negated words becomes quickly intractable when decreasing the support threshold. We tackle this problem by means of an opportunistic approach, where negated words are only generated to specialize rules that may wrongly classify training documents. Hence precision is increased, without losing recall.Experiments on the Reuters corpus show that our classifier based on negated words achieves good precision and recall results, while yielding an easily interpretable model typical of associative classifiers.

References

[1]
R. Agrawal and R. Srikant. Fast algorithm for mining association rules. In VLDB'94, Chile, 1994.]]
[2]
M.-L. Antonie and O. Zaiane. Text document categorization by term association. In ICDM'02, Japan, 2002.]]
[3]
M.-L. Antonie and O. Zaiane. An associative classifier based on positive and negative rules. In DMKD'04, 2004.]]
[4]
E. Baralis and P. Garza. A lazy approach to pruning classification rules. In ICDM'02, Japan, 2002.]]
[5]
S. Brin, R. Motwani, and C. Silverstein. Beyond market baskets: Generalizing associations rules to correlations. In ACM SIGMOD'97, Tucson, 1997.]]
[6]
W. W. Cohen and Y. Singer. Context-sensitive learning methods for text categorization. ACM TOIS, 17(2), 1999.]]
[7]
B. Goethals and M. J. Zaki. FIMI'03: Workshop on frequent itemset mining implementations. In FIMI'03, 2003.]]
[8]
J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. In SIGMOD'00, 2000.]]
[9]
S. Hettich and S. D. Bay. The reuters-21578 text collection. The UCI KDD Archive.]]
[10]
T. Joachims. Text categorization with Support Vector Machines: learning with many relevant features. In ECML'98, 1998.]]
[11]
D. Lewis. Naïve (bayes) at forty: The independence assumption in information retrieval. In ECML '98, 1998.]]
[12]
W. Li, J. Han, and J. Pei. CMAR: Accurate and efficient classification based on multiple class-association rules. In ICDM'01, San Jose, 2001.]]
[13]
B. Liu, W. Hsu, and Y. Ma. Integrating classification and association rule mining. In KDD'98, NY, 1998.]]
[14]
B. Liu, Y. Ma, and K. Wong. Improving an association rule based classifier. In PKDD'00, France, 2000.]]
[15]
R. Meo. Theory of dependence values. ACM Transaction On Database Systems, 2000.]]
[16]
J. Quinlan. C4.5: program for classification learning. Morgan Kaufmann, 1992.]]
[17]
J. Rocchio. Relevance feedback in information retrieval. Prentice-Hall, 1971.]]
[18]
A. Savasere, E. Omiecinski, and S. Navathe. Mining for strong negative associations in large database of customer transactions. In ICDE'98, 1998.]]
[19]
F. Sebastiani. Machine learning in automated text categorization. ACM Computing Surveys, 34(1), 2002.]]
[20]
K. Wang, S. Zhou, and Y. He. Growing decision trees on support-less association rules. In KDD '00, 2000.]]
[21]
Y. Yang. Expert network: effective and efficient learning from human decisions in text categorization and retrieval. In ACM SIGIR'94, 1994.]]

Cited By

View all

Index Terms

  1. Associative text categorization exploiting negated words

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SAC '06: Proceedings of the 2006 ACM symposium on Applied computing
    April 2006
    1967 pages
    ISBN:1595931082
    DOI:10.1145/1141277
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 23 April 2006

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. association rules
    2. text classification

    Qualifiers

    • Article

    Conference

    SAC06
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

    Upcoming Conference

    SAC '25
    The 40th ACM/SIGAPP Symposium on Applied Computing
    March 31 - April 4, 2025
    Catania , Italy

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 02 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2012)GAMoNArtificial Intelligence10.1016/j.artint.2012.07.003191-192(61-95)Online publication date: 1-Nov-2012
    • (2012)Automatic Filtering of Valuable Features for Text CategorizationAdvanced Data Mining and Applications10.1007/978-3-642-35527-1_24(284-295)Online publication date: 2012
    • (2010)Classification Inductive Rule Learning with Negated FeaturesAdvanced Data Mining and Applications10.1007/978-3-642-17316-5_12(125-136)Online publication date: 2010
    • (2010)Nonredundant Generalized Rules and Their Impact in ClassificationAdvances in Intelligent Information Systems10.1007/978-3-642-05183-8_1(3-25)Online publication date: 2010
    • (2009)OlexIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2008.20621:8(1118-1132)Online publication date: 1-Aug-2009
    • (2007)Learning rules with negation for text categorizationProceedings of the 2007 ACM symposium on Applied computing10.1145/1244002.1244098(409-416)Online publication date: 11-Mar-2007

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media