skip to main content
10.1145/1774088.1774306acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
research-article

A study on interestingness measures for associative classifiers

Published:22 March 2010Publication History

ABSTRACT

Associative classification is a rule-based approach to classify data relying on association rule mining by discovering associations between a set of features and a class label. Support and confidence are the de-facto "interestingness measures" used for discovering relevant association rules. The support-confidence framework has also been used in most, if not all, associative classifiers. Although support and confidence are appropriate measures for building a strong model in many cases, they are still not the ideal measures and other measures could be better suited.

There are many other rule interestingness measures already used in machine learning, data mining and statistics. This work focuses on using 53 different objective measures for associative classification rules. A wide range of UCI datasets are used to study the impact of different "inter-estingness measures" on different phases of associative classifiers based on the number of rules generated and the accuracy obtained. The results show that there are interesting-ness measures that can significantly reduce the number of rules for almost all datasets while the accuracy of the model is hardly jeopardized or even improved. However, no single measure can be introduced as an obvious winner.

References

  1. J. M. Adamo. Data mining for association rules and sequential patterns: sequential and parallel algorithms. Springer-Verlag, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. C. C. Aggarwal and PS. Yu. A new framework for itemset generation. In PODS: Proceedings of the 17th symposium on Principles of Database Systems, pages 18--24. ACM, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In The International Conference on Very Large Databases, pages 487--499, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. L. Antonie and O. R. Zaïane. Text document categorization by term association. In Proc. of the IEEE 2002 International Conference on Data Mining, pages 19--26, Maebashi City, Japan, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. B. Arunasalam and S. Chawla. Cccs: A top-down associative classifier for imbalanced class distribution. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge Discovery and Data Mining, pages 517--522. ACM, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. P. J. Azevedo and A. M. Jorge. Comparing rule measures for predictive association rules. In ECML '07: Proceedings of the 18th European conference on Machine Learning, pages 510--517, Berlin, Heidelberg, 2007. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. Brin, R. Motwani, and C. Silverstein. Beyond market baskets: Generalizing association rules to correlations. In SIGMOD '97: Proceedings of the 1997 ACM SIGMOD international conference on Management of data, pages 265--276. ACM, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. W. Buntine. Graphical models for discovering knowledge. Advances in knowledge discovery and data mining, pages 59--82, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. S. Chiusano and P. Garza. Selection of high quality rules in associative classification. In C. Zhang Y. Zhao and L. Cao, editors, Post-Mining of Association RUles: Techniques for Effective Knowledge Extraction. Information Science Reference, Hershey, NY, USA, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. U. M. Fayyad and K. B. Irani. Multi-interval discretization of continuous-valued attributes for classification learning. In Proceedings of the Thirteenth International Joint Conference on AI, pages 1022--1027, 1993.Google ScholarGoogle Scholar
  11. M. Gavrilov, D. Anguelov, P. Indyk, and R. Motwani. Mining the stock market: Which measure is best? In proceedings of the 6th ACM Int'l Conference on Knowledge Discovery and Data Mining, pages 487--496, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. L. Geng and H. J. Hamilton. Interestingness measures for data mining: A survey. ACM Comput. Surv., 38(3):9, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. Hahsler and K. Hornik. New probabilistic interest measures for association rules. Intell. Data Anal., 11(5):437--455, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. In SIGMOD '00: Proceedings of the 2000 ACM SIGMOD international conference on Management of data, pages 1--12. ACM, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. R. J. Hilderman, H. J. Hamilton, and B. Barber. Ranking the interestingness of summaries from data mining systems. In In Proceedings of the 12th Annual Florida Artificial Intelligence Research Symposium (FLAIRS'99, pages 100--106, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Mojdeh Jalali-Heravi. A study on interestingness measures for associative classifiers. Master's thesis, University of Alberta, 2009.Google ScholarGoogle Scholar
  17. Y. Kodratoff. Comparing machine learning and knowledge discovery in databases: An application to knowledge discovery in texts. In In: ECCAI summer, pages 1--21. Springer, 2000.Google ScholarGoogle Scholar
  18. I. Kononenko. On biases in estimating multi-valued attributes. In in Proc. 14th Int. Joint Conf Artificial Intelligence, pages 1034--1040. Morgan Kaufmann, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S. Lallich, O. Teytaud, and E. Prudhomme. Association rule interestingness: Measure and statistical validation. In Quality Measures in Data Mining, pages 251--275. Springer, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  20. Y. Lan, D. Janssens, G. Chen, and G. Wets. Improving associative classification by incorporating novel interestingness measures. In ICEBE '05: Proceedings of the IEEE International Conference on e-Business Engineering, pages 282--288, Washington, DC, USA, 2005. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. P. Lenca, P. Meyer, B. Vaillant, and S. Lallich. A multicriteria decision aid for interestingness measure selection. Technical Report LUSSI-TR-2004-01-EN, LUSSI Department, GET/ENST, France, 2004.Google ScholarGoogle Scholar
  22. P. Lenca, B. Vaillant, P. Meyer, and S. Lallich. Association rule interestingness measures: Experimental and theoretical studies. In Quality Measures in Data Mining, pages 51--76. Springer, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  23. W. Li, J. Han, and J. Pei. CMAR: Accurate and efficient classification based on multiple class-association rules. In IEEE International Conference on Data Mining (ICDM'01), San Jose, California, November 29--December 2 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. B. Liu, W. Hsu, and Y. Ma. Integrating classification and association rule mining. In KDD, pages 80--86, 1998.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. A. Major and J. J. Mangano. Selecting among rules induced from a hurricane database. Journal of Intelligent Information systems, 4:39--52, 1995.Google ScholarGoogle ScholarCross RefCross Ref
  26. K. McGarry. A survey of interestingness measures for knowledge discovery. Knowl. Eng. Rev., 20(1):39--61, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. K. McGarry and J. Malone. Analysis of rules discovered by the data mining process. In Applications and Science in Soft Computing Series: Advances in Soft Computing., pages 219--224. Springer, 2004.Google ScholarGoogle Scholar
  28. M. Ohsaki, S. Kitaguchi, H. Yokoi, and T. Yamaguchi. Investigation of rule interestingness in medical data mining. In Active Mining, pages 174--189, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. G. Piatetsky-Shapiro. Discovery, analysis, and presentation of strong rules. In G. Piatetsky-Shapiro and W. J. Frawley, editors, Knowledge Discovery in Databases. AAAI/MIT Press, Cambridge, MA, 1991.Google ScholarGoogle Scholar
  30. W. Romão, A. Freitas, and I. Gimenes. Discovering interesting knowledge from a science and technology database with a genetic algorithm. Appl. Soft Comput., 4(2):121--137, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  31. P. Tan and V. Kumar. Interestingness measures for association patterns: A perspective. Technical Report 00-036, Department of Computer Sciences, University of Minnesota, 2000.Google ScholarGoogle Scholar
  32. P. Tan, V. Kumar, and J. Srivastava. Selecting the right interestingness measure for association patterns. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge Discovery and Data Mining, pages 32--41. ACM, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. P. Tan, V. Kumar, and J. Srivastava. Selecting the right objective measure for association analysis. Inf. Syst., 29(4):293--313, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. F. Verhein and S. Chawla. Using significant positively associated and relatively class correlated rules for associative classification of imbalanced datasets. In Proceedings of the Seventh IEEE International Conference on Data Mining (ICDM '07), pages 679--684, Los Alamitos, 2007. IEEE Computer Society Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. K. Y. Yeung and W. L. Ruzzo. Principal component analysis for clustering gene expression data. Bioinformatics, 17(9):763--774, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  36. M. Zaki, S. Parthasarathy, M. Ogihara, and W. Li. New algorithms for fast discovery of association rules. In Proc. 3rd Int. Conf. on Knowledge Discovery and Data Mining, pages 283--296.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Y. Zhao and G. Karypis. Criterion functions for document clustering: Experiments and analysis. Technical report, Department of Computer Science, University of Minnesota, 2002.Google ScholarGoogle Scholar

Index Terms

  1. A study on interestingness measures for associative classifiers

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SAC '10: Proceedings of the 2010 ACM Symposium on Applied Computing
      March 2010
      2712 pages
      ISBN:9781605586397
      DOI:10.1145/1774088

      Copyright © 2010 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 22 March 2010

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      SAC '10 Paper Acceptance Rate364of1,353submissions,27%Overall Acceptance Rate1,650of6,669submissions,25%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader