ABSTRACT
Associative classification is a rule-based approach to classify data relying on association rule mining by discovering associations between a set of features and a class label. Support and confidence are the de-facto "interestingness measures" used for discovering relevant association rules. The support-confidence framework has also been used in most, if not all, associative classifiers. Although support and confidence are appropriate measures for building a strong model in many cases, they are still not the ideal measures and other measures could be better suited.
There are many other rule interestingness measures already used in machine learning, data mining and statistics. This work focuses on using 53 different objective measures for associative classification rules. A wide range of UCI datasets are used to study the impact of different "inter-estingness measures" on different phases of associative classifiers based on the number of rules generated and the accuracy obtained. The results show that there are interesting-ness measures that can significantly reduce the number of rules for almost all datasets while the accuracy of the model is hardly jeopardized or even improved. However, no single measure can be introduced as an obvious winner.
- J. M. Adamo. Data mining for association rules and sequential patterns: sequential and parallel algorithms. Springer-Verlag, 2001. Google ScholarDigital Library
- C. C. Aggarwal and PS. Yu. A new framework for itemset generation. In PODS: Proceedings of the 17th symposium on Principles of Database Systems, pages 18--24. ACM, 1998. Google ScholarDigital Library
- R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In The International Conference on Very Large Databases, pages 487--499, 1994. Google ScholarDigital Library
- M. L. Antonie and O. R. Zaïane. Text document categorization by term association. In Proc. of the IEEE 2002 International Conference on Data Mining, pages 19--26, Maebashi City, Japan, 2002. Google ScholarDigital Library
- B. Arunasalam and S. Chawla. Cccs: A top-down associative classifier for imbalanced class distribution. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge Discovery and Data Mining, pages 517--522. ACM, 2006. Google ScholarDigital Library
- P. J. Azevedo and A. M. Jorge. Comparing rule measures for predictive association rules. In ECML '07: Proceedings of the 18th European conference on Machine Learning, pages 510--517, Berlin, Heidelberg, 2007. Springer-Verlag. Google ScholarDigital Library
- S. Brin, R. Motwani, and C. Silverstein. Beyond market baskets: Generalizing association rules to correlations. In SIGMOD '97: Proceedings of the 1997 ACM SIGMOD international conference on Management of data, pages 265--276. ACM, 1997. Google ScholarDigital Library
- W. Buntine. Graphical models for discovering knowledge. Advances in knowledge discovery and data mining, pages 59--82, 1996. Google ScholarDigital Library
- S. Chiusano and P. Garza. Selection of high quality rules in associative classification. In C. Zhang Y. Zhao and L. Cao, editors, Post-Mining of Association RUles: Techniques for Effective Knowledge Extraction. Information Science Reference, Hershey, NY, USA, 2009. Google ScholarDigital Library
- U. M. Fayyad and K. B. Irani. Multi-interval discretization of continuous-valued attributes for classification learning. In Proceedings of the Thirteenth International Joint Conference on AI, pages 1022--1027, 1993.Google Scholar
- M. Gavrilov, D. Anguelov, P. Indyk, and R. Motwani. Mining the stock market: Which measure is best? In proceedings of the 6th ACM Int'l Conference on Knowledge Discovery and Data Mining, pages 487--496, 2000. Google ScholarDigital Library
- L. Geng and H. J. Hamilton. Interestingness measures for data mining: A survey. ACM Comput. Surv., 38(3):9, 2006. Google ScholarDigital Library
- M. Hahsler and K. Hornik. New probabilistic interest measures for association rules. Intell. Data Anal., 11(5):437--455, 2007. Google ScholarDigital Library
- J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. In SIGMOD '00: Proceedings of the 2000 ACM SIGMOD international conference on Management of data, pages 1--12. ACM, 2000. Google ScholarDigital Library
- R. J. Hilderman, H. J. Hamilton, and B. Barber. Ranking the interestingness of summaries from data mining systems. In In Proceedings of the 12th Annual Florida Artificial Intelligence Research Symposium (FLAIRS'99, pages 100--106, 1999. Google ScholarDigital Library
- Mojdeh Jalali-Heravi. A study on interestingness measures for associative classifiers. Master's thesis, University of Alberta, 2009.Google Scholar
- Y. Kodratoff. Comparing machine learning and knowledge discovery in databases: An application to knowledge discovery in texts. In In: ECCAI summer, pages 1--21. Springer, 2000.Google Scholar
- I. Kononenko. On biases in estimating multi-valued attributes. In in Proc. 14th Int. Joint Conf Artificial Intelligence, pages 1034--1040. Morgan Kaufmann, 1995. Google ScholarDigital Library
- S. Lallich, O. Teytaud, and E. Prudhomme. Association rule interestingness: Measure and statistical validation. In Quality Measures in Data Mining, pages 251--275. Springer, 2007.Google ScholarCross Ref
- Y. Lan, D. Janssens, G. Chen, and G. Wets. Improving associative classification by incorporating novel interestingness measures. In ICEBE '05: Proceedings of the IEEE International Conference on e-Business Engineering, pages 282--288, Washington, DC, USA, 2005. IEEE Computer Society. Google ScholarDigital Library
- P. Lenca, P. Meyer, B. Vaillant, and S. Lallich. A multicriteria decision aid for interestingness measure selection. Technical Report LUSSI-TR-2004-01-EN, LUSSI Department, GET/ENST, France, 2004.Google Scholar
- P. Lenca, B. Vaillant, P. Meyer, and S. Lallich. Association rule interestingness measures: Experimental and theoretical studies. In Quality Measures in Data Mining, pages 51--76. Springer, 2007.Google ScholarCross Ref
- W. Li, J. Han, and J. Pei. CMAR: Accurate and efficient classification based on multiple class-association rules. In IEEE International Conference on Data Mining (ICDM'01), San Jose, California, November 29--December 2 2001. Google ScholarDigital Library
- B. Liu, W. Hsu, and Y. Ma. Integrating classification and association rule mining. In KDD, pages 80--86, 1998.Google ScholarDigital Library
- J. A. Major and J. J. Mangano. Selecting among rules induced from a hurricane database. Journal of Intelligent Information systems, 4:39--52, 1995.Google ScholarCross Ref
- K. McGarry. A survey of interestingness measures for knowledge discovery. Knowl. Eng. Rev., 20(1):39--61, 2005. Google ScholarDigital Library
- K. McGarry and J. Malone. Analysis of rules discovered by the data mining process. In Applications and Science in Soft Computing Series: Advances in Soft Computing., pages 219--224. Springer, 2004.Google Scholar
- M. Ohsaki, S. Kitaguchi, H. Yokoi, and T. Yamaguchi. Investigation of rule interestingness in medical data mining. In Active Mining, pages 174--189, 2003. Google ScholarDigital Library
- G. Piatetsky-Shapiro. Discovery, analysis, and presentation of strong rules. In G. Piatetsky-Shapiro and W. J. Frawley, editors, Knowledge Discovery in Databases. AAAI/MIT Press, Cambridge, MA, 1991.Google Scholar
- W. Romão, A. Freitas, and I. Gimenes. Discovering interesting knowledge from a science and technology database with a genetic algorithm. Appl. Soft Comput., 4(2):121--137, 2004.Google ScholarCross Ref
- P. Tan and V. Kumar. Interestingness measures for association patterns: A perspective. Technical Report 00-036, Department of Computer Sciences, University of Minnesota, 2000.Google Scholar
- P. Tan, V. Kumar, and J. Srivastava. Selecting the right interestingness measure for association patterns. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge Discovery and Data Mining, pages 32--41. ACM, 2002. Google ScholarDigital Library
- P. Tan, V. Kumar, and J. Srivastava. Selecting the right objective measure for association analysis. Inf. Syst., 29(4):293--313, 2004. Google ScholarDigital Library
- F. Verhein and S. Chawla. Using significant positively associated and relatively class correlated rules for associative classification of imbalanced datasets. In Proceedings of the Seventh IEEE International Conference on Data Mining (ICDM '07), pages 679--684, Los Alamitos, 2007. IEEE Computer Society Press. Google ScholarDigital Library
- K. Y. Yeung and W. L. Ruzzo. Principal component analysis for clustering gene expression data. Bioinformatics, 17(9):763--774, 2001.Google ScholarCross Ref
- M. Zaki, S. Parthasarathy, M. Ogihara, and W. Li. New algorithms for fast discovery of association rules. In Proc. 3rd Int. Conf. on Knowledge Discovery and Data Mining, pages 283--296.Google ScholarDigital Library
- Y. Zhao and G. Karypis. Criterion functions for document clustering: Experiments and analysis. Technical report, Department of Computer Science, University of Minnesota, 2002.Google Scholar
Index Terms
- A study on interestingness measures for associative classifiers
Recommendations
Interestingness measures for data mining: A survey
Interestingness measures play an important role in data mining, regardless of the kind of patterns being mined. These measures are intended for selecting and ranking patterns according to their potential interest to the user. Good measures also allow ...
Interestingness measures for association rules: Combination between lattice and hash tables
There are many methods which have been developed for improving the time of mining frequent itemsets. However, the time for generating association rules were not put in deep research. In reality, if a database contains many frequent itemsets (from ...
A New Interestingness Measure of Association Rules
WGEC '08: Proceedings of the 2008 Second International Conference on Genetic and Evolutionary ComputingDiscovering association rules is one of the most important tasks in data mining. The classical model of association rules mining is support-confidence, the interestingness measure of which is the confidence measure. The classical Interestingness measure ...
Comments