ABSTRACT
Multi-label classification (MLC) is the task of assigning multiple class labels to an object based on the features that describe the object. One of the most effective MLC methods is known as Classifier Chains (CC). This approach consists in training q binary classifiers linked in a chain, y1 → y2 → ... → yq, with each responsible for classifying a specific label in {l1, l2, ..., lq}. The chaining mechanism allows each individual classifier to incorporate the predictions of the previous ones as additional information at classification time. Thus, possible correlations among labels can be automatically exploited. Nevertheless, CC suffers from two important drawbacks: (i) the label ordering is decided at random, although it usually has a strong effect on predictive accuracy; (ii) all labels are inserted into the chain, although some of them might carry irrelevant information to discriminate the others. In this paper we tackle both problems at once, by proposing a novel genetic algorithm capable of searching for a single optimized label ordering, while at the same time taking into consideration the utilization of partial chains. Experiments on benchmark datasets demonstrate that our approach is able to produce models that are both simpler and more accurate.
- M. P. Basgalupp, R. C. Barros, A. C. P. L. F. de Carvalho, A. A. Freitas, and D. D. Ruiz. Legal-tree: A lexicographic multi-objective genetic algorithm for decision tree induction. In Proc. of the 2009 ACM Symposium on Applied Computing, SAC'09, pages 1085--1090, March 2009. Google ScholarDigital Library
- L. Bull, E. Bernad o-Mansilla, and J. Holmes. Learning classier systems in data mining: An introduction. In Learning Classier Systems in Data Mining, volume 125 of Studies in Computational Intelligence, pages 1--15. Springer, 2008.Google ScholarCross Ref
- P. N. da Silva, E. C. Gon calves, A. Plastino, and A. A. Freitas. Distinct chains for different instances: An effective strategy for multi-label classier chains. In Proc. of the 2014 ECML/PKDD Conf., LCNS 8725, pages 453--468, September 2014.Google ScholarDigital Library
- P. Domingos. The role of occam's razor in knowledge discovery. Data Mining and Knowledge Discovery, 3(4):409--425, December 1999. Google ScholarDigital Library
- A. E. Eiben and J. E. Smith. Introduction to Evolutionary Computing. SpringerVerlag, 2003. Google ScholarCross Ref
- A. A. Freitas. Data Mining and Knowledge Discovery with Evolutionary Algorithms. Springer, 2002. Google ScholarDigital Library
- A. A. Freitas. A critical review of multi-objective optimization in data mining: A position paper. SIGKDD Explor. Newsl., 6(2):77--86, December 2004. Google ScholarDigital Library
- A. A. Freitas. A review of evolutionary algorithms for data mining. In Data Mining and Knowledge Discovery Handbook, pages 371--400. Springer US, July 2010.Google Scholar
- A. A. Freitas. Comprehensible classication models: A position paper. SIGKDD Explor. Newsl., 15(1):1--10,June 2013. Google ScholarDigital Library
- E. C. Gon calves. A human-centered approach for mining hybrid-dimensional association rules. In Proc. of the 2014 Conf. on Information Fusion, FUSION'14, pages 1--8, July 2014.Google Scholar
- E. C. Gon calves, A. Plastino, and A. A. Freitas. A genetic algorithm for optimizing the label ordering in multi-label classier chains. In Proc. of the 25th Int'l Conf. on Tools with Articial Intelligence, ICTAI'13, pages 469--476, November 2013. Google ScholarDigital Library
- I. Gon calves and S. Silva. Balancing learning and overtting in genetic programming with interleaved sampling of training data. In Proc. of EuroGP 2013, pages 73--84, April 2013. Google ScholarDigital Library
- Y. Guo and S. Gu. Multi-label classication using conditional dependency networks. In Proc. of the 22nd Int'l Joint Conf. on Articial Intelligence, IJCAI'11, pages 1300--1305, July 2011. Google ScholarDigital Library
- M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. Witten. The weka data mining software: an update. ACM SIGKDD Explor.,11(1):10--18, November 2009. Google ScholarDigital Library
- N. Japkowicz and M. Shah. Evaluating Learning Algorithms: A Classication Perspective. Cambridge University Press, New York, 2011. Google ScholarDigital Library
- A. Kumar, S. Vembu, A. K. Menon, and C. Elkan. Beam search algorithms for multilabel learning. Machine Learning, 92(1):65--89, July 2013. Google ScholarDigital Library
- P. Larra~naga, C. M. H. Kuijpers, R. H. Murga, I. Inza, and S. Dizdarevic. Genetic algorithms for the travelling salesman problem: A review of representations and operators. Articial Intelligence Review, 13(2):129--170, April 1999. Google ScholarDigital Library
- E. Levy, O. E. David, and N. S. Netanyahu. Genetic algorithms and deep learning for automatic painter classication. In Proc. of the 2014 Genetic and Evolutionary Computation Conference, GECCO'14, pages 1143--1150, July 2014. Google ScholarDigital Library
- P. Machado, J. Tavares, F. B. Pereira, and E. Costa. Vehicle routing problem: Doing it the evolutionary way. In Proc. of the 2002 Genetic and Evolutionary Computation Conference, GECCO'02, pages 690--696, July 2002. Google ScholarDigital Library
- G. Madjarov, D. Kocev, D. Gjorgjevikj, and S. D zeroski. An extensive experimental comparison of methods for multi-label learning. Pattern Recognition, 45(9):3084--3104, September 2012. Google ScholarDigital Library
- J. Read, L. Martino, and D. Luengo. Efficient monte carlo optimization for multi-label classier chains. In Proc. of the 38th Int'l Conf. on Acoustics, Speech, and Signal Processing, ICASSP'13, May 2013.Google Scholar
- J. Read, B. Pfahringer, G. Holmes, and E. Frank. Classier chains for multi-label classication. Machine Learning, 85(3):333--359, December 2011. Google ScholarDigital Library
- A. E. Romero and L. M. de Campos. A probabilistic methodology for multilabel classication. Intelligent Data Analysis, 18(5):911--926, September 2014. Google ScholarDigital Library
- L. E. Sucar, C. Bielza, E. F. Morales, P. H. Leal, J. H. Zaragoza, and P. Larra~naga. Multi-label classication with bayesian network-based chain classiers. Pattern Recogn. Lett., 41:14--22, May 2014. Google ScholarDigital Library
- G. Tsoumakas, I. Katakis, and I. Vlahavas. Mining multi-label data. In Data Mining and Knowledge Discovery Handbook, pages 667--685. Springer, 2010.Google Scholar
- G. Tsoumakas, E. S. Xious, J. Vilcek, and I. P. Vlahavas. Mulan: A java library for multi-label learning. Journal of Machine Learning Research, 12:2411--2414, July 2011. Google ScholarDigital Library
- M. J. Zaki and W. Meira Jr. Data Mining and Analysis: Fundamental Concepts and Algorithms. Cambridge University Press, New York, 2014. Google ScholarCross Ref
- M.-L. Zhang and K. Zhang. Multi-label learning by exploiting label dependency. In Proc. of the 16th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining, KDD'10, pages 999--1008, July 2010. Google ScholarDigital Library
- M.-L. Zhang and Z.-H. Zhoug. A review on multi-label learning algorithms. IEEE Trans. on Knowledge and Data Engineering, 26(8):1819--1837, August 2014.Google ScholarCross Ref
Index Terms
- Simpler is Better: a Novel Genetic Algorithm to Induce Compact Multi-label Chain Classifiers
Recommendations
A Genetic Algorithm for Optimizing the Label Ordering in Multi-label Classifier Chains
ICTAI '13: Proceedings of the 2013 IEEE 25th International Conference on Tools with Artificial IntelligenceFirst proposed in 2009, the classifier chains model (CC) has become one of the most influential algorithms for multi-label classification. It is distinguished by its simple and effective approach to exploit label dependencies. The CC method involves the ...
Classifier chains for positive unlabelled multi-label learning
AbstractIn traditional multi-label setting it is assumed that all relevant labels are assigned to the given instance. In positive unlabelled setting, only some of relevant labels are assigned. The appearance of a label means that the instance ...
Conditional entropy based classifier chains for multi-label classification
AbstractIn many real-world problems, data samples are simultaneously associated with multiple labels, instead of a single label. Multi-label classification deals with such problems, and has extensive applications in many fields. Among the many ...
Comments