research-article

Genetic rule extraction optimizing brier score

Authors:

Lars NiklassonAuthors Info & Claims

GECCO '10: Proceedings of the 12th annual conference on Genetic and evolutionary computation

Pages 1007 - 1014

https://doi.org/10.1145/1830483.1830668

Published: 07 July 2010 Publication History

Abstract

Most highly accurate predictive modeling techniques produce opaque models. When comprehensible models are required, rule extraction is sometimes used to generate a transparent model, based on the opaque. Naturally, the extracted model should be as similar as possible to the opaque. This criterion, called fidelity, is therefore a key part of the optimization function in most rule extracting algorithms. To the best of our knowledge, all existing rule extraction algorithms targeting fidelity use 0/1 fidelity, i.e., maximize the number of identical classifications. In this paper, we suggests and evaluate a rule extraction algorithm utilizing a more informed fidelity criterion. More specifically, the novel algorithms, which is based on genetic programming, minimizes the difference in probability estimates between the extracted and the opaque models, by using the generalized Brier score as fitness function. Experimental results from 26 UCI data sets show that the suggested algorithm obtained considerably higher accuracy and significantly better AUC than both the exact same rule extraction algorithm maximizing 0/1 fidelity, and the standard tree inducer J48. Somewhat surprisingly, rule extraction using the more informed fidelity metric normally resulted in less complex models, making sure that the improved predictive performance was not achieved on the expense of comprehensibility.

References

[1]

R. Andrews, J. Diederich, and A. B. Tickle. Survey and critique of techniques for extracting rules from trained artificial neural networks. Knowl.-Based Syst., 8(6):373--389, 1995.

Digital Library

[2]

A. Asuncion and D. J. Newman. UCI machine learning repository, 2007.

[3]

A. P. Bradley. The use of the area under the roc curve in the evaluation of machine learning algorithms. Pattern Recognition, 30(7):1145--1159, 1997.

Digital Library

[4]

L. Breiman. Random forests. Machine Learning, 45(1):5--32, October 2001.

Digital Library

[5]

L. Breiman, J. Friedman, C. J. Stone, and R. A. Olshen. Classification and Regression Trees. Chapman & Hall/CRC, January 1984.

[6]

G. Brier. Verification of forecasts expressed in terms of probability. Monthly Weather Review, 78(1):1--3, 1950.

[7]

M. Craven and J. Shavlik. Rule extraction: Where do we go from here? University of Wisconsin Machine Learning Research Group working Paper 99--1.

[8]

M. W. Craven and J. W. Shavlik. Extracting tree-structured representations of trained networks. In Advances in Neural Information Processing Systems, pages 24--30. MIT Press, 1996.

[9]

M. W. Craven and J. W. Shavlik. Using neural networks for data mining. Future Gener. Comput. Syst., 13(2--3):211--229, 1997.

Digital Library

[10]

J. Demšar. Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res., 7:1--30, 2006.

Digital Library

[11]

T. Fawcett. Using rule sets to maximize roc performance. In Proceedings of the 2001 IEEE International Conference on Data Mining, ICDM'01, pages 131--138, Washington, DC, USA, 2001. IEEE Computer Society.

Digital Library

[12]

M. Friedman. The use of ranks to avoid the assumption of normality implicit in the analysis of variance. Journal of American Statistical Association, 32:675--701, 1937.

[13]

L. Fu. Rule learning by searching on adapted nets. In AAAI, pages 590--595, 1991.

[14]

J. Huysmans, B. Baesens, and J. Vanthienen. Using rule extraction to improve the comprehensibility of predictive models. FETEW Research Report KBI 0612, K. U. Leuven, 2006.

[15]

U. Johansson. Obtaining Accurate and Comprehensible Data Mining Models: An Evolutionary Approach. PhD-thesis. Institute of Technology, Linköping University, 2007.

[16]

U. Johansson, R. König, and L. Niklasson. Rule extraction from trained neural networks using genetic programming. In 13thInternational Conference on Artificial Neural Networks, supplementary proceedings, pages 13--16, 2003.

[17]

U. Johansson, R. König, and L. Niklasson. Inconsistency - friend or foe. In IJCNN, pages 1383--1388, 2007.

[18]

U. Markowska-Kaczmar and M. Chumieja. Discovering the mysteries of neural networks. Int. J. Hybrid Intell. Syst., 1(3,4):153--163, 2004.

Digital Library

[19]

P. B. Nemenyi. Distribution-free multiple comparisons. PhD-thesis. Princeton University, 1963.

[20]

J. R. Quinlan. C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc., 1993.

Digital Library

[21]

H. L. Rudy, H. Lu, R. Setiono, and H. Liu. Neurorule: A connectionist approach to data mining. pages 478--489, 1995.

Digital Library

[22]

S. Thrun, G. Tesauro, D. Touretzky, and T. Leen. Extracting rules from artificial neural networks with distributed representations. In Advances in Neural Information Processing Systems 7, pages 505--512. MIT Press, 1995.

[23]

G. G. Towell and J. W. Shavlik. The extraction of refined rules from knowledge-based neural networks. In Machine Learning, pages 71--101, 1993.

Digital Library

[24]

I. H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems). Morgan Kaufmann, June 2005.

Digital Library

Cited By

Vieira CDigiampietri L(2022)Machine Learning post-hoc interpretability: a systematic mapping studyProceedings of the XVIII Brazilian Symposium on Information Systems10.1145/3535511.3535512(1-8)Online publication date: 16-May-2022
https://dl.acm.org/doi/10.1145/3535511.3535512
Chicco DWarrens MJurman G(2021)The Matthews Correlation Coefficient (MCC) is More Informative Than Cohen’s Kappa and Brier Score in Binary Classification AssessmentIEEE Access10.1109/ACCESS.2021.30840509(78368-78381)Online publication date: 2021
https://doi.org/10.1109/ACCESS.2021.3084050
Martin JCrane-Droesch ALapite FPuhl JKmiec TSilvestri JUngar LKinosian BHimes BHubbard RDiamond JAhya VSims MHalpern SWeissman G(2021)Development and validation of a prediction model for actionable aspects of frailty in the text of clinicians’ encounter notesJournal of the American Medical Informatics Association10.1093/jamia/ocab248Online publication date: 13-Nov-2021
https://doi.org/10.1093/jamia/ocab248
Show More Cited By

Index Terms

Genetic rule extraction optimizing brier score
1. Computing methodologies
  1. Machine learning

Recommendations

Neural Network Rule Extraction by Using the Genetic Programming and Its Applications to Explanatory Classifications

This paper deals with the use of neural network rule extraction techniques based on the Genetic Programming (GP) to build intelligent and explanatory evaluation systems. Recent development in algorithms that extract rules from trained neural networks ...
Feature selection, rule extraction, and score model: making ATC competitive with SVM
RSKT'06: Proceedings of the First international conference on Rough Sets and Knowledge Technology

Many studies have shown that association-based classification can achieve higher accuracy than traditional rule based schemes. However, when applied to text classification domain, the high dimensionality, the diversity of text data sets and the class ...
Choosing a Strictly Proper Scoring Rule

<P>Strictly proper scoring rules, including the Brier score and the logarithmic score, are standard metrics by which probability forecasters are assessed and compared. Researchers often find that one's choice of strictly proper scoring rule has minimal ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

GECCO '10: Proceedings of the 12th annual conference on Genetic and evolutionary computation

July 2010

1520 pages

ISBN:9781450300728

DOI:10.1145/1830483

General Chair:
Martin Pelikan
University of Missouri, USA
,
Program Chair:
Jürgen Branke
University of Warwick, Coventry, UK

Copyright © 2010 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGEVO: ACM Special Interest Group on Genetic and Evolutionary Computation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 July 2010

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

GECCO '10

Sponsor:

SIGEVO

GECCO '10: Genetic and Evolutionary Computation Conference

July 7 - 11, 2010

Oregon, Portland, USA

Acceptance Rates

Overall Acceptance Rate 1,669 of 4,410 submissions, 38%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
131
Total Downloads

Downloads (Last 12 months)13
Downloads (Last 6 weeks)0

Reflects downloads up to 20 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Vieira CDigiampietri L(2022)Machine Learning post-hoc interpretability: a systematic mapping studyProceedings of the XVIII Brazilian Symposium on Information Systems10.1145/3535511.3535512(1-8)Online publication date: 16-May-2022
https://dl.acm.org/doi/10.1145/3535511.3535512
Chicco DWarrens MJurman G(2021)The Matthews Correlation Coefficient (MCC) is More Informative Than Cohen’s Kappa and Brier Score in Binary Classification AssessmentIEEE Access10.1109/ACCESS.2021.30840509(78368-78381)Online publication date: 2021
https://doi.org/10.1109/ACCESS.2021.3084050
Martin JCrane-Droesch ALapite FPuhl JKmiec TSilvestri JUngar LKinosian BHimes BHubbard RDiamond JAhya VSims MHalpern SWeissman G(2021)Development and validation of a prediction model for actionable aspects of frailty in the text of clinicians’ encounter notesJournal of the American Medical Informatics Association10.1093/jamia/ocab248Online publication date: 13-Nov-2021
https://doi.org/10.1093/jamia/ocab248
Stoean RStoean C(2019)Modeling medical decision making by support vector machines, explaining by rules of evolutionary algorithms with feature selectionExpert Systems with Applications: An International Journal10.1016/j.eswa.2012.11.00740:7(2677-2686)Online publication date: 10-Dec-2019
https://dl.acm.org/doi/10.1016/j.eswa.2012.11.007
Abbaszadeh Afshar FAyoubi SJafari A(2018)The extrapolation of soil great groups using multinomial logistic regression at regional scale in arid regions of IranGeoderma10.1016/j.geoderma.2017.11.030315(36-48)Online publication date: Apr-2018
https://doi.org/10.1016/j.geoderma.2017.11.030
Stoean CStoean R(2014)Post-evolution of variable-length class prototypes to unlock decision making within support vector machinesApplied Soft Computing10.1016/j.asoc.2014.09.01725:C(159-173)Online publication date: 1-Dec-2014
https://dl.acm.org/doi/10.1016/j.asoc.2014.09.017
Sonstrod CJohansson UKonig R(2011)Evolving accurate and comprehensible classification rules2011 IEEE Congress of Evolutionary Computation (CEC)10.1109/CEC.2011.5949784(1436-1443)Online publication date: Jun-2011
https://doi.org/10.1109/CEC.2011.5949784

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents