ACM Home Page
Please provide us with feedback. Feedback
A methodology for comparing classifiers that allow the control of bias
Full text PdfPdf (239 KB)
Source Symposium on Applied Computing archive
Proceedings of the 2006 ACM symposium on Applied computing table of contents
Dijon, France
SESSION: Data mining (DM) table of contents
Pages: 582 - 587  
Year of Publication: 2006
ISBN:1-59593-108-2
Authors
Anton Zamolotskikh  University of Dublin, Dublin, Ireland
Sarah Jane Delany  Dublin Institute of Technology, Dublin, Ireland
Pádraig Cunningham  University of Dublin, Dublin, Ireland
Sponsor
SIGAPP: ACM Special Interest Group on Applied Computing
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 0,   Downloads (12 Months): 31,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1141277.1141411
What is a DOI?

ABSTRACT

This paper presents False Positive-Critical Classifiers Comparison a new technique for pairwise comparison of classifiers that allow the control of bias. An evaluation of Naïve Bayes, k-Nearest Neighbour and Support Vector Machine classifiers has been carried out on five datasets containing unsolicited and legitimate e-mail messages to confirm the advantage of the technique over Receiver Operating Characteristic curves. The evaluation results suggest that the technique may be useful for choosing the better classifier when the ROC curves do not show comprehensive differences, as well as to prove that the difference between two classifiers is not significant, when ROC suggests that it might be. Spam filtering is a typical application for such a comparison tool, as it requires a classifier to be biased toward negative prediction and to have some upper limit on the rate of false positives. Finally the particular evaluation summary is presented, which confirms that Support Vector Machines out-perform other methods in most cases, while the Naïve Bayes classifier works well in a narrow, but relevant range of false positive rate.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
I. Androutsopoulos, J. Koutsias, G. Paliouras, V. Karkaletsis, G. Sakkis, and C. Spyropoulos. Learning to filter spam email: A comparison of a naive bayesian and a memory based approach. In H. Zaragoza, P. Gallinari, and M. Rajman, editors, Procs of Workshop on Machine Learning and Textual Information Access, PKDD 2000, pages 1--13, 2000.
 
2
A. P. Bradley. The use of the area under the curve in the evaluation of machine learning algorithms. Pattern Recognition, 30(6):1145--1157, 1997.
 
3
 
4
S. J. Delany and P. Cunningham. An analysis of case-based editing in a spam filtering system. In P. Funk and P. González-Calero, editors, 7th European Conference on Case-Based Reasoning (ECCBR 2004), volume 3155 of LNAI, pages 128--141. Springer, 2004.
 
5
 
6
H. Drucker, V. Vapnik, and D. Wu. Support vector machines for spam categorization. IEEE Transactions on Neural Networks, 10(5):1048--1054, 1999.
 
7
 
8
R. Kohavi, B. Becker, and D. Sommerfield. Improving simple bayes. In Procs of the 9th European Conf. on Machine Learning (ECML 97). Springer Verlag, 1997.
 
9
T. Niblett. Constructing decision trees in noisy domains. In I. Bratko and N. Lavrac, editors, Progress in Machine Learning, Procs of 2nd European Working Session on Learning (EWSL 87), pages 67--78. Sigma Press, 1987.
 
10
M. Sahami, S. Dumais, D. Heckerman, and E. Horvitz. A bayesian approach to filtering junk E-mail. In Learning for Text Categorization: Papers from the 1998 Workshop, Madison, Wisconsin, 1998. AAAI Technical Report WS-98-05.
 
11
J. Shawe-Taylor and N. Cristianini. Margin distribution and soft margin, 2000.
 
12
J. A. Swets. Measuring the accuracy of diagnostic systems. Science, (240):1285--1293, 1988.

Collaborative Colleagues:
Anton Zamolotskikh: colleagues
Sarah Jane Delany: colleagues
Pádraig Cunningham: colleagues