| A methodology for comparing classifiers that allow the control of bias |
| Full text |
Pdf
(239 KB)
|
| Source
|
Symposium on Applied Computing
archive
Proceedings of the 2006 ACM symposium on Applied computing
table of contents
Dijon, France
SESSION: Data mining (DM)
table of contents
Pages: 582 - 587
Year of Publication: 2006
ISBN:1-59593-108-2
|
|
Authors
|
|
| Sponsor |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 0, Downloads (12 Months): 31, Citation Count: 0
|
|
|
ABSTRACT
This paper presents False Positive-Critical Classifiers Comparison a new technique for pairwise comparison of classifiers that allow the control of bias. An evaluation of Naïve Bayes, k-Nearest Neighbour and Support Vector Machine classifiers has been carried out on five datasets containing unsolicited and legitimate e-mail messages to confirm the advantage of the technique over Receiver Operating Characteristic curves. The evaluation results suggest that the technique may be useful for choosing the better classifier when the ROC curves do not show comprehensive differences, as well as to prove that the difference between two classifiers is not significant, when ROC suggests that it might be. Spam filtering is a typical application for such a comparison tool, as it requires a classifier to be biased toward negative prediction and to have some upper limit on the rate of false positives. Finally the particular evaluation summary is presented, which confirms that Support Vector Machines out-perform other methods in most cases, while the Naïve Bayes classifier works well in a narrow, but relevant range of false positive rate.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
I. Androutsopoulos, J. Koutsias, G. Paliouras, V. Karkaletsis, G. Sakkis, and C. Spyropoulos. Learning to filter spam email: A comparison of a naive bayesian and a memory based approach. In H. Zaragoza, P. Gallinari, and M. Rajman, editors, Procs of Workshop on Machine Learning and Textual Information Access, PKDD 2000, pages 1--13, 2000.
|
| |
2
|
A. P. Bradley. The use of the area under the curve in the evaluation of machine learning algorithms. Pattern Recognition, 30(6):1145--1157, 1997.
|
| |
3
|
|
| |
4
|
S. J. Delany and P. Cunningham. An analysis of case-based editing in a spam filtering system. In P. Funk and P. González-Calero, editors, 7th European Conference on Case-Based Reasoning (ECCBR 2004), volume 3155 of LNAI, pages 128--141. Springer, 2004.
|
| |
5
|
|
| |
6
|
H. Drucker, V. Vapnik, and D. Wu. Support vector machines for spam categorization. IEEE Transactions on Neural Networks, 10(5):1048--1054, 1999.
|
| |
7
|
|
| |
8
|
R. Kohavi, B. Becker, and D. Sommerfield. Improving simple bayes. In Procs of the 9th European Conf. on Machine Learning (ECML 97). Springer Verlag, 1997.
|
| |
9
|
T. Niblett. Constructing decision trees in noisy domains. In I. Bratko and N. Lavrac, editors, Progress in Machine Learning, Procs of 2nd European Working Session on Learning (EWSL 87), pages 67--78. Sigma Press, 1987.
|
| |
10
|
M. Sahami, S. Dumais, D. Heckerman, and E. Horvitz. A bayesian approach to filtering junk E-mail. In Learning for Text Categorization: Papers from the 1998 Workshop, Madison, Wisconsin, 1998. AAAI Technical Report WS-98-05.
|
| |
11
|
J. Shawe-Taylor and N. Cristianini. Margin distribution and soft margin, 2000.
|
| |
12
|
J. A. Swets. Measuring the accuracy of diagnostic systems. Science, (240):1285--1293, 1988.
|
|