ABSTRACT
Naive Bayes and logistic regression perform well in different regimes. While the former is a very simple generative model which is efficient to train and performs well empirically in many applications,the latter is a discriminative model which often achieves better accuracy and can be shown to outperform naive Bayes asymptotically. In this paper, we propose a novel hybrid model, partitioned logistic regression, which has several advantages over both naive Bayes and logistic regression. This model separates the original feature space into several disjoint feature groups. Individual models on these groups of features are learned using logistic regression and their predictions are combined using the naive Bayes principle to produce a robust final estimation. We show that our model is better both theoretically and empirically. In addition, when applying it in a practical application, email spam filtering, it improves the normalized AUC score at 10% false-positive rate by 28.8% and 23.6% compared to naive Bayes and logistic regression, when using the exact same training examples.
- I. Androutsopoulos, J. Koutsias, K. V. Chandrinos, and C. D.Spyropoulos. An experimental comparison of naive bayesian and keyword-based anti-spam filtering with personal e--mail messages. In SIGIR-2000, pages 160--167, 2000. Google ScholarDigital Library
- P. N. Bennett. Using asymmetric distributions to improve text classifier probability estimates. In SIGIR-2003, 2003. Google ScholarDigital Library
- P. N. Bennett. Building Reliable Metaclassifiers for Text Learning. PhD thesis, Carnegie Mellon University, 2006. Google ScholarDigital Library
- S. Bickel and T. Scheffer. Dirichlet-enhanced spam filtering based on biased samples. In Advances in Neural Information Processing Systems 19 (NIPS--2006), pages 161--168, 2007.Google Scholar
- G. Cormack. TREC 2006 spam track overview. In Proceedings of TREC-2006, 2006.Google Scholar
- G. Cormack and T. Lynam. TREC 2005 spam track overview. In Proceedings of TREC-2005, 2005.Google Scholar
- T. G. Dietterich. Approximate statistical test for comparing supervised classification learning algorithms. Neural Computation, 10(7):1895--1923, 1998. Google ScholarDigital Library
- T. G. Dietterich. Ensemble methods in machine learning.Lecture Notes in Computer Science, 1857:1--15, 2000. Google ScholarDigital Library
- P. Domingos and M. Pazzani. On the optimality of the simple Bayesian classifier under zero--one loss. Machine Learning, 29(2-3):103--130, 1997. Google ScholarDigital Library
- H. Drucker, D. Wu, and V. Vapnik. Support vector machines for Spam categorization. IEEE Transactions on Neural Networks, 10(5):1048--1054, 1999. Google ScholarDigital Library
- D. Fallows. Spam: How it is hurting email and degrading life on the Internet. Pew Internet and American Life Project, October 2003.Google Scholar
- J. Goodman. Sequential conditional generalized iterative scaling. In ACL--2001, pages 9--16, 2001. Google ScholarDigital Library
- J. Goodman and W. Yih. Online discriminative spam filter training. In CEAS-2006, 2006.Google Scholar
- J. He and B. Thiesson. Asymmetric gradient boosting with application to spam filtering. In CEAS-2007, 2007.Google Scholar
- S. Hershkop and S. J. Stolfo. Combining email models for false positive reduction. In KDD-2005, pages 98--107, 2005. Google ScholarDigital Library
- G. Hinton. Products of experts. In Proc. of the 9thInternational Conference on Artificial Neural Networks (ICANN99), pages 1--6, 1999.Google ScholarCross Ref
- J. M. Kahn. A generative bayesian model for aggregating experts. In UAI, pages 301--308, 2004. Google ScholarDigital Library
- J. Kittler, M. Hatef, R. P. Duin, and J. Matas. On combining classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(3):226--239, 1998. Google ScholarDigital Library
- A. Kolcz and W. Yih. Raising the baseline for high-precision text classifiers. In KDD--2007, 2007. Google ScholarDigital Library
- B. Leiba, J. Ossher, V. T. Rajan, R. Segal, and M. N. Wegman. SMTP path analysis. In CEAS-2005, 2005.Google Scholar
- D. Lowd and C. Meek. Adversarial learning. In KDD-2005,pages 641--647, 2005. Google ScholarDigital Library
- D. Lowd and C. Meek. Good word attacks on statistical spam filters. In CEAS--2005, 2005.Google Scholar
- V. Metsis, V. Androutsopoulos, and G. Paliouras. Spam filtering with naive Bayes -- which naive Bayes? In CEAS-2006, 2006.Google Scholar
- A. Ng and M. Jordan. On discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes. In Proceedings of NIPS 14, 2002.Google Scholar
- K. Nigam, J. Lafferty, and A. McCallum. Using maximum entropy for text classification. In IJCAI-99 Workshop on Machine Learning for Information Filtering, 1999.Google Scholar
- R. Raina, Y. Shen, A. Ng, and A. McCallum. Classification with hybrid generative/discriminative models. In Proceedings of NIPS 16, 2004.Google Scholar
- J. Rennie, L. Shih, J. Teevan, and D. Karger. Tackling the poor assumptions of naive Bayes text classifiers. In ICML-2003, 2003.Google Scholar
- M. Sahami, S. Dumais, D. Heckerman, and E. Horvitz. A Bayesian approach to filtering junk e-mail. In AAAI-98 Workshop on Learning for Text Categorization, 1998.Google Scholar
- G. Sakkis, I. Androutsopoulos, G. Paliouras, V. Karkaletsis,C. D. Spyropoulos, and P. Stamatopoulos. Stacking classifiers for anti-spam filtering of e-mail. In EMNLP-2001,pages 44--50, 2001.Google Scholar
- D. Sculley and G. M. Wachman. Relaxed online SVMs for spam filtering. In SIGIR--2007, pages 415--422, 2007. Google ScholarDigital Library
- R. Segal. Combining global and personal anti--spam filtering.In CEAS--2007, 2007.Google Scholar
- A. Smith, T. Cohn, and M. Osborne. Logarithmic opinion pools for conditional random fields. In ACL-2005, pages 18--25, 2005. Google ScholarDigital Library
- A. Smith and M. Osborne. Using gazetteers in discriminative information extraction. In Proceedings of the Tenth Conference on Computational Natural Language Learning(CoNLL-X), pages 133--140, 2006. Google ScholarDigital Library
- C. Sutton, M. Sindelar, and A. McCallum. Reducing weight undertraining in structured discriminative learning. In HLT-NAACL-2006, pages 89--95, 2006. Google ScholarDigital Library
- V. N. Vapnik. Statistical Learning Theory. John Wiley & Sons, 1998. Google ScholarDigital Library
- W. Yih, J. Goodman, and G. Hulten. Learning at low false positive rates. In CEAS--2006, 2006.Google Scholar
- W. Yih, R. McCann, and A. Kolcz. Improving spam filtering by detecting gray mail. In CEAS--2007, 2007.Google Scholar
Index Terms
- Partitioned logistic regression for spam filtering
Recommendations
An empirical study of reducing multiclass classification methodologies
MLDM'13: Proceedings of the 9th international conference on Machine Learning and Data Mining in Pattern RecognitionOne-against-all and one-against-one are two popular methodologies for reducing multiclass classification problems into a set of binary classifications. In this paper, we are interested in the performance of both one-against-all and one-against-one for ...
Logistic regression using covariates obtained by product-unit neural network models
We propose a logistic regression method based on the hybridation of a linear model and product-unit neural network models for binary classification. In a first step we use an evolutionary algorithm to determine the basic structure of the product-unit ...
Applications of Logistic Regression and Naive Bayes in Commodity Sentiment Analysis
IVSP '22: Proceedings of the 2022 4th International Conference on Image, Video and Signal ProcessingSentiment analysis is popular research which helps people perceive the trend of public opinion commented in separate social networking platforms. The aim of the paper is to investigate the real evaluation of goods when marketing tweets are eliminated ...
Comments