Abstract
We consider the problem of learning classifiers in structured domains, where some objects have a subset of features that are inherently absent due to complex relationships between the features. Unlike the case where a feature exists but its value is not observed, here we focus on the case where a feature may not even exist (structurally absent) for some of the samples. The common approach for handling missing features in discriminative models is to first complete their unknown values, and then use a standard classification procedure over the completed data. This paper focuses on features that are known to be non-existing, rather than have an unknown value. We show how incomplete data can be classified directly without any completion of the missing features using a max-margin learning framework. We formulate an objective function, based on the geometric interpretation of the margin, that aims to maximize the margin of each sample in its own relevant subspace. In this formulation, the linearly separable case can be transformed into a binary search over a series of second order cone programs (SOCP), a convex problem that can be solved efficiently. We also describe two approaches for optimizing the general case: an approximation that can be solved as a standard quadratic program (QP) and an iterative approach for solving the exact problem. By avoiding the pre-processing phase in which the data is completed, both of these approaches could offer considerable computational savings. More importantly, we show that the elegant handling of missing values by our approach allows it to both outperform other methods when the missing values have non-trivial structure, and be competitive with other methods when the values are missing at random. We demonstrate our results on several standard benchmarks and two real-world problems: edge prediction in metabolic pathways, and automobile detection in natural images.
- A. Berg, T. Berg, and J. Malik. Shape matching and object recognition using low distortion correspondence. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 2005. Google ScholarDigital Library
- G. Elidan, G. Heitz, and D. Koller. Learning object shape: From drawings to images. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 2006. Google ScholarDigital Library
- R. Fergus, P. Perona, and A. Zisserman. Object class recognition by unsupervised scale-invariant learning. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), volume 2, pages 264-271, 2003.Google ScholarCross Ref
- J. Forster, I. Famili, P. Fu, B., Palsson, and J. Nielsen. Genome-scale reconstruction of the saccharomyces cerevisiae metabolic network. Genome Research, 13(2):244-253, February 2003.Google ScholarCross Ref
- Z. Ghahramani and M. I. Jordan. Supervised learning from incomplete data via an EM approach. In Jack D. Cowan, Gerald Tesauro, and Joshua Alspector, editors, Advances in Neural Information Processing Systems, volume 6, pages 120-127. Morgan Kaufmann Publishers, Inc., 1994.Google Scholar
- K. Grauman and T. Darrell. Pyramid match kernels: Discriminative classification with sets of image features. In International Conference on Computer Vision, October 2005. Google ScholarDigital Library
- W.K. Huh, J.V. Falvo, L.C. Gerke, A.S. Carroll, R.W. Howson, J.S. Weissman, and E.K. O'Shea. Global analysis of protein localization in budding yeast. Nature, 425:686-691, 2003.Google ScholarCross Ref
- N. Hulo, A. Bairoch, V. Bulliard, L. Cerutti, E. De Castro, P.S. Langendijk-Genevaux, M. Pagni, and C.J.A. Sigrist. The prosite database. Nucleic Acids Res., 34:D227-D230, 2006.Google ScholarCross Ref
- J. Ihmels, R. Levy, and N. Barkai. Principles of transcriptional control in the metabolic network of saccharomyces cerevisiae. Nature Biotechnology, 22:86-92, 2003.Google ScholarCross Ref
- A. Jaimovich, G. Elidan, H. Margalit, and N. Friedman. Towards an integrated protein-protein interaction network. In RECOMB 2005, 2005. Google ScholarDigital Library
- D. Jensen and J. Neville. Linkage and autocorrelation cause feature selection bias in relational learning. In 19th international confernce on Machine learning (ICML 2002), 2002. Google ScholarDigital Library
- A. Kapoor. Learning Discriminative Models with Incomplete Data. PhD thesis, MIT Media Lab, Feb 2006 2006. Google ScholarDigital Library
- P. Kharchenko, D. Vitkup, and GM Church. Filling gaps in a metabolic network using expression information. Bioinformatics, 2003. Google ScholarDigital Library
- P. Kharchenko, GM Church, and D. Vitkup. Expression dynamics of a cellular metabolic network. Molecular Systems Biology, 2005.Google ScholarCross Ref
- Y. LeCun. MNIST Database of Handwritten Digits. NEC Research, 1998. URL http://yann.lecun.com/exdb/mnist/.Google Scholar
- R.J.A. Little and D.B. Rubin. Statistical Analysis with Missing Data. NY wiley, 1987. Google ScholarDigital Library
- M. Sousa Lobo, L. Vandenbergheb, S. Boyd, and H. Lebret. Applications of second-order cone programming. Linear Algebra and its Applications, 284(1-3):193-228, 1998.Google Scholar
- S. K. Pannagadatta, C. Bhattacharyya, and A.J. Smola. Second order cone programming approaches for handling missing and uncertain data. Journal of Machine Learning Research, 7:1283-1314, 2006. Google ScholarDigital Library
- A. Quattoni, M. Collins, and T. Darrell. Conditional random fields for object recognition. In Lawrence K. Saul, Yair Weiss, and Léon Bottou, editors, Advances in Neural Information Processing Systems 17, pages 1097-1104, Cambridge, MA, 2005. MIT Press.Google Scholar
- P. Roth. Missing data: A conceptual review for applied psychologists. Personnel Psychology, 47 (3):537-560, 1994.Google ScholarCross Ref
- J. L. Schafer. Analysis of Incomplete Multivariate Data. Chapman and Hall, 1997.Google ScholarCross Ref
- B. Schölkopf and A.J. Smola. Learning with Kernels: Support Vector Machines, Regularization Optimization and Beyond. MIT Press, Cambridge, MA, 2002. Google ScholarDigital Library
- V.N. Vapnik. The Nature of Statistical Learning Theory. Springer Verlag, 1995. Google ScholarDigital Library
- J. P. Vert and Y. Yamanishi. Supervised graph inference. In L. K. Saul, Y. Weiss, and L. Bottou, editors, Advances in Neural Information Processing Systems 17, pages 1433-1440, Cambridge, MA, 2004. MIT Press.Google Scholar
Index Terms
- Max-margin Classification of Data with Absent Features
Recommendations
Max-margin classification of incomplete data
NIPS'06: Proceedings of the 19th International Conference on Neural Information Processing SystemsWe consider the problem of learning classifiers for structurally incomplete data, where some objects have a subset of features inherently absent due to complex relationships between the features. The common approach for handling missing features is to ...
A Max-Margin Learning Algorithm with Additional Features
FAW '09: Proceedings of the 3d International Workshop on Frontiers in AlgorithmicsThis paper investigates the problem of learning classifiers from samples which have additional features and some of these additional features are absent due to noise or corruption of measurement. The common approach for handling missing features in ...
Max-margin Multiple-Instance Learning via Semidefinite Programming
ACML '09: Proceedings of the 1st Asian Conference on Machine Learning: Advances in Machine LearningIn this paper, we present a novel semidefinite programming approach for multiple-instance learning. We first formulate the multiple-instance learning as a combinatorial maximum margin optimization problem with additional instance selection constraints ...
Comments