article

Free Access

Max-margin Classification of Data with Absent Features

The Journal of Machine Learning Research Volume 9pp 1–21

Published:01 June 2008Publication History

The Journal of Machine Learning Research

Abstract

We consider the problem of learning classifiers in structured domains, where some objects have a subset of features that are inherently absent due to complex relationships between the features. Unlike the case where a feature exists but its value is not observed, here we focus on the case where a feature may not even exist (structurally absent) for some of the samples. The common approach for handling missing features in discriminative models is to first complete their unknown values, and then use a standard classification procedure over the completed data. This paper focuses on features that are known to be non-existing, rather than have an unknown value. We show how incomplete data can be classified directly without any completion of the missing features using a max-margin learning framework. We formulate an objective function, based on the geometric interpretation of the margin, that aims to maximize the margin of each sample in its own relevant subspace. In this formulation, the linearly separable case can be transformed into a binary search over a series of second order cone programs (SOCP), a convex problem that can be solved efficiently. We also describe two approaches for optimizing the general case: an approximation that can be solved as a standard quadratic program (QP) and an iterative approach for solving the exact problem. By avoiding the pre-processing phase in which the data is completed, both of these approaches could offer considerable computational savings. More importantly, we show that the elegant handling of missing values by our approach allows it to both outperform other methods when the missing values have non-trivial structure, and be competitive with other methods when the values are missing at random. We demonstrate our results on several standard benchmarks and two real-world problems: edge prediction in metabolic pathways, and automobile detection in natural images.

References

A. Berg, T. Berg, and J. Malik. Shape matching and object recognition using low distortion correspondence. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 2005. Google ScholarDigital Library
G. Elidan, G. Heitz, and D. Koller. Learning object shape: From drawings to images. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 2006. Google ScholarDigital Library
R. Fergus, P. Perona, and A. Zisserman. Object class recognition by unsupervised scale-invariant learning. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), volume 2, pages 264-271, 2003.Google ScholarCross Ref
J. Forster, I. Famili, P. Fu, B., Palsson, and J. Nielsen. Genome-scale reconstruction of the saccharomyces cerevisiae metabolic network. Genome Research, 13(2):244-253, February 2003.Google ScholarCross Ref
Z. Ghahramani and M. I. Jordan. Supervised learning from incomplete data via an EM approach. In Jack D. Cowan, Gerald Tesauro, and Joshua Alspector, editors, Advances in Neural Information Processing Systems, volume 6, pages 120-127. Morgan Kaufmann Publishers, Inc., 1994.Google Scholar
K. Grauman and T. Darrell. Pyramid match kernels: Discriminative classification with sets of image features. In International Conference on Computer Vision, October 2005. Google ScholarDigital Library
W.K. Huh, J.V. Falvo, L.C. Gerke, A.S. Carroll, R.W. Howson, J.S. Weissman, and E.K. O'Shea. Global analysis of protein localization in budding yeast. Nature, 425:686-691, 2003.Google ScholarCross Ref
N. Hulo, A. Bairoch, V. Bulliard, L. Cerutti, E. De Castro, P.S. Langendijk-Genevaux, M. Pagni, and C.J.A. Sigrist. The prosite database. Nucleic Acids Res., 34:D227-D230, 2006.Google ScholarCross Ref
J. Ihmels, R. Levy, and N. Barkai. Principles of transcriptional control in the metabolic network of saccharomyces cerevisiae. Nature Biotechnology, 22:86-92, 2003.Google ScholarCross Ref
A. Jaimovich, G. Elidan, H. Margalit, and N. Friedman. Towards an integrated protein-protein interaction network. In RECOMB 2005, 2005. Google ScholarDigital Library
D. Jensen and J. Neville. Linkage and autocorrelation cause feature selection bias in relational learning. In 19th international confernce on Machine learning (ICML 2002), 2002. Google ScholarDigital Library
A. Kapoor. Learning Discriminative Models with Incomplete Data. PhD thesis, MIT Media Lab, Feb 2006 2006. Google ScholarDigital Library
P. Kharchenko, D. Vitkup, and GM Church. Filling gaps in a metabolic network using expression information. Bioinformatics, 2003. Google ScholarDigital Library
P. Kharchenko, GM Church, and D. Vitkup. Expression dynamics of a cellular metabolic network. Molecular Systems Biology, 2005.Google ScholarCross Ref
Y. LeCun. MNIST Database of Handwritten Digits. NEC Research, 1998. URL http://yann.lecun.com/exdb/mnist/.Google Scholar
R.J.A. Little and D.B. Rubin. Statistical Analysis with Missing Data. NY wiley, 1987. Google ScholarDigital Library
M. Sousa Lobo, L. Vandenbergheb, S. Boyd, and H. Lebret. Applications of second-order cone programming. Linear Algebra and its Applications, 284(1-3):193-228, 1998.Google Scholar
S. K. Pannagadatta, C. Bhattacharyya, and A.J. Smola. Second order cone programming approaches for handling missing and uncertain data. Journal of Machine Learning Research, 7:1283-1314, 2006. Google ScholarDigital Library
A. Quattoni, M. Collins, and T. Darrell. Conditional random fields for object recognition. In Lawrence K. Saul, Yair Weiss, and Léon Bottou, editors, Advances in Neural Information Processing Systems 17, pages 1097-1104, Cambridge, MA, 2005. MIT Press.Google Scholar
P. Roth. Missing data: A conceptual review for applied psychologists. Personnel Psychology, 47 (3):537-560, 1994.Google ScholarCross Ref
J. L. Schafer. Analysis of Incomplete Multivariate Data. Chapman and Hall, 1997.Google ScholarCross Ref
B. Schölkopf and A.J. Smola. Learning with Kernels: Support Vector Machines, Regularization Optimization and Beyond. MIT Press, Cambridge, MA, 2002. Google ScholarDigital Library
V.N. Vapnik. The Nature of Statistical Learning Theory. Springer Verlag, 1995. Google ScholarDigital Library
J. P. Vert and Y. Yamanishi. Supervised graph inference. In L. K. Saul, Y. Weiss, and L. Bottou, editors, Advances in Neural Information Processing Systems 17, pages 1433-1440, Cambridge, MA, 2004. MIT Press.Google Scholar

Index Terms

Recommendations

Max-margin classification of incomplete data
NIPS'06: Proceedings of the 19th International Conference on Neural Information Processing Systems

We consider the problem of learning classifiers for structurally incomplete data, where some objects have a subset of features inherently absent due to complex relationships between the features. The common approach for handling missing features is to ...
Read More
A Max-Margin Learning Algorithm with Additional Features
FAW '09: Proceedings of the 3d International Workshop on Frontiers in Algorithmics

This paper investigates the problem of learning classifiers from samples which have additional features and some of these additional features are absent due to noise or corruption of measurement. The common approach for handling missing features in ...
Read More
Max-margin Multiple-Instance Learning via Semidefinite Programming
ACML '09: Proceedings of the 1st Asian Conference on Machine Learning: Advances in Machine Learning

In this paper, we present a novel semidefinite programming approach for multiple-instance learning. We first formulate the multiple-instance learning as a combinatorial maximum margin optimization problem with additional instance selection constraints ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

The Journal of Machine Learning Research Volume 9, Issue
6/1/2008
1964 pages
ISSN:1532-4435
EISSN:1533-7928
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
JMLR.org
Publication History
- Published: 1 June 2008
Published in jmlr Volume 9, Issue
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 24
  Total Citations
  View Citations
- 470
  Total Downloads
- Downloads (Last 12 months)19
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Max-margin Classification of Data with Absent Features

The Journal of Machine Learning Research

Abstract

References

Cited By

Index Terms

Recommendations

Max-margin classification of incomplete data

A Max-Margin Learning Algorithm with Additional Features

Max-margin Multiple-Instance Learning via Semidefinite Programming

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Max-margin Classification of Data with Absent Features

The Journal of Machine Learning Research

Abstract

References

Cited By

Index Terms

Recommendations

Max-margin classification of incomplete data

A Max-Margin Learning Algorithm with Additional Features

Max-margin Multiple-Instance Learning via Semidefinite Programming

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media