ACM Home Page
Please provide us with feedback. Feedback
Learning sparse metrics via linear programming
Full text PdfPdf (756 KB)
Source Conference on Knowledge Discovery in Data archive
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
Philadelphia, PA, USA
SESSION: Research track papers table of contents
Pages: 367 - 373  
Year of Publication: 2006
ISBN:1-59593-339-5
Authors
Rómer Rosales  Siemens Medical Solutions, Malvern, PA
Glenn Fung  Siemens Medical Solutions, Malvern, PA
Sponsors
ACM: Association for Computing Machinery
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 14,   Downloads (12 Months): 86,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1150402.1150444
What is a DOI?

ABSTRACT

Calculation of object similarity, for example through a distance function, is a common part of data mining and machine learning algorithms. This calculation is crucial for efficiency since distances are usually evaluated a large number of times, the classical example being query-by-example (find objects that are similar to a given query object). Moreover, the performance of these algorithms depends critically on choosing a good distance function. However, it is often the case that (1) the correct distance is unknown or chosen by hand, and (2) its calculation is computationally expensive (e.g., such as for large dimensional objects). In this paper, we propose a method for constructing relative-distance preserving low-dimensional mapping (sparse mappings). This method allows learning unknown distance functions (or approximating known functions) with the additional property of reducing distance computation time. We present an algorithm that given examples of proximity comparisons among triples of objects (object i is more like object j than object k), learns a distance function, in as few dimensions as possible, that preserves these distance relationships. The formulation is based on solving a linear programming optimization problem that finds an optimal mapping for the given dataset and distance relationships. Unlike other popular embedding algorithms, this method can easily generalize to new points, does not have local minima, and explicitly models computational efficiency by finding a mapping that is sparse, i.e. one that depends on a small subset of features or dimensions. Experimental evaluation shows that the proposed formulation compares favorably with a state-of-the art method in several publicly available datasets.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
V. Athitsos, J. Alon, S. Sclaroff, and G. Kollios. Boostmap: A method for efficient approximate similarity rankings. In Computer Vision and Pattern Recognition, 2004.
 
2
 
3
 
4
 
5
T. Cox and M. Cox. Multidimensional Scaling. Chapman & Hall, London, 1994.
6
 
7
 
8
 
9
I. Jolliffe. Principal Component Analysis. Springer-Verlag, New York, 1989.
 
10
 
11
S. Roweis and L. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 290:2323--2326, 2000.
 
12
M. Schultz and T. Joachims. Learning a distance metric from relative comparisons. In Advances in Neural Information Processing Systems, 2003.
 
13
J. B. Tenenbaum, V. de Silva, and J. C. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290:2319--2323, 2000.
 
14
K. C. Toh, M. J. Todd, and R. Tutuncu. SDPT3 - a Matlab software package for semidefinite programming. Optimization Methods and Software, 11:545--581, 1999.
 
15
 
16
K. Weinberger, J. Blitzer, and L. Saul. Distance metric learning for large margin nearest neighbor classification. In Advances in Neural Information Processing Systems 18, 2006.
 
17
K. Q. Weinberger, B. D. Packer, and L. K. Saul. Unsupervised learning of image manifolds by semidefinite programming. In Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics, Barbados, January 2005.
 
18
E. Xing, A. Ng, M. Jordan, and S. Russell. Distance metric learning, with application to clustering with side information. In Advances in Neural Information Processing Systems, 2002.

Collaborative Colleagues:
Rómer Rosales: colleagues
Glenn Fung: colleagues