research-article

Public Access

Metric Learning from Probabilistic Labels

Authors:
Mengdi Huai

State University of New York at Buffalo, Buffalo, NY, USA

State University of New York at Buffalo, Buffalo, NY, USA
View Profile

,
Chenglin Miao

State University of New York at Buffalo, Buffalo, NY, USA

State University of New York at Buffalo, Buffalo, NY, USA
View Profile

,
Yaliang Li

Tencent Medical AI Lab, Palo Alto, CA, USA

Tencent Medical AI Lab, Palo Alto, CA, USA
View Profile

,
Qiuling Suo

State University of New York at Buffalo, Buffalo, NY, USA

State University of New York at Buffalo, Buffalo, NY, USA
View Profile

,
Lu Su

State University of New York at Buffalo, Buffalo, NY, USA

State University of New York at Buffalo, Buffalo, NY, USA
View Profile

,
Aidong Zhang

State University of New York at Buffalo, Buffalo, NY, USA

State University of New York at Buffalo, Buffalo, NY, USA
View Profile

KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data MiningJuly 2018Pages 1541–1550https://doi.org/10.1145/3219819.3219976

Published:19 July 2018Publication History

KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

Pages 1541–1550

ABSTRACT

Metric learning aims to learn a good distance metric that can capture the relationships among instances, and its importance has long been recognized in many fields. In the traditional settings of metric learning, an implicit assumption is that the associated labels of the instances are deterministic. However, in many real-world applications, the associated labels come naturally with probabilities instead of deterministic values. Thus, the existing metric learning methods cannot work well in these applications. To tackle this challenge, in this paper, we study how to effectively learn the distance metric from datasets that contain probabilistic information, and then propose two novel metric learning mechanisms for two types of probabilistic labels, i.e., the instance-wise probabilistic label and the group-wise probabilistic label. Compared with the existing metric learning methods, our proposed mechanisms are capable of learning distance metrics directly from the probabilistic labels with high accuracy. We also theoretically analyze the two proposed mechanisms and provide theoretical bounds on the sample complexity for both of them. Additionally, extensive experiments based on real-world datasets are conducted to verify the desirable properties of the proposed mechanisms.

Supplemental Material

huai_probabilistic_labels.mp4

mp4

467.6 MB

Download

References

Mahdieh Soleymani Baghshah and Saeed Bagheri Shouraki . 2009. Semi-Supervised Metric Learning Using Pairwise Constraints Proceedings of the International Joint Conference on Artificial Intelligence. 1217--1222. Google ScholarDigital Library
Aharon Bar-Hillel, Tomer Hertz, Noam Shental, and Daphna Weinshall . 2005. Learning a mahalanobis metric from equivalence constraints. Journal of Machine Learning Research Vol. 6, Jun (2005), 937--965. Google ScholarDigital Library
Qiong Cao, Zheng-Chu Guo, and Yiming Ying . 2016. Generalization bounds for metric and similarity learning. Machine Learning Vol. 102, 1 (2016), 115--132. Google ScholarDigital Library
Olivier Chapelle, Vikas Sindhwani, and Sathiya S Keerthi . 2008. Optimization techniques for semi-supervised support vector machines. Journal of Machine Learning Research Vol. 9, Feb (2008), 203--233. Google ScholarDigital Library
Jason V Davis, Brian Kulis, Prateek Jain, Suvrit Sra, and Inderjit S Dhillon . 2007. Information-theoretic metric learning. In Proceedings of the 24th international conference on Machine learning. ACM, 209--216. Google ScholarDigital Library
Matthieu Guillaumin, Jakob Verbeek, and Cordelia Schmid . 2010. Multiple instance metric learning from automatically labeled bags of faces. In Proceedings of the European conference on Computer Vision. Springer, 634--647. Google ScholarDigital Library
Mengdi Huai, Chenglin Miao, Qiuling Suo, Yaliang Li, Jing Gao, and Aidong Zhang . 2018. Uncorrelated Patient Similarity Learning. In Proceedings of the 2018 SIAM International Conference on Data Mining. SIAM, 270--278.Google ScholarCross Ref
Yinjie Huang, Cong Li, Michael Georgiopoulos, and Georgios C Anagnostopoulos . 2013. Reduced-rank local distance metric learning. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 224--239. Google ScholarDigital Library
Arun Shankar Iyer, J Saketha Nath, and Sunita Sarawagi . 2016. Privacy-preserving class ratio estimation. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 925--934. Google ScholarDigital Library
Rong Jin, Shijun Wang, and Zhi-Hua Zhou . 2009. Learning a distance metric from multi-instance multi-label data Proceedings of the Conference on Computer Vision and Pattern Recognition. IEEE, 896--902.Google Scholar
Marc T Law, Yaoliang Yu, Raquel Urtasun, Richard S Zemel, and Eric P Xing . 2017. Efficient multiple instance metric learning using weakly supervised data Proceedings of the Conference on Computer Vision and Pattern Recognition.Google Scholar
Dewei Li and Yingjie Tian . 2016. Multi-view metric learning for multi-instance image classification. arXiv preprint arXiv:1610.06671 (2016).Google Scholar
Weiwei Liu and Ivor W Tsang . 2015. Large Margin Metric Learning for Multi-Label Prediction. Proceedings of the AAAI Conference on Artificial Intelligence, Vol. Vol. 15. 2800--2806. Google ScholarDigital Library
Gang Niu, Bo Dai, Makoto Yamada, and Masashi Sugiyama . 2014. Information-theoretic semi-supervised metric learning via entropy regularization. Neural computation Vol. 26, 8 (2014), 1717--1762. Google ScholarDigital Library
Giorgio Patrini, Richard Nock, Paul Rivera, and Tiberio Caetano . 2014. (Almost) no label no cry. In Advances in Neural Information Processing Systems. 190--198. Google ScholarDigital Library
Peng Peng, Raymond Chi-Wing Wong, and Phillp S Yu . 2014. Learning on probabilistic labels. In Proceedings of the 2014 SIAM International Conference on Data Mining. SIAM, 307--315.Google ScholarCross Ref
Filipe Rodrigues, Francisco Pereira, and Bernardete Ribeiro . 2014. Gaussian process classification and active learning with multiple annotators International Conference on Machine Learning. 433--441. Google ScholarDigital Library
Stefan Rueping . 2010. SVM classifier estimation from group probabilities Proceedings of the 27th international conference on machine learning (ICML-10). 911--918. Google ScholarDigital Library
Shai Shalev-Shwartz and Shai Ben-David . 2014. Understanding machine learning: From theory to algorithms. Cambridge university press. Google ScholarDigital Library
Kihyuk Sohn . 2016. Improved deep metric learning with multi-class n-pair loss objective Advances in Neural Information Processing Systems. 1857--1865. Google ScholarDigital Library
Jimeng Sun, Fei Wang, Jianying Hu, and Shahram Edabollahi . 2012. Supervised patient similarity measure of heterogeneous patient records. ACM SIGKDD Explorations Newsletter Vol. 14, 1 (2012), 16--24. Google ScholarDigital Library
Tao Sun, Dan Sheldon, and Brendan O'Connor . 2017. A Probabilistic Approach for Learning with Label Proportions Applied to the US Presidential Election. In Proceedings of the 2017 IEEE International Conference on Data Mining (ICDM). IEEE, 445--454.Google ScholarCross Ref
Qiuling Suo, Fenglong Ma, Ye Yuan, Mengdi Huai, Weida Zhong, Aidong Zhang, and Jing Gao . 2017. Personalized Disease Prediction Using a CNN-Based Similarity Learning Method Proceedings of the 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).Google Scholar
Tian Tian and Jun Zhu . 2015. Max-margin majority voting for learning from crowds Advances in Neural Information Processing Systems. 1621--1629. Google ScholarDigital Library
Dong Wang and Xiaoyang Tan . 2014. Robust Distance Metric Learning in the Presence of Label Noise Proceedings of the AAAI Conference on Artificial Intelligence. 1321--1327. Google ScholarDigital Library
Kilian Q Weinberger, John Blitzer, and Lawrence K Saul . 2006. Distance metric learning for large margin nearest neighbor classification Advances in neural information processing systems. 1473--1480. Google ScholarDigital Library
Kilian Q Weinberger and Lawrence K Saul . 2009. Distance metric learning for large margin nearest neighbor classification. Journal of Machine Learning Research Vol. 10, Feb (2009), 207--244. Google ScholarDigital Library
Eric P Xing, Michael I Jordan, Stuart J Russell, and Andrew Y Ng . 2003. Distance metric learning with application to clustering with side-information Advances in neural information processing systems. 521--528. Google ScholarDigital Library
Felix X Yu, Dong Liu, Sanjiv Kumar, Tony Jebara, and Shih-Fu Chang . 2013. SVM for learning with label proportions. In Internalization conference on machine learning (2013). Google ScholarDigital Library
Pourya Zadeh, Reshad Hosseini, and Suvrit Sra . 2016. Geometric mean metric learning. In International Conference on Machine Learning. 2464--2471. Google ScholarDigital Library
Mengting Zhan, Shilei Cao, Buyue Qian, Shiyu Chang, and Jishang Wei . 2016. Low-rank sparse feature selection for patient similarity learning Proceeding of the 2016 IEEE 16th International Conference on Data Mining (ICDM). IEEE.Google Scholar
Dengyong Zhou, Qiang Liu, John Platt, and Christopher Meek . 2014. Aggregating ordinal labels from crowds by minimax conditional entropy International conference on machine learning. 262--270. Google ScholarDigital Library

Index Terms

Metric Learning from Probabilistic Labels
1. Computing methodologies
  1. Machine learning
2. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

Learning Distance Metrics from Probabilistic Information
Special Issue on KDD 2018, Regular Papers and Survey Paper

The goal of metric learning is to learn a good distance metric that can capture the relationships among instances, and its importance has long been recognized in many fields. An implicit assumption in the traditional settings of metric learning is that ...
Read More
Joint learning of labels and distance metric
Special issue on game theory

Machine learning algorithms frequently suffer from the in sufficiency of training data and the usage of inappropriate distance metric. In this paper, we propose a joint learning of labels and distance metric (JLLDM) approach, which is able to ...
Read More
Co-metric: a metric learning algorithm for data with multiple views

We address the problem of metric learning for multi-view data. Many metric learning algorithms have been proposed, most of them focus just on single view circumstances, and only a few deal with multi-view data. In this paper, motivated by the co-...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
July 2018
2925 pages
ISBN:9781450355520
DOI:10.1145/3219819
General Chairs:
Yike Guo
Imperial College London
,
Faisal Farooq
IBM
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 19 July 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
distance measure
metric learning
probabilistic labels
Qualifiers
- research-article
Conference

Acceptance Rates
KDD '18 Paper Acceptance Rate107of983submissions,11%Overall Acceptance Rate1,133of8,635submissions,13%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 12
  Total Citations
  View Citations
- 1,566
  Total Downloads
- Downloads (Last 12 months)77
- Downloads (Last 6 weeks)9
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Metric Learning from Probabilistic Labels

KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Learning Distance Metrics from Probabilistic Information

Joint learning of labels and distance metric

Co-metric: a metric learning algorithm for data with multiple views

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Metric Learning from Probabilistic Labels

KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Learning Distance Metrics from Probabilistic Information

Joint learning of labels and distance metric

Co-metric: a metric learning algorithm for data with multiple views

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media