research-article

Online Adaptive Asymmetric Active Learning for Budgeted Imbalanced Data

Authors:
Yifan Zhang

South China University of Technology, Guangzhou, China

South China University of Technology, Guangzhou, China
View Profile

,
Peilin Zhao

South China University of Technology, Guangzhou, China

South China University of Technology, Guangzhou, China
View Profile

,
Jiezhang Cao

South China University of Technology, Guangzhou, China

South China University of Technology, Guangzhou, China
View Profile

,
Wenye Ma

Tencent AI Lab, Shenzhen, China

Tencent AI Lab, Shenzhen, China
View Profile

,
Junzhou Huang

Tencent AI Lab, Shenzhen , China

Tencent AI Lab, Shenzhen , China
View Profile

,
Qingyao Wu

South China University of Technology, Guangzhouc , China

South China University of Technology, Guangzhouc , China
View Profile

,
Mingkui Tan

South China University of Technology, Guangzhou, China

South China University of Technology, Guangzhou, China
View Profile

KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data MiningJuly 2018Pages 2768–2777https://doi.org/10.1145/3219819.3219948

Published:19 July 2018Publication History

KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

Pages 2768–2777

ABSTRACT

This paper investigates Online Active Learning (OAL) for imbalanced unlabeled datastream, where only a budget of labels can be queried to optimize some cost-sensitive performance measure. OAL can solve many real-world problems, such as anomaly detection in healthcare, finance and network security. In these problems, there are two key challenges: the query budget is often limited; the ratio between two classes is highly imbalanced. To address these challenges, existing work of OAL adopts either asymmetric losses or queries (an isolated asymmetric strategy) to tackle the imbalance, and uses first-order methods to optimize the cost-sensitive measure. However, they may incur two deficiencies: (1) the poor ability in handling imbalanced data due to the isolated asymmetric strategy; (2) relative slow convergence rate due to the first-order optimization. In this paper, we propose a novel Online Adaptive Asymmetric Active (OA3) learning algorithm, which is based on a new asymmetric strategy (merging both the asymmetric losses and queries strategies), and second-order optimization. We theoretically analyze its bounds, and also empirically evaluate it on four real-world online anomaly detection tasks. Promising results confirm the effectiveness and robustness of the proposed algorithm in various application domains.

References

P. Bachman, A. Sordoni, A. Trischler. Learning algorithms for active learning. In 34th International Conference on Machine Learning, 2017, pp. 301--310.Google ScholarDigital Library
N. Abe, B. Zadrozny, J. Langford. Outlier detection by active learning. In SIGKDD International Conference on Knowledge Discovery and Data Mining, 2006, pp. 504--509. Google ScholarDigital Library
C. Aggarwal, X. Kong, Q. Gu, J. Han, P. Yu. Active learning: a survey, Data Classification: Algorithms and Applications, 2014. Google ScholarDigital Library
J. Attenberg, F. Provost. Why label when you can search? Alternatives to active learning for applying human resources to build classification models under extreme class imbalance. In smallSIGKDD International Conference on Knowledge Discovery and Data Mining, 2010, pp. 423--432. Google ScholarDigital Library
N. Cesa-Bianchi, C. Gentile, L. Zaniboni. Worst-case analysis of selective sampling for linear classification. Journal of Machine Learning Research, 2006, No. 7, pp. 1205--1230. Google ScholarDigital Library
N. Cesa-Bianchi, A. Conconi, C. Gentile. A second-order perceptron algorithm. SIAM Journal on Computing, 2005, No. 3, pp. 640--668. Google ScholarDigital Library
S. Chakraborty, V. Balasubramanian, A. Sankar, S. Panchanathan, J. Ye. Batchrank: A novel batch mode active learning framework for hierarchical classification. In SIGKDD International Conference on Knowledge Discovery and Data Mining, 2015, pp. 99--108. Google ScholarDigital Library
C. C. Chang, C. J. Lin. LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2011, Vol. 2, No. 3, pp. 27. Google ScholarDigital Library
K. Crammer, O. Dekel, J. Keshet, S. Shalev-Shwartz, Y. Singer. Online passive-aggressive algorithms. Journal of Machine Learning Research, 2006, pp. 551--585. Google ScholarDigital Library
K. Crammer, A. Kulesza, M. Dredze. Adaptive regularization of weight vectors. In Advances in Neural Information Processing Systems, 2009, pp. 414--422. Google ScholarDigital Library
M. Dundar, B. Krishnapuram, J. Bi, R. B. Rao. Learning classifiers when the training data is not IID. In International Joint Conference on Artificial Intelligence, 2007, pp. 756--761. Google ScholarDigital Library
M. Fang, X. Zhu, B. Li, W. Ding, X. Wu. Self-taught active learning from crowds. In IEEE International Conference on Data Mining, 2012, pp. 858--863. Google ScholarDigital Library
Z. Ferdowsi, R. Ghani, R. Settimi. Online active learning with imbalanced classes. In IEEE International Conference on Data Mining. 2013, pp. 1043--1048.Google ScholarCross Ref
K. Fujii, H. Kashima. Budgeted stream-based active learning via adaptive submodular maximization. In Advances in Neural Information Processing Systems, 2016, pp. 514--522. Google ScholarDigital Library
Y. Freund, R. E. Schapire. Large margin classification using the perceptron algorithm. Machine learning, 1999, No. 3, pp. 277--296. Google ScholarDigital Library
S. Hao, J. Lu, P. Zhao, C. Zhang, S. C. Hoi, C. Miao. Second-order online active learning and its applications. IEEE Transactions on Knowledge and Data Engineering, 2017.Google Scholar
S. Hao, P. Zhao, J. Lu, S. C. Hoi, C. Miao, C. Zhang. Soal: Second-order online active learning. In IEEE International Conference on Data Mining, 2016, pp. 931--936.Google ScholarCross Ref
R. Horn, C. Johnson. Matrix analysis. Cambridge University Express, 1990. Google ScholarDigital Library
G. Hulten, L. Spencer, P. Domingos. Mining time-changing data streams. In SIGKDD International Conference on Knowledge Discovery and Data Mining, 2001, pp. 97--106. Google ScholarDigital Library
S. C. Hoi, R. Jin, J. Zhu, M. R. Lyu. Batch mode active learning and its application to medical image classification. In International Conference on Machine Learning, 2006, pp. 417--424. Google ScholarDigital Library
S. J. Huang, J. L. Chen, X. Mu, Z. H. Zhou. Cost-Effective active learning from diverse labelers. In International Joint Conference on Artificial Intelligence, 2017, pp. 1879--1885. Google ScholarDigital Library
K. Konyushkova, R. Sznitman, P. Fua. Learning active learning from data. In Advances in Neural Information Processing Systems, 2017, pp. 4228--4238.Google Scholar
A. Krishnamurthy, A. Agarwal, T. Huang, D. Hal and J. Langford. Active learning for cost-sensitive classification. In International Conference on Machine Learning, 2017, pp. 1915--1924.Google Scholar
Y. Li, P. M. Long. The relaxed online maximum margin algorithm. In Advances in Neural Information Processing Systems, 2000, pp. 498--504. Google ScholarDigital Library
J. Lu, P. Zhao, S. C. Hoi. Online passive-aggressive active learning. Machine Learning, 2016, Vol. 103, No. 2, pp. 141--183. Google ScholarDigital Library
S. O. Moepya, S. S. Akhoury, F. V. Nelwamondo. Applying cost-sensitive classification for financial fraud detection under high class-imbalance. In IEEE International Conference on Data Mining, 2014, pp. 183--192.Google ScholarCross Ref
F. Nan, V. Saligrama. Adaptive classification for prediction under a budget. In Advances in Neural Information Processing Systems, 2017, pp. 4730--4740.Google Scholar
V. S. Sheng, F. Provost, P. G. Ipeirotis. Get another label? improving data quality and data mining using multiple, noisy labelers. In SIGKDD International Conference on Knowledge Discovery and Data Mining, 2008, pp. 614--622. Google ScholarDigital Library
J. Wang, P. Zhao and S. C. Hoi. Cost-sensitive online classification. IEEE Transactions on Knowledge and Data Engineering, 2014, vol. 26, no. 10, pp. 2425--2438.Google ScholarCross Ref
X. Zhang, T. Yang, P. Srinivasan. Online asymmetric active learning with imbalanced data. In SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016, pp. 2055--2064. Google ScholarDigital Library
Y. Zhang, G. Shu, Y. Li. Strategy-updating depending on local environment enhances cooperation in prisoner's dilemma game. Applied Mathematics and Computation, 2017, vol. 301, pp. 224--232. Google ScholarDigital Library
P. Zhao, S. C. Hoi. Cost-sensitive online active learning with application to malicious URL detection. In SIGKDD International Conference on Knowledge Discovery and Data Mining, 2013, pp. 919--927. Google ScholarDigital Library
P. Zhao, F. Zhuang, M. Wu, X. Li, and S. C. Hoi. Cost-sensitive online classification with adaptive regularization and its applications. In IEEE International Conference on Data Mining, 2015, pp. 649--658. Google ScholarDigital Library
P. Zhao, Y. Zhang, M. Wu, S. C. Hoi, M. Tan, J. Huang. Adaptive cost-sensitive online classification. IEEE Transactions on Knowledge and Data Engineering, 2018.Google Scholar
I. Zliobaite, A. Bifet, B. Pfahringer, G. Holmes. Active learning with drifting streaming data. IEEE Transactions on Neural Networks and Learning Systems, 2014, Vol. 25, No. 1, pp. 27--39.Google ScholarCross Ref

Index Terms

Online Adaptive Asymmetric Active Learning for Budgeted Imbalanced Data
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Unsupervised learning
        Anomaly detection
2. Theory of computation
  1. Design and analysis of algorithms
    1. Online algorithms
      1. Online learning algorithms
  2. Theory and algorithms for application domains
    1. Machine learning theory
      1. Active learning

Recommendations

Online Asymmetric Active Learning with Imbalanced Data
KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

This paper considers online learning with imbalanced streaming data under a query budget, where the act of querying for labels is constrained to a budget limit. We study different active querying strategies for classification. In particular, we propose ...
Read More
Learning from Imbalanced Data

With the continuous expansion of data availability in many large-scale, complex, and networked systems, such as surveillance, security, Internet, and finance, it becomes critical to advance the fundamental understanding of knowledge discovery and ...
Read More
Studying Active Learning in the Cost-Sensitive Framework
HICSS '12: Proceedings of the 2012 45th Hawaii International Conference on System Sciences

Active learning is a learning paradigm that actively acquires extra information with an "effort" for a certain "gain" when building learning models. This paper unifies the effort and gain by studying active learning in the cost-sensitive framework. The ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
July 2018
2925 pages
ISBN:9781450355520
DOI:10.1145/3219819
General Chairs:
Yike Guo
Imperial College London
,
Faisal Farooq
IBM
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 19 July 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
active learning
anomaly detection
cost-sensitive learning
imbalance data
online learning
query budget
Qualifiers
- research-article
Conference

Acceptance Rates
KDD '18 Paper Acceptance Rate107of983submissions,11%Overall Acceptance Rate1,133of8,635submissions,13%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 25
  Total Citations
  View Citations
- 1,022
  Total Downloads
- Downloads (Last 12 months)36
- Downloads (Last 6 weeks)7
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Online Adaptive Asymmetric Active Learning for Budgeted Imbalanced Data

KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Online Asymmetric Active Learning with Imbalanced Data

Learning from Imbalanced Data

Studying Active Learning in the Cost-Sensitive Framework

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Online Adaptive Asymmetric Active Learning for Budgeted Imbalanced Data

KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Online Asymmetric Active Learning with Imbalanced Data

Learning from Imbalanced Data

Studying Active Learning in the Cost-Sensitive Framework

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media