ABSTRACT
This paper investigates Online Active Learning (OAL) for imbalanced unlabeled datastream, where only a budget of labels can be queried to optimize some cost-sensitive performance measure. OAL can solve many real-world problems, such as anomaly detection in healthcare, finance and network security. In these problems, there are two key challenges: the query budget is often limited; the ratio between two classes is highly imbalanced. To address these challenges, existing work of OAL adopts either asymmetric losses or queries (an isolated asymmetric strategy) to tackle the imbalance, and uses first-order methods to optimize the cost-sensitive measure. However, they may incur two deficiencies: (1) the poor ability in handling imbalanced data due to the isolated asymmetric strategy; (2) relative slow convergence rate due to the first-order optimization. In this paper, we propose a novel Online Adaptive Asymmetric Active (OA3) learning algorithm, which is based on a new asymmetric strategy (merging both the asymmetric losses and queries strategies), and second-order optimization. We theoretically analyze its bounds, and also empirically evaluate it on four real-world online anomaly detection tasks. Promising results confirm the effectiveness and robustness of the proposed algorithm in various application domains.
- P. Bachman, A. Sordoni, A. Trischler. Learning algorithms for active learning. In 34th International Conference on Machine Learning, 2017, pp. 301--310.Google ScholarDigital Library
- N. Abe, B. Zadrozny, J. Langford. Outlier detection by active learning. In SIGKDD International Conference on Knowledge Discovery and Data Mining, 2006, pp. 504--509. Google ScholarDigital Library
- C. Aggarwal, X. Kong, Q. Gu, J. Han, P. Yu. Active learning: a survey, Data Classification: Algorithms and Applications, 2014. Google ScholarDigital Library
- J. Attenberg, F. Provost. Why label when you can search? Alternatives to active learning for applying human resources to build classification models under extreme class imbalance. In smallSIGKDD International Conference on Knowledge Discovery and Data Mining, 2010, pp. 423--432. Google ScholarDigital Library
- N. Cesa-Bianchi, C. Gentile, L. Zaniboni. Worst-case analysis of selective sampling for linear classification. Journal of Machine Learning Research, 2006, No. 7, pp. 1205--1230. Google ScholarDigital Library
- N. Cesa-Bianchi, A. Conconi, C. Gentile. A second-order perceptron algorithm. SIAM Journal on Computing, 2005, No. 3, pp. 640--668. Google ScholarDigital Library
- S. Chakraborty, V. Balasubramanian, A. Sankar, S. Panchanathan, J. Ye. Batchrank: A novel batch mode active learning framework for hierarchical classification. In SIGKDD International Conference on Knowledge Discovery and Data Mining, 2015, pp. 99--108. Google ScholarDigital Library
- C. C. Chang, C. J. Lin. LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2011, Vol. 2, No. 3, pp. 27. Google ScholarDigital Library
- K. Crammer, O. Dekel, J. Keshet, S. Shalev-Shwartz, Y. Singer. Online passive-aggressive algorithms. Journal of Machine Learning Research, 2006, pp. 551--585. Google ScholarDigital Library
- K. Crammer, A. Kulesza, M. Dredze. Adaptive regularization of weight vectors. In Advances in Neural Information Processing Systems, 2009, pp. 414--422. Google ScholarDigital Library
- M. Dundar, B. Krishnapuram, J. Bi, R. B. Rao. Learning classifiers when the training data is not IID. In International Joint Conference on Artificial Intelligence, 2007, pp. 756--761. Google ScholarDigital Library
- M. Fang, X. Zhu, B. Li, W. Ding, X. Wu. Self-taught active learning from crowds. In IEEE International Conference on Data Mining, 2012, pp. 858--863. Google ScholarDigital Library
- Z. Ferdowsi, R. Ghani, R. Settimi. Online active learning with imbalanced classes. In IEEE International Conference on Data Mining. 2013, pp. 1043--1048.Google ScholarCross Ref
- K. Fujii, H. Kashima. Budgeted stream-based active learning via adaptive submodular maximization. In Advances in Neural Information Processing Systems, 2016, pp. 514--522. Google ScholarDigital Library
- Y. Freund, R. E. Schapire. Large margin classification using the perceptron algorithm. Machine learning, 1999, No. 3, pp. 277--296. Google ScholarDigital Library
- S. Hao, J. Lu, P. Zhao, C. Zhang, S. C. Hoi, C. Miao. Second-order online active learning and its applications. IEEE Transactions on Knowledge and Data Engineering, 2017.Google Scholar
- S. Hao, P. Zhao, J. Lu, S. C. Hoi, C. Miao, C. Zhang. Soal: Second-order online active learning. In IEEE International Conference on Data Mining, 2016, pp. 931--936.Google ScholarCross Ref
- R. Horn, C. Johnson. Matrix analysis. Cambridge University Express, 1990. Google ScholarDigital Library
- G. Hulten, L. Spencer, P. Domingos. Mining time-changing data streams. In SIGKDD International Conference on Knowledge Discovery and Data Mining, 2001, pp. 97--106. Google ScholarDigital Library
- S. C. Hoi, R. Jin, J. Zhu, M. R. Lyu. Batch mode active learning and its application to medical image classification. In International Conference on Machine Learning, 2006, pp. 417--424. Google ScholarDigital Library
- S. J. Huang, J. L. Chen, X. Mu, Z. H. Zhou. Cost-Effective active learning from diverse labelers. In International Joint Conference on Artificial Intelligence, 2017, pp. 1879--1885. Google ScholarDigital Library
- K. Konyushkova, R. Sznitman, P. Fua. Learning active learning from data. In Advances in Neural Information Processing Systems, 2017, pp. 4228--4238.Google Scholar
- A. Krishnamurthy, A. Agarwal, T. Huang, D. Hal and J. Langford. Active learning for cost-sensitive classification. In International Conference on Machine Learning, 2017, pp. 1915--1924.Google Scholar
- Y. Li, P. M. Long. The relaxed online maximum margin algorithm. In Advances in Neural Information Processing Systems, 2000, pp. 498--504. Google ScholarDigital Library
- J. Lu, P. Zhao, S. C. Hoi. Online passive-aggressive active learning. Machine Learning, 2016, Vol. 103, No. 2, pp. 141--183. Google ScholarDigital Library
- S. O. Moepya, S. S. Akhoury, F. V. Nelwamondo. Applying cost-sensitive classification for financial fraud detection under high class-imbalance. In IEEE International Conference on Data Mining, 2014, pp. 183--192.Google ScholarCross Ref
- F. Nan, V. Saligrama. Adaptive classification for prediction under a budget. In Advances in Neural Information Processing Systems, 2017, pp. 4730--4740.Google Scholar
- V. S. Sheng, F. Provost, P. G. Ipeirotis. Get another label? improving data quality and data mining using multiple, noisy labelers. In SIGKDD International Conference on Knowledge Discovery and Data Mining, 2008, pp. 614--622. Google ScholarDigital Library
- J. Wang, P. Zhao and S. C. Hoi. Cost-sensitive online classification. IEEE Transactions on Knowledge and Data Engineering, 2014, vol. 26, no. 10, pp. 2425--2438.Google ScholarCross Ref
- X. Zhang, T. Yang, P. Srinivasan. Online asymmetric active learning with imbalanced data. In SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016, pp. 2055--2064. Google ScholarDigital Library
- Y. Zhang, G. Shu, Y. Li. Strategy-updating depending on local environment enhances cooperation in prisoner's dilemma game. Applied Mathematics and Computation, 2017, vol. 301, pp. 224--232. Google ScholarDigital Library
- P. Zhao, S. C. Hoi. Cost-sensitive online active learning with application to malicious URL detection. In SIGKDD International Conference on Knowledge Discovery and Data Mining, 2013, pp. 919--927. Google ScholarDigital Library
- P. Zhao, F. Zhuang, M. Wu, X. Li, and S. C. Hoi. Cost-sensitive online classification with adaptive regularization and its applications. In IEEE International Conference on Data Mining, 2015, pp. 649--658. Google ScholarDigital Library
- P. Zhao, Y. Zhang, M. Wu, S. C. Hoi, M. Tan, J. Huang. Adaptive cost-sensitive online classification. IEEE Transactions on Knowledge and Data Engineering, 2018.Google Scholar
- I. Zliobaite, A. Bifet, B. Pfahringer, G. Holmes. Active learning with drifting streaming data. IEEE Transactions on Neural Networks and Learning Systems, 2014, Vol. 25, No. 1, pp. 27--39.Google ScholarCross Ref
Index Terms
- Online Adaptive Asymmetric Active Learning for Budgeted Imbalanced Data
Recommendations
Online Asymmetric Active Learning with Imbalanced Data
KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data MiningThis paper considers online learning with imbalanced streaming data under a query budget, where the act of querying for labels is constrained to a budget limit. We study different active querying strategies for classification. In particular, we propose ...
Learning from Imbalanced Data
With the continuous expansion of data availability in many large-scale, complex, and networked systems, such as surveillance, security, Internet, and finance, it becomes critical to advance the fundamental understanding of knowledge discovery and ...
Studying Active Learning in the Cost-Sensitive Framework
HICSS '12: Proceedings of the 2012 45th Hawaii International Conference on System SciencesActive learning is a learning paradigm that actively acquires extra information with an "effort" for a certain "gain" when building learning models. This paper unifies the effort and gain by studying active learning in the cost-sensitive framework. The ...
Comments