research-article

Positive and Unlabeled Learning for Anomaly Detection with Multi-features

Authors:
Jiaqi Zhang

Nanyang Technological University, Singapore, Singapore

Nanyang Technological University, Singapore, Singapore
View Profile

,
Zhenzhen Wang

Nanyang Technological University, Singapore, Singapore

Nanyang Technological University, Singapore, Singapore
View Profile

,
Junsong Yuan

Nanyang Technological University, Singapore, Singapore

Nanyang Technological University, Singapore, Singapore
View Profile

,
Yap-Peng Tan

Nanyang Technological University, Singapore, Singapore

Nanyang Technological University, Singapore, Singapore
View Profile

MM '17: Proceedings of the 25th ACM international conference on MultimediaOctober 2017Pages 854–862https://doi.org/10.1145/3123266.3123304

Published:19 October 2017Publication History

MM '17: Proceedings of the 25th ACM international conference on Multimedia

Pages 854–862

ABSTRACT

Anomaly detection is of great interest to big data applications, and both supervised and unsupervised learning have been applied for anomaly detection. However, it still remains a challenging problem because: (1) for supervised learning, it is difficult to acquire training data for anomaly samples; while (2) for unsupervised learning, the performance may not be satisfactory due to the lack of training data. To address the limitations, we propose a hybrid solution by using both normal (positive) data and unlabeled data (could be positive or negative) for semi-supervised anomaly detection. Particularly, we introduce a new framework based on Positive and Unlabeled (PU) Learning using multi-features to detect anomalies. We extend previous PU learning methods to (1) better address unbalanced class problem which is typical for anomaly detection, and (2) handle multiple features for anomaly detection. An iterative algorithm is proposed to learn the anomaly classifier incrementally from the labeled normal data and also unlabeled data. Our proposed method is verified on three benchmark datasets and one synthetic dataset. Experimental results show that our method outperforms existing methods under different class priors and different proportions of given positive classes.

References

Unusual crowd activity dataset of University of Minnesota. Available at http: //mha.cs.umn.edu/proj_events.shtml, Accessed: 2017-04.Google Scholar
Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST) 2, 3 (2011), 27. Google ScholarDigital Library
Kaustav Das and Jeff Schneider. 2007. Detecting anomalous records in categorical datasets. In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 220--229. Google ScholarDigital Library
Marthinus Du Plessis, Gang Niu, and Masashi Sugiyama. 2015. Convex formulation for learning from positive and unlabeled data. In International Conference on Machine Learning. 1386--1394. Google ScholarDigital Library
Marthinus C du Plessis, Gang Niu, and Masashi Sugiyama. 2014. Analysis of learning from positive and unlabeled data. In Advances in neural information processing systems. 703--711. Google ScholarDigital Library
Marthinus Christoffel Du Plessis and Masashi Sugiyama. 2014. Class prior estimation from positive and unlabeled data. IEICE TRANSACTIONS on Information and Systems 97, 5 (2014), 1358--1362.Google ScholarCross Ref
Levent Ertöz, Michael Steinbach, and Vipin Kumar. 2003. Finding topics in collections of documents: A shared nearest neighbor approach. Clustering and Information Retrieval 11 (2003), 83--103.Google ScholarCross Ref
Simon Hawkins, Hongxing He, Graham Williams, and Rohan Baxter. 2002. Outlier detection using replicator neural networks. In International Conference on Data Warehousing and Knowledge Discovery. Springer, 170--180. Google ScholarCross Ref
Katherine A Heller, Krysta M Svore, Angelos D Keromytis, and Salvatore J Stolfo. 2003. One class support vector machines for detecting anomalous windows registry accesses. In Proc. of the workshop on Data Mining for Computer Security, Vol. 9.Google Scholar
Anurag Kumar and Bhiksha Raj. 2016. Audio event detection using weakly labeled data. In Proceedings of the 2016 ACM on Multimedia Conference. ACM, 1038--1047. Google ScholarDigital Library
Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278--2324.Google ScholarCross Ref
Huayi Li, Bing Liu, Arjun Mukherjee, and Jidong Shao. 2014. Spotting fake reviews using positive-unlabeled learning. Computación y Sistemas 18, 3 (2014), 467--475.Google Scholar
Xiaoli Li and Bing Liu. 2003. Learning to classify texts using positive and unlabeled data. In IJCAI, Vol. 3. 587--592. Google ScholarDigital Library
Bing Liu, Wee Sun Lee, Philip S Yu, and Xiaoli Li. 2002. Partially supervised classification of text documents. In ICML, Vol. 2. Citeseer, 387--394. Google ScholarDigital Library
Fantine Mordelet and Jean-Philippe Vert. 2011. ProDiGe: Prioritization Of Disease Genes with multitask machine learning from positive and unlabeled examples. BMC bioinformatics 12, 1 (2011), 389.Google Scholar
Gang Niu, Marthinus Christoffel du Plessis, Tomoya Sakai, and Masashi Sugiyama. 2016. Theoretical Comparisons of Learning from Positive-Negative, Positive-Unlabeled, and Negative-Unlabeled Data. arXiv preprint arXiv:1603.03130 (2016).Google Scholar
Yafeng Ren, Donghong Ji, and Hongbin Zhang. 2014. Positive Unlabeled Learning for Deceptive Reviews Detection.. In EMNLP. 488--498.Google Scholar
Jungsuk Song, Hiroki Takakura, and Yasuo Okabe. 2008. Cooperation of intelligent honeypots to detect unknown malicious codes. In Information Security Threats Data Collection and Sharing, 2008. WISTDCS'08. WOMBAT Workshop on. IEEE, 31--39. Google ScholarDigital Library
Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. 2015. Learning spatiotemporal features with 3d convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision. 4489--4497. Google ScholarDigital Library
Xin-Shun Xu, Yuan Jiang, Xiangyang Xue, and Zhi-Hua Zhou. 2012. Semi-supervised multi-instance multi-label learning for video annotation task. In Proceedings of the 20th ACM international conference on Multimedia. ACM, 737--740. Google ScholarDigital Library
Peng Yang, Xiaoli Li, Hon-Nian Chua, Chee-Keong Kwoh, and See-Kiong Ng. 2014. Ensemble positive unlabeled learning for disease gene identification. PloS one 9, 5 (2014), e97079.Google ScholarCross Ref
Peng Yang, Xiao-Li Li, Jian-Ping Mei, Chee-Keong Kwoh, and See-Kiong Ng. 2012. Positive-unlabeled learning for disease gene identification. Bioinformatics 28, 20 (2012), 2640--2647. Google ScholarDigital Library
Kun Zhao, Wei Liu, and Jianzhuang Liu. 2012. Optimal semi-supervised metric learning for image retrieval. In Proceedings of the 20th ACM international conference on Multimedia. ACM, 893--896. Google ScholarDigital Library

Recommendations

Learning from Positive and Unlabeled Multi-Instance Bags in Anomaly Detection
KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

In the multi-instance learning (MIL) setting instances are grouped together into bags. Labels are provided only for the bags and not on the level of individual instances. A positive bag label means that at least one instance inside the bag is positive, ...
Read More
Deep Weakly-supervised Anomaly Detection
KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

Recent semi-supervised anomaly detection methods that are trained using small labeled anomaly examples and large unlabeled data (mostly normal data) have shown largely improved performance over unsupervised methods. However, these methods often focus on ...
Read More
A unified framework for semi-supervised PU learning

Traditional supervised classifiers use only labeled data (features/label pairs) as the training set, while the unlabeled data is used as the testing set. In practice, it is often the case that the labeled data is hard to obtain and the unlabeled data ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MM '17: Proceedings of the 25th ACM international conference on Multimedia
October 2017
2028 pages
ISBN:9781450349062
DOI:10.1145/3123266
General Chairs:
Qiong Liu
FXPAL, USA
,
Rainer Lienhart
Universität Augsburg, Germany
,
Haohong Wang
TCL America, USA
,
Program Chairs:
Sheng-Wei "Kuan-Ta" Chen
Academia Sinica, Taiwan
,
Susanne Boll
University of Oldenburg, Germany
,
Phoebe Chen
La Trobe University, Australia
,
Gerald Friedland
Lawrence Livermore National Lab, USA
,
Jia Li
Google, USA
,
Shuicheng Yan
Qihoo 360, China
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 19 October 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
anomaly detection
intrusion detection
pu learning
semi-supervised learning
Qualifiers
- research-article
Conference

Acceptance Rates
MM '17 Paper Acceptance Rate189of684submissions,28%Overall Acceptance Rate995of4,171submissions,24%
More
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 16
  Total Citations
  View Citations
- 636
  Total Downloads
- Downloads (Last 12 months)55
- Downloads (Last 6 weeks)10
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Positive and Unlabeled Learning for Anomaly Detection with Multi-features

MM '17: Proceedings of the 25th ACM international conference on Multimedia

ABSTRACT

References

Cited By

Recommendations

Learning from Positive and Unlabeled Multi-Instance Bags in Anomaly Detection

Deep Weakly-supervised Anomaly Detection

A unified framework for semi-supervised PU learning