skip to main content
10.1145/3123266.3123304acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Positive and Unlabeled Learning for Anomaly Detection with Multi-features

Authors Info & Claims
Published:19 October 2017Publication History

ABSTRACT

Anomaly detection is of great interest to big data applications, and both supervised and unsupervised learning have been applied for anomaly detection. However, it still remains a challenging problem because: (1) for supervised learning, it is difficult to acquire training data for anomaly samples; while (2) for unsupervised learning, the performance may not be satisfactory due to the lack of training data. To address the limitations, we propose a hybrid solution by using both normal (positive) data and unlabeled data (could be positive or negative) for semi-supervised anomaly detection. Particularly, we introduce a new framework based on Positive and Unlabeled (PU) Learning using multi-features to detect anomalies. We extend previous PU learning methods to (1) better address unbalanced class problem which is typical for anomaly detection, and (2) handle multiple features for anomaly detection. An iterative algorithm is proposed to learn the anomaly classifier incrementally from the labeled normal data and also unlabeled data. Our proposed method is verified on three benchmark datasets and one synthetic dataset. Experimental results show that our method outperforms existing methods under different class priors and different proportions of given positive classes.

References

  1. Unusual crowd activity dataset of University of Minnesota. Available at http: //mha.cs.umn.edu/proj_events.shtml, Accessed: 2017-04.Google ScholarGoogle Scholar
  2. Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST) 2, 3 (2011), 27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Kaustav Das and Jeff Schneider. 2007. Detecting anomalous records in categorical datasets. In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 220--229. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Marthinus Du Plessis, Gang Niu, and Masashi Sugiyama. 2015. Convex formulation for learning from positive and unlabeled data. In International Conference on Machine Learning. 1386--1394. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Marthinus C du Plessis, Gang Niu, and Masashi Sugiyama. 2014. Analysis of learning from positive and unlabeled data. In Advances in neural information processing systems. 703--711. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Marthinus Christoffel Du Plessis and Masashi Sugiyama. 2014. Class prior estimation from positive and unlabeled data. IEICE TRANSACTIONS on Information and Systems 97, 5 (2014), 1358--1362.Google ScholarGoogle ScholarCross RefCross Ref
  7. Levent Ertöz, Michael Steinbach, and Vipin Kumar. 2003. Finding topics in collections of documents: A shared nearest neighbor approach. Clustering and Information Retrieval 11 (2003), 83--103.Google ScholarGoogle ScholarCross RefCross Ref
  8. Simon Hawkins, Hongxing He, Graham Williams, and Rohan Baxter. 2002. Outlier detection using replicator neural networks. In International Conference on Data Warehousing and Knowledge Discovery. Springer, 170--180. Google ScholarGoogle ScholarCross RefCross Ref
  9. Katherine A Heller, Krysta M Svore, Angelos D Keromytis, and Salvatore J Stolfo. 2003. One class support vector machines for detecting anomalous windows registry accesses. In Proc. of the workshop on Data Mining for Computer Security, Vol. 9.Google ScholarGoogle Scholar
  10. Anurag Kumar and Bhiksha Raj. 2016. Audio event detection using weakly labeled data. In Proceedings of the 2016 ACM on Multimedia Conference. ACM, 1038--1047. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278--2324.Google ScholarGoogle ScholarCross RefCross Ref
  12. Huayi Li, Bing Liu, Arjun Mukherjee, and Jidong Shao. 2014. Spotting fake reviews using positive-unlabeled learning. Computación y Sistemas 18, 3 (2014), 467--475.Google ScholarGoogle Scholar
  13. Xiaoli Li and Bing Liu. 2003. Learning to classify texts using positive and unlabeled data. In IJCAI, Vol. 3. 587--592. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Bing Liu, Wee Sun Lee, Philip S Yu, and Xiaoli Li. 2002. Partially supervised classification of text documents. In ICML, Vol. 2. Citeseer, 387--394. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Fantine Mordelet and Jean-Philippe Vert. 2011. ProDiGe: Prioritization Of Disease Genes with multitask machine learning from positive and unlabeled examples. BMC bioinformatics 12, 1 (2011), 389.Google ScholarGoogle Scholar
  16. Gang Niu, Marthinus Christoffel du Plessis, Tomoya Sakai, and Masashi Sugiyama. 2016. Theoretical Comparisons of Learning from Positive-Negative, Positive-Unlabeled, and Negative-Unlabeled Data. arXiv preprint arXiv:1603.03130 (2016).Google ScholarGoogle Scholar
  17. Yafeng Ren, Donghong Ji, and Hongbin Zhang. 2014. Positive Unlabeled Learning for Deceptive Reviews Detection.. In EMNLP. 488--498.Google ScholarGoogle Scholar
  18. Jungsuk Song, Hiroki Takakura, and Yasuo Okabe. 2008. Cooperation of intelligent honeypots to detect unknown malicious codes. In Information Security Threats Data Collection and Sharing, 2008. WISTDCS'08. WOMBAT Workshop on. IEEE, 31--39. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. 2015. Learning spatiotemporal features with 3d convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision. 4489--4497. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Xin-Shun Xu, Yuan Jiang, Xiangyang Xue, and Zhi-Hua Zhou. 2012. Semi-supervised multi-instance multi-label learning for video annotation task. In Proceedings of the 20th ACM international conference on Multimedia. ACM, 737--740. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Peng Yang, Xiaoli Li, Hon-Nian Chua, Chee-Keong Kwoh, and See-Kiong Ng. 2014. Ensemble positive unlabeled learning for disease gene identification. PloS one 9, 5 (2014), e97079.Google ScholarGoogle ScholarCross RefCross Ref
  22. Peng Yang, Xiao-Li Li, Jian-Ping Mei, Chee-Keong Kwoh, and See-Kiong Ng. 2012. Positive-unlabeled learning for disease gene identification. Bioinformatics 28, 20 (2012), 2640--2647. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Kun Zhao, Wei Liu, and Jianzhuang Liu. 2012. Optimal semi-supervised metric learning for image retrieval. In Proceedings of the 20th ACM international conference on Multimedia. ACM, 893--896. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    MM '17: Proceedings of the 25th ACM international conference on Multimedia
    October 2017
    2028 pages
    ISBN:9781450349062
    DOI:10.1145/3123266

    Copyright © 2017 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 19 October 2017

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article

    Acceptance Rates

    MM '17 Paper Acceptance Rate189of684submissions,28%Overall Acceptance Rate995of4,171submissions,24%

    Upcoming Conference

    MM '24
    MM '24: The 32nd ACM International Conference on Multimedia
    October 28 - November 1, 2024
    Melbourne , VIC , Australia

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader