skip to main content
10.1145/3219819.3220022acmotherconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

DILOF: Effective and Memory Efficient Local Outlier Detection in Data Streams

Published: 19 July 2018 Publication History

Abstract

With precipitously growing demand to detect outliers in data streams, many studies have been conducted aiming to develop extensions of well-known outlier detection algorithm called Local Outlier Factor (LOF), for data streams. However, existing LOF-based algorithms for data streams still suffer from two inherent limitations: 1) Large amount of memory space is required. 2) A long sequence of outliers is not detected. In this paper, we propose a new outlier detection algorithm for data streams, called DILOF that effectively overcomes the limitations. To this end, we first develop a novel density-based sampling algorithm to summarize past data and then propose a new strategy for detecting a sequence of outliers. It is worth noting that our proposing algorithms do not require any prior knowledge or assumptions on data distribution. Moreover, we accelerate the execution time of DILOF about 15 times by developing a powerful distance approximation technique. Our comprehensive experiments on real-world datasets demonstrate that DILOF significantly outperforms the state-of-the-art competitors in terms of accuracy and execution time. The source code for the proposed algorithm is available at our website: http://di.postech.ac.kr/DILOF.

References

[1]
C. C. Aggarwal and S. Sathe. 2015. Theoretical Foundations and Algorithms for Outlier Ensembles. ACM SIGKDD Exploarations Newsletter 17, 1 (2015).
[2]
M. Arjovsky, S. Chintala, and L. Bottou. 2017. Wasserstein GAN. arXiv:1701.07875 (2017).
[3]
M. M. Breunig, H. P. Kriegel, R. T. Ng, and J. Sander. 2000. LOF: Identifying Density-based Local Outliers. In SIGMOD. ACM, 93--104.
[4]
K. Grauman and T. Darrell. 2004. Fast Counter Matching Using Approximate Earth Mover's Distance. In CVPR.
[5]
A. K. Jain. 2010. Data clustering: 50 years beyond K-means. Pattern Recognition Letters 31, 8 (2010).
[6]
G. Kollios, D. Gunopulos, N. Koudas, and S. Berchtold. 2003. Efficient biased sampling for approximate clustering and outlier detection in large data sets. IEEE Transactions on Knowledge and Data Engineering 15, 5 (2003).
[7]
X. Mu, F. Zhu, J. Du, E. Lim, and Z. Zhou. 2017. Streaming Classification with Emerging New Class by Class Matrix Sketching. In Proceedings of the Thiry-First AAAI Conference on Artificial Intelligence. 2373--2379.
[8]
B. Poczos, L. Xiong, and J. Schneider. 2011. Nonparametric Divergence Estimation with Applications to Machine Learning on Distributions. In Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence. 599--608.
[9]
D. Pokrajac, A. Lazarevic, and L. J. Latecki. 2007. Incremental Local Outlier Detection for Data Streams. In Proceedings of the 2007 IEEE Symposium on Computational Intelligence and Data Mining. 504--515.
[10]
F. Ros and S. Guillaume. 2016. DENDIS: A new density-based sampling for clustering algorithm. Expert Systems With Applications 56, 1 (2016).
[11]
M. Salehi, C. Leckie, J. C. Bezdek, T. Vaithianathan, and X. Zhang. 2016. Fast Memory Efficient Local Outlier Detection in Data Streams. IEEE Transactions on Knowledge and Data Engineering 28, 12 (2016).
[12]
Y. Tang, L. H. U, Y. Cai, N. Mamoulis, and R. Cheng. 2013. Earth Mover's Distance based Similarity Search at Scale. Proceedings of the VLDB Endowment 7, 4 (2013).
[13]
Y. Xiao, B. Liu, Z. Hao, and L. Cao. 2014. A K-Farthest-Neighbor-based approach for support vector data description. Applied Intelligence 41, 1 (2014).
[14]
K. Yamanishi, J. Takeuchi, and G. Williams. 2000. On-line Unsupervised Outlier Detection Using Finite Mixtures with Discounting Learning Algorithms. In KDD. 320--324.
[15]
Y. Yan, L. Cao, C. Kuhlman, and E. A. Rundensteiner. 2017. Distributed Local Outlier Detection in Big Data. In KDD. 1225--1234.
[16]
Y. Yan, L. Cao, and E. A. Rundensteiner. 2017. Scalable Top-n Local Outlier Detection. In KDD. 1235--1244.
[17]
D. Yang, E. A. Rundensteiner, and M. O. Ward. 2011. Summarization and matching of density-based clusters in streaming environments. Proceedings of the VLDB Endowment 5, 2 (2011).

Cited By

View all
  • (2024)Toward Ubiquitous Interaction-Attentive and Extreme-Aware Crowd Activity Level PredictionACM Transactions on Intelligent Systems and Technology10.1145/368206315:6(1-26)Online publication date: 29-Jul-2024
  • (2024)An Explore–Exploit Workload-Bounded Strategy for Rare Event Detection in Massive Energy Sensor Time SeriesACM Transactions on Intelligent Systems and Technology10.1145/365764115:4(1-25)Online publication date: 17-Apr-2024
  • (2024)A Systematic Literature Review of Novelty Detection in Data Streams: Challenges and OpportunitiesACM Computing Surveys10.1145/365728656:10(1-37)Online publication date: 12-Apr-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
July 2018
2925 pages
ISBN:9781450355520
DOI:10.1145/3219819
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 July 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data streams
  2. density-based sampling
  3. outlier detection

Qualifiers

  • Research-article

Funding Sources

  • the National Research Foundation of Korea (NRF)
  • Naver Corporation
  • Institute for Information &communications Technology Promotion (IITP)
  • Next-Generation Information Computing Development Program

Conference

KDD '18
Sponsor:

Acceptance Rates

KDD '18 Paper Acceptance Rate 107 of 983 submissions, 11%;
Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)103
  • Downloads (Last 6 weeks)6
Reflects downloads up to 03 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Toward Ubiquitous Interaction-Attentive and Extreme-Aware Crowd Activity Level PredictionACM Transactions on Intelligent Systems and Technology10.1145/368206315:6(1-26)Online publication date: 29-Jul-2024
  • (2024)An Explore–Exploit Workload-Bounded Strategy for Rare Event Detection in Massive Energy Sensor Time SeriesACM Transactions on Intelligent Systems and Technology10.1145/365764115:4(1-25)Online publication date: 17-Apr-2024
  • (2024)A Systematic Literature Review of Novelty Detection in Data Streams: Challenges and OpportunitiesACM Computing Surveys10.1145/365728656:10(1-37)Online publication date: 12-Apr-2024
  • (2024)Unsupervised Adaptive Fleet Battery Pack Fault Detection With Concept Drift Under Evolving EnvironmentIEEE Transactions on Automation Science and Engineering10.1109/TASE.2024.336300221:3(2276-2288)Online publication date: Jul-2024
  • (2024)A Review of Unsupervised Anomaly Detection Techniques for Health Insurance Fraud2024 IEEE 10th International Conference on Big Data Computing Service and Machine Learning Applications (BigDataService)10.1109/BigDataService62917.2024.00028(141-149)Online publication date: 15-Jul-2024
  • (2024)Leveraging the Christoffel function for outlier detection in data streamsInternational Journal of Data Science and Analytics10.1007/s41060-024-00581-2Online publication date: 13-Jun-2024
  • (2024)Revisiting streaming anomaly detection: benchmark and evaluationArtificial Intelligence Review10.1007/s10462-024-10995-w58:1Online publication date: 7-Nov-2024
  • (2024)Multi-view anomaly detection via hybrid instance-neighborhood aligning and cross-view reasoningMultimedia Systems10.1007/s00530-024-01526-230:6Online publication date: 10-Oct-2024
  • (2024)Adaptive Plug-and-Play Framework for Time Series Anomaly Detection with Temporal DriftAdvanced Data Mining and Applications10.1007/978-981-96-0840-9_28(401-416)Online publication date: 13-Dec-2024
  • (2024)Unsupervised Time Series Anomaly Detection for Edge Computing Applications: A ReviewIoT Edge Intelligence10.1007/978-3-031-58388-9_6(173-198)Online publication date: 29-Mar-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media