skip to main content
10.1145/1321440.1321552acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Detecting distance-based outliers in streams of data

Published: 06 November 2007 Publication History

Abstract

In this work a method for detecting distance-based outliers in data streams is presented. We deal with the sliding window model, where outlier queries are performed in order to detect anomalies in the current window. Two algorithms are presented. The first one exactly answers outlier queries, but has larger space requirements. The second algorithm is directly derived from the exact one, has limited memory requirements and returns an approximate answer based on accurate estimations with a statistical guarantee. Several experiments have been accomplished, confirming the effectiveness of the proposed approach and the high quality of approximate solutions.

References

[1]
C. C. Aggarwal and P. S. Yu. Outlier detection for high dimensional data. In Proc. Int. Conference on Managment of Data (SIGMOD'01), 2001.
[2]
Charu C. Aggarwal. On abnormality detection in spuriously populated data streams. In SIAM Data Mining, 2005.
[3]
F. Angiulli, S. Basta, and C. Pizzuti. Distance-based detection and prediction of outliers. IEEE Transaction on Knowledge and Data Engineering, 18(2):145--160, February 2006.
[4]
F. Angiulli and C. Pizzuti. Fast outlier detection in large high-dimensional data sets. In Proc. Int. Conf. on Principles of Data Mining and Knowledge Discovery (PKDD'02), pages 15--26, 2002.
[5]
F. Angiulli and C. Pizzuti. Outlier mining in large high-dimensional data sets. IEEE Transaction on Knowledge and Data Engineering, 17(2):203--215, February 2005.
[6]
A. Arning, C. Aggarwal, and P. Raghavan. A linear method for deviation detection in large databases. In Proc. Int. Conf. on Knowledge Discovery and Data Mining (KDD'96), pages 164--169, 1996.
[7]
Brian Babcock, Shivnath Babu, Mayur Datar, Rajeev Motwani, and Jennifer Widom. Models and issues in data stream systems. In PODS, pages 1--16, 2002.
[8]
V. Barnett and T. Lewis. Outliers in Statistical Data. John Wiley & Sons, 1994.
[9]
S. D. Bay and M. Schwabacher. Mining distance-based outliers in near linear time with randomization and a simple pruning rule. In Proc. Int. Conf. on Knowledge Discovery and Data Mining (KDD'03), 2003.
[10]
N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger. The r*-tree: An efficient and robust access method for points and rectangles. In Proc. of the SIGMOD Conference, pages 322--331, 1990.
[11]
M. M. Breunig, H. Kriegel, R. T. Ng, and J. Sander. Lof: Identifying density-based local outliers. In Proc. Int. Conf. on Managment of Data (SIGMOD'00), 2000.
[12]
Edgar Chávez, Gonzalo Navarro, Ricardo A. Baeza-Yates, and José L. Marroquín. Searching in metric spaces. ACM Comput. Surv., 33(3):273--321, 2001.
[13]
Defense Advanced Research Projects Agency DARPA. Intrusion detection evaluation. In http://www.ll.mit.edu/IST/ideval/index.html.
[14]
E. Eskin, A. Arnold, M. Prerau, L. Portnoy, and S. Stolfo. A geometric framework for unsupervised anomaly detection: Detecting intrusions in unlabeled data. In Applications of Data Mining in Computer Security, Kluwer, 2002.
[15]
Lukasz Golab and M. Tamer &3214;zsu. Issues in data stream management. SIGMOD Record, 32(2):5--14, 2003.
[16]
W. Jin, A. K. H. Tung, and J. Han. Mining top-n local outliers in large databases. In Proc. ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'01), 2001.
[17]
E. Knorr and R. Ng. Algorithms for mining distance-based outliers in large datasets. In Proc. Int. Conf. on Very Large Databases (VLDB98), pages 392--403, 1998.
[18]
E. Knorr and R. Ng. Finding intensional knowledge of distance-based outliers. In Proc. Int. Conf. on Very Large Databases (VLDB99), pages 211--222, 1999.
[19]
E. Knorr, R. Ng, and V. Tucakov. Distance-based outlier: algorithms and applications. VLDB Journal, 8(3-4):237--253, 2000.
[20]
Donald Knuth. The Art of Computer Programming, Volume 3: Sorting and Searching. Addison-Wesley, 1997.
[21]
A. Lazarevic, L. Ertöz, V. Kumar, A. Ozgur, and J. Srivastava. A comparative study of anomaly detection schemes in network intrusion detection. In Proc. of the SIAM Int. Conf. on Data Mining, 2003.
[22]
S. Papadimitriou, H. Kitagawa, P. B. Gibbons, and C. Faloutsos. Loci: Fast outlier detection using the local correlation integral. In Proc. Int. Conf. on Data Enginnering (ICDE), pages 315--326, 2003.
[23]
Spiros Papadimitriou, Hiroyuki Kitagawa, Phillip B. Gibbons, and Christos Faloutsos. Loci: Fast outlier detection using the local correlation integral. In ICDE, pages 315--326, 2003.
[24]
S. Ramaswamy, R. Rastogi, and K. Shim. Efficient algorithms for mining outliers from large data sets. In Proc. Int. Conf. on Managment of Data (SIGMOD'00), pages 427--438, 2000.
[25]
S. Subramaniam, T. Palpanas, D. Papadopoulos, V. Kalogeraki, and D. Gunopulos. Online outlier detection in sensor data using non-parametric models. In International Conference on Very Large Data Bases, Seoul, Korea, September 12--15 2006.
[26]
O. Watanabe. Simple sampling techniques for discovery science. TIEICE: IEICE Transactions on Communications/Electronics/Information and Systems, E83-D(1):19--26, 2000.
[27]
Kenji Yamanishi, Jun ichi Takeuchi, Graham J. Williams, and Peter Milne. On-line unsupervised outlier detection using finite mixtures with discounting learning algorithms. In KDD, pages 320--324, 2000.

Cited By

View all
  • (2024)Multivariate Time Series Cleaning under Speed ConstraintsProceedings of the ACM on Management of Data10.1145/36988212:6(1-26)Online publication date: 20-Dec-2024
  • (2024)Structural performance‐based anomaly detection for velocity pulseComputer-Aided Civil and Infrastructure Engineering10.1111/mice.13174Online publication date: 22-Feb-2024
  • (2024)RTOD: Efficient Outlier Detection With Ray Tracing CoresIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.345390136:12(9192-9204)Online publication date: Dec-2024
  • Show More Cited By

Index Terms

  1. Detecting distance-based outliers in streams of data

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
    November 2007
    1048 pages
    ISBN:9781595938039
    DOI:10.1145/1321440
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 06 November 2007

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. anomaly detection
    2. data streams
    3. distance-based outliers

    Qualifiers

    • Research-article

    Conference

    CIKM07

    Acceptance Rates

    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    CIKM '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)42
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 20 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Multivariate Time Series Cleaning under Speed ConstraintsProceedings of the ACM on Management of Data10.1145/36988212:6(1-26)Online publication date: 20-Dec-2024
    • (2024)Structural performance‐based anomaly detection for velocity pulseComputer-Aided Civil and Infrastructure Engineering10.1111/mice.13174Online publication date: 22-Feb-2024
    • (2024)RTOD: Efficient Outlier Detection With Ray Tracing CoresIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.345390136:12(9192-9204)Online publication date: Dec-2024
    • (2024)Unsupervised Adaptive Fleet Battery Pack Fault Detection With Concept Drift Under Evolving EnvironmentIEEE Transactions on Automation Science and Engineering10.1109/TASE.2024.336300221:3(2276-2288)Online publication date: Jul-2024
    • (2024)Parameter-free Streaming Distance-based Outlier Detection2024 IEEE 40th International Conference on Data Engineering Workshops (ICDEW)10.1109/ICDEW61823.2024.00019(102-106)Online publication date: 13-May-2024
    • (2024)Performance Analysis of Online Machine Learning Frameworks for Anomaly Detection in IoT Data Streams2024 15th International Conference on Computing Communication and Networking Technologies (ICCCNT)10.1109/ICCCNT61001.2024.10724326(1-5)Online publication date: 24-Jun-2024
    • (2024)Anomaly Detection using PCA in Time Series Data2024 IEEE International Conference on Interdisciplinary Approaches in Technology and Management for Social Innovation (IATMSI)10.1109/IATMSI60426.2024.10502929(1-6)Online publication date: 14-Mar-2024
    • (2024)Revisiting streaming anomaly detection: benchmark and evaluationArtificial Intelligence Review10.1007/s10462-024-10995-w58:1Online publication date: 7-Nov-2024
    • (2024)Adaptive Plug-and-Play Framework for Time Series Anomaly Detection with Temporal DriftAdvanced Data Mining and Applications10.1007/978-981-96-0840-9_28(401-416)Online publication date: 13-Dec-2024
    • (2024)dSalmon: High-Speed Anomaly Detection for Evolving Multivariate Data StreamsPerformance Evaluation Methodologies and Tools10.1007/978-3-031-48885-6_10(153-169)Online publication date: 3-Jan-2024
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media