LOF: identifying density-based local outliers

Authors:
Markus M. Breunig

Institute for Computer Science, University of Munich, Oettingenstr. 67, D-80538 Munich, Germany

Institute for Computer Science, University of Munich, Oettingenstr. 67, D-80538 Munich, Germany
View Profile

,
Hans-Peter Kriegel

Institute for Computer Science, University of Munich, Oettingenstr. 67, D-80538 Munich, Germany

Institute for Computer Science, University of Munich, Oettingenstr. 67, D-80538 Munich, Germany
View Profile

,
Raymond T. Ng

Department of Computer Science, University of British Columbia, Vancouver, BC V6T 1Z4 Canada

Department of Computer Science, University of British Columbia, Vancouver, BC V6T 1Z4 Canada
View Profile

,
Jörg Sander

Institute for Computer Science, University of Munich, Oettingenstr. 67, D-80538 Munich, Germany

Institute for Computer Science, University of Munich, Oettingenstr. 67, D-80538 Munich, Germany
View Profile

Authors Info & Claims

ACM SIGMOD Record Volume 29 Issue 2June 2000pp 93–104https://doi.org/10.1145/335191.335388

Published:16 May 2000Publication History

ACM SIGMOD Record

Abstract

For many KDD applications, such as detecting criminal activities in E-commerce, finding the rare instances or the outliers, can be more interesting than finding the common patterns. Existing work in outlier detection regards being an outlier as a binary property. In this paper, we contend that for many scenarios, it is more meaningful to assign to each object a degree of being an outlier. This degree is called the local outlier factor (LOF) of an object. It is local in that the degree depends on how isolated the object is with respect to the surrounding neighborhood. We give a detailed formal analysis showing that LOF enjoys many desirable properties. Using real-world datasets, we demonstrate that LOF can be used to find outliers which appear to be meaningful, but can otherwise not be identified with existing approaches. Finally, a careful performance evaluation of our algorithm confirms we show that our approach of finding local outliers can be practical.

References

1 Aming, A., Agrawal R., Raghavan R: "A Linear Method for Deviation Detection in Large Databases", Proc. 2rid Int. Conf. on Knowledge Discovery and Data Mining, Portland, OR, AAAI Press, 1996, p. 164-169.Google Scholar
2 Ankerst M., Breunig M.M., Kriegel H.-E, Sander J.: "OPTICS: Ordering Points To Identify the Clustering Structure", Proc. ACM SIGMOD Int. Conf. on Management of Data, Philadelphia, PA, 1999. Google ScholarDigital Library
3 Agrawal R., Gehrke J., Gunopulos D., Raghavan E: "Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications", Proc. ACM SIGMOD Int. Conf. on Management of Data, Seattle, WA, 1998, pp. 94-105. Google ScholarDigital Library
4 Berchthold S., Keim D. A., Kriegel H.-E: "The X-Tree: An Index Structure for High-Dimensional Data", 22nd Conf. on Very Large Data Bases, Bombay, India, 1996, pp. 28-39. Google ScholarDigital Library
5 Barnett V., Lewis T.: "Outliers in statistical data", John Wiley, 1994.Google Scholar
6 DuMouchel W., Schonlau M.: "A Fast Computer Intrusion Detection Algorithm based on Hypothesis Testing of Command Transition Probabilities", Proc. 4th Int. Conf. on Knowledge Discovery and Data Mining, New York, NY, AAAI Press, 1998, pp. 189-193.Google Scholar
7 Ester M., Kriegel H.-E, Sander J., Xu X.: "A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise", Proc. 2rid Int. Conf. on Knowledge Discovery and Data Mining, Portland, OR, AAAI Press, 1996, pp. 226-231.Google Scholar
8 Fawcett T., Provost F.: "Adaptive Fraud Detection", Data Mining and Knowledge Discovery Journal, Kluwer Academic Publishers, Vol. 1, No. 3, 1997, pp. 291-316. Google ScholarDigital Library
9 Fayyad U., Piatetsky-Shapiro G., Smyth R: "Knowledge Discovery and Data Mining: Towards a Unifying Framework", Proc. 2rid Int. Conf. on Knowledge Discovery and Data Mining, Portland, OR, 1996, pp. 82-88.Google Scholar
10 Hawkins, D.: "Identification of Outliers", Chapman and Hall, London, 1980.Google ScholarCross Ref
11 Hinneburg A., Keim D.A.: "An Efficient Approach to Clustering in Large Multimedia Databases with Noise", Proc. 4th Int. Conf. on Knowledge Discovery and Data Mining, New York City, NY, 1998,pp. 58-65.Google Scholar
12 Johnson T., Kwok I., Ng R.: "Fast Computation of 2- Dimensional Depth Contours", Proc. 4th Int. Conf. on Knowledge Discovery and Data Mining, New York, NY, AAAI Press, 1998, pp. 224-228.Google Scholar
13 Knott E.M., Ng R.T.: "Algorithms for Mining Distance- Based Outliers in Large Datasets", Proc. 24th Int. Conf. on Very Large Data Bases, New York, NY, 1998, pp. 392-403. Google ScholarDigital Library
14 Knott E. M., Ng R.T.: "Finding Intensional Knowledge of Distance-based Outliers", Proc. 25th Int. Conf. on Very Large Data Bases, Edinburgh, Scotland, 1999, pp. 211-222. Google ScholarDigital Library
15 Ng R. T., Hart J.: "Efficient and Effective Clustering Methods for Spatial Data Mining", Proc. 20th Int. Conf. on Very Large Data Bases, Santiago, Chile, Morgan Kaufmann Publishers, San Francisco, CA, 1994, pp. 144-155. Google ScholarDigital Library
16 Preparata E, Shamos M.: "Computational Geometry: an Introduction", Springer, 1988. Google ScholarDigital Library
17 Ramaswamy S., Rastogi R., Kyuseok S.: "Efficient Algorithms for Mining Outliers from Large Data Sets", Proc. ACM SIDMOD Int. Conf. on Management of Data, 2000. Google ScholarDigital Library
18 Ruts I., Rousseeuw E: "Computing Depth Contours of Bivariate Point Clouds, Journal of Computational Statistics and Data Analysis, 23, 1996, pp. 153-168. Google ScholarDigital Library
19 Sheikholeslami G., Chatterjee S., Zhang A.: "WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases", Proc. Int. Conf. on Very Large Data Bases, New York, NY, 1998, pp. 428-439. Google ScholarDigital Library
20 Tukey J. W.: "Exploratory Data Analysis", Addison-Wesley, 1977.Google Scholar
21 Weber R., Schek Hans-L, Blott S.: "A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces", Proc. Int. Conf. on Very Large Data Bases, New York, NY, 1998, pp. 194-205. Google ScholarDigital Library
22 Wang W., Yang J., Muntz R.: "STING: A Statistical Information Grid Approach to Spatial Data Mining", Proc. 23th Int. Conf. on Very Large Data Bases, Athens, Greece, Morgan Kaufmann Publishers, San Francisco, CA, 1997, pp. 186-195. Google ScholarDigital Library
23 Zhang T., Ramakrishnan R., Linvy M.: "BIRCH: An Efficient Data Clustering Method for Very Large Databases", Proc. ACM SIGMOD Int. Conf. on Management of Data, ACM Press, New York, 1996, pp. 103-114. Google ScholarDigital Library

Index Terms

LOF: identifying density-based local outliers
1. Information systems
  1. Data management systems
    1. Database design and models
  2. Information systems applications

Recommendations

LOF: identifying density-based local outliers
SIGMOD '00: Proceedings of the 2000 ACM SIGMOD international conference on Management of data

For many KDD applications, such as detecting criminal activities in E-commerce, finding the rare instances or the outliers, can be more interesting than finding the common patterns. Existing work in outlier detection regards being an outlier as a binary ...
Read More
Improving Detection Efficiency: Optimizing Block Size in the Local Outlier Factor (LOF) Algorithm
Rough Sets
Abstract
Detecting outliers in data is essential in various fields, such as finance, healthcare, and many other domains with anomalies. Among well-known outlier detection algorithms, Local Outlier Factor (LOF) is widely used for identifying unusual data ...
Read More
LDBOD: A novel local distribution based outlier detector

As an important research direction in KDD field, outlier detection has been drawing much attention from different communities. In this paper, two novel algorithms LDBOD and LDBOD+ for outlier detection are proposed. Similar to LOF, they also aim to find ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM SIGMOD Record Volume 29, Issue 2
June 2000
609 pages
ISSN:0163-5808
DOI:10.1145/335191
Editors:
Weidong Chen
Southern Methodist Univ., Dallas, TX
,
Jeffrey Naughton
Univ. of Wisconsin-Madison, Madison
,
Philip A. Bernstein
Microsoft
Issue’s Table of Contents
SIGMOD '00: Proceedings of the 2000 ACM SIGMOD international conference on Management of data
May 2000
604 pages
ISBN:1581132174
DOI:10.1145/342009
Chairmen:
Maggie Dunham
Southern Methodist Univ.
,
Jeffrey F. Naughton
Univ. of Wisconsin-Madison
,
Weidong Chen
Southern Methodist Univ.
,
Nick Koudas
AT &T Labs
Copyright © 2000 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 16 May 2000
Check for updates
Author Tags
database mining
outlier detection
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 5,038
  Total Citations
  View Citations
- 14,826
  Total Downloads
- Downloads (Last 12 months)5,480
- Downloads (Last 6 weeks)774
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

LOF: identifying density-based local outliers

ACM SIGMOD Record

Abstract

References

Cited By

Index Terms

Recommendations

LOF: identifying density-based local outliers

Improving Detection Efficiency: Optimizing Block Size in the Local Outlier Factor (LOF) Algorithm

LDBOD: A novel local distribution based outlier detector