skip to main content
article
Free Access

LOF: identifying density-based local outliers

Authors Info & Claims
Published:16 May 2000Publication History
Skip Abstract Section

Abstract

For many KDD applications, such as detecting criminal activities in E-commerce, finding the rare instances or the outliers, can be more interesting than finding the common patterns. Existing work in outlier detection regards being an outlier as a binary property. In this paper, we contend that for many scenarios, it is more meaningful to assign to each object a degree of being an outlier. This degree is called the local outlier factor (LOF) of an object. It is local in that the degree depends on how isolated the object is with respect to the surrounding neighborhood. We give a detailed formal analysis showing that LOF enjoys many desirable properties. Using real-world datasets, we demonstrate that LOF can be used to find outliers which appear to be meaningful, but can otherwise not be identified with existing approaches. Finally, a careful performance evaluation of our algorithm confirms we show that our approach of finding local outliers can be practical.

References

  1. 1 Aming, A., Agrawal R., Raghavan R: "A Linear Method for Deviation Detection in Large Databases", Proc. 2rid Int. Conf. on Knowledge Discovery and Data Mining, Portland, OR, AAAI Press, 1996, p. 164-169.Google ScholarGoogle Scholar
  2. 2 Ankerst M., Breunig M.M., Kriegel H.-E, Sander J.: "OPTICS: Ordering Points To Identify the Clustering Structure", Proc. ACM SIGMOD Int. Conf. on Management of Data, Philadelphia, PA, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. 3 Agrawal R., Gehrke J., Gunopulos D., Raghavan E: "Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications", Proc. ACM SIGMOD Int. Conf. on Management of Data, Seattle, WA, 1998, pp. 94-105. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. 4 Berchthold S., Keim D. A., Kriegel H.-E: "The X-Tree: An Index Structure for High-Dimensional Data", 22nd Conf. on Very Large Data Bases, Bombay, India, 1996, pp. 28-39. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. 5 Barnett V., Lewis T.: "Outliers in statistical data", John Wiley, 1994.Google ScholarGoogle Scholar
  6. 6 DuMouchel W., Schonlau M.: "A Fast Computer Intrusion Detection Algorithm based on Hypothesis Testing of Command Transition Probabilities", Proc. 4th Int. Conf. on Knowledge Discovery and Data Mining, New York, NY, AAAI Press, 1998, pp. 189-193.Google ScholarGoogle Scholar
  7. 7 Ester M., Kriegel H.-E, Sander J., Xu X.: "A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise", Proc. 2rid Int. Conf. on Knowledge Discovery and Data Mining, Portland, OR, AAAI Press, 1996, pp. 226-231.Google ScholarGoogle Scholar
  8. 8 Fawcett T., Provost F.: "Adaptive Fraud Detection", Data Mining and Knowledge Discovery Journal, Kluwer Academic Publishers, Vol. 1, No. 3, 1997, pp. 291-316. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. 9 Fayyad U., Piatetsky-Shapiro G., Smyth R: "Knowledge Discovery and Data Mining: Towards a Unifying Framework", Proc. 2rid Int. Conf. on Knowledge Discovery and Data Mining, Portland, OR, 1996, pp. 82-88.Google ScholarGoogle Scholar
  10. 10 Hawkins, D.: "Identification of Outliers", Chapman and Hall, London, 1980.Google ScholarGoogle ScholarCross RefCross Ref
  11. 11 Hinneburg A., Keim D.A.: "An Efficient Approach to Clustering in Large Multimedia Databases with Noise", Proc. 4th Int. Conf. on Knowledge Discovery and Data Mining, New York City, NY, 1998,pp. 58-65.Google ScholarGoogle Scholar
  12. 12 Johnson T., Kwok I., Ng R.: "Fast Computation of 2- Dimensional Depth Contours", Proc. 4th Int. Conf. on Knowledge Discovery and Data Mining, New York, NY, AAAI Press, 1998, pp. 224-228.Google ScholarGoogle Scholar
  13. 13 Knott E.M., Ng R.T.: "Algorithms for Mining Distance- Based Outliers in Large Datasets", Proc. 24th Int. Conf. on Very Large Data Bases, New York, NY, 1998, pp. 392-403. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. 14 Knott E. M., Ng R.T.: "Finding Intensional Knowledge of Distance-based Outliers", Proc. 25th Int. Conf. on Very Large Data Bases, Edinburgh, Scotland, 1999, pp. 211-222. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. 15 Ng R. T., Hart J.: "Efficient and Effective Clustering Methods for Spatial Data Mining", Proc. 20th Int. Conf. on Very Large Data Bases, Santiago, Chile, Morgan Kaufmann Publishers, San Francisco, CA, 1994, pp. 144-155. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. 16 Preparata E, Shamos M.: "Computational Geometry: an Introduction", Springer, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. 17 Ramaswamy S., Rastogi R., Kyuseok S.: "Efficient Algorithms for Mining Outliers from Large Data Sets", Proc. ACM SIDMOD Int. Conf. on Management of Data, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. 18 Ruts I., Rousseeuw E: "Computing Depth Contours of Bivariate Point Clouds, Journal of Computational Statistics and Data Analysis, 23, 1996, pp. 153-168. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. 19 Sheikholeslami G., Chatterjee S., Zhang A.: "WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases", Proc. Int. Conf. on Very Large Data Bases, New York, NY, 1998, pp. 428-439. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. 20 Tukey J. W.: "Exploratory Data Analysis", Addison-Wesley, 1977.Google ScholarGoogle Scholar
  21. 21 Weber R., Schek Hans-L, Blott S.: "A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces", Proc. Int. Conf. on Very Large Data Bases, New York, NY, 1998, pp. 194-205. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. 22 Wang W., Yang J., Muntz R.: "STING: A Statistical Information Grid Approach to Spatial Data Mining", Proc. 23th Int. Conf. on Very Large Data Bases, Athens, Greece, Morgan Kaufmann Publishers, San Francisco, CA, 1997, pp. 186-195. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. 23 Zhang T., Ramakrishnan R., Linvy M.: "BIRCH: An Efficient Data Clustering Method for Very Large Databases", Proc. ACM SIGMOD Int. Conf. on Management of Data, ACM Press, New York, 1996, pp. 103-114. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. LOF: identifying density-based local outliers

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM SIGMOD Record
        ACM SIGMOD Record  Volume 29, Issue 2
        June 2000
        609 pages
        ISSN:0163-5808
        DOI:10.1145/335191
        Issue’s Table of Contents
        • cover image ACM Conferences
          SIGMOD '00: Proceedings of the 2000 ACM SIGMOD international conference on Management of data
          May 2000
          604 pages
          ISBN:1581132174
          DOI:10.1145/342009

        Copyright © 2000 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 16 May 2000

        Check for updates

        Qualifiers

        • article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader