skip to main content
10.1145/1401890.1401983acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Local peculiarity factor and its application in outlier detection

Published: 24 August 2008 Publication History

Abstract

Peculiarity oriented mining (POM), aiming to discover peculiarity rules hidden in a dataset, is a new data mining method. In the past few years, many results and applications on POM have been reported. However, there is still a lack of theoretical analysis. In this paper, we prove that the peculiarity factor (PF), one of the most important concepts in POM, can accurately characterize the peculiarity of data with respect to the probability density function of a normal distribution, but is unsuitable for more general distributions. Thus, we propose the concept of local peculiarity factor (LPF). It is proved that the LPF has the same ability as the PF for a normal distribution and is the so-called µ-sensitive peculiarity description for general distributions. To demonstrate the effectiveness of the LPF, we apply it to outlier detection problems and give a new outlier detection algorithm called LPF-Outlier. Experimental results show that LPF-Outlier is an effective outlier detection algorithm.

References

[1]
N. Abe, B. Zadrozny, and J. Langford. Outlier detection by active learning. Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 504--509, 2006.
[2]
N. L. Bhamidipati and S. K. Pal. Comparing rank-inducing scoring systems. Proceedings of the 18th International Conference on Pattern Recognition, pages 300--303, 2006.
[3]
L. Breiman. Bagging predictors. Machine Learning, 24:123--140, 1996.
[4]
M. M. Breunig, H. P. Kriegel, R. T. Ng, and J. Sander. Lof: identifying density-based local outliers. Proceedings of the 6th ACM SIGMOD International Conference on Management of Data, pages 93--104, 2000.
[5]
N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer. Smote: Synthetic minority oversampling technique. Journal of Artificial Intelligence Research, 16:321--357, 2002.
[6]
L. Ertoz. Similarity Measures. Ph.D. Dissertation, University of Minnesota, 2005.
[7]
S. Harkins, H. He, G. J. Willams, and R. A. Baster. Outlier detection using replicator neural networks. Proceedings of the 4th International Conference on Data Warehousing and Knowledge Discovery, pages 170--180, 2002.
[8]
Z. Y. He, X. F. Xu, and S. C. Deng. Discovering cluster based local outliers. Pattern Recognition Letters, 24:164--1650, 2003.
[9]
Z. Y. He, X. F. Xu, Z. X. Huang, and S. C. Deng. A frequent pattern discovery method for outlier detection. Proceedings of the 5th International Conference on Web-Age Information Management, pages 726--732, 2004.
[10]
E. Knorr and R. Ng. Algorithms for mining distance-based outliers in large datasets. Proceedings of the 12th International Conference on Very Large Data Bases, pages 392--403, 1998.
[11]
A. Lazarevic and V. Kumar. Feature bagging for outlier detection. Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 157--166, 2005.
[12]
B. Liu, W. Hsu, and Y. Ma. Integrating classification and association rule mining. Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining, pages 80--86, 1998.
[13]
K. Mcgarry. A survey of interestingness measures for knowledge discovery. The Knowledge Engineering Review, 20:39--61, 2005.
[14]
G. Merz and P. Murphy. Uci repository of machine learning databases. http://www.ics.uci.edu/mlearn/MLRepository.html, 1996.
[15]
M. Ohshima, N. Zhong, Y. Y. Yao, and C. Liu. Relational peculiarity oriented mining. Data Mining and Knowledge Discovery, 15:249--273, 2007.
[16]
M. Ohshima, N. Zhong, Y. Y. Yao, and S. Murata. Peculiarity oriented analysis in multi-people tracking images. Advances in Knowledge Discovery and Data Mining, pages 508--518, 2004.
[17]
S. Ramaswamy, R. Rastogi, and S. Kyuseok. Efficient algorithms for mining outliers from large data sets. Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pages 427--438, 2000.
[18]
D. Saso and L. Nada. An introduction to inductive logic programming. Relational Data Mining, pages 48--73, 2001.
[19]
Y. Y. Yao, F. Y. Wang, J. Wang, and D. D. Zeng. Rule + exception strategies for security information analysis. IEEE Intelligent Systems, 20:52--57, 2005.
[20]
Y. Y. Yao and N. Zhong. An analysis of peculiarity oriented data mining. Proceedings of the 2002 IEEE International Conference on Data Mining Workshop on the Foundation of Data Mining and Discovery, pages 185--188, 2002.
[21]
N. Zhong, C. Liu, Y. Y. Yao, M. Ohshima, M. X. Huang, and J. J. Huang. Relational peculiarity oriented data mining. Proceedings of the 2004 IEEE International Conference on Data Mining, pages 575--578, 2004.
[22]
N. Zhong, M. Ohshima, and S. Ohsuga. Peculiarity oriented mining and its application for knowledge discovery in amino-acid data. Advances in Knowledge Discovery and Data Mining, pages 260--269, 2001.
[23]
N. Zhong, Y. Yao, and M. Ohshima. Peculiarity oriented multi-database mining. IEEE Transactions on Knowledge and Data Engineering, 15:952--960, 2003.
[24]
N. Zhong, Y. Y. Yao, M. Ohshima, and S. Ohsuga. Interestingness, peculiarity, and multi-database mining. Proceedings of the 2001 IEEE International Conference on Data Mining, pages 566--573, 2001.

Cited By

View all
  • (2024)ABAC Policy Mining through Affiliation Networks and Biclique AnalysisInformation10.3390/info1501004515:1(45)Online publication date: 12-Jan-2024
  • (2023)Peculiarity and Diversity Measures to Evaluate Attribute-Based Access Rules2023 5th IEEE International Conference on Trust, Privacy and Security in Intelligent Systems and Applications (TPS-ISA)10.1109/TPS-ISA58951.2023.00049(344-349)Online publication date: 1-Nov-2023
  • (2021)OAB - An Open Anomaly Benchmark Framework for Unsupervised and Semisupervised Anomaly Detection on Image and Tabular Data Sets2021 International Conference on Data Mining Workshops (ICDMW)10.1109/ICDMW53433.2021.00129(991-1000)Online publication date: Dec-2021
  • Show More Cited By

Index Terms

  1. Local peculiarity factor and its application in outlier detection

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      KDD '08: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
      August 2008
      1116 pages
      ISBN:9781605581934
      DOI:10.1145/1401890
      • General Chair:
      • Ying Li,
      • Program Chairs:
      • Bing Liu,
      • Sunita Sarawagi
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 24 August 2008

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. ∈-sensitive peculiarity description
      2. data mining
      3. local peculiarity factor
      4. outlier detection
      5. peculiarity factor

      Qualifiers

      • Research-article

      Conference

      KDD08

      Acceptance Rates

      KDD '08 Paper Acceptance Rate 118 of 593 submissions, 20%;
      Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

      Upcoming Conference

      KDD '25

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)4
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 07 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)ABAC Policy Mining through Affiliation Networks and Biclique AnalysisInformation10.3390/info1501004515:1(45)Online publication date: 12-Jan-2024
      • (2023)Peculiarity and Diversity Measures to Evaluate Attribute-Based Access Rules2023 5th IEEE International Conference on Trust, Privacy and Security in Intelligent Systems and Applications (TPS-ISA)10.1109/TPS-ISA58951.2023.00049(344-349)Online publication date: 1-Nov-2023
      • (2021)OAB - An Open Anomaly Benchmark Framework for Unsupervised and Semisupervised Anomaly Detection on Image and Tabular Data Sets2021 International Conference on Data Mining Workshops (ICDMW)10.1109/ICDMW53433.2021.00129(991-1000)Online publication date: Dec-2021
      • (2018)An Attempt to Discover Analytical Information for Multi-Dimensional Data Sets2018 International Conference on Inventive Research in Computing Applications (ICIRCA)10.1109/ICIRCA.2018.8597350(1-5)Online publication date: Jul-2018
      • (2018)An attempt to analyze data distribution for abnormal behaviors2018 IEEE 8th Annual Computing and Communication Workshop and Conference (CCWC)10.1109/CCWC.2018.8301728(275-280)Online publication date: Jan-2018
      • (2017)Multi-dimensional analysis on data sets for information retrieval2017 International Conference on Intelligent Sustainable Systems (ICISS)10.1109/ISS1.2017.8389389(157-162)Online publication date: Dec-2017
      • (2017)Anomaly Detection in Network Traffic with a Relationnal Clustering CriterionGeometric Science of Information10.1007/978-3-319-68445-1_15(127-134)Online publication date: 24-Oct-2017
      • (2016)On the evaluation of unsupervised outlier detectionData Mining and Knowledge Discovery10.1007/s10618-015-0444-830:4(891-927)Online publication date: 1-Jul-2016
      • (2015)Hierarchical Density Estimates for Data Clustering, Visualization, and Outlier DetectionACM Transactions on Knowledge Discovery from Data10.1145/273338110:1(1-51)Online publication date: 22-Jul-2015
      • (2015)Outlier Detection and Trend DetectionProceedings of the 2015 IEEE International Conference on Data Mining Workshop (ICDMW)10.1109/ICDMW.2015.79(40-46)Online publication date: 14-Nov-2015
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media