skip to main content
10.1145/2339530.2339667acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Integrating community matching and outlier detection for mining evolutionary community outliers

Authors Info & Claims
Published:12 August 2012Publication History

ABSTRACT

Temporal datasets, in which data evolves continuously, exist in a wide variety of applications, and identifying anomalous or outlying objects from temporal datasets is an important and challenging task. Different from traditional outlier detection, which detects objects that have quite different behavior compared with the other objects, temporal outlier detection tries to identify objects that have different evolutionary behavior compared with other objects. Usually objects form multiple communities, and most of the objects belonging to the same community follow similar patterns of evolution. However, there are some objects which evolve in a very different way relative to other community members, and we define such objects as evolutionary community outliers. This definition represents a novel type of outliers considering both temporal dimension and community patterns. We investigate the problem of identifying evolutionary community outliers given the discovered communities from two snapshots of an evolving dataset. To tackle the challenges of community evolution and outlier detection, we propose an integrated optimization framework which conducts outlier-aware community matching across snapshots and identification of evolutionary outliers in a tightly coupled way. A coordinate descent algorithm is proposed to improve community matching and outlier detection performance iteratively. Experimental results on both synthetic and real datasets show that the proposed approach is highly effective in discovering interesting evolutionary community outliers.

Skip Supplemental Material Section

Supplemental Material

307_t_talk_6.mp4

mp4

508.7 MB

References

  1. T. Abeel, Y. Van de Peer, and Y. Saeys. Java-ML: A Machine Learning Library. Journal of Machine Learning Research, 10:931--934, Jun 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. C. C. Aggarwal and P. S. Yu. Outlier Detection for High Dimensional Data. SIGMOD Records, 30:37--46, May 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. C. C. Aggarwal and P. S. Yu. Outlier Detection with Uncertain Data. In Proc. of the SIAM Intl. Conf. on Data Mining (SDM), 483--493, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  4. C. C. Aggarwal, Y. Zhao, and P. S. Yu. Outlier Detection in Graph Streams. In Proc. of the 27th Intl. Conf. on Data Engineering (ICDE)}, 399--409. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. Alon, S. Sclaroff, G. Kollios, and V. Pavlovic. Discovering Clusters in Motion Time-Series Data. In Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR)}, volume 1, 375--381. IEEE Computer Society, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. D. P. Bertsekas. Non-Linear Programming (2nd Edition). Athena Scientific, 1999.Google ScholarGoogle Scholar
  7. M. M. Breunig, H.-P. Kriegel, R. T. Ng, and J. Sander. LOF: Identifying Density-Based Local Outliers. In Proc. of the 2000 ACM SIGMOD Intl. Conf. on Management of Data (SIGMOD), 93--104. ACM, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. V. Chandola, A. Banerjee, and V. Kumar. Anomaly Detection: A Survey. ACM Surveys, 41(3), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. W. W. Cohen and J. Richman. Learning to Match and Cluster Large High-Dimensional Data Sets for Data Integration. In Proc. of the 8th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining (SIGKDD), 475--480. ACM, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. E. Dimitriadou, A. Weingessel, and K. Hornik. Voting-Merging: An Ensemble Method for Clustering. In Proc. of the Intl. Conf. on Artificial Neural Networks (ICANN), 217--224. Springer, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. S. Dudoit and J. Fridlyand. Bagging to Improve the Accuracy of a Clustering Procedure. Bioinformatics, 19(9):1090--1099, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  12. E. Eskin. Anomaly Detection over Noisy Data using Learned Probability Distributions. In Proc. of the 17th Intl. Conf. on Machine Learning (ICML), 255--262. Morgan Kaufmann Publishers Inc., 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. J. Fox. Outliers in Time Series. Journal of the Royal Statistical Society. Series B (Methodological), 34(3):350--363, 1972.Google ScholarGoogle ScholarCross RefCross Ref
  14. J. Gao, F. Liang, W. Fan, Y. Sun, and J. Han. Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models. In Proc. of the 23rd Annual Conf. on Neural Information Processing Systems (NIPS), 585--593. Curran Associates, Inc., 2009.Google ScholarGoogle Scholar
  15. J. Gao, F. Liang, W. Fan, C. Wang, Y. Sun, and J. Han. On Community Outliers and their Efficient Detection in Information Networks. In Proc. of the 16th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining (SIGKDD), 813--822, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Y. Ge, H. Xiong, Z. Zhou, H. Ozdemir, J. Yu, and K. C. Lee. Top-Eye: Top-K Evolving Trajectory Outlier Detection. In Proc. of the 19th ACM Conf. on Information and Knowledge Management (CIKM), 1733--1736, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. A. Ghoting, M. E. Otey, and S. Parthasarathy. LOADED: Link-Based Outlier and Anomaly Detection in Evolving Data Sets. In Proc. of the 4th IEEE Intl. Conf. on Data Mining (ICDM), 387--390, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. V. J. Hodge and J. Austin. A Survey of Outlier Detection Methodologies. AI Review, 22(2):85--126, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. W. Hu, Y. Liao, and V. R. Vemuri. Robust Anomaly Detection Using Support Vector Machines. In Proc. of the Intl. Conf. on Machine Learning (ICML), 282--289. Morgan Kaufmann Publishers Inc, 2003.Google ScholarGoogle Scholar
  20. M. Jakobsson and N. A. Rosenberg. CLUMPP: A Cluster Matching and Permutation Program for Dealing with Label Switching and Multimodality in Analysis of Population Structure. Bioinformatics, 23:1801--1806, Jul 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. E. M. Knorr and R. T. Ng. Algorithms for Mining Distance-Based Outliers in Large Datasets. In Proc. of the 24th Intl. Conf. on Very Large Data Bases (VLDB), 392--403. Morgan Kaufmann, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. E. M. Knorr, R. T. Ng, and V. Tucakov. Distance-Based Outliers: Algorithms and Applications. The VLDB Journal, 8:237--253, Feb 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. D. Kottke and Y. Sun. Motion Estimation Via Cluster Matching. IEEE Transactions on Pattern Analysis and Machine Intelligence, 16:1128--1132, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. H.-P. Kriegel, P. Kröger, E. Schubert, and A. Zimek. LoOP: Local Outlier Probabilities. In Proc. of the 18th ACM Conf. on Information and Knowledge Management (CIKM), 1649--1652. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. H.-P. Kriegel, P. Kröger, E. Schubert, and A. Zimek. Interpreting and Unifying Outlier Scores. In Proc. of the 11th SIAM Intl. Conf. on Data Mining(SDM), 13--24. SIAM / Omnipress, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  26. J.-G. Lee, J. Han, and X. Li. Trajectory Outlier Detection: A Partition-and-Detect Framework. In Proc. of the 27th Intl. Conf. on Data Engineering (ICDE), 140--149. IEEE, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. B. Long, Z. M. Zhang, and P. S. Yu. Combining Multiple Clusterings by Soft Correspondence. In Proc. of the 5th IEEE Intl. Conf. on Data Mining (ICDM), 282--289. IEEE Computer Society, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. M. J. Miller, A. D. Olson, and S. S. Thorgeirsson. Computer Analysis of Two-Dimensional Gels: Automatic Matching. ElectroPhoresis, 5(5):297--303, 1984.Google ScholarGoogle Scholar
  29. D. Pokrajac, A. Lazarevic, and L. J. Latecki. Incremental Local Outlier Detection for Data Streams. In IEEE Symposium on Computational Intelligence and Data Mining (CIDM), 504--515. IEEE, Apr 2007.Google ScholarGoogle Scholar
  30. S. Ramaswamy, R. Rastogi, and K. Shim. Efficient Algorithms for Mining Outliers from Large Data Sets. SIGMOD Records, 29:427--438, May 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Y. Sun, J. Han, J. Gao, and Y. Yu. iTopicModel: Information Network-Integrated Topic Modeling. In Proc. of the 9th IEEE Intl. Conf. on Data Mining (ICDM), 493--502. IEEE Computer Society, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Integrating community matching and outlier detection for mining evolutionary community outliers

        Recommendations

        Reviews

        Christoph F. Strnadl

        Several algorithms are able to identify certain so-called communities, regions of data points with more cohesion than the rest. But what happens if these communities and memberships evolve and both concepts are not necessarily preserved over time__?__ This easily accessible paper answers the question of how to detect outliers—data points whose behavior is different from the rest of the community they initially belonged to—in evolving datasets. Think of stockbrokers who deviate from investment trends or scientific authors who change their co-authorship networks. Initial inputs to the proposed detection algorithm are P and Q , two partitions of a (constant) set of objects in a varying number of communities, corresponding to the two points in time to be compared. Obviously, a single comparison of community memberships P and Q in terms of a correspondence matrix S is too naive to discriminate between an outlier of a community and its core members. Therefore, the authors introduce an additional "outlierness" score, A , for a given object with regard to a community of Q . Because outlierness is not a crisp concept, the total outlierness score has to be constrained by a certain threshold to obtain a convergent algorithm. Besides a rigorous exposition of the algorithm (sufficient to actually implement it), the authors also describe some theoretical properties, such as convergence and running time. Using both synthetic and real datasets (for example, subsets of data from the Internet Movie DataBase and the Digital Bibliography and Library Project), the authors convincingly demonstrate the applicability of their approach. I definitely recommend this paper to researchers in theoretical or applied computer science with an interest in (statistical) communities and outlier detection. Online Computing Reviews Service

        Access critical reviews of Computing literature here

        Become a reviewer for Computing Reviews.

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          KDD '12: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
          August 2012
          1616 pages
          ISBN:9781450314626
          DOI:10.1145/2339530

          Copyright © 2012 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 12 August 2012

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate1,133of8,635submissions,13%

          Upcoming Conference

          KDD '24

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader