research-article

Integrating community matching and outlier detection for mining evolutionary community outliers

Authors:
Manish Gupta

Univ of Illinois at Urbana-Champaign, Urbana, IL, USA

Univ of Illinois at Urbana-Champaign, Urbana, IL, USA
View Profile

,
Jing Gao

State Univ of New York, Buffalo, Buffalo, NY, USA

State Univ of New York, Buffalo, Buffalo, NY, USA
View Profile

,
Yizhou Sun

Univ of Illinois at Urbana-Champaign, Urbana, IL, USA

Univ of Illinois at Urbana-Champaign, Urbana, IL, USA
View Profile

,
Jiawei Han

Univ of Illinois at Urbana-Champaign, Urbana, IL, USA

Univ of Illinois at Urbana-Champaign, Urbana, IL, USA
View Profile

KDD '12: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data miningAugust 2012Pages 859–867https://doi.org/10.1145/2339530.2339667

Published:12 August 2012Publication History

KDD '12: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 859–867

ABSTRACT

Temporal datasets, in which data evolves continuously, exist in a wide variety of applications, and identifying anomalous or outlying objects from temporal datasets is an important and challenging task. Different from traditional outlier detection, which detects objects that have quite different behavior compared with the other objects, temporal outlier detection tries to identify objects that have different evolutionary behavior compared with other objects. Usually objects form multiple communities, and most of the objects belonging to the same community follow similar patterns of evolution. However, there are some objects which evolve in a very different way relative to other community members, and we define such objects as evolutionary community outliers. This definition represents a novel type of outliers considering both temporal dimension and community patterns. We investigate the problem of identifying evolutionary community outliers given the discovered communities from two snapshots of an evolving dataset. To tackle the challenges of community evolution and outlier detection, we propose an integrated optimization framework which conducts outlier-aware community matching across snapshots and identification of evolutionary outliers in a tightly coupled way. A coordinate descent algorithm is proposed to improve community matching and outlier detection performance iteratively. Experimental results on both synthetic and real datasets show that the proposed approach is highly effective in discovering interesting evolutionary community outliers.

Supplemental Material

307_t_talk_6.mp4

mp4

508.7 MB

Download

References

T. Abeel, Y. Van de Peer, and Y. Saeys. Java-ML: A Machine Learning Library. Journal of Machine Learning Research, 10:931--934, Jun 2009. Google ScholarDigital Library
C. C. Aggarwal and P. S. Yu. Outlier Detection for High Dimensional Data. SIGMOD Records, 30:37--46, May 2001. Google ScholarDigital Library
C. C. Aggarwal and P. S. Yu. Outlier Detection with Uncertain Data. In Proc. of the SIAM Intl. Conf. on Data Mining (SDM), 483--493, 2008.Google ScholarCross Ref
C. C. Aggarwal, Y. Zhao, and P. S. Yu. Outlier Detection in Graph Streams. In Proc. of the 27th Intl. Conf. on Data Engineering (ICDE)}, 399--409. 2011. Google ScholarDigital Library
J. Alon, S. Sclaroff, G. Kollios, and V. Pavlovic. Discovering Clusters in Motion Time-Series Data. In Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR)}, volume 1, 375--381. IEEE Computer Society, 2003. Google ScholarDigital Library
D. P. Bertsekas. Non-Linear Programming (2nd Edition). Athena Scientific, 1999.Google Scholar
M. M. Breunig, H.-P. Kriegel, R. T. Ng, and J. Sander. LOF: Identifying Density-Based Local Outliers. In Proc. of the 2000 ACM SIGMOD Intl. Conf. on Management of Data (SIGMOD), 93--104. ACM, 2000. Google ScholarDigital Library
V. Chandola, A. Banerjee, and V. Kumar. Anomaly Detection: A Survey. ACM Surveys, 41(3), 2009. Google ScholarDigital Library
W. W. Cohen and J. Richman. Learning to Match and Cluster Large High-Dimensional Data Sets for Data Integration. In Proc. of the 8th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining (SIGKDD), 475--480. ACM, 2002. Google ScholarDigital Library
E. Dimitriadou, A. Weingessel, and K. Hornik. Voting-Merging: An Ensemble Method for Clustering. In Proc. of the Intl. Conf. on Artificial Neural Networks (ICANN), 217--224. Springer, 2001. Google ScholarDigital Library
S. Dudoit and J. Fridlyand. Bagging to Improve the Accuracy of a Clustering Procedure. Bioinformatics, 19(9):1090--1099, 2003.Google ScholarCross Ref
E. Eskin. Anomaly Detection over Noisy Data using Learned Probability Distributions. In Proc. of the 17th Intl. Conf. on Machine Learning (ICML), 255--262. Morgan Kaufmann Publishers Inc., 2000. Google ScholarDigital Library
A. J. Fox. Outliers in Time Series. Journal of the Royal Statistical Society. Series B (Methodological), 34(3):350--363, 1972.Google ScholarCross Ref
J. Gao, F. Liang, W. Fan, Y. Sun, and J. Han. Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models. In Proc. of the 23rd Annual Conf. on Neural Information Processing Systems (NIPS), 585--593. Curran Associates, Inc., 2009.Google Scholar
J. Gao, F. Liang, W. Fan, C. Wang, Y. Sun, and J. Han. On Community Outliers and their Efficient Detection in Information Networks. In Proc. of the 16th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining (SIGKDD), 813--822, 2010. Google ScholarDigital Library
Y. Ge, H. Xiong, Z. Zhou, H. Ozdemir, J. Yu, and K. C. Lee. Top-Eye: Top-K Evolving Trajectory Outlier Detection. In Proc. of the 19th ACM Conf. on Information and Knowledge Management (CIKM), 1733--1736, 2010. Google ScholarDigital Library
A. Ghoting, M. E. Otey, and S. Parthasarathy. LOADED: Link-Based Outlier and Anomaly Detection in Evolving Data Sets. In Proc. of the 4th IEEE Intl. Conf. on Data Mining (ICDM), 387--390, 2004. Google ScholarDigital Library
V. J. Hodge and J. Austin. A Survey of Outlier Detection Methodologies. AI Review, 22(2):85--126, 2004. Google ScholarDigital Library
W. Hu, Y. Liao, and V. R. Vemuri. Robust Anomaly Detection Using Support Vector Machines. In Proc. of the Intl. Conf. on Machine Learning (ICML), 282--289. Morgan Kaufmann Publishers Inc, 2003.Google Scholar
M. Jakobsson and N. A. Rosenberg. CLUMPP: A Cluster Matching and Permutation Program for Dealing with Label Switching and Multimodality in Analysis of Population Structure. Bioinformatics, 23:1801--1806, Jul 2007. Google ScholarDigital Library
E. M. Knorr and R. T. Ng. Algorithms for Mining Distance-Based Outliers in Large Datasets. In Proc. of the 24th Intl. Conf. on Very Large Data Bases (VLDB), 392--403. Morgan Kaufmann, 1998. Google ScholarDigital Library
E. M. Knorr, R. T. Ng, and V. Tucakov. Distance-Based Outliers: Algorithms and Applications. The VLDB Journal, 8:237--253, Feb 2000. Google ScholarDigital Library
D. Kottke and Y. Sun. Motion Estimation Via Cluster Matching. IEEE Transactions on Pattern Analysis and Machine Intelligence, 16:1128--1132, 1994. Google ScholarDigital Library
H.-P. Kriegel, P. Kröger, E. Schubert, and A. Zimek. LoOP: Local Outlier Probabilities. In Proc. of the 18th ACM Conf. on Information and Knowledge Management (CIKM), 1649--1652. ACM, 2009. Google ScholarDigital Library
H.-P. Kriegel, P. Kröger, E. Schubert, and A. Zimek. Interpreting and Unifying Outlier Scores. In Proc. of the 11th SIAM Intl. Conf. on Data Mining(SDM), 13--24. SIAM / Omnipress, 2011.Google ScholarCross Ref
J.-G. Lee, J. Han, and X. Li. Trajectory Outlier Detection: A Partition-and-Detect Framework. In Proc. of the 27th Intl. Conf. on Data Engineering (ICDE), 140--149. IEEE, 2008. Google ScholarDigital Library
B. Long, Z. M. Zhang, and P. S. Yu. Combining Multiple Clusterings by Soft Correspondence. In Proc. of the 5th IEEE Intl. Conf. on Data Mining (ICDM), 282--289. IEEE Computer Society, 2005. Google ScholarDigital Library
M. J. Miller, A. D. Olson, and S. S. Thorgeirsson. Computer Analysis of Two-Dimensional Gels: Automatic Matching. ElectroPhoresis, 5(5):297--303, 1984.Google Scholar
D. Pokrajac, A. Lazarevic, and L. J. Latecki. Incremental Local Outlier Detection for Data Streams. In IEEE Symposium on Computational Intelligence and Data Mining (CIDM), 504--515. IEEE, Apr 2007.Google Scholar
S. Ramaswamy, R. Rastogi, and K. Shim. Efficient Algorithms for Mining Outliers from Large Data Sets. SIGMOD Records, 29:427--438, May 2000. Google ScholarDigital Library
Y. Sun, J. Han, J. Gao, and Y. Yu. iTopicModel: Information Network-Integrated Topic Modeling. In Proc. of the 9th IEEE Intl. Conf. on Data Mining (ICDM), 493--502. IEEE Computer Society, 2009. Google ScholarDigital Library

Index Terms

Integrating community matching and outlier detection for mining evolutionary community outliers

Recommendations

An Algorithm for Mining Top K Influential Community Based Evolutionary Outliers in Temporal Dataset
ICTAI '13: Proceedings of the 2013 IEEE 25th International Conference on Tools with Artificial Intelligence

Identifying outlier objects against main community evolution trends is not only meaningful itself for the purpose of finding novel evolution behaviors, but also helpful for better understanding the mainstream of community evolution. With the definition ...
Read More
Community trend outlier detection using soft temporal pattern mining
ECMLPKDD'12: Proceedings of the 2012th European Conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II

Numerous applications, such as bank transactions, road traffic, and news feeds, generate temporal datasets, in which data evolves continuously. To understand the temporal behavior and characteristics of the dataset and its elements, we need effective ...
Read More
Community-based anomaly detection in evolutionary networks

Networks of dynamic systems, including social networks, the World Wide Web, climate networks, and biological networks, can be highly clustered. Detecting clusters, or communities, in such dynamic networks is an emerging area of research; however, less ...
Read More

Reviews

Reviewer: Christoph F. Strnadl

Several algorithms are able to identify certain so-called communities, regions of data points with more cohesion than the rest. But what happens if these communities and memberships evolve and both concepts are not necessarily preserved over time__?__ This easily accessible paper answers the question of how to detect outliers—data points whose behavior is different from the rest of the community they initially belonged to—in evolving datasets. Think of stockbrokers who deviate from investment trends or scientific authors who change their co-authorship networks. Initial inputs to the proposed detection algorithm are P and Q , two partitions of a (constant) set of objects in a varying number of communities, corresponding to the two points in time to be compared. Obviously, a single comparison of community memberships P and Q in terms of a correspondence matrix S is too naive to discriminate between an outlier of a community and its core members. Therefore, the authors introduce an additional "outlierness" score, A , for a given object with regard to a community of Q . Because outlierness is not a crisp concept, the total outlierness score has to be constrained by a certain threshold to obtain a convergent algorithm. Besides a rigorous exposition of the algorithm (sufficient to actually implement it), the authors also describe some theoretical properties, such as convergence and running time. Using both synthetic and real datasets (for example, subsets of data from the Internet Movie DataBase and the Digital Bibliography and Library Project), the authors convincingly demonstrate the applicability of their approach. I definitely recommend this paper to researchers in theoretical or applied computer science with an interest in (statistical) communities and outlier detection. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '12: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
August 2012
1616 pages
ISBN:9781450314626
DOI:10.1145/2339530
General Chair:
Qiang Yang
Hong Kong University of Science and Technology
,
Program Chairs:
Deepak Agarwal
LinkedIn
,
Jian Pei
Simon Fraser University
Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 August 2012
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
anomaly detection
community matching
ecoutlier
evolutionary community outliers
temporal outliers
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,133of8,635submissions,13%
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 67
  Total Citations
  View Citations
- 1,291
  Total Downloads
- Downloads (Last 12 months)20
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.