ABSTRACT
Temporal datasets, in which data evolves continuously, exist in a wide variety of applications, and identifying anomalous or outlying objects from temporal datasets is an important and challenging task. Different from traditional outlier detection, which detects objects that have quite different behavior compared with the other objects, temporal outlier detection tries to identify objects that have different evolutionary behavior compared with other objects. Usually objects form multiple communities, and most of the objects belonging to the same community follow similar patterns of evolution. However, there are some objects which evolve in a very different way relative to other community members, and we define such objects as evolutionary community outliers. This definition represents a novel type of outliers considering both temporal dimension and community patterns. We investigate the problem of identifying evolutionary community outliers given the discovered communities from two snapshots of an evolving dataset. To tackle the challenges of community evolution and outlier detection, we propose an integrated optimization framework which conducts outlier-aware community matching across snapshots and identification of evolutionary outliers in a tightly coupled way. A coordinate descent algorithm is proposed to improve community matching and outlier detection performance iteratively. Experimental results on both synthetic and real datasets show that the proposed approach is highly effective in discovering interesting evolutionary community outliers.
Supplemental Material
- T. Abeel, Y. Van de Peer, and Y. Saeys. Java-ML: A Machine Learning Library. Journal of Machine Learning Research, 10:931--934, Jun 2009. Google ScholarDigital Library
- C. C. Aggarwal and P. S. Yu. Outlier Detection for High Dimensional Data. SIGMOD Records, 30:37--46, May 2001. Google ScholarDigital Library
- C. C. Aggarwal and P. S. Yu. Outlier Detection with Uncertain Data. In Proc. of the SIAM Intl. Conf. on Data Mining (SDM), 483--493, 2008.Google ScholarCross Ref
- C. C. Aggarwal, Y. Zhao, and P. S. Yu. Outlier Detection in Graph Streams. In Proc. of the 27th Intl. Conf. on Data Engineering (ICDE)}, 399--409. 2011. Google ScholarDigital Library
- J. Alon, S. Sclaroff, G. Kollios, and V. Pavlovic. Discovering Clusters in Motion Time-Series Data. In Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR)}, volume 1, 375--381. IEEE Computer Society, 2003. Google ScholarDigital Library
- D. P. Bertsekas. Non-Linear Programming (2nd Edition). Athena Scientific, 1999.Google Scholar
- M. M. Breunig, H.-P. Kriegel, R. T. Ng, and J. Sander. LOF: Identifying Density-Based Local Outliers. In Proc. of the 2000 ACM SIGMOD Intl. Conf. on Management of Data (SIGMOD), 93--104. ACM, 2000. Google ScholarDigital Library
- V. Chandola, A. Banerjee, and V. Kumar. Anomaly Detection: A Survey. ACM Surveys, 41(3), 2009. Google ScholarDigital Library
- W. W. Cohen and J. Richman. Learning to Match and Cluster Large High-Dimensional Data Sets for Data Integration. In Proc. of the 8th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining (SIGKDD), 475--480. ACM, 2002. Google ScholarDigital Library
- E. Dimitriadou, A. Weingessel, and K. Hornik. Voting-Merging: An Ensemble Method for Clustering. In Proc. of the Intl. Conf. on Artificial Neural Networks (ICANN), 217--224. Springer, 2001. Google ScholarDigital Library
- S. Dudoit and J. Fridlyand. Bagging to Improve the Accuracy of a Clustering Procedure. Bioinformatics, 19(9):1090--1099, 2003.Google ScholarCross Ref
- E. Eskin. Anomaly Detection over Noisy Data using Learned Probability Distributions. In Proc. of the 17th Intl. Conf. on Machine Learning (ICML), 255--262. Morgan Kaufmann Publishers Inc., 2000. Google ScholarDigital Library
- A. J. Fox. Outliers in Time Series. Journal of the Royal Statistical Society. Series B (Methodological), 34(3):350--363, 1972.Google ScholarCross Ref
- J. Gao, F. Liang, W. Fan, Y. Sun, and J. Han. Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models. In Proc. of the 23rd Annual Conf. on Neural Information Processing Systems (NIPS), 585--593. Curran Associates, Inc., 2009.Google Scholar
- J. Gao, F. Liang, W. Fan, C. Wang, Y. Sun, and J. Han. On Community Outliers and their Efficient Detection in Information Networks. In Proc. of the 16th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining (SIGKDD), 813--822, 2010. Google ScholarDigital Library
- Y. Ge, H. Xiong, Z. Zhou, H. Ozdemir, J. Yu, and K. C. Lee. Top-Eye: Top-K Evolving Trajectory Outlier Detection. In Proc. of the 19th ACM Conf. on Information and Knowledge Management (CIKM), 1733--1736, 2010. Google ScholarDigital Library
- A. Ghoting, M. E. Otey, and S. Parthasarathy. LOADED: Link-Based Outlier and Anomaly Detection in Evolving Data Sets. In Proc. of the 4th IEEE Intl. Conf. on Data Mining (ICDM), 387--390, 2004. Google ScholarDigital Library
- V. J. Hodge and J. Austin. A Survey of Outlier Detection Methodologies. AI Review, 22(2):85--126, 2004. Google ScholarDigital Library
- W. Hu, Y. Liao, and V. R. Vemuri. Robust Anomaly Detection Using Support Vector Machines. In Proc. of the Intl. Conf. on Machine Learning (ICML), 282--289. Morgan Kaufmann Publishers Inc, 2003.Google Scholar
- M. Jakobsson and N. A. Rosenberg. CLUMPP: A Cluster Matching and Permutation Program for Dealing with Label Switching and Multimodality in Analysis of Population Structure. Bioinformatics, 23:1801--1806, Jul 2007. Google ScholarDigital Library
- E. M. Knorr and R. T. Ng. Algorithms for Mining Distance-Based Outliers in Large Datasets. In Proc. of the 24th Intl. Conf. on Very Large Data Bases (VLDB), 392--403. Morgan Kaufmann, 1998. Google ScholarDigital Library
- E. M. Knorr, R. T. Ng, and V. Tucakov. Distance-Based Outliers: Algorithms and Applications. The VLDB Journal, 8:237--253, Feb 2000. Google ScholarDigital Library
- D. Kottke and Y. Sun. Motion Estimation Via Cluster Matching. IEEE Transactions on Pattern Analysis and Machine Intelligence, 16:1128--1132, 1994. Google ScholarDigital Library
- H.-P. Kriegel, P. Kröger, E. Schubert, and A. Zimek. LoOP: Local Outlier Probabilities. In Proc. of the 18th ACM Conf. on Information and Knowledge Management (CIKM), 1649--1652. ACM, 2009. Google ScholarDigital Library
- H.-P. Kriegel, P. Kröger, E. Schubert, and A. Zimek. Interpreting and Unifying Outlier Scores. In Proc. of the 11th SIAM Intl. Conf. on Data Mining(SDM), 13--24. SIAM / Omnipress, 2011.Google ScholarCross Ref
- J.-G. Lee, J. Han, and X. Li. Trajectory Outlier Detection: A Partition-and-Detect Framework. In Proc. of the 27th Intl. Conf. on Data Engineering (ICDE), 140--149. IEEE, 2008. Google ScholarDigital Library
- B. Long, Z. M. Zhang, and P. S. Yu. Combining Multiple Clusterings by Soft Correspondence. In Proc. of the 5th IEEE Intl. Conf. on Data Mining (ICDM), 282--289. IEEE Computer Society, 2005. Google ScholarDigital Library
- M. J. Miller, A. D. Olson, and S. S. Thorgeirsson. Computer Analysis of Two-Dimensional Gels: Automatic Matching. ElectroPhoresis, 5(5):297--303, 1984.Google Scholar
- D. Pokrajac, A. Lazarevic, and L. J. Latecki. Incremental Local Outlier Detection for Data Streams. In IEEE Symposium on Computational Intelligence and Data Mining (CIDM), 504--515. IEEE, Apr 2007.Google Scholar
- S. Ramaswamy, R. Rastogi, and K. Shim. Efficient Algorithms for Mining Outliers from Large Data Sets. SIGMOD Records, 29:427--438, May 2000. Google ScholarDigital Library
- Y. Sun, J. Han, J. Gao, and Y. Yu. iTopicModel: Information Network-Integrated Topic Modeling. In Proc. of the 9th IEEE Intl. Conf. on Data Mining (ICDM), 493--502. IEEE Computer Society, 2009. Google ScholarDigital Library
Index Terms
- Integrating community matching and outlier detection for mining evolutionary community outliers
Recommendations
An Algorithm for Mining Top K Influential Community Based Evolutionary Outliers in Temporal Dataset
ICTAI '13: Proceedings of the 2013 IEEE 25th International Conference on Tools with Artificial IntelligenceIdentifying outlier objects against main community evolution trends is not only meaningful itself for the purpose of finding novel evolution behaviors, but also helpful for better understanding the mainstream of community evolution. With the definition ...
Community trend outlier detection using soft temporal pattern mining
ECMLPKDD'12: Proceedings of the 2012th European Conference on Machine Learning and Knowledge Discovery in Databases - Volume Part IINumerous applications, such as bank transactions, road traffic, and news feeds, generate temporal datasets, in which data evolves continuously. To understand the temporal behavior and characteristics of the dataset and its elements, we need effective ...
Community-based anomaly detection in evolutionary networks
Networks of dynamic systems, including social networks, the World Wide Web, climate networks, and biological networks, can be highly clustered. Detecting clusters, or communities, in such dynamic networks is an emerging area of research; however, less ...
Comments