ABSTRACT
The problem of anomaly detection in time series has received a lot of attention in the past two decades. However, existing techniques cannot locate where the anomalies are within anomalous time series, or they require users to provide the length of potential anomalies. To address these limitations, we propose a self-learning online anomaly detection algorithm that automatically identifies anomalous time series, as well as the exact locations where the anomalies occur in the detected time series. We evaluate our approach on several real datasets, including two CPU manufacturing data from Intel. We demonstrate that our approach can successfully detect the correct anomalies without requiring any prior knowledge about the data.
- C. C. Aggarwal and S. Y. Philip. On clustering massive text and categorical data streams. Knowledge and information systems, 24(2):171--196, 2010.Google Scholar
- N. Begum, L. Ulanova, J. Wang, and E. Keogh. Accelerating dynamic time warping clustering with a novel admissible pruning strategy. In KDD, 2015. Google ScholarDigital Library
- S. Budalakoti, A. N. Srivastava, R. Akella, and E. Turkov. Anomaly detection in large sets of high-dimensional symbol sequences. Tech. Rep, 2006.Google Scholar
- V. Chandola, D. Cheboli, and V. Kumar. Detecting anomalies in a time series database. Tech. Rep., 2009.Google Scholar
- M. Gupta, J. Gao, C. Aggarwal, and J. Han. Outlier detection for temporal data. Synthesis Lectures on Data Mining and Knowledge Discovery, 2014. Google ScholarDigital Library
- D. M. Hawkins. Identification of outliers, volume 11. Springer, 1980.Google ScholarCross Ref
- Z. He, X. Xu, and S. Deng. Discovering cluster-based local outliers. Pattern Recognition Letters, 2003. Google ScholarDigital Library
- R. J. Hyndman, E. Wang, and N. Laptev. Large-scale unusual time series detection. In Proceedings of International Conference on Data Mining series, 2015. Google ScholarDigital Library
- H. Izakian and W. Pedrycz. Anomaly detection and characterization in spatial time series data: A cluster-centric approach. IEEE.T.Fuzzy Syst., 2014.Google Scholar
- P. Jaccard. The distribution of the flora in the alpine zone. New phytologist, 11(2):37--50, 1912.Google ScholarCross Ref
- E. Keogh, K. Chakrabarti, M. Pazzani, and S. Mehrotra. Dimensionality reduction for fast similarity search in large time series databases. Knowledge and information Systems, 2001.Google Scholar
- E. Keogh and J. Lin. Clustering of time-series subsequences is meaningless: implications for previous and future research. Knowl. and Inf. Syst., 2005. Google ScholarDigital Library
- E. Keogh, J. Lin, and A. Fu. Hot sax: Efficiently finding the most unusual time series subsequence. In ICDM, 2005. Google ScholarDigital Library
- E. Keogh, J. Lin, S.-H. Lee, and H. Van Herle. Finding the most unusual time series subsequence: algorithms and applications. Knowl. and Inf. Syst., 2007. Google ScholarDigital Library
- N. Laptev, S. Amizadeh, and I. Flint. Generic and scalable framework for automated time-series anomaly detection. In KDD, 2015. Google ScholarDigital Library
- Y. Li, J. Lin, and T. Oates. Visualizing variable-length time series motifs. In SDM, pages 895--906, 2012.Google ScholarCross Ref
- J. Lin, E. Keogh, L. Wei, and S. Lonardi. Experiencing SAX: a novel symbolic representation of time series. Data Mining and knowledge discovery, 2007. Google ScholarDigital Library
- C. G. Nevill-Manning and I. H. Witten. Identifying hierarchical structure in sequences: A linear-time algorithm. J. Artif. Intell. Res.(JAIR), 1997. Google ScholarDigital Library
- A. Pires and C. Santos-Pereira. Using clustering and robust estimators to detect outliers in multivariate data. In the Int'l. Conf. on Robust Stats., 2005.Google Scholar
- F. Pukelsheim. The three sigma rule. The American Statistician, 48(2):88--91, 1994.Google Scholar
- P. Senin, J. Lin, X. Wang, T. Oates, S. Gandhi, A. P. Boedihardjo, C. Chen, and S. Frankenstein. Time series anomaly discovery with grammar-based compression. In EDBT, pages 481--492, 2015.Google Scholar
- P. Senin, J. Lin, X. Wang, T. Oates, S. Gandhi, A. P. Boedihardjo, C. Chen, S. Frankenstein, and M. Lerner. Grammarviz 2.0: a tool for grammar-based pattern discovery in time series. In ECML/PKDD, pages 468--472. Springer, 2014.Google ScholarDigital Library
- K. Sequeira and M. Zaki. Admit: anomaly-based data mining for intrusions. In KDD, 2002. Google ScholarDigital Library
- H. Sun, Y. Bao, F. Zhao, G. Yu, and D. Wang. Cd-trees: An efficient index structure for outlier detection. In WAIM. Springer, 2004.Google Scholar
- H. Wang, M. Tang, Y.-S. Park, and C. E. Priebe. Locality statistics for anomaly detection in time series of graphs. Sig. Pro., IEEE Trans. on, 2014. Google ScholarDigital Library
- X. Wang, Y. Gao, J. Lin, H. Rangwala, and R. Mittu. A machine learning approach to false alarm detection for critical arrhythmia alarms. In ICMLA, 2015.Google ScholarCross Ref
- L. Wei, E. Keogh, and X. Xi. Saxually explicit images: finding unusual shapes. In ICDM, 2006. Google ScholarDigital Library
- Y. Xie, J. Huang, and R. Willett. Change-point detection for high-dimensional time series with missing data. J. Sel. Top. Signal Process., 2013.Google ScholarCross Ref
- Y. Zhang, N. Meratnia, and P. Havinga. Outlier detection techniques for wireless sensor networks: A survey. Com. Surveys & Tutorials, IEEE, 2010. Google ScholarDigital Library
Index Terms
- A Self-Learning and Online Algorithm for Time Series Anomaly Detection, with Application in CPU Manufacturing
Recommendations
Deep learning for anomaly detection in multivariate time series: Approaches, applications, and challenges
AbstractAnomaly detection has recently been applied to various areas, and several techniques based on deep learning have been proposed for the analysis of multivariate time series. In this study, we classify the anomalies into three types, ...
Highlights- The methods for anomaly detection on multivariate time series are reviewed.
- The ...
Anomaly and change point detection for time series with concept drift
AbstractAnomaly detection is one of the most important research contents in time series data analysis, which is widely used in many fields. In real world, the environment is usually dynamically changing, and the distribution of data changes over time, ...
Exact variable-length anomaly detection algorithm for univariate and multivariate time series
The problem of anomaly detection in time series has received a lot of attention in the past two decades. However, existing techniques cannot locate where the anomalies are within anomalous time series, or they require users to provide the length of ...
Comments