|
ABSTRACT
The goal is to monitor multiple numerical streams, and determine which pairs are correlated with lags, as well as the value of each such lag. Lag correlations (and anti-correlations) are frequent, and very interesting in practice: For example, a decrease in interest rates typically precedes an increase in house sales by a few months; higher amounts of fluoride in the drinking water may lead to fewer dental cavities, some years later. Additional settings include network analysis, sensor monitoring, financial data analysis, and moving object tracking. Such data streams are often correlated (or anti-correlated), but with an unknown lag.We propose BRAID, a method to detect lag correlations between data streams. BRAID can handle data streams of semi-infinite length, incrementally, quickly, and with small resource consumption. We also provide a theoretical analysis, which, based on Nyquist's sampling theorem, shows that BRAID can estimate lag correlations with little, and often with no error at all. Our experiments on real and realistic data show that BRAID detects the correct lag perfectly most of the time (the largest relative error was about 1%); while it is up to 40,000 times faster than the naive implementation.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Daniel J. Abadi , Don Carney , Ugur Çetintemel , Mitch Cherniack , Christian Convey , Sangdon Lee , Michael Stonebraker , Nesime Tatbul , Stan Zdonik, Aurora: a new model and architecture for data stream management, The VLDB Journal — The International Journal on Very Large Data Bases, v.12 n.2, p.120-139, August 2003
[doi> 10.1007/s00778-003-0095-z]
|
| |
2
|
|
 |
3
|
|
| |
4
|
|
| |
5
|
R. P. Brent. Algorithm for Minimization without Derivatives. Dover Publications, 2002.
|
| |
6
|
D. Carney, U. Cetintemel, A. Rasin, S. B. Zdonik, M. Cherniack, and M. Stonebraker. Operator scheduling in a data stream manager. VLDB, pages 838--849, Sept. 2003.
|
| |
7
|
S. Chandrasekaran, O. Cooper, A. Deshpande, M. J. Franklin, J. M. Hellerstein, W. Hong, S. Krishnamurthy, S. Madden, V. Raman, F. Reiss, and M. A. Shah. Telegraphcq: Continuous dataflow processing for an uncertain world. CIDR, Jan. 2003.
|
| |
8
|
S. Chandrasekaran and M. J. Franklin. Remembrance of streams past: Overload-sensitive management of archived streams. VLDB, pages 348--359, August-September 2004.
|
 |
9
|
|
 |
10
|
|
 |
11
|
|
 |
12
|
|
 |
13
|
Christos Faloutsos , M. Ranganathan , Yannis Manolopoulos, Fast subsequence matching in time-series databases, Proceedings of the 1994 ACM SIGMOD international conference on Management of data, p.419-429, May 24-27, 1994, Minneapolis, Minnesota, United States
|
| |
14
|
|
 |
15
|
|
 |
16
|
|
 |
17
|
A. C. Gilbert , S. Guha , P. Indyk , S. Muthukrishnan , M. Strauss, Near-optimal sparse fourier representations via sampling, Proceedings of the thiry-fourth annual ACM symposium on Theory of computing, May 19-21, 2002, Montreal, Quebec, Canada
[doi> 10.1145/509907.509933]
|
| |
18
|
|
| |
19
|
S. Guha, C. Kim, and K. Shim. Xwave: Approximate extended wavelets for streaming data. VLDB, pages 288--299, August-September 2004.
|
| |
20
|
|
| |
21
|
|
 |
22
|
|
| |
23
|
E. J. Keogh. Exact indexing of dynamic time warping. VLDB, pages 406--417, Aug. 2002.
|
 |
24
|
Eamonn Keogh , Kaushik Chakrabarti , Michael Pazzani , Sharad Mehrotra, Locally adaptive dimensionality reduction for indexing large time series databases, Proceedings of the 2001 ACM SIGMOD international conference on Management of data, p.151-162, May 21-24, 2001, Santa Barbara, California, United States
|
| |
25
|
K. Koper, T. Wallace, S. Taylor, and H. Hartse. Forensic seismology and the sinking of the kursk. EOS Trans., AGU, 82, pages 37,45--46, 2001.
|
| |
26
|
N. Koudas, B. C. Ooi, K.-L. Tan, and R. Zhang. Approximate nn queries on streams with guaranteed error/performance bounds. VLDB, pages 804--815, August-September 2004.
|
| |
27
|
B. P. Lathi. Signal Processing and Linear Systems. Oxford University Press, 1998.
|
 |
28
|
|
| |
29
|
R. Motwani, J. Widom, A. Arasu, B. Babcock, S. Babu, M. Datar, G. S. Manku, C. Olston, J. Rosenstein, and R. Varma. Query processing, approximation, and resource management in a data stream management system. CIDR, Jan. 2003.
|
| |
30
|
S. Papadimitriou, A. Brockwell, and C. Faloutsos. Adaptive, hands-off stream mining. VLDB, pages 560--571, Sept. 2003.
|
| |
31
|
N. Tatbul, U. Cetintemel, S. B. Zdonik, M. Cherniack, and M. Stonebraker. Load shedding in a data stream manager. VLDB, pages 309--320, Sept. 2003.
|
| |
32
|
M. Wang, T. Madhyastha, N. H. Chang, S. Papadimitriou, and C. Faloutsos. Data mining meets performance evaluation: Fast algorithms for modeling bursty traffic. ICDE, Feb. 2002.
|
| |
33
|
B.-K. Yi, N. Sidiropoulos, T. Johnson, H. Jagadish, C. Faloutsos, and A. Biliris. Online data mining for co-evolving time sequences. ICDE, pages 13--22, 2000.
|
| |
34
|
Y. Zhu and D. Shasha. Statistical monitoring of thousands of data streams in real time. VLDB, pages 358--369, Aug. 2002.
|
 |
35
|
|
CITED BY 3
|
Wei Peng , Tong Sun , Philip Rose , Tao Li, A semi-automatic system with an iterative learning method for discovering the leading indicators in business processes, Proceedings of the 2007 international workshop on Domain driven data mining, p.33-42, August 12-12, 2007, San Jose, California
|
|
|
|
|
|
|