research-article

Detection of Highly Correlated Live Data Streams

Authors:
Rakan Alseghayer

Department of Computer Science, University of Pittsburgh, USA

Department of Computer Science, University of Pittsburgh, USA
View Profile

,
Daniel Petrov

Department of Computer Science, University of Pittsburgh, USA

Department of Computer Science, University of Pittsburgh, USA
View Profile

,
Panos K. Chrysanthis

Department of Computer Science, University of Pittsburgh, USA

Department of Computer Science, University of Pittsburgh, USA
View Profile

,
Mohamed Sharaf

University of Queensland, Brisbane, Australia

University of Queensland, Brisbane, Australia
View Profile

,
Alexandros Labrinidis

Department of Computer Science, University of Pittsburgh, USA

Department of Computer Science, University of Pittsburgh, USA
View Profile

BIRTE '17: Proceedings of the International Workshop on Real-Time Business Intelligence and AnalyticsAugust 2017Article No.: 3Pages 1–8https://doi.org/10.1145/3129292.3129298

Published:28 August 2017Publication History

BIRTE '17: Proceedings of the International Workshop on Real-Time Business Intelligence and Analytics

Pages 1–8

ABSTRACT

More and more organizations (commercial, health, government and security) currently base their decisions on real-time analysis of fast arriving, large volumes of data streams. For such analysis to lead to actionable information in real-time and at the right time, the most recent data needs to be processed within a specified delay target. Effective solutions for analysis of such data streams rely on two techniques, (1) incremental sliding-window computation of aggregates, to avoid unnecessary recomputations and (2) intelligent scheduling of computational steps and operations. In this paper, we propose a solution that combines both of these techniques to find highly correlated data streams in real-time, using the Pearson Correlation Coefficient as a correlation metric for two windows of data streams. Specifically, we propose to partition a set of data streams into micro-batches that capture the delay target, use sliding windows within a range as the subsequences of values exhibiting a certain level of correlation, utilize the idea of sufficient statistics to incrementally compute the Pearson Correlation Coefficient of pairs of sliding windows, and adopt a deadline-aware priority scheduling to detect the highly correlated pairs of data streams. Our experimental results show that our scheme and in particular our Price-DCS with warm start scheduling algorithm outperform existing ones and enable high degree of interactivity in correlating live data streams micro-batches.

References

Richard Cole, Dennis Shasha, and Xiaojian Zhao. 2005. Fast Window Correlations over Uncooperative Time Series. In Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining (KDD '05). ACM, New York, NY, USA, 743--749. Google ScholarDigital Library
Kaiyu Feng, Gao Cong, Sourav S. Bhowmick, Wen-Chih Peng, and Chunyan Miao. 2016. Towards Best Region Search for Data Exploration (ACM SIGMOD'16). 1055--1070. Google ScholarDigital Library
Stratos Idreos, Olga Papaemmanouil, and Surajit Chaudhuri. 2015. Overview of Data Exploration Techniques (ACM SIGMOD'15). 277--281. Google ScholarDigital Library
Yahoo Inc. 2016. Yahoo Finance Historical Data. (2016). https://fmance.yahoo.com/quote/YHOO/historyGoogle Scholar
Alexander Kalinin, Ugur Cetintemel, and Stan Zdonik. 2014. Interactive Data Exploration Using Semantic Windows (ACM SIGMOD'14). 505--516. Google ScholarDigital Library
Alexander Kalinin, Ugur Cetintemel, and Stan Zdonik. 2015. Searchlight: Enabling Integrated Search and Exploration over Large Multidimensional Data. Proc. VLDB Endow. 8, 10 (jun 2015), 1094--1105. Google ScholarDigital Library
Dongeun Lee, Alex Sim, Jaesik Choi, and Kesheng Wu. 2016. Novel Data Reduction Based on Statistical Similarity (SSDBM '16). 21:1-21:12. Google ScholarDigital Library
Abdullah Mueen, Suman Nath, and Jie Liu. 2010. Fast Approximate Correlation for Massive Time-series Data (ACM SIGMOD '10). 171--182. Google ScholarDigital Library
Mahsa Orang and Nematollaah Shiri. 2015. Improving Performance of Similarity Measures for Uncertain Time series Using Preprocessing Techniques (SSDBM '15). 31:1-31:12. Google ScholarDigital Library
Daniel Petrov, Rakan Alseghayer, Mohamed Sharaf, Panos K. Chrysanthis, and Alexandros Labrinidis. 2017. Interactive Exploration of Correlated Time Series. In Proceedings of the ExploreDB'17 (ExploreDB'17). ACM, New York, NY, USA, Article 2, 6 pages. Google ScholarDigital Library
Yasushi Sakurai, Spiros Papadimitriou, and Christos Faloutsos. 2005. BRAID: Stream Mining Through Group Lag Correlations (ACM SIGMOD'05). 599--610. Google ScholarDigital Library
Ilari Shafer, Kai Ren, Vishnu Naresh Boddeti, Yoshihisa Abe, Gregory R. Ganger, and Christos Faloutsos. 2012. RainMon: An Integrated Approach to Mining Bursty Timeseries Monitoring Data (ACM KDD '12). 1158--1166. Google ScholarDigital Library
Eleni Tzirita Zacharatou, Farhan Tauheedz, Thomas Heinis, and Anastasia Ailamaki. 2015. RUBIK: Efficient Threshold Queries on Massive Time series (SSDBM '15). 18:1-18:12. Google ScholarDigital Library
Yunyue Zhu and Dennis Shasha. 2002. StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time. In Proceedings of the 28th International Conference on Very Large Data Bases (VLDB '02). VLDB Endowment, 358--369. http://dl.acm.org/citation.cfm?id=1287369.1287401 Google ScholarDigital Library
Kostas Zoumpatianos, Stratos Idreos, and Themis Palpanas. 2014. Indexing for Interactive Exploration of Big Data series (ACM SIGMOD '14). 1555--1566. Google ScholarDigital Library

Index Terms

Detection of Highly Correlated Live Data Streams
1. Information systems
  1. Information retrieval
    1. Users and interactive retrieval
  2. World Wide Web
    1. Web searching and information discovery
      1. Personalization

Recommendations

Strategies for Detection of Correlated Data Streams
ExploreDB 2018: Proceedings of the 5th International Workshop on Exploratory Search in Databases and the Web

There is an increasing demand for real-time analysis of large volumes of data streams that are produced at high velocity. The most recent data needs to be processed within a specified delay target in order for the analysis to lead to actionable result. ...
Read More
Identifying correlated heavy-hitters in a two-dimensional data stream

We consider online mining of correlated heavy-hitters (CHH) from a data stream. Given a stream of two-dimensional data, a correlated aggregate query first extracts a substream by applying a predicate along a primary dimension, and then computes an ...
Read More
Data Streams with Bounded Deletions
PODS '18: Proceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems

Two prevalent models in the data stream literature are the insertion-only and turnstile models. Unfortunately, many important streaming problems require a Θ(log(n)) multiplicative factor more space for turnstile streams than for insertion-only streams. ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
BIRTE '17: Proceedings of the International Workshop on Real-Time Business Intelligence and Analytics
August 2017
49 pages
ISBN:9781450354257
DOI:10.1145/3129292
Editors:
Damianos Chatziantoniou
Athens University of Economics and Business, Greece
,
Malu Castellanos
Teradata Aster, USA
,
Panos K. Chrysanthis
University of Pittsburgh, USA
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 28 August 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
correlation
data exploration
data streams
search
subsequence
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
BIRTE '17 Paper Acceptance Rate6of11submissions,55%Overall Acceptance Rate12of21submissions,57%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 6
  Total Citations
  View Citations
- 147
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Detection of Highly Correlated Live Data Streams

BIRTE '17: Proceedings of the International Workshop on Real-Time Business Intelligence and Analytics

ABSTRACT

References

Cited By

Index Terms

Recommendations

Strategies for Detection of Correlated Data Streams

Identifying correlated heavy-hitters in a two-dimensional data stream

Data Streams with Bounded Deletions

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Detection of Highly Correlated Live Data Streams

BIRTE '17: Proceedings of the International Workshop on Real-Time Business Intelligence and Analytics

ABSTRACT

References

Cited By

Index Terms

Recommendations

Strategies for Detection of Correlated Data Streams

Identifying correlated heavy-hitters in a two-dimensional data stream

Data Streams with Bounded Deletions

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media