skip to main content
10.1145/3129292.3129298acmotherconferencesArticle/Chapter ViewAbstractPublication PagesbirteConference Proceedingsconference-collections
research-article

Detection of Highly Correlated Live Data Streams

Authors Info & Claims
Published:28 August 2017Publication History

ABSTRACT

More and more organizations (commercial, health, government and security) currently base their decisions on real-time analysis of fast arriving, large volumes of data streams. For such analysis to lead to actionable information in real-time and at the right time, the most recent data needs to be processed within a specified delay target. Effective solutions for analysis of such data streams rely on two techniques, (1) incremental sliding-window computation of aggregates, to avoid unnecessary recomputations and (2) intelligent scheduling of computational steps and operations. In this paper, we propose a solution that combines both of these techniques to find highly correlated data streams in real-time, using the Pearson Correlation Coefficient as a correlation metric for two windows of data streams. Specifically, we propose to partition a set of data streams into micro-batches that capture the delay target, use sliding windows within a range as the subsequences of values exhibiting a certain level of correlation, utilize the idea of sufficient statistics to incrementally compute the Pearson Correlation Coefficient of pairs of sliding windows, and adopt a deadline-aware priority scheduling to detect the highly correlated pairs of data streams. Our experimental results show that our scheme and in particular our Price-DCS with warm start scheduling algorithm outperform existing ones and enable high degree of interactivity in correlating live data streams micro-batches.

References

  1. Richard Cole, Dennis Shasha, and Xiaojian Zhao. 2005. Fast Window Correlations over Uncooperative Time Series. In Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining (KDD '05). ACM, New York, NY, USA, 743--749. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Kaiyu Feng, Gao Cong, Sourav S. Bhowmick, Wen-Chih Peng, and Chunyan Miao. 2016. Towards Best Region Search for Data Exploration (ACM SIGMOD'16). 1055--1070. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Stratos Idreos, Olga Papaemmanouil, and Surajit Chaudhuri. 2015. Overview of Data Exploration Techniques (ACM SIGMOD'15). 277--281. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Yahoo Inc. 2016. Yahoo Finance Historical Data. (2016). https://fmance.yahoo.com/quote/YHOO/historyGoogle ScholarGoogle Scholar
  5. Alexander Kalinin, Ugur Cetintemel, and Stan Zdonik. 2014. Interactive Data Exploration Using Semantic Windows (ACM SIGMOD'14). 505--516. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Alexander Kalinin, Ugur Cetintemel, and Stan Zdonik. 2015. Searchlight: Enabling Integrated Search and Exploration over Large Multidimensional Data. Proc. VLDB Endow. 8, 10 (jun 2015), 1094--1105. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Dongeun Lee, Alex Sim, Jaesik Choi, and Kesheng Wu. 2016. Novel Data Reduction Based on Statistical Similarity (SSDBM '16). 21:1-21:12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Abdullah Mueen, Suman Nath, and Jie Liu. 2010. Fast Approximate Correlation for Massive Time-series Data (ACM SIGMOD '10). 171--182. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Mahsa Orang and Nematollaah Shiri. 2015. Improving Performance of Similarity Measures for Uncertain Time series Using Preprocessing Techniques (SSDBM '15). 31:1-31:12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Daniel Petrov, Rakan Alseghayer, Mohamed Sharaf, Panos K. Chrysanthis, and Alexandros Labrinidis. 2017. Interactive Exploration of Correlated Time Series. In Proceedings of the ExploreDB'17 (ExploreDB'17). ACM, New York, NY, USA, Article 2, 6 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Yasushi Sakurai, Spiros Papadimitriou, and Christos Faloutsos. 2005. BRAID: Stream Mining Through Group Lag Correlations (ACM SIGMOD'05). 599--610. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Ilari Shafer, Kai Ren, Vishnu Naresh Boddeti, Yoshihisa Abe, Gregory R. Ganger, and Christos Faloutsos. 2012. RainMon: An Integrated Approach to Mining Bursty Timeseries Monitoring Data (ACM KDD '12). 1158--1166. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Eleni Tzirita Zacharatou, Farhan Tauheedz, Thomas Heinis, and Anastasia Ailamaki. 2015. RUBIK: Efficient Threshold Queries on Massive Time series (SSDBM '15). 18:1-18:12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Yunyue Zhu and Dennis Shasha. 2002. StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time. In Proceedings of the 28th International Conference on Very Large Data Bases (VLDB '02). VLDB Endowment, 358--369. http://dl.acm.org/citation.cfm?id=1287369.1287401 Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Kostas Zoumpatianos, Stratos Idreos, and Themis Palpanas. 2014. Indexing for Interactive Exploration of Big Data series (ACM SIGMOD '14). 1555--1566. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Detection of Highly Correlated Live Data Streams

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        BIRTE '17: Proceedings of the International Workshop on Real-Time Business Intelligence and Analytics
        August 2017
        49 pages
        ISBN:9781450354257
        DOI:10.1145/3129292

        Copyright © 2017 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 28 August 2017

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed limited

        Acceptance Rates

        BIRTE '17 Paper Acceptance Rate6of11submissions,55%Overall Acceptance Rate12of21submissions,57%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader