skip to main content
10.1145/2983323.2983775acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

PISA: An Index for Aggregating Big Time Series Data

Published:24 October 2016Publication History

ABSTRACT

Aggregation operation plays an important role in time series database management. As the amount of data increases, current solutions such as summary table and MapReduce-based methods struggle to respond to such queries with low latency. Other approaches such as segment tree based methods have a poor insertion performance when the data size exceeds the available memory. This paper proposes a new segment tree based index called PISA, which has fast insertion performance and low latency for aggregation queries. PISA uses a forest to overcome the performance disadvantages of insertions in traditional segment trees. By defining two kinds of tags, namely code number and serial number, we propose an algorithm to accelerate queries by avoiding reading unnecessary data on disk. The index is stored on disk and only takes a few hundred bytes of memory for billions of data points. PISA can be easily implemented on both traditional databases and NoSQL systems, examples including MySQL and Cassandra. It handles aggregation queries within milliseconds on a commodity server for a time range that may contain tens of billions of data points.

References

  1. V. Abramova and J. Bernardino. NoSQL databases: MongoDB vs Cassandra. In C3S2E, pages 14--22. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. P. Bhatotia, M. Dischinger, R. Rodrigues, and U. A. Acar. Slider: Incremental sliding-window computations for large-scale data analysis. CITI, Universidade Nova de Lisboa, Lisbon, 2012.Google ScholarGoogle Scholar
  3. M. de Berg, O. Cheong, M. van Kreveld, and M. Overmars. More geometric data structures. Computational Geometry: Algorithms and Applications, pages 219--241, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  4. J. Dean and S. Ghemawat. MapReduce: simplified data processing on large clusters. CACM, 51(1):107--113, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. Gamper, M. Böhlen, and C. S. Jensen. Temporal aggregation. In Encyclopedia of database systems, pages 2924--2929. Springer, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  6. J. Gordevičius, J. Gamper, and M. Böhlen. Parsimonious temporal aggregation. VLDBJ, 21(3):309--332, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. Kornacker and Behm. Impala: A modern, open-source SQL engine for Hadoop. CIDR, 2015.Google ScholarGoogle Scholar
  8. W. Lehner, B. Cochrane, H. Pirahesh, and M. Zaharioudakis. Applying mass query optimization to speed up automatic summary table refresh. In ICDE, 2001.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. C. Li, J. Chen, C. Jin, R. Zhang, and A. Zhou. MR-tree: an efficient index for MapReduce. IJCS, 27(6):828--838, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Y. Li, G. Kim, L. Wen, and H. Bae. Mhb-tree: A distributed spatial index method for document based nosql database system. In UITA, pages 489--497. Springer, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  11. G. Mahlknecht, A. Dignös, and J. Gamper. Efficient computation of parsimonious temporal aggregation. In ADIS, pages 320--333. Springer, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  12. D. P. Mehta and S. Sahni. Handbook of data structures and applications. CRC Press, 2004. Google ScholarGoogle ScholarCross RefCross Ref
  13. B. Moon, I. F. V. Lopez, and V. Immanuel. Efficient algorithms for large-scale temporal aggregation. TKDE, 15(3):744--759, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. S. B. Navathe and R. Ahmed. A temporal relational model and a query language. Information Sciences, 49(1):147--175, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. P. Nikolov and G. Pierre. Aggregate queries in NoSQL cloud data stores. University Amsterdam, 2011.Google ScholarGoogle Scholar
  16. D. Pallett. Improving query performance of holistic aggregate queries for real-time data exploration. 2014.Google ScholarGoogle Scholar
  17. Z. Parker, S. Poe, and S. V. Vrbsky. Comparing NoSQL MongoDB to an SQL DB. In ACMSE, pages 5:1--5:6, New York, NY, USA, 2013. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. D. Peng, K. Duan, and L. Xie. Improving the performance of aggregate queries with cached tuples in MapReduce. IJDTA, 6(1):13--24, 2013.Google ScholarGoogle Scholar
  19. V. Raman, G. Attaluri, R. Barber, N. Chainani, D. Kalmuk, V. KulandaiSamy, J. Leenstra, S. Lightstone, S. Liu, G. M. Lohman, et al. DB2 with BLU acceleration: So much more than just a column store. VLDB, 6(11):1080--1091, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. V. Raman, G. Attaluri, R. Barber, N. Chainani, D. Kalmuk, V. KulandaiSamy, J. Leenstra, S. Lightstone, S. Liu, G. M. Lohman, et al. DB2 with BLU acceleration: So much more than just a column store. VLDB, 6(11):1080--1091, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. K. Tangwongsan, M. Hirzel, S. Schneider, and K.-L. Wu. General incremental sliding-window aggregation. VLDB, 8(7):702--713, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. V. K. Wei. Generalized hamming weights for linear codes. IEEE TIT, 37(5):1412--1418, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. J. Yang and J. Widom. Incremental computation and maintenance of temporal aggregates. VLDBJ, 12(3):262--283, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. PISA: An Index for Aggregating Big Time Series Data

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management
          October 2016
          2566 pages
          ISBN:9781450340731
          DOI:10.1145/2983323

          Copyright © 2016 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 24 October 2016

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          CIKM '16 Paper Acceptance Rate160of701submissions,23%Overall Acceptance Rate1,861of8,427submissions,22%

          Upcoming Conference

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader