ABSTRACT
Aggregation operation plays an important role in time series database management. As the amount of data increases, current solutions such as summary table and MapReduce-based methods struggle to respond to such queries with low latency. Other approaches such as segment tree based methods have a poor insertion performance when the data size exceeds the available memory. This paper proposes a new segment tree based index called PISA, which has fast insertion performance and low latency for aggregation queries. PISA uses a forest to overcome the performance disadvantages of insertions in traditional segment trees. By defining two kinds of tags, namely code number and serial number, we propose an algorithm to accelerate queries by avoiding reading unnecessary data on disk. The index is stored on disk and only takes a few hundred bytes of memory for billions of data points. PISA can be easily implemented on both traditional databases and NoSQL systems, examples including MySQL and Cassandra. It handles aggregation queries within milliseconds on a commodity server for a time range that may contain tens of billions of data points.
- V. Abramova and J. Bernardino. NoSQL databases: MongoDB vs Cassandra. In C3S2E, pages 14--22. ACM, 2013. Google ScholarDigital Library
- P. Bhatotia, M. Dischinger, R. Rodrigues, and U. A. Acar. Slider: Incremental sliding-window computations for large-scale data analysis. CITI, Universidade Nova de Lisboa, Lisbon, 2012.Google Scholar
- M. de Berg, O. Cheong, M. van Kreveld, and M. Overmars. More geometric data structures. Computational Geometry: Algorithms and Applications, pages 219--241, 2008.Google ScholarCross Ref
- J. Dean and S. Ghemawat. MapReduce: simplified data processing on large clusters. CACM, 51(1):107--113, 2008. Google ScholarDigital Library
- J. Gamper, M. Böhlen, and C. S. Jensen. Temporal aggregation. In Encyclopedia of database systems, pages 2924--2929. Springer, 2009.Google ScholarCross Ref
- J. Gordevičius, J. Gamper, and M. Böhlen. Parsimonious temporal aggregation. VLDBJ, 21(3):309--332, 2012. Google ScholarDigital Library
- M. Kornacker and Behm. Impala: A modern, open-source SQL engine for Hadoop. CIDR, 2015.Google Scholar
- W. Lehner, B. Cochrane, H. Pirahesh, and M. Zaharioudakis. Applying mass query optimization to speed up automatic summary table refresh. In ICDE, 2001.Google ScholarDigital Library
- C. Li, J. Chen, C. Jin, R. Zhang, and A. Zhou. MR-tree: an efficient index for MapReduce. IJCS, 27(6):828--838, 2014. Google ScholarDigital Library
- Y. Li, G. Kim, L. Wen, and H. Bae. Mhb-tree: A distributed spatial index method for document based nosql database system. In UITA, pages 489--497. Springer, 2013.Google ScholarCross Ref
- G. Mahlknecht, A. Dignös, and J. Gamper. Efficient computation of parsimonious temporal aggregation. In ADIS, pages 320--333. Springer, 2015.Google ScholarCross Ref
- D. P. Mehta and S. Sahni. Handbook of data structures and applications. CRC Press, 2004. Google ScholarCross Ref
- B. Moon, I. F. V. Lopez, and V. Immanuel. Efficient algorithms for large-scale temporal aggregation. TKDE, 15(3):744--759, 2003. Google ScholarDigital Library
- S. B. Navathe and R. Ahmed. A temporal relational model and a query language. Information Sciences, 49(1):147--175, 1989. Google ScholarDigital Library
- P. Nikolov and G. Pierre. Aggregate queries in NoSQL cloud data stores. University Amsterdam, 2011.Google Scholar
- D. Pallett. Improving query performance of holistic aggregate queries for real-time data exploration. 2014.Google Scholar
- Z. Parker, S. Poe, and S. V. Vrbsky. Comparing NoSQL MongoDB to an SQL DB. In ACMSE, pages 5:1--5:6, New York, NY, USA, 2013. ACM. Google ScholarDigital Library
- D. Peng, K. Duan, and L. Xie. Improving the performance of aggregate queries with cached tuples in MapReduce. IJDTA, 6(1):13--24, 2013.Google Scholar
- V. Raman, G. Attaluri, R. Barber, N. Chainani, D. Kalmuk, V. KulandaiSamy, J. Leenstra, S. Lightstone, S. Liu, G. M. Lohman, et al. DB2 with BLU acceleration: So much more than just a column store. VLDB, 6(11):1080--1091, 2013. Google ScholarDigital Library
- V. Raman, G. Attaluri, R. Barber, N. Chainani, D. Kalmuk, V. KulandaiSamy, J. Leenstra, S. Lightstone, S. Liu, G. M. Lohman, et al. DB2 with BLU acceleration: So much more than just a column store. VLDB, 6(11):1080--1091, 2013. Google ScholarDigital Library
- K. Tangwongsan, M. Hirzel, S. Schneider, and K.-L. Wu. General incremental sliding-window aggregation. VLDB, 8(7):702--713, 2015. Google ScholarDigital Library
- V. K. Wei. Generalized hamming weights for linear codes. IEEE TIT, 37(5):1412--1418, 1991. Google ScholarDigital Library
- J. Yang and J. Widom. Incremental computation and maintenance of temporal aggregates. VLDBJ, 12(3):262--283, 2003. Google ScholarDigital Library
Index Terms
PISA: An Index for Aggregating Big Time Series Data
Recommendations
Dual-PISA: An index for aggregation operations on time series data
AbstractAggregation operations play an essential role in time series database management. As the number of data increases, it is difficult for current solutions, such as summary table and MapReduce-based methods to respond to such queries with ...
Highlights- The paper proposes a new online index that supports fast aggregation queries.
- ...
PISA: Performance Models for Index Structures with and without Aggregated Data
SSDBM '99: Proceedings of the 11th International Conference on Scientific and Statistical Database ManagementDifferent models to estimate the performance of tree-based index structures exist. Materialized aggregates in the inner nodes of such index structures are used to speed up range queries on aggregates. This is achieved by avoiding traversing the index ...
Research of Temporal Information Index Strategy Based on HBase
In view of the demand of massive unstructured data storage and fast retrieval, in this study, it puts forward a distributed and unstructured database HBase under the Hadoop platform to store massive temporal data, and to build a temporal data storage ...
Comments