ABSTRACT
Modern data analytical tasks often witness very wide tables, from a few hundred columns to a few thousand. While it is commonly agreed that column stores are an appropriate data format for wide tables and analytical workloads, the physical order of columns has not been investigated. Column ordering plays a critical role in I/O performance, because in wide tables accessing the columns in a single horizontal partition may involve multiple disk seeks. An optimal column ordering will incur minimal cumulative disk seek costs for the set of queries applied to the data. In this paper, we aim to find such an optimal column layout to maximize I/O performance. Specifically, we study two problems for column stores on HDFS: column ordering and column duplication. Column ordering seeks an approximately optimal order of columns; column duplication complements column ordering in that some columns may be duplicated multiple times to reduce contention among the queries' diverse requirements on the column order. We consider an actual fine-grained cost model for column accesses and propose algorithms that take a query workload as input and output a column ordering strategy with or without storage redundancy that significantly improves the overall I/O performance. Experimental results over real-life data and production query workloads confirm the effectiveness of the proposed algorithms in diverse settings.
- http://drill.apache.org/.Google Scholar
- http://impala.io/.Google Scholar
- http://parquet.apache.org/documentation/latest/.Google Scholar
- https://berlinbuzzwords.de/sites/berlinbuzzwords.de/ files/media/documents/ted_dunning-what_and_why_and_how_apache_drill.pdf.Google Scholar
- https://cwiki.apache.org/confluence/display/hive/languagemanual+orc.Google Scholar
- https://cwiki.apache.org/confluence/display/hive/ parquet.Google Scholar
- https://en.wikipedia.org/wiki/hamiltonian_path_problem.Google Scholar
- http://spark.apache.org/sql/.Google Scholar
- https://prestodb.io/.Google Scholar
- http://www.odbms.org/2014/03/star-schema-benchmark/.Google Scholar
- http://www.seagate.com/staticfiles/docs/pdf/datasheet/disc/barracuda-ds1737--1--1111us.pdf.Google Scholar
- http://www.seagate.com/staticfiles/docs/pdf/datasheet/disc/savvio10k5-fips-data-sheet-ds1727-4-1201-us.pdf.Google Scholar
- http://www.seagate.com/www-content/ product-content/constellation-fam/constellation-es/constellation-es-3/en-us/docs/constellation-es-3-data-sheet-ds1769-1-1210us.pdf.Google Scholar
- http://www.tpc.org/tpch/.Google Scholar
- D. Abadi, P. A. Boncz, S. Harizopoulos, S. Idreos, and S. Madden. The design and implementation of modern column-oriented database systems. Foundations and Trends in Databases, 5(3):197--280, 2013. Google ScholarDigital Library
- D. J. Abadi, P. A. Boncz, and S. Harizopoulos. Column-oriented database systems. PVLDB, 2(2):1664--1665, 2009. Google ScholarDigital Library
- D. J. Abadi, S. Madden, and N. Hachem. Column-stores vs. row-stores: how different are they really? In SIGMOD, pages 967--980, 2008. Google ScholarDigital Library
- D. J. Abadi, D. S. Myers, D. J. DeWitt, and S. R. Madden. Materialization strategies in a column-oriented dbms. In ICDE, pages 466--475, 2007.Google ScholarCross Ref
- S. Agarwal, B. Mozafari, A. Panda, H. Milner, S. Madden, and I. Stoica. Blinkdb: queries with bounded errors and bounded response times on very large data. In EuroSys, pages 29--42, 2013. Google ScholarDigital Library
- A. Ailamaki, D. J. DeWitt, M. D. Hill, and M. Skounakis. Weaving relations for cache performance. In VLDB, pages 169--180, 2001. Google ScholarDigital Library
- I. Alagiannis, S. Idreos, and A. Ailamaki. H2o: a hands-free adaptive store. In SIGMOD, pages 1103--1114, 2014. Google ScholarDigital Library
- R. H. Arpaci-Dusseau and A. C. Arpaci-Dusseau. Hard disk drives. In Operating Systems: Three Easy Pieces. Arpaci-Dusseau Books, 0.80 edition, May 2014.Google Scholar
- C. Curino, E. Jones, Y. Zhang, and S. Madden. Schism: a workload-driven approach to database replication and partitioning. PVLDB, 3(1--2):48--57, 2010. Google ScholarDigital Library
- A. Floratou, J. M. Patel, E. J. Shekita, and S. Tata. Column-oriented storage techniques for mapreduce. PVLDB, 4(7):419--429, 2011. Google ScholarDigital Library
- K. S. G., H. S. G., and M. B. Taghadosi. Importance of the initial conditions and the time schedule in the simulated annealing. In R. Chibante, editor, Simulated Annealing, Theory with Applications, chapter 12, pages 217--234. Sciyo, August 2010.Google Scholar
- M. Grund, J. Krüger, H. Plattner, A. Zeier, P. Cudre-Mauroux, and S. Madden. Hyrise: a main memory hybrid storage engine. PVLDB, 4(2):105--116, 2010. Google ScholarDigital Library
- S. Guo, J. Xiong, W. Wang, and R. Lee. Mastiff: A mapreduce-based system for time-based big data analytics. In CLUSTER, pages 72--80, 2012. Google ScholarDigital Library
- R. A. Hankins and J. M. Patel. Data morphing: an adaptive, cache-conscious storage technique. In VLDB, pages 417--428, 2003. Google ScholarDigital Library
- Y. He, R. Lee, Y. Huai, Z. Shao, N. Jain, X. Zhang, and Z. Xu. Rcfile: A fast and space-efficient data placement structure in mapreduce-based warehouse systems. In ICDE, pages 1199--1208, 2011. Google ScholarDigital Library
- Y. Huai, S. Ma, R. Lee, O. O'Malley, and X. Zhang. Understanding insights into the basic structure and essential issues of table placement methods in clusters. PVLDB, 6(14):1750--1761, 2013. Google ScholarDigital Library
- D. M. Jacobson and J. Wilkes. Disk scheduling algorithms based on rotational position. Citeseer, 1991.Google Scholar
- A. Jindal and J. Dittrich. Relax and let the database do the partitioning online. In BIRTE, pages 65--80, 2011.Google Scholar
- A. Jindal, E. Palatinus, V. Pavlov, and J. Dittrich. A comparison of knives for bread slicing. PVLDB, 6(6):361--372, 2013. Google ScholarDigital Library
- A. Jindal, J. Quiané-Ruiz, and J. Dittrich. Trojan data layouts: right shoes for a running elephant. In SOCC, page 21, 2011. Google ScholarDigital Library
- S. Kirkpatrick et al. Optimization by simmulated annealing. Science, 220(4598):671--680, 1983.Google ScholarCross Ref
- Y. Li and J. M. Patel. Widetable: An accelerator for analytical data processing. PVLDB, 7(10):907--918, 2014. Google ScholarDigital Library
- T. W. Manikas and J. T. Cain. Genetic algorithms vs. simulated annealing: A comparison of approaches for solving the circuit partitioning problem. 1996.Google Scholar
- S. Navathe, S. Ceri, G. Wiederhold, and J. Dou. Vertical partitioning algorithms for database design. TODS, 9(4):680--710, 1984. Google ScholarDigital Library
- Y. Nourani and B. Andresen. A comparison of simulated annealing cooling strategies. Journal of Physics A: Mathematical and General, 31(41):8373, 1998.Google ScholarCross Ref
- C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins. Pig latin: a not-so-foreign language for data processing. In SIGMOD, pages 1099--1110, 2008. Google ScholarDigital Library
- S. Papadomanolakis and A. Ailamaki. Autopart: Automating schema design for large scientific databases using data partitioning. In SSDBM, pages 383--392, 2004. Google ScholarDigital Library
- C. Ruemmler and J. Wilkes. An introduction to disk drive modeling. Computer, 27(3):17--28, 1994. Google ScholarDigital Library
- J. Shafer, S. Rixner, and A. Cox. The hadoop distributed filesystem: Balancing portability and performance. In ISPASS, pages 122--133, 2010.Google ScholarCross Ref
- D.'Śl'zak, J. Wróblewski, V. Eastwood, and P. Synak. Brighthouse: an analytic data warehouse for ad-hoc queries. PVLDB, 1(2):1337--1345, 2008. Google ScholarDigital Library
- M. Stonebraker, D. J. Abadi, A. Batkin, X. Chen, M. Cherniack, M. Ferreira, E. Lau, A. Lin, S. Madden, E. O'Neil, et al. C-store: a column-oriented dbms. In VLDB, pages 553--564, 2005. Google ScholarDigital Library
- L. Sun, M. J. Franklin, S. Krishnan, and R. S. Xin. Fine-grained partitioning for aggressive data skipping. In SIGMOD, pages 1115--1126, 2014. Google ScholarDigital Library
- H. Szu and R. Hartley. Fast simulated annealing. Physics letters A, 122(3):157--162, 1987.Google Scholar
- A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, N. Zhang, S. Anthony, H. Liu, and R. Murthy. Hive - a petabyte scale data warehouse using hadoop. In ICDE, pages 996--1005, 2010.Google ScholarCross Ref
- R. Van Meter. Observing the effects of multi-zone disks. In The Usenix Technical Conference, pages 19--30, 1997. Google ScholarDigital Library
- T. White. The hadoop distributed file system. In Hadoop: The Definitive Guide. O'Reilly Media, Inc., 4 edition, March 2015. Google ScholarDigital Library
- D. Whitley. A genetic algorithm tutorial. Statistics and computing, 4(2):65--85, 1994.Google ScholarCross Ref
- Y. Yan, L. J. Chen, and Z. Zhang. Error-bounded sampling for analytics on big sparse data. PVLDB, 7(13):1508--1519, 2014. Google ScholarDigital Library
- M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. Spark: Cluster computing with working sets. In HotCloud, page 95, 2010. Google ScholarDigital Library
- J. Zhou, N. Bruno, and W. Lin. Advanced partitioning techniques for massively distributed computation. In SIGMOD, pages 13--24, 2012. Google ScholarDigital Library
Index Terms
- Wide Table Layout Optimization based on Column Ordering and Duplication
Recommendations
A column pre-ordering strategy for the unsymmetric-pattern multifrontal method
A new method for sparse LU factorization is presented that combines a column pre-ordering strategy with a right-looking unsymmetric-pattern multifrontal numerical factorization. The column ordering is selected to give a good a priori upper bound on fill-...
A column approximate minimum degree ordering algorithm
Sparse Gaussian elimination with partial pivoting computes the factorization PAQ = LU of a sparse matrix A, where the row ordering P is selected during factorization using standard partial pivoting with row interchanges. The goal is to select a column ...
Finding Optimal Ordering of Sparse Matrices for Column-Oriented Parallel Cholesky Factorization
In this paper, we consider the problem of finding fill-preserving sparse matrix orderings for parallel factorization. That is, given a large sparse symmetric and positive definite matrix A that has been ordered by some fill-reducing ordering, we want to ...
Comments