ABSTRACT
The rapid increase in data volumes and complexity of applied analytical tasks poses a big challenge for visualization solutions. It is important to keep the experience highly interactive, so that users stay engaged and can perform insightful data exploration. Query processing usually dominates the cost of visualization generation. Therefore, in order to achieve acceptable response times, one needs to utilize backend capabilities to the fullest and apply techniques, such as caching or prefetching. In this paper we discuss key data processing components in Tableau: the query processor, query caches, Tableau Data Engine [1, 2] and Data Server. Furthermore, we cover recent performance improvements related to the number and quality of remote queries, broader reuse of cached data, and application of inter and intra query parallelism.
- Richard Wesley, Matthew Eldridge, and Pawel T. Terlecki. 2011. An analytic data engine for visualization in tableau. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of data (SIGMOD '11). ACM, New York, NY, USA, 1185--1194. DOI= http://doi.acm.org/10.1145/1989323.1989449 Google ScholarDigital Library
- Richard Michael Grantham Wesley and Pawel Terlecki. 2014. Leveraging compression in the tableau data engine. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data (SIGMOD '14). ACM, New York, NY, USA, 563--573. DOI=http://doi.acm.org/10.1145/2588555.2595639 Google ScholarDigital Library
- Boncz, P., Zukowski, M., and Nes, N. MonetDB/X100: Hyper-Pipelining Query Execution. In International Conference on Innovative Data Systems Research (CIDR), Jan. 2005, 225--237.Google Scholar
- G. Graefe, "Volcano: An extensible and parallel query evaluation system," IEEE Transactions on Knowledge and Data Engineering, 120--135, 1994. Google ScholarDigital Library
- J. Zhou, P. Larson, and R. Chaiken. Incorporating partitioning and parallel plans into the SCOPE optimizer. In ICDE, 2010.Google ScholarCross Ref
- LibXL http://www.libxl.com/Google Scholar
- Abadi, D. J., Madden, S. R., and Hachem, N. 2008. Column-stores vs. row-stores: how different are they really? In Proceedings of the 2008 ACM SIGMOD international Conference on Management of Data (Vancouver, Canada, June 09 - 12, 2008). SIGMOD '08. ACM, New York, NY, 967--980. Google ScholarDigital Library
- Boncz, P. Monet: A Next-Generation DBMS Kernel For Query-Intensive Applications. Doctoral Thesis, Universiteit van Amsterdam, Amsterdam, The Netherlands, May 2002.Google Scholar
- Zukowski, Marcin, and Peter A. Boncz. "Vectorwise: Beyond column stores."IEEE Data Engineering Bulletin 35.1 (2012): 21--27.Google Scholar
- Shivnath Babu and Herodotos Herodotou (2013), "Massively Parallel Databases and MapReduce Systems", Foundations and Trends® in Databases: Vol. 5: No. 1, pp 1--104. http://dx.doi.org/10.1561/1900000036 Google ScholarDigital Library
- Franz Färber, Norman May, Wolfgang Lehner, Philipp Große, Ingo Müller, Hannes Rauhe, and Jonathan Dees. The SAP HANA Database -- An Architecture Overview. IEEE Data Engineering Bulletin, 35(1):28'33, 2012.Google Scholar
- Anikiej K. Multi-core Parallelization of Vectorized Queries {dissertation}. University of Warsaw and VU University of Amsterdam, 2010.Google Scholar
- P. G. Selinger, M. M. Astrahan, D. D. Chamberlin, R. A. Lorie, and T. G. Price. Access path selection in a relational database management system. In Proceedings of SIGMOD Conference, 1979. Google ScholarDigital Library
- M. Majster-Cederbaum. Elimination of redundant operations in relational queries with general selection operators. Computing, 34(4):303--323, 1984. Google ScholarDigital Library
- A. V. Aho, C. Beeri, and J. D. Ullman. The theory of joins in relational databases. ACM Trans. on Database Systems, 4(3): 297--314, 1979. Google ScholarDigital Library
- A. V. Aho, Y. Sagiv, and J. D. Ullman. Efficient optimization of a class of relational expression. ACM Trans. on Database Systems, 4(4):435--454, 1979. Google ScholarDigital Library
- Nikolaus Ott, Klaus Horländer, Removing redundant join operations in queries involving views, Information Systems, Volume 10, Issue 3, 1985, Pages 279--288 Google ScholarDigital Library
- Y. Sagiv and M. Yannakakis. Equivalences among relational expressions with the union and difference operator. Journal of the ACM, 27(4):633--655, 1980. Google ScholarDigital Library
- Halevy, Alon Y. "Answering queries using views: A survey." The VLDB Journal 10.4 (2001): 270--294. Google ScholarDigital Library
- Sara Cohen, Werner Nutt, and Yehoshua Sagiv. 2003. Containment of Aggregate Queries. In Proceedings of the 9th International Conference on Database Theory (ICDT '03), Diego Calvanese, Maurizio Lenzerini, and Rajeev Motwani (Eds.). Springer-Verlag, London, UK, UK, 111--125. Google ScholarDigital Library
- Chandra A.K., Merlin P.M. Optimal implementation of conjunctive queries in relational databases. In: Proc. Ninth AnnualACMSymposium on Theory of Computing.pp 77'90, 1977 Google ScholarDigital Library
- Zhang X., Ozsoyoglu M.Z. On efficient reasoning with implication constraints. In: Proc. of DOOD. pp 236'252, 1993Google ScholarCross Ref
- Chaudhuri S., Vardi M. Optimizing real conjunctive queries. In: Proc. of PODS. pp 59'70, Washington D.C., USA, 1993 Google ScholarDigital Library
- Chaudhuri S., Vardi M. On the complexity of equivalence between recursive and nonrecursive datalog programs. In: Proc. of PODS. pp 55'66, Minneapolis, Minn., USA, 1994 Google ScholarDigital Library
- Kolaitis P., Martin D., Thakur M. On the complexity of the containment problem for conjunctive queries with built-in predicates. In: Proc. of PODS. pp 197'204, Seattle,Wash., USA, 1998 Google ScholarDigital Library
- Tsatalos O.G., Solomon M.H., Ioannidis Y.E. The GMAP: a versatile tool for physical data independence. In: Proc. of VLDB. pp 367'378, Santiago, Chile, 1994 Google ScholarDigital Library
- Tsatalos O.G., Solomon M.H., Ioannidis Y.E. The GMAP: a versatile tool for physical data independence. VLDB J. (2):101'118, 1996 Google ScholarDigital Library
- Chaudhuri, S., Krishnamurthy, R., Potamianos, S., & Shim, K. (1995, March). Optimizing queries with materialized views. In 2013 IEEE 29th International Conference on Data Engineering (ICDE) (pp. 190--190). IEEE Computer Society. Google ScholarDigital Library
- Goldstein J., Larson P.A. Optimizing queries using materialized views: a practical, scalable solution. In: Proc. of SIGMOD. pp 331'342, 2001 Google ScholarDigital Library
- JarkeM. Common subexpression isolation in multiple query optimization. Query Processing in Database Systems, KimW, Reiner DS, Batory DS (eds.). Springer: Berlin, 1985Google Scholar
- Park J, Segev A. Using common subexpressions to optimize multiple queries. Proceedings of the 4th International Conference on Data Engineering. IEEE Computer Society: Washington, DC, 1988; 311'319. Google ScholarDigital Library
- Sellis T. Multiple query optimization. ACM Transactions on Database Systems 1988; 13(1):23'52. Google ScholarDigital Library
- Cosar A, Lim E, Srivastava J. Multiple query optimization with depth-first branch-and-bound and dynamic query ordering. CIKM 93, Proceedings of the Second International Conference on Information and Knowledge Management. ACM, 1993; 433'438. Google ScholarDigital Library
- Chen F, Dunham M. Common subexpression processing in multiple-query processing. IEEE Transactions on Knowledge and Data Engineering 1988; 10(3):493'499. Google ScholarDigital Library
- Roy P et al. Efficient and extensible algorithms for multi query optimization. Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data. ACM Press: New York, 2000; 249'260. Google ScholarDigital Library
- Tan K, Lu H. Workload scheduling for multiple query processing. Information Processing Letters 1995; 55(5):251'257. Google ScholarDigital Library
- Tan K, Lu H. Scheduling multiple queries in symmetric multiprocessors. Information Sciences 1996; 95(1/2):125'153. Google ScholarDigital Library
- Dalvi N et al. Pipelining in multi-query optimization. J. Comput. Syst. Sci. 2003; 66(4):728'762. Google ScholarDigital Library
- O'Gorman, Kevin, Amr El Abbadi, and Divyakant Agrawal. "Multiple query optimization in middleware using query teamwork." Software: Practice and Experience 35.4 (2005): 361--391. Google ScholarDigital Library
- Stolte, C., Tang, D., and Hanrahan, P. 2008. Polaris: a system for query, analysis, and visualization of multidimensional databases. Commun. ACM 51, 11 (Nov. 2008), 75--84. Google ScholarDigital Library
- http://redis.io/Google Scholar
- Lakshman, Avinash, and Prashant Malik. "Cassandra: a decentralized structured storage system." ACM SIGOPS Operating Systems Review 44.2 (2010): 35--40. Google ScholarDigital Library
- https://www.faa.gov/data_research/Google Scholar
- Milena G. Ivanova, Martin L. Kersten, Niels J. Nes, and Romulo A.P. Gonçalves. 2009. An architecture for recycling intermediates in a column-store. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of data (SIGMOD '09), Carsten Binnig and Benoit Dageville (Eds.). ACM, New York, NY, USA, 309--320 Google ScholarDigital Library
- Parag Agrawal , Daniel Kifer , Christopher Olston, Scheduling shared scans of large data files, Proceedings of the VLDB Endowment, v.1 n.1, August 2008 Google ScholarDigital Library
- Prasanth Jayachandran, Karthik Tunga, Niranjan Kamat, Arnab Nandi. Combining User Interaction, Speculative Query Execution and Sampling in the DICE System. PVLDB 7(13): 1697--1700 (2014) Google ScholarDigital Library
- Kristi Morton, Ross Bunker, Jock Mackinlay, Robert Morton, and Chris Stolte. 2012. Dynamic workload driven data integration in tableau. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data (SIGMOD '12). ACM, New York, NY, USA, 807--816. http://doi.acm.org/10.1145/2213836.2213961 Google ScholarDigital Library
- Shaul Dar , Michael J. Franklin , Björn Þór Jónsson , Divesh Srivastava , Michael Tan. Semantic Data Caching and Replacement, Proceedings of the 22th International Conference on Very Large Data Bases, p.330--341, September 03-06, 1996 Google ScholarDigital Library
Index Terms
- On Improving User Response Times in Tableau
Recommendations
An analytic data engine for visualization in tableau
SIGMOD '11: Proceedings of the 2011 ACM SIGMOD International Conference on Management of dataEfficient data processing is critical for interactive visualization of analytic data sets. Inspired by the large amount of recent research on column-oriented stores, we have developed a new specialized analytic data engine tightly-coupled with the ...
Cache conscious star-join in MapReduce environments
Cloud-I '13: Proceedings of the 2nd International Workshop on Cloud IntelligenceWith the popularity of big data and cloud computing, data parallel framework MapReduce based data warehouse systems are used widely. Column store is a default data placement in these systems. Traditionally star join is a core operation in the data ...
Exploratory Visualization of Surgical Training Databases for Improving Skill Acquisition
A new visualization system analyzes multidimensional surgical performance databases of information collected via emerging surgical robot and simulator technologies. In particular, it has visualized force, position, rotation, and synchronized video data ...
Comments