ABSTRACT
We demonstrate VerdictDB, the first platform-independent approximate query processing (AQP) system. Unlike existing AQP systems that are tightly-integrated into a specific database, VerdictDB operates at the driver-level, acting as a middleware between users and off-the-shelf database systems. In other words, VerdictDB requires no modifications to the database internals; it simply relies on rewriting incoming queries such that the standard execution of the rewritten queries under relational semantics yields approximate answers to the original queries. VerdictDB exploits a novel technique for error estimation called variational subsampling, which is amenable to efficient computation via SQL. In this demonstration, we showcase VerdictDB's performance benefits (up to two orders of magnitude) compared to the queries that are issued directly to existing query engines. We also illustrate that the approximate answers returned by VerdictDB are nearly identical to the exact answers. We use Apache Spark SQL and Amazon Redshift as two examples of modern distributed query platforms. We allow the audience to explore VerdictDB using a web-based interface (e.g., Hue or Apache Zeppelin) to issue queries and visualize their answers. VerdictDB is currently open-sourced and available under Apache License (V2).
- Apache zeppelin. https://zeppelin.apache.org/. Accessed: 2017-09--17.Google Scholar
- Fast, approximate analysis of big data (yahoo's druid). http://yahooeng.tumblr.com/post/135390948446/data-sketches. Accessed: 2017-09--17.Google Scholar
- Instacart Orders, Open Sourced. https://www.instacart.com/datasets/grocery-shopping-2017. Accessed: 2017-09--17.Google Scholar
- Presto: Distributed SQL query engine for big data. https://prestodb.io/docs/current/release/release-0.61.html. Accessed: 2017-09--17.Google Scholar
- TPC-H Benchmark. http://www.tpc.org/tpch/. Accessed: 2017-09--17.Google Scholar
- VerdictDB. http://verdictdb.org/. Accessed: 2017-09--17.Google Scholar
- S. Acharya, P. B. Gibbons, V. Poosala, and S. Ramaswamy. Join synopses for approximate query answering. In SIGMOD, 1999. Google ScholarDigital Library
- S. Agarwal, H. Milner, A. Kleiner, A. Talwalkar, M. Jordan, S. Madden, B. Mozafari, and I. Stoica. Knowing when you're wrong: Building fast and reliable approximate query processing systems. In SIGMOD, 2014. Google ScholarDigital Library
- S. Agarwal, B. Mozafari, A. Panda, H. Milner, S. Madden, and I. Stoica. BlinkDB: queries with bounded errors and bounded response times on very large data. In EuroSys, 2013. Google ScholarDigital Library
- S. Agarwal, A. Panda, B. Mozafari, A. P. Iyer, S. Madden, and I. Stoica. Blink and it's done: Interactive queries on very large data. PVLDB, 2012. Google ScholarDigital Library
- S. Chaudhuri, G. Das, and V. Narasayya. Optimized stratified sampling for approximate query processing. TODS, 2007. Google ScholarDigital Library
- K. Eykholt, A. Prakash, and B. Mozafari. Ensuring authorized updates in multi-user database-backed applications. In USENIX Security Symposium, 2017.Google Scholar
- Infobright. Infobright approximate query (iaq). https://infobright.com/introducing-iaq/. Accessed: 2017-09--17.Google Scholar
- S. Kandula, A. Shanbhag, A. Vitorovic, M. Olma, R. Grandl, S. Chaudhuri, and B. Ding. Quickr: Lazily approximating complex adhoc queries in bigdata clusters. In SIGMOD, 2016. Google ScholarDigital Library
- F. Li, B. Wu, K. Yi, and Z. Zhao. Wander join: Online aggregation via random walks. In Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference 2016, San Francisco, CA, USA, June 26 - July 01, 2016, 2016. Google ScholarDigital Library
- B. Mozafari. Verdict: A system for stochastic query planning. In CIDR, Biennial Conference on Innovative Data Systems, 2015.Google Scholar
- B. Mozafari. Approximate query engines: Commercial challenges and research opportunities. In SIGMOD, 2017. Google ScholarDigital Library
- B. Mozafari, C. Curino, A. Jindal, and S. Madden. Performance and resource modeling in highly-concurrent OLTP workloads. In SIGMOD, 2013. Google ScholarDigital Library
- B. Mozafari, C. Curino, and S. Madden. DBSeer: Resource and performance prediction for building a next generation database cloud. In CIDR, 2013.Google Scholar
- B. Mozafari, E. Z. Y. Goh, and D. Y. Yoon. CliffGuard: A principled framework for finding robust database designs. In SIGMOD, 2015. Google ScholarDigital Library
- B. Mozafari and N. Niu. A handbook for building an approximate query engine. IEEE Data Eng. Bull., 2015.Google Scholar
- B. Mozafari, J. Ramnarayan, S. Menon, Y. Mahajan, S. Chakraborty, H. Bhanawat, and K. Bachhav. SnappyData: A unified cluster for streaming, transactions, and interactive analytics. In CIDR, 2017.Google Scholar
- B. Mozafari and C. Zaniolo. Optimal load shedding with aggregates and mining queries. In ICDE, 2010.Google ScholarCross Ref
- N. Pansare, V. R. Borkar, C. Jermaine, and T. Condie. Online aggregation for large mapreduce jobs. PVLDB, 4, 2011.Google Scholar
- Y. Park, M. Cafarella, and B. Mozafari. Visualization-aware sampling for very large databases. ICDE, 2016.Google ScholarCross Ref
- Y. Park, B. Mozafari, J. Sorenson, and J. Wang. VerdictDB: universalizing approximate query processing. In SIGMOD, 2018. Google ScholarDigital Library
- Y. Park, A. S. Tajik, M. Cafarella, and B. Mozafari. Database Learning: Towards a database that becomes smarter every time. In SIGMOD, 2017. Google ScholarDigital Library
- A. Pol and C. Jermaine. Relational confidence bounds are easy with the bootstrap. In SIGMOD, 2005. Google ScholarDigital Library
- D. N. Politis and J. P. Romano. Large sample confidence regions based on subsamples under minimal assumptions. The Annals of Statistics, 1994.Google Scholar
- J. Ramnarayan, B. Mozafari, S. Menon, S. Wale, N. Kumar, H. Bhanawat, S. Chakraborty, Y. Mahajan, R. Mishra, and K. Bachhav. SnappyData: A hybrid transactional analytical store built on spark. In SIGMOD, 2016. Google ScholarDigital Library
- H. Su, M. Zait, V. Barrière, J. Torres, and A. Menck. Approximate aggregates in oracle 12c, 2016.Google Scholar
- S. Wu, B. C. Ooi, and K.-L. Tan. Continuous Sampling for Online Aggregation over Multiple Queries. In SIGMOD, pages 651--662, 2010. Google ScholarDigital Library
- K. Zeng, S. Gao, J. Gu, B. Mozafari, and C. Zaniolo. ABS: a system for scalable approximate queries with accuracy guarantees. In SIGMOD, 2014. Google ScholarDigital Library
- K. Zeng, S. Gao, B. Mozafari, and C. Zaniolo. The analytical bootstrap: a new method for fast error estimation in approximate query processing. In SIGMOD, 2014. Google ScholarDigital Library
Index Terms
- Demonstration of VerdictDB, the Platform-Independent AQP System
Recommendations
Database Learning: Toward a Database that Becomes Smarter Every Time
SIGMOD '17: Proceedings of the 2017 ACM International Conference on Management of DataIn today's databases, previous query answers rarely benefit answering future queries. For the first time, to the best of our knowledge, we change this paradigm in an approximate query processing (AQP) context. We make the following observation: the ...
VerdictDB: Universalizing Approximate Query Processing
SIGMOD '18: Proceedings of the 2018 International Conference on Management of DataDespite 25 years of research in academia, approximate query processing (AQP) has had little industrial adoption. One of the major causes of this slow adoption is the reluctance of traditional vendors to make radical changes to their legacy codebases, ...
Sampling-Based AQP in Modern Analytical Engines
DaMoN '22: Proceedings of the 18th International Workshop on Data Management on New HardwareAs the data volume grows, reducing the query execution times remains an elusive goal. While approximate query processing (AQP) techniques present a principled method to trade off accuracy for faster queries in analytics, the sample creation is often ...
Comments