skip to main content
10.1145/3183713.3193538acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article
Public Access

Demonstration of VerdictDB, the Platform-Independent AQP System

Authors Info & Claims
Published:27 May 2018Publication History

ABSTRACT

We demonstrate VerdictDB, the first platform-independent approximate query processing (AQP) system. Unlike existing AQP systems that are tightly-integrated into a specific database, VerdictDB operates at the driver-level, acting as a middleware between users and off-the-shelf database systems. In other words, VerdictDB requires no modifications to the database internals; it simply relies on rewriting incoming queries such that the standard execution of the rewritten queries under relational semantics yields approximate answers to the original queries. VerdictDB exploits a novel technique for error estimation called variational subsampling, which is amenable to efficient computation via SQL. In this demonstration, we showcase VerdictDB's performance benefits (up to two orders of magnitude) compared to the queries that are issued directly to existing query engines. We also illustrate that the approximate answers returned by VerdictDB are nearly identical to the exact answers. We use Apache Spark SQL and Amazon Redshift as two examples of modern distributed query platforms. We allow the audience to explore VerdictDB using a web-based interface (e.g., Hue or Apache Zeppelin) to issue queries and visualize their answers. VerdictDB is currently open-sourced and available under Apache License (V2).

References

  1. Apache zeppelin. https://zeppelin.apache.org/. Accessed: 2017-09--17.Google ScholarGoogle Scholar
  2. Fast, approximate analysis of big data (yahoo's druid). http://yahooeng.tumblr.com/post/135390948446/data-sketches. Accessed: 2017-09--17.Google ScholarGoogle Scholar
  3. Instacart Orders, Open Sourced. https://www.instacart.com/datasets/grocery-shopping-2017. Accessed: 2017-09--17.Google ScholarGoogle Scholar
  4. Presto: Distributed SQL query engine for big data. https://prestodb.io/docs/current/release/release-0.61.html. Accessed: 2017-09--17.Google ScholarGoogle Scholar
  5. TPC-H Benchmark. http://www.tpc.org/tpch/. Accessed: 2017-09--17.Google ScholarGoogle Scholar
  6. VerdictDB. http://verdictdb.org/. Accessed: 2017-09--17.Google ScholarGoogle Scholar
  7. S. Acharya, P. B. Gibbons, V. Poosala, and S. Ramaswamy. Join synopses for approximate query answering. In SIGMOD, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. Agarwal, H. Milner, A. Kleiner, A. Talwalkar, M. Jordan, S. Madden, B. Mozafari, and I. Stoica. Knowing when you're wrong: Building fast and reliable approximate query processing systems. In SIGMOD, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. S. Agarwal, B. Mozafari, A. Panda, H. Milner, S. Madden, and I. Stoica. BlinkDB: queries with bounded errors and bounded response times on very large data. In EuroSys, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S. Agarwal, A. Panda, B. Mozafari, A. P. Iyer, S. Madden, and I. Stoica. Blink and it's done: Interactive queries on very large data. PVLDB, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. S. Chaudhuri, G. Das, and V. Narasayya. Optimized stratified sampling for approximate query processing. TODS, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. K. Eykholt, A. Prakash, and B. Mozafari. Ensuring authorized updates in multi-user database-backed applications. In USENIX Security Symposium, 2017.Google ScholarGoogle Scholar
  13. Infobright. Infobright approximate query (iaq). https://infobright.com/introducing-iaq/. Accessed: 2017-09--17.Google ScholarGoogle Scholar
  14. S. Kandula, A. Shanbhag, A. Vitorovic, M. Olma, R. Grandl, S. Chaudhuri, and B. Ding. Quickr: Lazily approximating complex adhoc queries in bigdata clusters. In SIGMOD, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. F. Li, B. Wu, K. Yi, and Z. Zhao. Wander join: Online aggregation via random walks. In Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference 2016, San Francisco, CA, USA, June 26 - July 01, 2016, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. B. Mozafari. Verdict: A system for stochastic query planning. In CIDR, Biennial Conference on Innovative Data Systems, 2015.Google ScholarGoogle Scholar
  17. B. Mozafari. Approximate query engines: Commercial challenges and research opportunities. In SIGMOD, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. B. Mozafari, C. Curino, A. Jindal, and S. Madden. Performance and resource modeling in highly-concurrent OLTP workloads. In SIGMOD, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. B. Mozafari, C. Curino, and S. Madden. DBSeer: Resource and performance prediction for building a next generation database cloud. In CIDR, 2013.Google ScholarGoogle Scholar
  20. B. Mozafari, E. Z. Y. Goh, and D. Y. Yoon. CliffGuard: A principled framework for finding robust database designs. In SIGMOD, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. B. Mozafari and N. Niu. A handbook for building an approximate query engine. IEEE Data Eng. Bull., 2015.Google ScholarGoogle Scholar
  22. B. Mozafari, J. Ramnarayan, S. Menon, Y. Mahajan, S. Chakraborty, H. Bhanawat, and K. Bachhav. SnappyData: A unified cluster for streaming, transactions, and interactive analytics. In CIDR, 2017.Google ScholarGoogle Scholar
  23. B. Mozafari and C. Zaniolo. Optimal load shedding with aggregates and mining queries. In ICDE, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  24. N. Pansare, V. R. Borkar, C. Jermaine, and T. Condie. Online aggregation for large mapreduce jobs. PVLDB, 4, 2011.Google ScholarGoogle Scholar
  25. Y. Park, M. Cafarella, and B. Mozafari. Visualization-aware sampling for very large databases. ICDE, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  26. Y. Park, B. Mozafari, J. Sorenson, and J. Wang. VerdictDB: universalizing approximate query processing. In SIGMOD, 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Y. Park, A. S. Tajik, M. Cafarella, and B. Mozafari. Database Learning: Towards a database that becomes smarter every time. In SIGMOD, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. A. Pol and C. Jermaine. Relational confidence bounds are easy with the bootstrap. In SIGMOD, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. D. N. Politis and J. P. Romano. Large sample confidence regions based on subsamples under minimal assumptions. The Annals of Statistics, 1994.Google ScholarGoogle Scholar
  30. J. Ramnarayan, B. Mozafari, S. Menon, S. Wale, N. Kumar, H. Bhanawat, S. Chakraborty, Y. Mahajan, R. Mishra, and K. Bachhav. SnappyData: A hybrid transactional analytical store built on spark. In SIGMOD, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. H. Su, M. Zait, V. Barrière, J. Torres, and A. Menck. Approximate aggregates in oracle 12c, 2016.Google ScholarGoogle Scholar
  32. S. Wu, B. C. Ooi, and K.-L. Tan. Continuous Sampling for Online Aggregation over Multiple Queries. In SIGMOD, pages 651--662, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. K. Zeng, S. Gao, J. Gu, B. Mozafari, and C. Zaniolo. ABS: a system for scalable approximate queries with accuracy guarantees. In SIGMOD, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. K. Zeng, S. Gao, B. Mozafari, and C. Zaniolo. The analytical bootstrap: a new method for fast error estimation in approximate query processing. In SIGMOD, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Demonstration of VerdictDB, the Platform-Independent AQP System

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SIGMOD '18: Proceedings of the 2018 International Conference on Management of Data
        May 2018
        1874 pages
        ISBN:9781450347037
        DOI:10.1145/3183713

        Copyright © 2018 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 27 May 2018

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        SIGMOD '18 Paper Acceptance Rate90of461submissions,20%Overall Acceptance Rate785of4,003submissions,20%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader