research-article

Public Access

Demonstration of VerdictDB, the Platform-Independent AQP System

Authors:
Wen He

University of Michigan, Ann Arbor, Ann Arbor, MI, USA

University of Michigan, Ann Arbor, Ann Arbor, MI, USA
View Profile

,
Yongjoo Park

University of Michigan, Ann Arbor, Ann Arbor, MI, USA

University of Michigan, Ann Arbor, Ann Arbor, MI, USA
View Profile

,
Idris Hanafi

University of Michigan, Ann Arbor, Ann Arbor, MI, USA

University of Michigan, Ann Arbor, Ann Arbor, MI, USA
View Profile

,
Jacob Yatvitskiy

University of Michigan, Ann Arbor, Ann Arbor, MI, USA

University of Michigan, Ann Arbor, Ann Arbor, MI, USA
View Profile

,
Barzan Mozafari

University of Michigan, Ann Arbor, Ann Arbor, MI, USA

University of Michigan, Ann Arbor, Ann Arbor, MI, USA
View Profile

SIGMOD '18: Proceedings of the 2018 International Conference on Management of DataMay 2018Pages 1665–1668https://doi.org/10.1145/3183713.3193538

Published:27 May 2018Publication History

SIGMOD '18: Proceedings of the 2018 International Conference on Management of Data

Pages 1665–1668

ABSTRACT

We demonstrate VerdictDB, the first platform-independent approximate query processing (AQP) system. Unlike existing AQP systems that are tightly-integrated into a specific database, VerdictDB operates at the driver-level, acting as a middleware between users and off-the-shelf database systems. In other words, VerdictDB requires no modifications to the database internals; it simply relies on rewriting incoming queries such that the standard execution of the rewritten queries under relational semantics yields approximate answers to the original queries. VerdictDB exploits a novel technique for error estimation called variational subsampling, which is amenable to efficient computation via SQL. In this demonstration, we showcase VerdictDB's performance benefits (up to two orders of magnitude) compared to the queries that are issued directly to existing query engines. We also illustrate that the approximate answers returned by VerdictDB are nearly identical to the exact answers. We use Apache Spark SQL and Amazon Redshift as two examples of modern distributed query platforms. We allow the audience to explore VerdictDB using a web-based interface (e.g., Hue or Apache Zeppelin) to issue queries and visualize their answers. VerdictDB is currently open-sourced and available under Apache License (V2).

References

Apache zeppelin. https://zeppelin.apache.org/. Accessed: 2017-09--17.Google Scholar
Fast, approximate analysis of big data (yahoo's druid). http://yahooeng.tumblr.com/post/135390948446/data-sketches. Accessed: 2017-09--17.Google Scholar
Instacart Orders, Open Sourced. https://www.instacart.com/datasets/grocery-shopping-2017. Accessed: 2017-09--17.Google Scholar
Presto: Distributed SQL query engine for big data. https://prestodb.io/docs/current/release/release-0.61.html. Accessed: 2017-09--17.Google Scholar
TPC-H Benchmark. http://www.tpc.org/tpch/. Accessed: 2017-09--17.Google Scholar
VerdictDB. http://verdictdb.org/. Accessed: 2017-09--17.Google Scholar
S. Acharya, P. B. Gibbons, V. Poosala, and S. Ramaswamy. Join synopses for approximate query answering. In SIGMOD, 1999. Google ScholarDigital Library
S. Agarwal, H. Milner, A. Kleiner, A. Talwalkar, M. Jordan, S. Madden, B. Mozafari, and I. Stoica. Knowing when you're wrong: Building fast and reliable approximate query processing systems. In SIGMOD, 2014. Google ScholarDigital Library
S. Agarwal, B. Mozafari, A. Panda, H. Milner, S. Madden, and I. Stoica. BlinkDB: queries with bounded errors and bounded response times on very large data. In EuroSys, 2013. Google ScholarDigital Library
S. Agarwal, A. Panda, B. Mozafari, A. P. Iyer, S. Madden, and I. Stoica. Blink and it's done: Interactive queries on very large data. PVLDB, 2012. Google ScholarDigital Library
S. Chaudhuri, G. Das, and V. Narasayya. Optimized stratified sampling for approximate query processing. TODS, 2007. Google ScholarDigital Library
K. Eykholt, A. Prakash, and B. Mozafari. Ensuring authorized updates in multi-user database-backed applications. In USENIX Security Symposium, 2017.Google Scholar
Infobright. Infobright approximate query (iaq). https://infobright.com/introducing-iaq/. Accessed: 2017-09--17.Google Scholar
S. Kandula, A. Shanbhag, A. Vitorovic, M. Olma, R. Grandl, S. Chaudhuri, and B. Ding. Quickr: Lazily approximating complex adhoc queries in bigdata clusters. In SIGMOD, 2016. Google ScholarDigital Library
F. Li, B. Wu, K. Yi, and Z. Zhao. Wander join: Online aggregation via random walks. In Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference 2016, San Francisco, CA, USA, June 26 - July 01, 2016, 2016. Google ScholarDigital Library
B. Mozafari. Verdict: A system for stochastic query planning. In CIDR, Biennial Conference on Innovative Data Systems, 2015.Google Scholar
B. Mozafari. Approximate query engines: Commercial challenges and research opportunities. In SIGMOD, 2017. Google ScholarDigital Library
B. Mozafari, C. Curino, A. Jindal, and S. Madden. Performance and resource modeling in highly-concurrent OLTP workloads. In SIGMOD, 2013. Google ScholarDigital Library
B. Mozafari, C. Curino, and S. Madden. DBSeer: Resource and performance prediction for building a next generation database cloud. In CIDR, 2013.Google Scholar
B. Mozafari, E. Z. Y. Goh, and D. Y. Yoon. CliffGuard: A principled framework for finding robust database designs. In SIGMOD, 2015. Google ScholarDigital Library
B. Mozafari and N. Niu. A handbook for building an approximate query engine. IEEE Data Eng. Bull., 2015.Google Scholar
B. Mozafari, J. Ramnarayan, S. Menon, Y. Mahajan, S. Chakraborty, H. Bhanawat, and K. Bachhav. SnappyData: A unified cluster for streaming, transactions, and interactive analytics. In CIDR, 2017.Google Scholar
B. Mozafari and C. Zaniolo. Optimal load shedding with aggregates and mining queries. In ICDE, 2010.Google ScholarCross Ref
N. Pansare, V. R. Borkar, C. Jermaine, and T. Condie. Online aggregation for large mapreduce jobs. PVLDB, 4, 2011.Google Scholar
Y. Park, M. Cafarella, and B. Mozafari. Visualization-aware sampling for very large databases. ICDE, 2016.Google ScholarCross Ref
Y. Park, B. Mozafari, J. Sorenson, and J. Wang. VerdictDB: universalizing approximate query processing. In SIGMOD, 2018. Google ScholarDigital Library
Y. Park, A. S. Tajik, M. Cafarella, and B. Mozafari. Database Learning: Towards a database that becomes smarter every time. In SIGMOD, 2017. Google ScholarDigital Library
A. Pol and C. Jermaine. Relational confidence bounds are easy with the bootstrap. In SIGMOD, 2005. Google ScholarDigital Library
D. N. Politis and J. P. Romano. Large sample confidence regions based on subsamples under minimal assumptions. The Annals of Statistics, 1994.Google Scholar
J. Ramnarayan, B. Mozafari, S. Menon, S. Wale, N. Kumar, H. Bhanawat, S. Chakraborty, Y. Mahajan, R. Mishra, and K. Bachhav. SnappyData: A hybrid transactional analytical store built on spark. In SIGMOD, 2016. Google ScholarDigital Library
H. Su, M. Zait, V. Barrière, J. Torres, and A. Menck. Approximate aggregates in oracle 12c, 2016.Google Scholar
S. Wu, B. C. Ooi, and K.-L. Tan. Continuous Sampling for Online Aggregation over Multiple Queries. In SIGMOD, pages 651--662, 2010. Google ScholarDigital Library
K. Zeng, S. Gao, J. Gu, B. Mozafari, and C. Zaniolo. ABS: a system for scalable approximate queries with accuracy guarantees. In SIGMOD, 2014. Google ScholarDigital Library
K. Zeng, S. Gao, B. Mozafari, and C. Zaniolo. The analytical bootstrap: a new method for fast error estimation in approximate query processing. In SIGMOD, 2014. Google ScholarDigital Library

Index Terms

Demonstration of VerdictDB, the Platform-Independent AQP System
1. Information systems
  1. Data management systems
    1. Database management system engines
      1. Database query processing
        Query optimization
      2. Online analytical processing engines

Recommendations

Database Learning: Toward a Database that Becomes Smarter Every Time
SIGMOD '17: Proceedings of the 2017 ACM International Conference on Management of Data

In today's databases, previous query answers rarely benefit answering future queries. For the first time, to the best of our knowledge, we change this paradigm in an approximate query processing (AQP) context. We make the following observation: the ...
Read More
VerdictDB: Universalizing Approximate Query Processing
SIGMOD '18: Proceedings of the 2018 International Conference on Management of Data

Despite 25 years of research in academia, approximate query processing (AQP) has had little industrial adoption. One of the major causes of this slow adoption is the reluctance of traditional vendors to make radical changes to their legacy codebases, ...
Read More
Sampling-Based AQP in Modern Analytical Engines
DaMoN '22: Proceedings of the 18th International Workshop on Data Management on New Hardware

As the data volume grows, reducing the query execution times remains an elusive goal. While approximate query processing (AQP) techniques present a principled method to trade off accuracy for faster queries in analytics, the sample creation is often ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '18: Proceedings of the 2018 International Conference on Management of Data
May 2018
1874 pages
ISBN:9781450347037
DOI:10.1145/3183713
General Chairs:
Gautam Das
University of Texas at Arlington, USA
,
Christopher Jermaine
Rice University, USA
,
Philip Bernstein
Microsoft Research, USA
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 27 May 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
approximate query processing
data analytics
Qualifiers
- research-article
Conference

Acceptance Rates
SIGMOD '18 Paper Acceptance Rate90of461submissions,20%Overall Acceptance Rate785of4,003submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 7
  Total Citations
  View Citations
- 367
  Total Downloads
- Downloads (Last 12 months)38
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Demonstration of VerdictDB, the Platform-Independent AQP System

SIGMOD '18: Proceedings of the 2018 International Conference on Management of Data

ABSTRACT

References

Cited By

Index Terms

Recommendations

Database Learning: Toward a Database that Becomes Smarter Every Time

VerdictDB: Universalizing Approximate Query Processing

Sampling-Based AQP in Modern Analytical Engines

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Demonstration of VerdictDB, the Platform-Independent AQP System

SIGMOD '18: Proceedings of the 2018 International Conference on Management of Data

ABSTRACT

References

Cited By

Index Terms

Recommendations

Database Learning: Toward a Database that Becomes Smarter Every Time

VerdictDB: Universalizing Approximate Query Processing

Sampling-Based AQP in Modern Analytical Engines

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media