ABSTRACT
A fundamental problem in database systems is choosing the best physical design, i.e., a small set of auxiliary structures that enable the fastest execution of future queries. Almost all commercial databases come with designer tools that create a number of indices or materialized views (together comprising the physical design) that they exploit during query processing. Existing designers are what we call nominal; that is, they assume that their input parameters are precisely known and equal to some nominal values. For instance, since future workload is often not known a priori, it is common for these tools to optimize for past workloads in hopes that future queries and data will be similar. In practice, however, these parameters are often noisy or missing. Since nominal designers do not take the influence of such uncertainties into account, they find designs that are sub-optimal and remarkably brittle. Often, as soon as the future workload deviates from the past, their overall performance falls off a cliff, leading to customer discontent and expensive redesigns. Thus, we propose a new type of database designer that is robust against parameter uncertainties, so that overall performance degrades more gracefully when future workloads deviate from the past. Users express their risk tolerance by deciding on how much nominal optimality they are willing to trade for attaining their desired level of robustness against uncertain situations. To the best of our knowledge, this paper is the first to adopt the recent breakthroughs in the theory of robust optimization to build a practical framework for solving some of the most fundamental problems in databases, replacing today's brittle designs with a principled world of robust designs that can guarantee predictable and consistent performance.
- CliffGuard: A General Framework for Robust and Efficient Database Optimization. http://www.cliffguard.org.Google Scholar
- S. Acharya, P. B. Gibbons, and V. Poosala. Aqua: A fast decision support system using approximate query answers. In VLDB, 1999. Google ScholarDigital Library
- S. Acharya, P. B. Gibbons, and V. Poosala. Congressional samples for approximate answering of group-by queries. In SIGMOD, May 2000. Google ScholarDigital Library
- S. Agarwal, H. Milner, A. Kleiner, A. Talwalkar, M. Jordan, S. Madden, B. Mozafari, and I. Stoica. Knowing when you're wrong: Building fast and reliable approximate query processing systems. In SIGMOD, 2014. Google ScholarDigital Library
- S. Agarwal, B. Mozafari, A. Panda, H. Milner, S. Madden, and I. Stoica. Blinkdb: queries with bounded errors and bounded response times on very large data. In EuroSys, 2013. Google ScholarDigital Library
- S. Agarwal, A. Panda, B. Mozafari, A. P. Iyer, S. Madden, and I. Stoica. Blink and it's done: Interactive queries on very large data. PVLDB, 2012. Google ScholarDigital Library
- S. Agrawal, S. Chaudhuri, and V. Narasayya. Materialized view and index selection tool for microsoft sql server 2000. 2001.Google Scholar
- I. Alagiannis, D. Dash, K. Schnaitter, A. Ailamaki, and N. Polyzotis. An automated, yet interactive and portable db designer. In SIGMOD, 2010. Google ScholarDigital Library
- I. Alagiannis, S. Idreos, and A. Ailamaki. H2o: a hands-free adaptive store. In SIGMOD, 2014. Google ScholarDigital Library
- M. Armbrust and et. al. Piql: Success-tolerant query processing in the cloud. PVLDB, 5, 2011. Google ScholarDigital Library
- B. Babcock and S. Chaudhuri. Towards a robust query optimizer: A principled and practical approach. In SIGMOD, 2005. Google ScholarDigital Library
- B. Babcock, S. Chaudhuri, and G. Das. Dynamic sample selection for approximate query processing. In VLDB, 2003.Google ScholarDigital Library
- A. Ben-Tal, L. El Ghaoui, and A. Nemirovski. Robust optimization. Princeton University Press, 2009.Google ScholarCross Ref
- D. Bertsimas and D. B. Brown. Constrained stochastic lqc: a tractable approach. Automatic Control, IEEE Transactions on, 52(10), 2007.Google Scholar
- D. Bertsimas and et. al. Theory and applications of robust optimization. SIAM, 53, 2011. Google ScholarDigital Library
- D. Bertsimas, O. Nohadani, and K. M. Teo. Robust nonconvex optimization for simulation-based problems. Operations Research, 2007.Google Scholar
- D. Bertsimas and M. Sim. The price of robustness. Operations research, 52(1), 2004. Google ScholarDigital Library
- J. R. Birge and et. al. Improving thin-film manufacturing yield with robust optimization. Applied optics, 50, 2011.Google Scholar
- S. P. Boyd and L. Vandenberghe. Convex optimization. Cambridge university press, 2004. Google ScholarCross Ref
- D. P. Brown, J. Chaware, and M. Koppuravuri. Index selection in a database system, Mar. 3 2009. US Patent 7,499,907.Google Scholar
- N. Bruno, S. Chaudhuri, A. C. König, V. R. Narasayya, R. Ramamurthy, and M. Syamala. Autoadmin project at microsoft research: Lessons learned. IEEE Data Eng. Bull., 34(4), 2011.Google Scholar
- C. Chatfield. Time-series forecasting. CRC Press, 2002.Google Scholar
- S. Chaudhuri, G. Das, and V. Narasayya. Optimized stratified sampling for approximate query processing. TODS, 2007. Google ScholarDigital Library
- S. Chaudhuri, A. K. Gupta, and V. Narasayya. Compressing sql workloads. In SIGMOD, 2002. Google ScholarDigital Library
- S. Chaudhuri, H. Lee, and V. R. Narasayya. Variance aware optimization of parameterized queries. In SIGMOD, 2010. Google ScholarDigital Library
- S. Chaudhuri and V. Narasayya. Self-tuning database systems: A decade of progress. In VLDB, 2007. Google ScholarDigital Library
- S. Chaudhuri, V. Narasayya, M. Datar, et al. Linear programming approach to assigning benefit to database physical design structures, Nov. 21 2006. US Patent 7,139,778.Google Scholar
- A. N. K. Chen and et. al. Heuristics for selecting robust database structures with dynamic query patterns. EJOR, 168, 2006.Google Scholar
- X. Chen, M. Sim, and P. Sun. A robust optimization perspective on stochastic programming. Operations Research, 55, 2007. Google ScholarDigital Library
- F. Chu, J. Y. Halpern, and P. Seshadri. Least expected cost query optimization: An exercise in utility. In PODS, 1999. Google ScholarDigital Library
- B. e. Dageville. Automatic sql tuning in oracle 10g. In VLDB, 2004. Google ScholarDigital Library
- K. Deb. Geneas: A robust optimal design technique for mechanical component design. 1997.Google ScholarCross Ref
- X. Ding and J. Le. Adaptive projection in column-stores. In FSKD, 2011.Google ScholarCross Ref
- I. Doltsinis and et. al. Robust design of non-linear structures using optimization methods. Computer methods in applied mechanics and engineering, 194, 2005.Google Scholar
- D. Donjerkovic and R. Ramakrishnan. Probabilistic optimization of top n queries. In VLDB, 1999. Google ScholarDigital Library
- K. E. Gebaly and A. Aboulnaga. Robustness in automatic physical database design. In Advances in database technology, 2008. Google ScholarDigital Library
- G. Graefe and H. Kuno. Adaptive indexing for relational keys. In ICDEW, 2010.Google ScholarCross Ref
- G. Graefe and H. Kuno. Self-selecting, self-tuning, incrementally optimized indexes. In ICEDT, 2010. Google ScholarDigital Library
- F. Halim and et. al. Stochastic database cracking: Towards robust adaptive indexing in main-memory column-stores. PVLDB, 5, 2012. Google ScholarDigital Library
- M. Holze, A. Haschimi, and N. Ritter. Towards workload-aware self-management: Predicting significant workload shifts. In CDEW, 2010.Google ScholarCross Ref
- K.-L. Hsiung, S.-J. Kim, and S. Boyd. Power control in lognormal fading wireless channels with uptime probability specifications via robust geometric programming. In American Control Conference, 2005.Google Scholar
- R. Huang, R. Chirkova, and Y. Fathi. Two-stage stochastic view selection for data-analysis queries. In Advances in Databases and Information Systems, 2013.Google ScholarCross Ref
- S. Idreos, M. L. Kersten, and S. Manegold. Database cracking. In CIDR, 2007.Google Scholar
- S. Idreos, M. L. Kersten, and S. Manegold. Self-organizing tuple reconstruction in column-stores. In SIGMOD, 2009. Google ScholarDigital Library
- A. C. Konig and S. U. Nabar. Scalable exploration of physical database design. In ICDE, 2006. Google ScholarDigital Library
- M. Kormilitsin, R. Chirkova, Y. Fathi, and M. Stallmann. View and index selection for query-performance improvement: quality-centered algorithms and heuristics. In CIKM, 2008. Google ScholarDigital Library
- A. Lamb, M. Fuller, R. Varadarajan, N. Tran, B. Vandiver, L. Doshi, and C. Bear. The vertica analytic database: C-store 7 years later. PVLDB, 5(12), 2012. Google ScholarDigital Library
- K.-H. Lee and G.-J. Park. A global robust optimization using kriging based approximation model. JSME, 49, 2006.Google Scholar
- J. LeFevre, J. Sankaranarayanan, H. Hacigumus, J. Tatemura, N. Polyzotis, and M. J. Carey. Opportunistic physical design for big data analytics. In SIGMOD, 2014. Google ScholarDigital Library
- J. Li, A. C. König, V. Narasayya, and S. Chaudhuri. Robust estimation of resource consumption for sql queries using statistical techniques. PVLDB, 5(11), 2012. Google ScholarDigital Library
- W. Liang, H. Wang, and M. E. Orlowska. Materialized view selection under the maintenance time constraint. Data & Knowledge Engineering, 37(2), 2001. Google ScholarDigital Library
- S. S. Lightstone, G. Lohman, and D. Zilio. Toward autonomic computing with db2 universal database. SIGMOD Record, 31(3), 2002. Google ScholarDigital Library
- R. G. Lorenz and S. P. Boyd. Robust minimum variance beamforming. Signal Processing, IEEE Transactions on, 53(5), 2005. Google ScholarDigital Library
- C. Maier and et. al. Parinda: an interactive physical designer for postgresql. In ICEDT, 2010. Google ScholarDigital Library
- I. Mami and Z. Bellahsene. A survey of view selection methods. SIGMOD, 41(1), 2012. Google ScholarDigital Library
- V. Markl and et. al. Robust query processing through progressive optimization. In SIGMOD, 2004. Google ScholarDigital Library
- B. Mozafari, C. Curino, A. Jindal, and S. Madden. Performance and resource modeling in highly-concurrent OLTP workloads. In SIGMOD, 2013. Google ScholarDigital Library
- B. Mozafari, C. Curino, and S. Madden. Dbseer: Resource and performance prediction for building a next generation database cloud. In CIDR, 2013.Google Scholar
- B. Mozafari, E. Z. Y. Goh, and D. Y. Yoon. Cliffguard: An extended report. Technical report, University of Michigan, Ann Arbor, 2015.Google ScholarDigital Library
- G. Palermo, C. Silvano, and V. Zaccaria. Robust optimization of soc architectures: A multi-scenario approach. In Embedded Systems for Real-Time Multimedia, 2008.Google ScholarCross Ref
- S. Papadomanolakis and A. Ailamaki. An integer linear programming approach to database design. In ICDE Workshop, 2007. Google ScholarDigital Library
- D. Patil, S. Yun, S.-J. Kim, A. Cheung, M. Horowitz, and S. Boyd. A new method for design of robust digital circuits. In ISQED, 2005. Google ScholarDigital Library
- V. Raman and et. al. Constant-time query processing. In ICDE, 2008. Google ScholarDigital Library
- K. Schnaitter, S. Abiteboul, T. Milo, and N. Polyzotis. On-line index selection for shifting workloads. In ICDE Workshop, 2007. Google ScholarDigital Library
- A. Shukla, P. Deshpande, J. F. Naughton, et al. Materialized view selection for multidimensional datasets. In VLDB, volume 98, 1998. Google ScholarDigital Library
- Z. A. Talebi, R. Chirkova, and Y. Fathi. An integer programming approach for the view and index selection problem. Data & Knowledge Engineering, 83, 2013. Google ScholarDigital Library
- Q. T. Tran, I. Jimenez, R. Wang, N. Polyzotis, and A. Ailamaki. Rita: An index-tuning advisor for replicated databases. arXiv preprint arXiv:1304.1411, 2013.Google Scholar
- J. Tu, K. K. Choi, and Y. H. Park. A new study on reliability-based design optimization. Journal of Mechanical Design, 121(4), 1999.Google ScholarCross Ref
- G. Valentin and et. al. Db2 advisor: An optimizer smart enough to recommend its own indexes. In ICDE, 2000.Google ScholarCross Ref
- A. van der Vaart. Asymptotic statistics, volume 3. Cambridge university press, 2000.Google Scholar
- R. Varadarajan, V. Bharathan, A. Cary, J. Dave, and S. Bodagala. Dbdesigner: A customizable physical design tool for vertica analytic database. In ICDE, 2014.Google ScholarCross Ref
- J. X. Yu, X. Yao, C.-H. Choi, and G. Gou. Materialized view selection as constrained evolutionary optimization. Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on, 33(4), 2003. Google ScholarDigital Library
- K. Zeng, S. Gao, J. Gu, B. Mozafari, and C. Zaniolo. Abs: a system for scalable approximate queries with accuracy guarantees. In SIGMOD, 2014. Google ScholarDigital Library
- K. Zeng, S. Gao, B. Mozafari, and C. Zaniolo. The analytical bootstrap: a new method for fast error estimation in approximate query processing. In SIGMOD, 2014. Google ScholarDigital Library
- Y. Zhang. General robust-optimization formulation for nonlinear programming. JOTA, 132, 2007.Google Scholar
- D. C. Zilio and et. al. Recommending materialized views and indexes with the ibm db2 design advisor. In ICAC, 2004. Google ScholarDigital Library
Index Terms
- CliffGuard: A Principled Framework for Finding Robust Database Designs
Recommendations
Performance Evaluation of NoSQL Multi-Model Data Stores in Polyglot Persistence Applications
IDEAS '16: Proceedings of the 20th International Database Engineering & Applications SymposiumNoSQL data store systems have recently been introduced as alternatives to traditional relational database management systems. These data stores systems implement simpler and scalable data models that increase the performance and efficiency of a new kind ...
Storing XML (with XSD) in SQL Databases: Interplay of Logical and Physical Designs
Much of business XML data has accompanying XSD specifications. In many scenarios, "shredding such XML data into a relational storage is a popular paradigm. Optimizing evaluation of XPath queries over such XML data requires paying careful attention to ...
A performance evaluation of in-memory databases
The popularity of NoSQL databases has increased due to the need of (1) processing vast amount of data faster than the relational database management systems by taking the advantage of highly scalable architecture, (2) flexible (schema-free) data ...
Comments