Abstract
In the last decade, improvements on single-core performance of CPUs has stagnated. Consequently, methods for the development and optimization of software for these platforms have to be reconsidered. Software must be optimized such that the available single-core performance is exploited more effectively. This can be achieved by reducing the number of instructions that need to be executed. In this article, we show that layered database applications execute many redundant, nonessential, instructions that can be eliminated without affecting the course of execution and the output of the application. This elimination is performed using a vertical integration process which breaks down the different layers of layered database applications. By doing so, applications are being reduced to their essence, and as a consequence, transformations can be carried out that affect both the application code and the data access code which were not possible before. We show that this vertical integration process can be fully automated and, as such, be integrated in an operational workflow. Experimental evaluation of this approach shows that up to 95% of the instructions can be eliminated. The reduction of instructions leads to a more efficient use of the available hardware resources. This results in greatly improved performance of the application and a significant reduction in energy consumption.
- Frances E. Allen and John Cocke. 1976. A program data flow analysis procedure. Commun. ACM 19, 3 (March 1976), 137. Google ScholarDigital Library
- John R. Allen. 1983. Dependence analysis for subscripted variables and its applications to program transformations. Ph.D. Dissertation, Rice University. Google ScholarDigital Library
- Randy Allen and Ken Kennedy. 1987. Automatic translation of FORTRAN programs to vector form. ACM Trans. Program. Lang. Syst. 9, 4 (October 1987), 491--542. Google ScholarDigital Library
- Khalil Amiri, Sanghyun Park, Renu Tewari, and Sriram Padmanabhan. 2003. DBProxy: A dynamic data cache for Web applications. In Proceedings of the 19th International Conference on Data Engineering. IEEE Computer Society, 821--831. 1063-6382Google ScholarCross Ref
- Henrique Andrade, Suresh Aryangat, Tahsin Kurc, Joel Saltz, and Alan Sussman. 2004. Efficient execution of multi-query data analysis batches using compiler optimization strategies. In Languages and Compilers for Parallel Computing, Lawrence Rauchwerger (Ed.)., Lecture Notes in Computer Science, Vol. 2958, Springer, Berlin Heidelberg, 509--523.Google Scholar
- Mahendra Chavan, Ravindra Guravannavar, Karthik Ramachandra, and S. Sudarshan. 2011. DBridge: A program rewrite tool for set-oriented query execution. In Proceedings of the IEEE 27th International Conference on Data Engineering (ICDE). IEEE Computer Society, 1284--1287. Google ScholarDigital Library
- Alvin Cheung, Owen Arden, Samuel Madden, Armando Solar-Lezama, and Andrew C. Myers. 2013. StatusQuo: Making familiar abstractions perform using program analysis. In Proceedings of the Conference on Innovative Data Systems Research (CIDR).Google Scholar
- William R. Cook and Ali H. Ibrahim. 2005. Integrating programming languages and databases: What is the problem. ODBMS.ORG, Expert Article.Google Scholar
- EPA. 2007. Report to Congress on Server and Data Center Energy Efficiency. U.S. Environmental Protection Agency. (2007).Google Scholar
- Xiaobo Fan, Wolf-Dietrich Weber, and Luiz Andre Barroso. 2007. Power provisioning for a warehouse-sized computer. SIGARCH Comput. Archit. News 35, 2 (June 2007), 13--23. Google ScholarDigital Library
- Charles Garrod, Amit Manjhi, Bruce M. Maggs, Todd C. Mowry, and Anthony Tomasic. 2008. Holistic application analysis for update-independence. In Proceedings of the 2nd IEEE Workshop on Hot Topics in Web Systems and Technologies (HotWeb).Google Scholar
- Andreas Gawecki and Florian Matthes. 2000. Integrating query and program optimization using persistent CPS representations. In Fully Integrated Data Environments, Malcolm P. Atkinson and Ray Welland (Eds.), Esprit Basic Research Series 2000, Springer, Berlin Heidelberg, 496--501.Google Scholar
- Joseph (Yossi) Gil, and Keren Lenz. 2007. Eliminating impedance mismatch in C++. In Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB'07). VLDB Endowment, 1386--1389. Google ScholarDigital Library
- HipHop for PHP. 2012. HipHop for PHP Project. https://github.com/facebook/hiphop-php/wiki/. (Last accessed July 2012).Google Scholar
- Myong H. Kang, Henry G. Dietz, and Bharat K. Bhargava. 1994. Multiple-query optimization at algorithm-level. Data Knowl. Eng. 14, 1 (1994), 57--75. Google ScholarDigital Library
- Ken Kennedy. 1981. A Survey of Data Flow Analysis Techniques. Prentice-Hall, Englewood Cliffs, NJ. 5--54.Google Scholar
- Ken Kennedy and Kathryn McKinley. 1994. Maximizing loop parallelism and improving data locality via loop fusion and distribution. In Languages and Compilers for Parallel Computing, Utpal Banerjee, David Gelernter, Alex Nicolau, and David Padua (Eds.), Lecture Notes in Computer Science, Vol. 768, Springer, Berlin Heidelberg, 301--320. Google ScholarDigital Library
- Jonathan G. Koomey. 2011. Growth in data center electricity use 2005 to 2010. http://www.analyticspress.com/datacenters.html.Google Scholar
- Konstantinos Krikellas, Stratis D. Viglas, and Marcelo Cintra. 2010. Generating code for holistic query evaluation. In Proceedings of the IEEE 26th International Conference on Data Engineering (ICDE). IEEE Computer Society, 613--624.Google ScholarCross Ref
- David J. Kuck, Robert H. Kuhn, David A. Padua, Bruce Leasure, and Michael Wolfe. 1981. Dependence graphs and compiler optimizations. In Proceedings of the 8th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL'81). ACM, New York, NY, 207--218. Google ScholarDigital Library
- Daniel F. Lieuwen. 1998. Parallelizing loops in database programming languages. In Proceedings of the 14th International Conference on Data Engineering. 86--93. Google ScholarDigital Library
- Daniel F. Lieuwen and David J. DeWitt. 1992. A transformation-based approach to optimizing loops in database programming languages. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). 91--100. Google ScholarDigital Library
- Qiong Luo, Sailesh Krishnamurthy, C. Mohan, Hamid Pirahesh, Honguk Woo, Bruce G. Lindsay, and Jeffrey F. Naughton. 2002. Middle-tier database caching for e-business. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'02). ACM, New York, NY, 600--611. Google ScholarDigital Library
- David Maier. 1990. Representing Database Programs as Objects. ACM, New York, NY, 377--386. Google ScholarDigital Library
- Amit Manjhi, Charles Garrod, Bruce M. Maggs, Todd C. Mowry, and Anthony Tomasic. 2009. Holistic query transformations for dynamic web applications. In Proceedings of the IEEE International Conference on Data Engineering. IEEE Computer Society, 1175--1178. Google ScholarDigital Library
- Erik Meijer, Brian Beckman, and Gavin Bierman. 2006. LINQ: Reconciling object, relations and XML in the .NET framework. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, 706--706. Google ScholarDigital Library
- MonetDB. 2013. MonetDB Project. http://www.monetdb.org/.(Last accessed, February 2013).Google Scholar
- Thomas Neumann. 2011. Efficiently compiling efficient query plans for modern hardware. Proc. VLDB Endow. 4, 9 (June 2011), 539--550. Google ScholarDigital Library
- ObjectWeb Consortium. 2005. JMOB - RUBBoS Benchmark. http://jmob.ow2.org/rubbos.html. (Last accessed July 2012.)Google Scholar
- ObjectWeb Consortium. 2008. RUBiS - Home Page. http://rubis.ow2.org/. (Last accessed July 2012.)Google Scholar
- Oracle. 2011. Oracle In-Memory Database Cache. http://www.oracle.com/us/products/database/in-memory-database-cache-066510.html (Last accessed March 2011.)Google Scholar
- David A. Padua. and Michael J. Wolfe. 1986. Advanced compiler optimizations for supercomputers. Commun. ACM 29, 12 (Dec. 1986), 1184--1201. DOT:http://dx.doi.org/10.1145/7902.7904 Google ScholarDigital Library
- Christian Plattner, Gustavo Alonso, and M. Tamer Özsu. 2008. Extending DBMSs with satellite databases. VLDB J. 17, 4 (July 2008), 657--682. Google ScholarDigital Library
- Kristian F. D. Rietveld and Harry A. G. Wijshoff. 2013a. Quantifying energy usage in data centers through instruction-count overhead. In Proceedings of the 2nd International Conference on Smart Grids and Green IT Systems (SMARTGREENS'13). 189--198.Google Scholar
- Kristian F. D. Rietveld and Harry A. G. Wijshoff. 2013b. To cache or not to cache: A trade-off analysis for locally cached database systems. In Proceedings of the ACM International Conference on Computing Frontiers. 31. Google ScholarDigital Library
- Kristian F. D. Rietveld and Harry A. G. Wijshoff. 2015. Re-engineering compiler transformations to outperform database query optimizers. In Languages and Compilers for Parallel Computing, James Brodman and Peng Tu (Eds.), Lecture Notes in Computer Science, Vol. 8967, Springer International Publishing, 300--314.Google Scholar
- Swaminathan Sivasubramanian, Guillaume Pierre, Maarten van Steen, and Gustavo Alonso. 2007. Analysis of caching and replication strategies for web applications. IEEE Int. Comput. 11, 1 (2007), 60--66. Google ScholarDigital Library
- TPC. 2009. TPC-H. Transaction Processing Performance Council. http://tpc.org/tpch/default.asp. (Last accessed May 2009.)Google Scholar
- Michael Wolfe. 1988. Vector optimization vs vectorization. J. Parallel Distrib. Comput. 5, 5 (1988), 551--567. Google ScholarDigital Library
- Guoqing Xu, Matthew Arnold, Nick Mitchell, Atanas Rountev, and Gary Sevitsky. 2009. Go with the flow: Profiling copies to find runtime bloat. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'09). ACM, New York, NY, 419--430. Google ScholarDigital Library
- David P. Yach, James D. Graham, and Anthony F. Scian. 2002. Database system with methodology for accessing a database from portable devices. U.S. Patent 6341288, Filed July 29, 1998, and issued January 22, 2002.Google Scholar
- Hans Zima and Barbara Chapman. 1991. Supercompilers for Parallel and Vector Computers. ACM, New York, NY. Google Scholar
Index Terms
Reducing Layered Database Applications to their Essence through Vertical Integration
Recommendations
Testing database applications
SIGMOD '06: Proceedings of the 2006 ACM SIGMOD international conference on Management of dataTesting database application is challenging because most methods and tools developed for application testing do not consider the database state during the test. In this paper we demonstrate three different tools for testing database applications: HTDGen,...
A Static Analysis Framework for Database Applications
ICDE '09: Proceedings of the 2009 IEEE International Conference on Data EngineeringDatabase developers today use data access APIs such as ADO.NET to execute SQL queries from their application. These applications often have security problems such as SQL injection vulnerabilities and performance problems such as poorly written SQL ...
Automatic Parallelization of GPU Applications Using OpenCL
APCASE '15: Proceedings of the 2015 Asia-Pacific Conference on Computer Aided System EngineeringGraphics Processing Units (GPUs) have been successfully used to accelerate scientific applications due to their computation power and the availability of programming languages that make more approachable writing scientific applications for GPUs. However,...
Comments