Understanding and Combating Memory Bloat in Managed Data-Intensive Systems

Authors:
Khanh Nguyen

University of California, Irvine, CA

University of California, Irvine, CA
View Profile

,
Kai Wang

University of California, Irvine, CA

University of California, Irvine, CA
View Profile

,
Yingyi Bu

University of California, Irvine, CA

University of California, Irvine, CA
View Profile

,
Lu Fang

University of California, Irvine, CA

University of California, Irvine, CA
View Profile

,
Guoqing Xu

University of California, Irvine, CA

University of California, Irvine, CA
View Profile

ACM Transactions on Software Engineering and Methodology Volume 26 Issue 4Article No.: 12pp 1–41https://doi.org/10.1145/3162626

Published:03 January 2018Publication History

ACM Transactions on Software Engineering and Methodology

Abstract

The past decade has witnessed increasing demands on data-driven business intelligence that led to the proliferation of data-intensive applications. A managed object-oriented programming language such as Java is often the developer’s choice for implementing such applications, due to its quick development cycle and rich suite of libraries and frameworks. While the use of such languages makes programming easier, their automated memory management comes at a cost. When the managed runtime meets large volumes of input data, memory bloat is significantly magnified and becomes a scalability-prohibiting bottleneck.

This article first studies, analytically and empirically, the impact of bloat on the performance and scalability of large-scale, real-world data-intensive systems. To combat bloat, we design a novel compiler framework, called Facade, that can generate highly efficient data manipulation code by automatically transforming the data path of an existing data-intensive application. The key treatment is that in the generated code, the number of runtime heap objects created for data classes in each thread is (almost) statically bounded, leading to significantly reduced memory management cost and improved scalability. We have implemented Facade and used it to transform seven common applications on three real-world, already well-optimized data processing frameworks: GraphChi, Hyracks, and GPS. Our experimental results are very positive: the generated programs have (1) achieved a 3% to 48% execution time reduction and an up to 88× GC time reduction, (2) consumed up to 50% less memory, and (3) scaled to much larger datasets.

References

Foto N. Afrati and Jeffrey D. Ullman. 2010. Optimizing joins in a map-reduce environment. In International Conference on Extending Database Technology (EDBT’10). 99--110. Google ScholarDigital Library
Parag Agrawal, Daniel Kifer, and Christopher Olston. 2008. Scheduling shared scans of large data files. Proc. VLDB Endow. 1, 1 (2008), 958--969. Google ScholarDigital Library
Alexander Aiken, Manuel Fähndrich, and Raph Levien. 1995. Better static memory management: Improving region-based analysis of higher-order languages. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’95). 174--185. Google ScholarDigital Library
Erik Altman, Matthew Arnold, Stephen Fink, and Nick Mitchell. 2010. Performance analysis of idle programs. In ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA’10). 739--753. Google ScholarDigital Library
Apache 2014a. Apache Flink. Retrieved from http://flink.apache.org/.Google Scholar
Apache 2014b. Giraph: Open-source implementation of Pregel. Retrieved from http://incubator.apache.org/giraph/.Google Scholar
Apache 2014c. Hadoop: Open-source implementation of MapReduce. Retrieved from http://hadoop.apache.org.Google Scholar
Apache 2014d. The Hive Project. Retrieved from http://hive.apache.org/.Google Scholar
Apache 2014e. The Mahout Project. Retrieved from http://mahout.apache.org/.Google Scholar
Azul. 2014. Zing: Java for the real time business. Retrieved from http://www.azulsystems.com/products/zing/whatisit.Google Scholar
Godmar Back and Wilson C. Hsieh. 2005. The Kaffeos Java runtime system. ACM Trans. Program. Lang. Syst. (TOPLAS) 27, 4 (2005), 583--630. Google ScholarDigital Library
Gaurav Banga, Peter Druschel, and Jeffrey C. Mogul. 1999. Resource containers: A new facility for resource management in server systems. In USENIX Symposium on Operating Systems Design and Implementation (OSDI’99). 45--58. Google ScholarDigital Library
William S. Beebee and Martin C. Rinard. 2001. An implementation of scoped memory for real-time Java. In International Conference on Embedded Software (EMSOFT’01). 289--305. Google ScholarDigital Library
Stephen M. Blackburn and Kathryn S. McKinley. 2008. Immix: A mark-region garbage collector with space efficiency, fast collection, and mutator performance. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’08). 22--32. Google ScholarDigital Library
B. Blanchet. 1999. Escape analysis for object-oriented languages. applications to Java. In ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA’99). 20--34. Google ScholarDigital Library
Vinayak R. Borkar, Michael J. Carey, Raman Grover, Nicola Onose, and Rares Vernica. 2011. Hyracks: A flexible and extensible foundation for data-intensive computing. In International Conference on Data Engineering (ICDE’11). 1151--1162. Google ScholarDigital Library
Chandrasekhar Boyapati, Alexandru Salcianu, William Beebee, Jr., and Martin Rinard. 2003. Ownership types for safe region-based memory management in real-time Java. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’03). 324--337. Google ScholarDigital Library
Yingyi Bu, Vinayak Borkar, Guoqing Xu, and Michael J. Carey. 2013. A bloat-aware design for big data applications. In ACM SIGNPLAN International Symposium on Memory Management (ISMM’13). 119--130. Google ScholarDigital Library
Cascading. 2015. The Cascading Ecosystem. Retrieved from http://www.cascading.org.Google Scholar
Ronnie Chaiken, Bob Jenkins, Per-Åke Larson, Bill Ramsey, Darren Shakib, Simon Weaver, and Jingren Zhou. 2008. SCOPE: Easy and efficient parallel processing of massive data sets. Proc. VLDB Endow. 1, 2 (2008), 1265--1276. Google ScholarDigital Library
Craig Chambers, Ashish Raniwala, Frances Perry, Stephen Adams, Robert R. Henry, Robert Bradshaw, and Nathan Weizenbaum. 2010. FlumeJava: Easy, efficient data-parallel pipelines. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’10). 363--375. Google ScholarDigital Library
Jong-Deok Choi, Manish Gupta, Mauricio Serrano, Vugranam C. Sreedhar, and Sam Midkiff. 1999. Escape analysis for Java. In ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA’99). 1--19. Google ScholarDigital Library
CMU. 2015. Out of memory error in efficient sharded positional indexer. Retrieved from http://www.cs.cmu.edu/&sim;lezhao/TA/2010/HW2/.Google Scholar
Tyson Condie, Neil Conway, Peter Alvaro, Joseph M. Hellerstein, Khaled Elmeleegy, and Russell Sears. 2010. MapReduce online. In USENIX Symposium on Networked Systems Design and Implementation (NSDI’10). 21--21. Google ScholarDigital Library
Cplusplus. 2015. Why is Java more popular than C++. Retrieved from http://www.cplusplus.com/forum/general/79656/.Google Scholar
DataBricks. 2015. Project Tungsten. Retrieved from https://databricks.com/blog/2015/04/28/project-tungsten-bringing-spark-closer-to-bare-metal.html.Google Scholar
Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: Simplified data processing on large clusters. Commun. ACM 51, 1 (2008), 107--113. Google ScholarDigital Library
Jens Dittrich, Jorge-Arnulfo Quiané-Ruiz, Alekh Jindal, Yagiz Kargin, Vinay Setty, and Jörg Schad. 2010. Hadoop++: Making a yellow elephant run like a cheetah (without it even noticing). Proc. VLDB Endow. 3 (2010), 515--529. Google ScholarDigital Library
Julian Dolby and Andrew Chien. 2000. An automatic object inlining optimization and its evaluation. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’00). 345--357. Google ScholarDigital Library
Bruno Dufour, Barbara G. Ryder, and Gary Sevitsky. 2008. A scalable technique for characterizing the usage of temporaries in framework-intensive Java applications. In ACM SIGSOFT International Symposium on the Foundations of Software Engineering (FSE’08). 59--70. Google ScholarDigital Library
Lu Fang, Khanh Nguyen, Guoqing Xu, Brian Demsky, and Shan Lu. 2015. Interruptible tasks: Treating memory pressure as interrupts for highly scalable data-parallel programs. In ACM Symposium on Operating Systems Principles (SOSP’15). 394--409. Google ScholarDigital Library
Kathleen Fisher, Yitzhak Mandelbaum, and David Walker. 2006. The next 700 data description languages. In ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’06). 2--15. Google ScholarDigital Library
Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides. 1995. Design Patterns: Elements of Reusable Object-Oriented Software. Addison-Wesley. Google ScholarDigital Library
David Gay and Alex Aiken. 1998. Memory management with explicit regions. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’98). 313--323. Google ScholarDigital Library
David Gay and Alex Aiken. 2001. Language support for regions. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’01). 70--80. Google ScholarDigital Library
Lokesh Gidra, Gaël Thomas, Julien Sopena, Marc Shapiro, and Nhan Nguyen. 2015. NumaGiC: A garbage collector for big data on big NUMA machines. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’15). 661--673. Google ScholarDigital Library
Ionel Gog, Jana Giceva, Malte Schwarzkopf, Kapil Vaswani, Dimitrios Vytiniotis, Ganesan Ramalingam, Manuel Costa, Derek G. Murray, Steven Hand, and Michael Isard. 2015. Broom: Sweeping out garbage collection from big data systems. In 15th USENIX Workshop on Hot Topics in Operating Systems. Google ScholarDigital Library
Goetz Graefe. 1993. Query evaluation techniques for large databases. ACM Comput. Surv. 25, 2 (1993), 73--170. Google ScholarDigital Library
Dan Grossman, Greg Morrisett, Trevor Jim, Michael Hicks, Yanling Wang, and James Cheney. 2002. Region-based memory management in cyclone. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’02). 282--293. Google ScholarDigital Library
Zhenyu Guo, Xuepeng Fan, Rishan Chen, Jiaxing Zhang, Hucheng Zhou, Sean McDirmid, Chang Liu, Wei Lin, Jingren Zhou, and Lidong Zhou. 2012. Spotting code optimizations in data-parallel pipelines through PeriSCOPE. In USENIX Symposium on Operating Systems Design and Implementation (OSDI’12). 121--133. Google ScholarDigital Library
Samuel Z. Guyer, Kathryn S. McKinley, and Daniel Frampton. 2006. Free-Me: A static analysis for automatic individual object reclamation. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’06). 364--375. Google ScholarDigital Library
Niels Hallenberg, Martin Elsman, and Mads Tofte. 2002. Combining region inference and garbage collection. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’02). 141--152. Google ScholarDigital Library
Chris Hawblitzel and Thorsten von Eicken. 2002. Luna: A flexible Java protection system. In USENIX Symposium on Operating Systems Design and Implementation (OSDI’02). 391--403. Google ScholarDigital Library
Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko Borisov, Liang Dong, Fatma Bilgen Cetin, and Shivnath Babu. 2011. Starfish: A self-tuning system for big data analytics. In Conference on Innovative Data Systems Research (CIDR). 261--272.Google Scholar
Michael Hicks, Greg Morrisett, Dan Grossman, and Trevor Jim. 2004. Experience with safe manual memory-management in cyclone. In ACM SIGNPLAN International Symposium on Memory Management (ISMM’04). 73--84. Google ScholarDigital Library
Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly. 2007. Dryad: Distributed data-parallel programs from sequential building blocks. In European Conference on Computer Systems (EuroSys’07). 59--72. Google ScholarDigital Library
Sumant Kowshik, Dinakar Dhurjati, and Vikram Adve. 2002. Ensuring code safety without runtime checks for real-time control systems. In International Conference on Architecture and Synthesis for Embedded Systems (CASES’02). 288--297. Google ScholarDigital Library
Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. 2010. What is twitter, a social network or a news media? In International World Wide Web Conference (WWW’10). 591--600. Google ScholarDigital Library
Aapo Kyrola, Guy Blelloch, and Carlos Guestrin. 2012. GraphChi: Large-scale graph computation on just a PC. In USENIX Symposium on Operating Systems Design and Implementation (OSDI’12). 31--46. Google ScholarDigital Library
Chris Lattner. 2005. Macroscopic Data Structure Analysis and Optimization. Ph.D. Dissertation. University of Illinois at Urbana-Champaign. Google ScholarDigital Library
Chris Lattner and Vikram Adve. 2005. Automatic pool allocation: Improving performance by controlling data structure layout in the heap. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’05). 129--142. Google ScholarDigital Library
Chris Lattner, Andrew Lenharth, and Vikram Adve. 2007. Making context-sensitive points-to analysis with heap cloning practical for the real world. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’07). 278--289. Google ScholarDigital Library
Rubao Lee, Tian Luo, Yin Huai, Fusheng Wang, Yongqiang He, and Xiaodong Zhang. 2011. YSmart: Yet another SQL-to-MapReduce translator. In IEEE International Conference on Distributed Computing Systems (ICDCS’11). 25--36. Google ScholarDigital Library
Ondřej Lhoták and Laurie Hendren. 2003. Scaling Java points-to analysis using SPARK. In International Conference on Compiler Construction (CC’03). 153--169. Google ScholarDigital Library
Ondrej Lhotak and Laurie Hendren. 2005. Run-time evaluation of opportunities for object inlining in Java. Concurrency Comput. Practice Exper. 17, 5--6 (2005), 515--537. Google ScholarDigital Library
Jun Liu, Nishkam Ravi, Srimat Chakradhar, and Mahmut Kandemir. 2012. Panacea: Towards holistic optimization of mapreduce applications. In International Symposium on Code Generation and Optimization (CGO’12). 33--43. Google ScholarDigital Library
Martin Maas, Tim Harris, Krste Asanovic, and John Kubiatowicz. 2015. Trash day: Coordinating garbage collection in distributed systems. In 5th USENIX Workshop on Hot Topics in Operating Systems. Google ScholarDigital Library
Martin Maas, Tim Harris, Krste Asanovic, and John Kubiatowicz. 2016. Holly: A multi-node language runtime system for coordinating distributed managed language applications. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’16).Google Scholar
Henning Makholm. 2000. A region-based memory manager for prolog. In ACM SIGNPLAN International Symposium on Memory Management (ISMM’00). 25--34. Google ScholarDigital Library
Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: A system for large-scale graph processing. In ACM SIGMOD International Conference on Management of Data (SIGMOD’10). 135--146. Google ScholarDigital Library
Yitzhak Mandelbaum, Kathleen Fisher, David Walker, Mary F. Fernández, and Artem Gleyzer. 2007. PADS/ML: A functional data description language. In ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’07). 77--83. Google ScholarDigital Library
McGill. 2014. Soot framework. Retrieved from http://www.sable.mcgill.ca/soot/.Google Scholar
Nick Mitchell, Edith Schonberg, and Gary Sevitsky. 2009. Making sense of large heaps. In European Conference on Object-Oriented Programming (ECOOP’09). 77--97. Google ScholarDigital Library
Nick Mitchell, Edith Schonberg, and Gary Sevitsky. 2010. Four trends leading to Java runtime bloat. IEEE Software 27, 1 (2010), 56--63. Google ScholarDigital Library
Nick Mitchell and Gary Sevitsky. 2007. The causes of bloat, the limits of health. In ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA’07). 245--260. Google ScholarDigital Library
Nick Mitchell, Gary Sevitsky, and Harini Srinivasan. 2006. Modeling runtime behavior in framework-based applications. In European Conference on Object-Oriented Programming (ECOOP’06). 429--451. Google ScholarDigital Library
Mozilla. 2014. The Rust programming language. Retrieved from http://www.rust-lang.org/.Google Scholar
Derek Gordon Murray, Michael Isard, and Yuan Yu. 2011. Steno: Automatic optimization of declarative queries. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’11). 121--131. Google ScholarDigital Library
Khanh Nguyen, Kai Wang, Yingyi Bu, Lu Fang, Jianfei Hu, and Guoqing Xu. 2015. Facade: A compiler and runtime for (almost) object-bounded big data applications. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’15). 675--690. Google ScholarDigital Library
Khanh Nguyen and Guoqing Xu. 2013. Cachetor: Detecting cacheable data to remove bloat. In ACM SIGSOFT International Symposium on the Foundations of Software Engineering (FSE’13). 268--278. Google ScholarDigital Library
Tomasz Nykiel, Michalis Potamias, Chaitanya Mishra, George Kollios, and Nick Koudas. 2010. MRShare: Sharing across multiple queries in MapReduce. Proc. VLDB Endow. 3, 1--2 (2010), 494--505. Google ScholarDigital Library
Christopher Olston, Benjamin Reed, Adam Silberstein, and Utkarsh Srivastava. 2008a. Automatic optimization of parallel dataflow programs. In USENIX USENIX Annual Technical Conference (ATC’08). 267--273. Google ScholarDigital Library
Christopher Olston, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, and Andrew Tomkins. 2008b. Pig latin: A not-so-foreign language for data processing. In ACM SIGMOD International Conference on Management of Data (SIGMOD’08). 1099--1110. Google ScholarDigital Library
Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. The PageRank Citation Ranking: Bringing Order to the Web.Technical Report 1999-66. Stanford InfoLab. Retrieved from http://ilpubs.stanford.edu:8090/422/.Google Scholar
Rob Pike, Sean Dorward, Robert Griesemer, and Sean Quinlan. 2005. Interpreting the data: Parallel analysis with Sawzall. Sci. Program. 13, 4 (2005), 277--298. Google ScholarDigital Library
Quora. 2015. For Big Data, Java or C++. Retrieved from https://www.quora.com/For-big-data-Java-or-C++.Google Scholar
Semih Salihoglu and Jennifer Widom. 2013. GPS: A graph processing system. In Scientific and Statistical Database Management (SSDBM’13). 22:1--22:12. Google ScholarDigital Library
Ajeet Shankar, Matthew Arnold, and Rastislav Bodik. 2008. JOLT: Lightweight dynamic analysis and removal of object churn. In ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA’08). 127--142. Google ScholarDigital Library
Yefim Shuf, Manish Gupta, Rajesh Bordawekar, and Jaswinder Pal Singh. 2002. Exploiting prolific types for memory management and optimizations. In ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’02). 295--306. Google ScholarDigital Library
Spark User List. 2014. Help understanding - Not enough space to cache RDD. Retrieved from http://apache-spark-user-list.1001560.n3.nabble.com/Help-understanding-Not-enough-space-to-cache-rdd-td20186.html.Google Scholar
StackExchange. 2015. Choose C++ or Java for applications requiring huge amounts of RAM? Retrieved from http://programmers.stackexchange.com/questions/130108/choose-c-or-java-for-applications-requiring-huge-amounts-of-ram.Google Scholar
StackOverflow. 2015a. Out of memory error due to appending values to StringBuilder. Retrieved from http://stackoverflow.com/questions/12831076/.Google Scholar
StackOverflow. 2015b. Out of memory error due to large spill buffer. Retrieved from http://stackoverflow.com/questions/8464048/.Google Scholar
StackOverflow. 2015c. Out of memory error in a web parser. Retrieved from http://stackoverflow.com/questions/17707883/.Google Scholar
StackOverflow. 2015d. Out of memory error in building inverted index. Retrieved from http://stackoverflow.com/questions/17980491/.Google Scholar
StackOverflow. 2015e. Out of memory error in computing frequencies of attribute values. Retrieved from http://stackoverflow.com/questions/23042829/.Google Scholar
StackOverflow. 2015f. Out of memory error in customer review processing. Retrieved from http://stackoverflow.com/questions/20247185/.Google Scholar
StackOverflow. 2015g. Out of memory error in hash join using DistributedCache. Retrieved from http://stackoverflow.com/questions/15316539/.Google Scholar
StackOverflow. 2015h. Out of memory error in map-side aggregation. Retrieved from http://stackoverflow.com/questions/16684712/.Google Scholar
StackOverflow. 2015i. Out of memory error in matrix multiplication. Retrieved from http://stackoverflow.com/questions/16116022/.Google Scholar
StackOverflow. 2015j. Out of memory error in processing a text file as a record. Retrieved from http://stackoverflow.com/questions/12466527/.Google Scholar
StackOverflow. 2015k. Out of memory error in word cooccurrence matrix stripes builder. Retrieved from http://stackoverflow.com/questions/12831076/.Google Scholar
StackOverflow. 2015l. The performance comparison between in-mapper combiner and regular combiner. Retrieved from http://stackoverflow.com/questions/10925840/.Google Scholar
StackOverflow. 2015m. Reducer hang at the merge step. Retrieved from http://stackoverflow.com/questions/15541900/. (2015).Google Scholar
StackOverflow. 2015n. Spark worker insufficient memory. Retrieved from http://stackoverflow.com/questions/31830834/spark-worker-insufficient-memory.Google Scholar
Ashish Thusoo, Joydeep Sen Sarma, Namit Jain, Zheng Shao, Prasad Chakka, Suresh Anthony, Hao Liu, Pete Wyckoff, and Raghotham Murthy. 2009. Hive: A warehousing solution over a map-reduce framework. Proc. VLDB Endow. 2, 2 (2009), 1626--1629. Google ScholarDigital Library
Ashish Thusoo, Joydeep Sen Sarma, Namit Jain, Zheng Shao, Prasad Chakka, Ning Zhang, Suresh Antony, Hao Liu, and Raghotham Murthy. 2010. Hive - A petabyte scale data warehouse using Hadoop. In International Conference on Data Engineering (ICDE’10). 996--1005.Google ScholarCross Ref
Mads Tofte and Jean-Pierre Talpin. 1994. Implementation of the typed call-by-value lamda-calculus using a stack of regions. In ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’94). 188--201. Google ScholarDigital Library
Twitter. 2014. Storm: dstributed and fault-tolerant realtime computation. Retrieved from https://github.com/nathanmarz/storm.Google Scholar
UCI. 2014. Hyracks: A data parallel platform. Retrieved from http://code.google.com/p/hyracks/.Google Scholar
UCI. 2015a. Algebricks. Retrieved from https://code.google.com/p/hyracks/source/browse/#git%2Ffullstack%2Falgebricks.Google Scholar
UCI. 2015b. AsterixDB. Retrieved from https://code.google.com/p/asterixdb/wiki/AsterixAlphaRelease.Google Scholar
UCI. 2015c. Hivesterix. Retrieved from http://hyracks.org/projects/hivesterix/.Google Scholar
UCI. 2015d. Pregelix. Retrieved from http://hyracks.org/projects/pregelix/.Google Scholar
UCI. 2015e. VXQuery. Retrieved from http://incubator.apache.org/vxquery/.Google Scholar
Raja Vallée-Rai, Etienne Gagnon, Laurie Hendren, Patrick Lam, Patrice Pominville, and Vijay Sundaresan. 2000. Optimizing Java bytecode using the soot framework: Is it feasible? In International Conference on Compiler Construction (CC’00). 18--34. Google ScholarDigital Library
Guoqing Xu. 2012. Finding reusable data structures. In ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA’12). 1017--1034. Google ScholarDigital Library
Guoqing Xu. 2013a. CoCo: Sound and adaptive replacement of Java collections. In European Conference on Object-Oriented Programming (ECOOP’13). 1--26. Google ScholarDigital Library
Guoqing Xu. 2013b. Resurrector: A tunable object lifetime profiling technique for optimizing real-world programs. In ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA’13). 111--130. Google ScholarDigital Library
Guoqing Xu, Matthew Arnold, Nick Mitchell, Atanas Rountev, Edith Schonberg, and Gary Sevitsky. 2010a. Finding low-utility data structures. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’10). 174--186. Google ScholarDigital Library
Guoqing Xu, Matthew Arnold, Nick Mitchell, Atanas Rountev, and Gary Sevitsky. 2009. Go with the flow: Profiling copies to find runtime bloat. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’09). 419--430. Google ScholarDigital Library
Guoqing Xu, Nick Mitchell, Matthew Arnold, Atanas Rountev, Edith Schonberg, and Gary Sevitsky. 2014. Scalable runtime bloat detection using abstract dynamic slicing. ACM Trans. Softw. Eng. Methodol. 23, 3, Article 23 (June 2014), 50 pages. Google ScholarDigital Library
Guoqing Xu, Nick Mitchell, Matthew Arnold, Atanas Rountev, and Gary Sevitsky. 2010b. Software bloat analysis: Finding, removing, and preventing performance problems in modern large-scale object-oriented applications. In ACM SIGSOFT FSE/SDP Working Conference on the Future of Software Engineering Research (FoSER’10). 421--426. Google ScholarDigital Library
Guoqing Xu and Atanas Rountev. 2008. Precise memory leak detection for Java software using container profiling. In International Conference on Software Engineering (ICSE). 151--160. Google ScholarDigital Library
Guoqing Xu and Atanas Rountev. 2010. Detecting inefficiently-used containers to avoid bloat. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’10). 160--173. Google ScholarDigital Library
Guoqing Xu, Dacong Yan, and Atanas Rountev. 2012. Static detection of loop-invariant data structures. In European Conference on Object-Oriented Programming (ECOOP’12). 738--763. Google ScholarDigital Library
Yahoo. 2014. Yahoo&excl; Webscope program. Retrieved from http://webscope.sandbox.yahoo.com/.Google Scholar
Dacong Yan, Guoqing Xu, and Atanas Rountev. 2012. Uncovering performance problems in Java applications with reference propagation profiling. In International Conference on Software Engineering (ICSE). 134--144. Google ScholarDigital Library
Hung-chih Yang, Ali Dasdan, Ruey-Lung Hsiao, and D. Stott Parker. 2007. Map-reduce-merge: Simplified relational data processing on large clusters. In ACM SIGMOD International Conference on Management of Data (SIGMOD’07). 1029--1040. Google ScholarDigital Library
Yuan Yu, Michael Isard, Dennis Fetterly, Mihai Budiu, Úlfar Erlingsson, Pradeep Kumar Gunda, and Jon Currey. 2008. DryadLINQ: A system for general-purpose distributed data-parallel computing using a high-level language. In USENIX Symposium on Operating Systems Design and Implementation (OSDI’08). 1--14. Google ScholarDigital Library
Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In USENIX Symposium on Networked Systems Design and Implementation (NSDI’12). USENIX Association, 2. Google ScholarDigital Library
Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2010. Spark: Cluster computing with working sets. In USENIX Conference on Hot Topics in Cloud Computing (HotCloud’10). 10. Google ScholarDigital Library
Nickolai Zeldovich, Silas Boyd-Wickizer, Eddie Kohler, and David Mazières. 2006. Making information flow explicit in hiStar. In USENIX Symposium on Operating Systems Design and Implementation (OSDI’06). 263--278. Google ScholarDigital Library
Jingren Zhou, Per-Åke Larson, and Ronnie Chaiken. 2010. Incorporating partitioning and parallel plans into the SCOPE optimizer. In International Conference on Data Engineering (ICDE’10). 1060--1071.Google ScholarCross Ref

Index Terms

Understanding and Combating Memory Bloat in Managed Data-Intensive Systems
1. Software and its engineering
  1. Software notations and tools
    1. Compilers
      1. Runtime environments
      2. Source code generation

Recommendations

FACADE: A Compiler and Runtime for (Almost) Object-Bounded Big Data Applications
ASPLOS '15

The past decade has witnessed the increasing demands on data-driven business intelligence that led to the proliferation of data-intensive applications. A managed object-oriented programming language such as Java is often the developer's choice for ...
Read More
FACADE: A Compiler and Runtime for (Almost) Object-Bounded Big Data Applications
ASPLOS'15

The past decade has witnessed the increasing demands on data-driven business intelligence that led to the proliferation of data-intensive applications. A managed object-oriented programming language such as Java is often the developer's choice for ...
Read More
FACADE: A Compiler and Runtime for (Almost) Object-Bounded Big Data Applications
ASPLOS '15: Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems

The past decade has witnessed the increasing demands on data-driven business intelligence that led to the proliferation of data-intensive applications. A managed object-oriented programming language such as Java is often the developer's choice for ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Software Engineering and Methodology Volume 26, Issue 4
October 2017
128 pages
ISSN:1049-331X
EISSN:1557-7392
DOI:10.1145/3177744
Editor:
David S. Rosenblum
National University of Singapore, Singapore
Issue’s Table of Contents
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 3 January 2018
- Revised: 1 October 2017
- Accepted: 1 October 2017
- Received: 1 July 2016
Published in tosem Volume 26, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Big data
managed languages
memory management
performance optimization
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 4
  Total Citations
  View Citations
- 778
  Total Downloads
- Downloads (Last 12 months)110
- Downloads (Last 6 weeks)23
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Understanding and Combating Memory Bloat in Managed Data-Intensive Systems

ACM Transactions on Software Engineering and Methodology

Abstract

References

Cited By

Index Terms

Recommendations

FACADE: A Compiler and Runtime for (Almost) Object-Bounded Big Data Applications

FACADE: A Compiler and Runtime for (Almost) Object-Bounded Big Data Applications

FACADE: A Compiler and Runtime for (Almost) Object-Bounded Big Data Applications