skip to main content
research-article
Public Access

Understanding and Combating Memory Bloat in Managed Data-Intensive Systems

Published:03 January 2018Publication History
Skip Abstract Section

Abstract

The past decade has witnessed increasing demands on data-driven business intelligence that led to the proliferation of data-intensive applications. A managed object-oriented programming language such as Java is often the developer’s choice for implementing such applications, due to its quick development cycle and rich suite of libraries and frameworks. While the use of such languages makes programming easier, their automated memory management comes at a cost. When the managed runtime meets large volumes of input data, memory bloat is significantly magnified and becomes a scalability-prohibiting bottleneck.

This article first studies, analytically and empirically, the impact of bloat on the performance and scalability of large-scale, real-world data-intensive systems. To combat bloat, we design a novel compiler framework, called Facade, that can generate highly efficient data manipulation code by automatically transforming the data path of an existing data-intensive application. The key treatment is that in the generated code, the number of runtime heap objects created for data classes in each thread is (almost) statically bounded, leading to significantly reduced memory management cost and improved scalability. We have implemented Facade and used it to transform seven common applications on three real-world, already well-optimized data processing frameworks: GraphChi, Hyracks, and GPS. Our experimental results are very positive: the generated programs have (1) achieved a 3% to 48% execution time reduction and an up to 88× GC time reduction, (2) consumed up to 50% less memory, and (3) scaled to much larger datasets.

References

  1. Foto N. Afrati and Jeffrey D. Ullman. 2010. Optimizing joins in a map-reduce environment. In International Conference on Extending Database Technology (EDBT’10). 99--110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Parag Agrawal, Daniel Kifer, and Christopher Olston. 2008. Scheduling shared scans of large data files. Proc. VLDB Endow. 1, 1 (2008), 958--969. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Alexander Aiken, Manuel Fähndrich, and Raph Levien. 1995. Better static memory management: Improving region-based analysis of higher-order languages. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’95). 174--185. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Erik Altman, Matthew Arnold, Stephen Fink, and Nick Mitchell. 2010. Performance analysis of idle programs. In ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA’10). 739--753. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Apache 2014a. Apache Flink. Retrieved from http://flink.apache.org/.Google ScholarGoogle Scholar
  6. Apache 2014b. Giraph: Open-source implementation of Pregel. Retrieved from http://incubator.apache.org/giraph/.Google ScholarGoogle Scholar
  7. Apache 2014c. Hadoop: Open-source implementation of MapReduce. Retrieved from http://hadoop.apache.org.Google ScholarGoogle Scholar
  8. Apache 2014d. The Hive Project. Retrieved from http://hive.apache.org/.Google ScholarGoogle Scholar
  9. Apache 2014e. The Mahout Project. Retrieved from http://mahout.apache.org/.Google ScholarGoogle Scholar
  10. Azul. 2014. Zing: Java for the real time business. Retrieved from http://www.azulsystems.com/products/zing/whatisit.Google ScholarGoogle Scholar
  11. Godmar Back and Wilson C. Hsieh. 2005. The Kaffeos Java runtime system. ACM Trans. Program. Lang. Syst. (TOPLAS) 27, 4 (2005), 583--630. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Gaurav Banga, Peter Druschel, and Jeffrey C. Mogul. 1999. Resource containers: A new facility for resource management in server systems. In USENIX Symposium on Operating Systems Design and Implementation (OSDI’99). 45--58. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. William S. Beebee and Martin C. Rinard. 2001. An implementation of scoped memory for real-time Java. In International Conference on Embedded Software (EMSOFT’01). 289--305. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Stephen M. Blackburn and Kathryn S. McKinley. 2008. Immix: A mark-region garbage collector with space efficiency, fast collection, and mutator performance. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’08). 22--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. B. Blanchet. 1999. Escape analysis for object-oriented languages. applications to Java. In ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA’99). 20--34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Vinayak R. Borkar, Michael J. Carey, Raman Grover, Nicola Onose, and Rares Vernica. 2011. Hyracks: A flexible and extensible foundation for data-intensive computing. In International Conference on Data Engineering (ICDE’11). 1151--1162. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Chandrasekhar Boyapati, Alexandru Salcianu, William Beebee, Jr., and Martin Rinard. 2003. Ownership types for safe region-based memory management in real-time Java. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’03). 324--337. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Yingyi Bu, Vinayak Borkar, Guoqing Xu, and Michael J. Carey. 2013. A bloat-aware design for big data applications. In ACM SIGNPLAN International Symposium on Memory Management (ISMM’13). 119--130. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Cascading. 2015. The Cascading Ecosystem. Retrieved from http://www.cascading.org.Google ScholarGoogle Scholar
  20. Ronnie Chaiken, Bob Jenkins, Per-Åke Larson, Bill Ramsey, Darren Shakib, Simon Weaver, and Jingren Zhou. 2008. SCOPE: Easy and efficient parallel processing of massive data sets. Proc. VLDB Endow. 1, 2 (2008), 1265--1276. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Craig Chambers, Ashish Raniwala, Frances Perry, Stephen Adams, Robert R. Henry, Robert Bradshaw, and Nathan Weizenbaum. 2010. FlumeJava: Easy, efficient data-parallel pipelines. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’10). 363--375. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Jong-Deok Choi, Manish Gupta, Mauricio Serrano, Vugranam C. Sreedhar, and Sam Midkiff. 1999. Escape analysis for Java. In ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA’99). 1--19. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. CMU. 2015. Out of memory error in efficient sharded positional indexer. Retrieved from http://www.cs.cmu.edu/∼lezhao/TA/2010/HW2/.Google ScholarGoogle Scholar
  24. Tyson Condie, Neil Conway, Peter Alvaro, Joseph M. Hellerstein, Khaled Elmeleegy, and Russell Sears. 2010. MapReduce online. In USENIX Symposium on Networked Systems Design and Implementation (NSDI’10). 21--21. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Cplusplus. 2015. Why is Java more popular than C++. Retrieved from http://www.cplusplus.com/forum/general/79656/.Google ScholarGoogle Scholar
  26. DataBricks. 2015. Project Tungsten. Retrieved from https://databricks.com/blog/2015/04/28/project-tungsten-bringing-spark-closer-to-bare-metal.html.Google ScholarGoogle Scholar
  27. Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: Simplified data processing on large clusters. Commun. ACM 51, 1 (2008), 107--113. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Jens Dittrich, Jorge-Arnulfo Quiané-Ruiz, Alekh Jindal, Yagiz Kargin, Vinay Setty, and Jörg Schad. 2010. Hadoop++: Making a yellow elephant run like a cheetah (without it even noticing). Proc. VLDB Endow. 3 (2010), 515--529. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Julian Dolby and Andrew Chien. 2000. An automatic object inlining optimization and its evaluation. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’00). 345--357. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Bruno Dufour, Barbara G. Ryder, and Gary Sevitsky. 2008. A scalable technique for characterizing the usage of temporaries in framework-intensive Java applications. In ACM SIGSOFT International Symposium on the Foundations of Software Engineering (FSE’08). 59--70. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Lu Fang, Khanh Nguyen, Guoqing Xu, Brian Demsky, and Shan Lu. 2015. Interruptible tasks: Treating memory pressure as interrupts for highly scalable data-parallel programs. In ACM Symposium on Operating Systems Principles (SOSP’15). 394--409. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Kathleen Fisher, Yitzhak Mandelbaum, and David Walker. 2006. The next 700 data description languages. In ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’06). 2--15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides. 1995. Design Patterns: Elements of Reusable Object-Oriented Software. Addison-Wesley. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. David Gay and Alex Aiken. 1998. Memory management with explicit regions. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’98). 313--323. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. David Gay and Alex Aiken. 2001. Language support for regions. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’01). 70--80. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Lokesh Gidra, Gaël Thomas, Julien Sopena, Marc Shapiro, and Nhan Nguyen. 2015. NumaGiC: A garbage collector for big data on big NUMA machines. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’15). 661--673. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Ionel Gog, Jana Giceva, Malte Schwarzkopf, Kapil Vaswani, Dimitrios Vytiniotis, Ganesan Ramalingam, Manuel Costa, Derek G. Murray, Steven Hand, and Michael Isard. 2015. Broom: Sweeping out garbage collection from big data systems. In 15th USENIX Workshop on Hot Topics in Operating Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Goetz Graefe. 1993. Query evaluation techniques for large databases. ACM Comput. Surv. 25, 2 (1993), 73--170. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Dan Grossman, Greg Morrisett, Trevor Jim, Michael Hicks, Yanling Wang, and James Cheney. 2002. Region-based memory management in cyclone. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’02). 282--293. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Zhenyu Guo, Xuepeng Fan, Rishan Chen, Jiaxing Zhang, Hucheng Zhou, Sean McDirmid, Chang Liu, Wei Lin, Jingren Zhou, and Lidong Zhou. 2012. Spotting code optimizations in data-parallel pipelines through PeriSCOPE. In USENIX Symposium on Operating Systems Design and Implementation (OSDI’12). 121--133. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Samuel Z. Guyer, Kathryn S. McKinley, and Daniel Frampton. 2006. Free-Me: A static analysis for automatic individual object reclamation. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’06). 364--375. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Niels Hallenberg, Martin Elsman, and Mads Tofte. 2002. Combining region inference and garbage collection. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’02). 141--152. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Chris Hawblitzel and Thorsten von Eicken. 2002. Luna: A flexible Java protection system. In USENIX Symposium on Operating Systems Design and Implementation (OSDI’02). 391--403. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko Borisov, Liang Dong, Fatma Bilgen Cetin, and Shivnath Babu. 2011. Starfish: A self-tuning system for big data analytics. In Conference on Innovative Data Systems Research (CIDR). 261--272.Google ScholarGoogle Scholar
  45. Michael Hicks, Greg Morrisett, Dan Grossman, and Trevor Jim. 2004. Experience with safe manual memory-management in cyclone. In ACM SIGNPLAN International Symposium on Memory Management (ISMM’04). 73--84. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly. 2007. Dryad: Distributed data-parallel programs from sequential building blocks. In European Conference on Computer Systems (EuroSys’07). 59--72. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Sumant Kowshik, Dinakar Dhurjati, and Vikram Adve. 2002. Ensuring code safety without runtime checks for real-time control systems. In International Conference on Architecture and Synthesis for Embedded Systems (CASES’02). 288--297. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. 2010. What is twitter, a social network or a news media? In International World Wide Web Conference (WWW’10). 591--600. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Aapo Kyrola, Guy Blelloch, and Carlos Guestrin. 2012. GraphChi: Large-scale graph computation on just a PC. In USENIX Symposium on Operating Systems Design and Implementation (OSDI’12). 31--46. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Chris Lattner. 2005. Macroscopic Data Structure Analysis and Optimization. Ph.D. Dissertation. University of Illinois at Urbana-Champaign. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Chris Lattner and Vikram Adve. 2005. Automatic pool allocation: Improving performance by controlling data structure layout in the heap. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’05). 129--142. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Chris Lattner, Andrew Lenharth, and Vikram Adve. 2007. Making context-sensitive points-to analysis with heap cloning practical for the real world. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’07). 278--289. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Rubao Lee, Tian Luo, Yin Huai, Fusheng Wang, Yongqiang He, and Xiaodong Zhang. 2011. YSmart: Yet another SQL-to-MapReduce translator. In IEEE International Conference on Distributed Computing Systems (ICDCS’11). 25--36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Ondřej Lhoták and Laurie Hendren. 2003. Scaling Java points-to analysis using SPARK. In International Conference on Compiler Construction (CC’03). 153--169. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Ondrej Lhotak and Laurie Hendren. 2005. Run-time evaluation of opportunities for object inlining in Java. Concurrency Comput. Practice Exper. 17, 5--6 (2005), 515--537. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Jun Liu, Nishkam Ravi, Srimat Chakradhar, and Mahmut Kandemir. 2012. Panacea: Towards holistic optimization of mapreduce applications. In International Symposium on Code Generation and Optimization (CGO’12). 33--43. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Martin Maas, Tim Harris, Krste Asanovic, and John Kubiatowicz. 2015. Trash day: Coordinating garbage collection in distributed systems. In 5th USENIX Workshop on Hot Topics in Operating Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Martin Maas, Tim Harris, Krste Asanovic, and John Kubiatowicz. 2016. Holly: A multi-node language runtime system for coordinating distributed managed language applications. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’16).Google ScholarGoogle Scholar
  59. Henning Makholm. 2000. A region-based memory manager for prolog. In ACM SIGNPLAN International Symposium on Memory Management (ISMM’00). 25--34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: A system for large-scale graph processing. In ACM SIGMOD International Conference on Management of Data (SIGMOD’10). 135--146. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Yitzhak Mandelbaum, Kathleen Fisher, David Walker, Mary F. Fernández, and Artem Gleyzer. 2007. PADS/ML: A functional data description language. In ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’07). 77--83. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. McGill. 2014. Soot framework. Retrieved from http://www.sable.mcgill.ca/soot/.Google ScholarGoogle Scholar
  63. Nick Mitchell, Edith Schonberg, and Gary Sevitsky. 2009. Making sense of large heaps. In European Conference on Object-Oriented Programming (ECOOP’09). 77--97. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Nick Mitchell, Edith Schonberg, and Gary Sevitsky. 2010. Four trends leading to Java runtime bloat. IEEE Software 27, 1 (2010), 56--63. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Nick Mitchell and Gary Sevitsky. 2007. The causes of bloat, the limits of health. In ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA’07). 245--260. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Nick Mitchell, Gary Sevitsky, and Harini Srinivasan. 2006. Modeling runtime behavior in framework-based applications. In European Conference on Object-Oriented Programming (ECOOP’06). 429--451. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Mozilla. 2014. The Rust programming language. Retrieved from http://www.rust-lang.org/.Google ScholarGoogle Scholar
  68. Derek Gordon Murray, Michael Isard, and Yuan Yu. 2011. Steno: Automatic optimization of declarative queries. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’11). 121--131. Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Khanh Nguyen, Kai Wang, Yingyi Bu, Lu Fang, Jianfei Hu, and Guoqing Xu. 2015. Facade: A compiler and runtime for (almost) object-bounded big data applications. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’15). 675--690. Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Khanh Nguyen and Guoqing Xu. 2013. Cachetor: Detecting cacheable data to remove bloat. In ACM SIGSOFT International Symposium on the Foundations of Software Engineering (FSE’13). 268--278. Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Tomasz Nykiel, Michalis Potamias, Chaitanya Mishra, George Kollios, and Nick Koudas. 2010. MRShare: Sharing across multiple queries in MapReduce. Proc. VLDB Endow. 3, 1--2 (2010), 494--505. Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Christopher Olston, Benjamin Reed, Adam Silberstein, and Utkarsh Srivastava. 2008a. Automatic optimization of parallel dataflow programs. In USENIX USENIX Annual Technical Conference (ATC’08). 267--273. Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Christopher Olston, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, and Andrew Tomkins. 2008b. Pig latin: A not-so-foreign language for data processing. In ACM SIGMOD International Conference on Management of Data (SIGMOD’08). 1099--1110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. The PageRank Citation Ranking: Bringing Order to the Web.Technical Report 1999-66. Stanford InfoLab. Retrieved from http://ilpubs.stanford.edu:8090/422/.Google ScholarGoogle Scholar
  75. Rob Pike, Sean Dorward, Robert Griesemer, and Sean Quinlan. 2005. Interpreting the data: Parallel analysis with Sawzall. Sci. Program. 13, 4 (2005), 277--298. Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. Quora. 2015. For Big Data, Java or C++. Retrieved from https://www.quora.com/For-big-data-Java-or-C++.Google ScholarGoogle Scholar
  77. Semih Salihoglu and Jennifer Widom. 2013. GPS: A graph processing system. In Scientific and Statistical Database Management (SSDBM’13). 22:1--22:12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. Ajeet Shankar, Matthew Arnold, and Rastislav Bodik. 2008. JOLT: Lightweight dynamic analysis and removal of object churn. In ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA’08). 127--142. Google ScholarGoogle ScholarDigital LibraryDigital Library
  79. Yefim Shuf, Manish Gupta, Rajesh Bordawekar, and Jaswinder Pal Singh. 2002. Exploiting prolific types for memory management and optimizations. In ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’02). 295--306. Google ScholarGoogle ScholarDigital LibraryDigital Library
  80. Spark User List. 2014. Help understanding - Not enough space to cache RDD. Retrieved from http://apache-spark-user-list.1001560.n3.nabble.com/Help-understanding-Not-enough-space-to-cache-rdd-td20186.html.Google ScholarGoogle Scholar
  81. StackExchange. 2015. Choose C++ or Java for applications requiring huge amounts of RAM? Retrieved from http://programmers.stackexchange.com/questions/130108/choose-c-or-java-for-applications-requiring-huge-amounts-of-ram.Google ScholarGoogle Scholar
  82. StackOverflow. 2015a. Out of memory error due to appending values to StringBuilder. Retrieved from http://stackoverflow.com/questions/12831076/.Google ScholarGoogle Scholar
  83. StackOverflow. 2015b. Out of memory error due to large spill buffer. Retrieved from http://stackoverflow.com/questions/8464048/.Google ScholarGoogle Scholar
  84. StackOverflow. 2015c. Out of memory error in a web parser. Retrieved from http://stackoverflow.com/questions/17707883/.Google ScholarGoogle Scholar
  85. StackOverflow. 2015d. Out of memory error in building inverted index. Retrieved from http://stackoverflow.com/questions/17980491/.Google ScholarGoogle Scholar
  86. StackOverflow. 2015e. Out of memory error in computing frequencies of attribute values. Retrieved from http://stackoverflow.com/questions/23042829/.Google ScholarGoogle Scholar
  87. StackOverflow. 2015f. Out of memory error in customer review processing. Retrieved from http://stackoverflow.com/questions/20247185/.Google ScholarGoogle Scholar
  88. StackOverflow. 2015g. Out of memory error in hash join using DistributedCache. Retrieved from http://stackoverflow.com/questions/15316539/.Google ScholarGoogle Scholar
  89. StackOverflow. 2015h. Out of memory error in map-side aggregation. Retrieved from http://stackoverflow.com/questions/16684712/.Google ScholarGoogle Scholar
  90. StackOverflow. 2015i. Out of memory error in matrix multiplication. Retrieved from http://stackoverflow.com/questions/16116022/.Google ScholarGoogle Scholar
  91. StackOverflow. 2015j. Out of memory error in processing a text file as a record. Retrieved from http://stackoverflow.com/questions/12466527/.Google ScholarGoogle Scholar
  92. StackOverflow. 2015k. Out of memory error in word cooccurrence matrix stripes builder. Retrieved from http://stackoverflow.com/questions/12831076/.Google ScholarGoogle Scholar
  93. StackOverflow. 2015l. The performance comparison between in-mapper combiner and regular combiner. Retrieved from http://stackoverflow.com/questions/10925840/.Google ScholarGoogle Scholar
  94. StackOverflow. 2015m. Reducer hang at the merge step. Retrieved from http://stackoverflow.com/questions/15541900/. (2015).Google ScholarGoogle Scholar
  95. StackOverflow. 2015n. Spark worker insufficient memory. Retrieved from http://stackoverflow.com/questions/31830834/spark-worker-insufficient-memory.Google ScholarGoogle Scholar
  96. Ashish Thusoo, Joydeep Sen Sarma, Namit Jain, Zheng Shao, Prasad Chakka, Suresh Anthony, Hao Liu, Pete Wyckoff, and Raghotham Murthy. 2009. Hive: A warehousing solution over a map-reduce framework. Proc. VLDB Endow. 2, 2 (2009), 1626--1629. Google ScholarGoogle ScholarDigital LibraryDigital Library
  97. Ashish Thusoo, Joydeep Sen Sarma, Namit Jain, Zheng Shao, Prasad Chakka, Ning Zhang, Suresh Antony, Hao Liu, and Raghotham Murthy. 2010. Hive - A petabyte scale data warehouse using Hadoop. In International Conference on Data Engineering (ICDE’10). 996--1005.Google ScholarGoogle ScholarCross RefCross Ref
  98. Mads Tofte and Jean-Pierre Talpin. 1994. Implementation of the typed call-by-value lamda-calculus using a stack of regions. In ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’94). 188--201. Google ScholarGoogle ScholarDigital LibraryDigital Library
  99. Twitter. 2014. Storm: dstributed and fault-tolerant realtime computation. Retrieved from https://github.com/nathanmarz/storm.Google ScholarGoogle Scholar
  100. UCI. 2014. Hyracks: A data parallel platform. Retrieved from http://code.google.com/p/hyracks/.Google ScholarGoogle Scholar
  101. UCI. 2015a. Algebricks. Retrieved from https://code.google.com/p/hyracks/source/browse/#git%2Ffullstack%2Falgebricks.Google ScholarGoogle Scholar
  102. UCI. 2015b. AsterixDB. Retrieved from https://code.google.com/p/asterixdb/wiki/AsterixAlphaRelease.Google ScholarGoogle Scholar
  103. UCI. 2015c. Hivesterix. Retrieved from http://hyracks.org/projects/hivesterix/.Google ScholarGoogle Scholar
  104. UCI. 2015d. Pregelix. Retrieved from http://hyracks.org/projects/pregelix/.Google ScholarGoogle Scholar
  105. UCI. 2015e. VXQuery. Retrieved from http://incubator.apache.org/vxquery/.Google ScholarGoogle Scholar
  106. Raja Vallée-Rai, Etienne Gagnon, Laurie Hendren, Patrick Lam, Patrice Pominville, and Vijay Sundaresan. 2000. Optimizing Java bytecode using the soot framework: Is it feasible? In International Conference on Compiler Construction (CC’00). 18--34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  107. Guoqing Xu. 2012. Finding reusable data structures. In ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA’12). 1017--1034. Google ScholarGoogle ScholarDigital LibraryDigital Library
  108. Guoqing Xu. 2013a. CoCo: Sound and adaptive replacement of Java collections. In European Conference on Object-Oriented Programming (ECOOP’13). 1--26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  109. Guoqing Xu. 2013b. Resurrector: A tunable object lifetime profiling technique for optimizing real-world programs. In ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA’13). 111--130. Google ScholarGoogle ScholarDigital LibraryDigital Library
  110. Guoqing Xu, Matthew Arnold, Nick Mitchell, Atanas Rountev, Edith Schonberg, and Gary Sevitsky. 2010a. Finding low-utility data structures. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’10). 174--186. Google ScholarGoogle ScholarDigital LibraryDigital Library
  111. Guoqing Xu, Matthew Arnold, Nick Mitchell, Atanas Rountev, and Gary Sevitsky. 2009. Go with the flow: Profiling copies to find runtime bloat. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’09). 419--430. Google ScholarGoogle ScholarDigital LibraryDigital Library
  112. Guoqing Xu, Nick Mitchell, Matthew Arnold, Atanas Rountev, Edith Schonberg, and Gary Sevitsky. 2014. Scalable runtime bloat detection using abstract dynamic slicing. ACM Trans. Softw. Eng. Methodol. 23, 3, Article 23 (June 2014), 50 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  113. Guoqing Xu, Nick Mitchell, Matthew Arnold, Atanas Rountev, and Gary Sevitsky. 2010b. Software bloat analysis: Finding, removing, and preventing performance problems in modern large-scale object-oriented applications. In ACM SIGSOFT FSE/SDP Working Conference on the Future of Software Engineering Research (FoSER’10). 421--426. Google ScholarGoogle ScholarDigital LibraryDigital Library
  114. Guoqing Xu and Atanas Rountev. 2008. Precise memory leak detection for Java software using container profiling. In International Conference on Software Engineering (ICSE). 151--160. Google ScholarGoogle ScholarDigital LibraryDigital Library
  115. Guoqing Xu and Atanas Rountev. 2010. Detecting inefficiently-used containers to avoid bloat. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’10). 160--173. Google ScholarGoogle ScholarDigital LibraryDigital Library
  116. Guoqing Xu, Dacong Yan, and Atanas Rountev. 2012. Static detection of loop-invariant data structures. In European Conference on Object-Oriented Programming (ECOOP’12). 738--763. Google ScholarGoogle ScholarDigital LibraryDigital Library
  117. Yahoo. 2014. Yahoo! Webscope program. Retrieved from http://webscope.sandbox.yahoo.com/.Google ScholarGoogle Scholar
  118. Dacong Yan, Guoqing Xu, and Atanas Rountev. 2012. Uncovering performance problems in Java applications with reference propagation profiling. In International Conference on Software Engineering (ICSE). 134--144. Google ScholarGoogle ScholarDigital LibraryDigital Library
  119. Hung-chih Yang, Ali Dasdan, Ruey-Lung Hsiao, and D. Stott Parker. 2007. Map-reduce-merge: Simplified relational data processing on large clusters. In ACM SIGMOD International Conference on Management of Data (SIGMOD’07). 1029--1040. Google ScholarGoogle ScholarDigital LibraryDigital Library
  120. Yuan Yu, Michael Isard, Dennis Fetterly, Mihai Budiu, Úlfar Erlingsson, Pradeep Kumar Gunda, and Jon Currey. 2008. DryadLINQ: A system for general-purpose distributed data-parallel computing using a high-level language. In USENIX Symposium on Operating Systems Design and Implementation (OSDI’08). 1--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  121. Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In USENIX Symposium on Networked Systems Design and Implementation (NSDI’12). USENIX Association, 2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  122. Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2010. Spark: Cluster computing with working sets. In USENIX Conference on Hot Topics in Cloud Computing (HotCloud’10). 10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  123. Nickolai Zeldovich, Silas Boyd-Wickizer, Eddie Kohler, and David Mazières. 2006. Making information flow explicit in hiStar. In USENIX Symposium on Operating Systems Design and Implementation (OSDI’06). 263--278. Google ScholarGoogle ScholarDigital LibraryDigital Library
  124. Jingren Zhou, Per-Åke Larson, and Ronnie Chaiken. 2010. Incorporating partitioning and parallel plans into the SCOPE optimizer. In International Conference on Data Engineering (ICDE’10). 1060--1071.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Understanding and Combating Memory Bloat in Managed Data-Intensive Systems

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Software Engineering and Methodology
        ACM Transactions on Software Engineering and Methodology  Volume 26, Issue 4
        October 2017
        128 pages
        ISSN:1049-331X
        EISSN:1557-7392
        DOI:10.1145/3177744
        Issue’s Table of Contents

        Copyright © 2018 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 3 January 2018
        • Revised: 1 October 2017
        • Accepted: 1 October 2017
        • Received: 1 July 2016
        Published in tosem Volume 26, Issue 4

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader