ABSTRACT
Processor design has turned toward parallelism and heterogeneous cores to achieve performance and energy efficiency. Developers find high-level languages attractive as they use abstraction to offer productivity and portability over these hardware complexities. Over the past few decades, researchers have developed increasingly advanced mechanisms to deliver performance despite the overheads naturally imposed by this abstraction. Recent work has demonstrated that such mechanisms can be exploited to attack overheads that arise in emerging high-level languages, which provide strong abstractions over parallelism. However, current implementation of existing popular high-level languages, such as Java, offer little by way of abstractions that allow the developer to achieve performance in the face of extensive hardware parallelism.
In this paper, we present a small set of extensions to the Java programming language that aims to achieve both high performance and high productivity with minimal programmer effort. We incorporate ideas from languages like X10 and AJ to develop five annotations in Java for achieving asynchronous task parallelism and data-centric concurrency control. These annotations allow the use of a highly efficient implementation of a work-stealing scheduler for task parallelism. We evaluate our proposal by refactoring classes from a number of existing multithreaded open source projects to use our new annotations. Our results suggest that these annotations significantly reduce the programming effort while achieving performance improvements up to 30% compared to conventional approaches.
- AJWS. https://github.com/vivkumar/ajws.Google Scholar
- jMetal. http://jmetal.sourceforge.net/.Google Scholar
- JTransforms. https://sites.google.com/site/piotrwendykier/software/jtransforms.Google Scholar
- Simple-Java-XML-Parser. https://github.com/thebuzzmedia/simple-java-xml-parser.Google Scholar
- TryCatchWS. https://github.com/vivkumar/TryCatchWS.Google Scholar
- Tiobe index for ranking the popularity of programming languages. http://www.tiobe.com/tiobe_index, June 2016.Google Scholar
- G. Agha. Actors: A Model of Concurrent Computation in Distributed Systems. MIT Press, Cambridge, MA, USA, 1986. Google ScholarCross Ref
- B. Alpern, S. Augart, S. M. Blackburn, M. Butrico, A. Cocchi, P. Cheng, J. Dolby, S. J. Fink, D. Grove, M. Hind, K. S. McKinley, M. Mergen, J. E. B. Moss, T. Ngo, V. Sarkar, and M. Trapp. The Jikes RVM Project: Building an open source research community. IBM System Journal, 44(2):399--418, 2005. Google ScholarDigital Library
- R. Blumofe and C. Leiserson. Scheduling multithreaded computations by work stealing. Journal of the ACM (JACM), 46(5):720--748, 1999. Google ScholarDigital Library
- S. Borkar and A. A. Chien. The future of microprocessors. Commun. ACM, 54(5):67--77, May 2011. Google ScholarDigital Library
- V. Cavé, J. Zhao, J. Shirako, and V. Sarkar. Habanero-java: The new adventures of old x10. In Proceedings of the 9th International Conference on Principles and Practice of Programming in Java, PPPJ '11, pages 51--61, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
- B. Chamberlain, D. Callahan, and H. Zima. Parallel programmability and the Chapel language. International Journal of High Performance Computing Applications, 21(3):291, 2007. Google ScholarDigital Library
- P. Charles, C. Grothoff, V. Saraswat, C. Donawa, A. Kielstra, K. Ebcioglu, C. von Praun, and V. Sarkar. X10: an object-oriented approach to non-uniform cluster computing. In Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications, OOPSLA '05, pages 519--538, New York, NY, USA, 2005. ACM. Google ScholarDigital Library
- S. Cherem, T. Chilimbi, and S. Gulwani. Inferring locks for atomic sections. In Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '08, pages 304--315, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
- A. A. Chien. ICC++-A C++ dialect for high performance parallel computing. volume 4, pages 19--23, New York, NY, USA, Apr. 1996. ACM. Google ScholarDigital Library
- B. Demsky and P. Lam. Views: Synthesizing fine-grained concurrency control. ACM Trans. Softw. Eng. Methodol., 22(1):4:1--4:33, Mar. 2013. Google ScholarDigital Library
- J. Dolby, C. Hammer, D. Marino, F. Tip, M. Vaziri, and J. Vitek. A data-centric approach to synchronization. ACM Trans. Program. Lang. Syst., 34(1):4:1--4:48, May 2012. Google ScholarDigital Library
- T. Ekman and G. Hedin. The JastAdd extensible Java compiler. In Proceedings of the 22nd Annual ACM SIGPLAN Conference on Object-oriented Programming Systems and Applications, OOPSLA '07, pages 1--18, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
- S. J. Fink and F. Qian. Design, implementation and evaluation of adaptive recompilation with on-stack replacement. In Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization, CGO '03, pages 241--252, Washington, DC, USA, 2003. IEEE Computer Society. Google ScholarDigital Library
- M. Frigo, C. E. Leiserson, and K. H. Randall. The implementation of the Cilk-5 multithreaded language. In Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation, PLDI '98, pages 212--223, New York, NY, USA, 1998. ACM. Google ScholarDigital Library
- J. B. Goodenough. Exception handling: Issues and a proposed notation. Commun. ACM, 18(12):683--696, Dec. 1975. Google ScholarDigital Library
- M. Grossman, S. Imam, and V. Sarkar. HJ-OpenCL: Reducing the gap between the JVM and accelerators. In Proceedings of the Principles and Practices of Programming on The Java Platform, PPPJ '15, pages 2--15. ACM, 2015. Google ScholarDigital Library
- C. Hammer, J. Dolby, M. Vaziri, and F. Tip. Dynamic detection of atomic-set-serializability violations. In Proceedings of the 30th International Conference on Software Engineering, ICSE '08, pages 231--240, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
- U. Hölzle, C. Chambers, and D. Ungar. Debugging optimized code with dynamic deoptimization. In Proceedings of the ACM SIGPLAN 1992 Conference on Programming Language Design and Implementation, PLDI '92, pages 32--43, New York, NY, USA, 1992. ACM. Google ScholarDigital Library
- S. Imam and V. Sarkar. Habanero-java library: A java 8 framework for multicore programming. In Proceedings of the 2014 International Conference on Principles and Practices of Programming on the Java Platform: Virtual Machines, Languages, and Tools, PPPJ '14, pages 75--86, New York, NY, USA, 2014. ACM. Google ScholarCross Ref
- S. Imam, J. Zhao, and V. Sarkar. Euro-par 2015: Parallel processing: 21st international conference on parallel and distributed computing, vienna, austria, august 24-28, 2015, proceedings. chapter A Composable Deadlock-Free Approach to Object-Based Isolation, pages 426--437. Springer Berlin Heidelberg, Berlin, Heidelberg, 2015.Google Scholar
- N. Kidd, T. Reps, J. Dolby, and M. Vaziri. Finding concurrency-related bugs using random isolation. In Proceedings of the 10th International Conference on Verification, Model Checking, and Abstract Interpretation, VMCAI '09, pages 198--213, Berlin, Heidelberg, 2009. Springer-Verlag. Google ScholarDigital Library
- V. Kumar, S. M. Blackburn, and D. Grove. Friendly barriers: Efficient work-stealing with return barriers. In Proceedings of the 10th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, VEE '14, pages 165--176, New York, NY, USA, 2014. ACM. Google ScholarDigital Library
- V. Kumar, D. Frampton, S. M. Blackburn, D. Grove, and O. Tardieu. Work-stealing without the baggage. In Proceedings of the ACM International Conference on Object Oriented Programming Systems Languages and Applications, OOPSLA '12, pages 297--314, New York, NY, USA, 2012. ACM. Google ScholarDigital Library
- V. Kumar, A. Sbîrlea, A. Jayaraj, Z. Budimlić, D. Majeti, and V. Sarkar. Heterogeneous work-stealing across CPU and DSP cores. In High Performance Extreme Computing Conference (HPEC), 2015 IEEE, pages 1--6, Sept 2015.Google ScholarCross Ref
- J. Larus and C. Kozyrakis. Transactional memory. Commun. ACM, 51(7):80--88, July 2008. Google ScholarDigital Library
- Y. Lin, K. Wang, S. M. Blackburn, A. L. Hosking, and M. Norrish. Stop and go: Understanding yieldpoint behavior. In Proceedings of the 2015 International Symposium on Memory Management, ISMM '15, pages 70--80, New York, NY, USA, 2015. ACM. Google ScholarDigital Library
- D. Marino, C. Hammer, J. Dolby, M. Vaziri, F. Tip, and J. Vitek. Detecting deadlock in programs with data-centric synchronization. In Proceedings of the 2013 International Conference on Software Engineering, ICSE '13, pages 322--331, Piscataway, NJ, USA, 2013. IEEE Press. Google ScholarDigital Library
- M. Petito. Eclipse refactoring. http://people.clarkson.edu/~dhou/courses/EE564-s07/Eclipse-Refactoring.pdf, 5:2010, 2007.Google Scholar
- V. Saraswat, B. Bloom, I. Peshansky, O. Tardieu, and D. Grove. X10 language specification, 2011.Google Scholar
- V. Sundaresan, D. Maier, P. Ramarao, and M. Stoodley. Experiences with multi-threading and dynamic class loading in a java just-in-time compiler. In Proceedings of the International Symposium on Code Generation and Optimization, CGO '06, pages 87--97, Washington, DC, USA, 2006. IEEE Computer Society. Google ScholarDigital Library
- R. Vallée-Rai, P. Co, E. Gagnon, L. Hendren, P. Lam, and V. Sundaresan. Soot - a Java bytecode optimization framework. In Proceedings of the 1999 Conference of the Centre for Advanced Studies on Collaborative Research, CASCON '99, pages 13--. IBM Press, 1999. Google ScholarDigital Library
- M. Vaziri, F. Tip, and J. Dolby. Associating synchronization constraints with data in an object-oriented language. In Conference Record of the 33rd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL '06, pages 334--345, New York, NY, USA, 2006. ACM. Google ScholarDigital Library
- M. Vaziri, F. Tip, J. Dolby, C. Hammer, and J. Vitek. A type system for data-centric synchronization. In Proceedings of the 24th European Conference on Object-oriented Programming, ECOOP'10, pages 304--328, Berlin, Heidelberg, 2010. Springer-Verlag. Google ScholarDigital Library
- D. A. Wheeler. SLOCCount. http://www.dwheeler.com/sloccount/, 2001.Google Scholar
- G. V. Wilson. Parallel Programming Using C++. MIT Press, 1996.Google ScholarDigital Library
- A. Yonezawa, editor. ABCL: An Object-oriented Concurrent System. MIT Press, 1990. Google ScholarDigital Library
- Integrating Asynchronous Task Parallelism and Data-centric Atomicity
Recommendations
A work-stealing scheduler for X10's task parallelism with suspension
PPoPP '12: Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel ProgrammingThe X10 programming language is intended to ease the programming of scalable concurrent and distributed applications. X10 augments a familiar imperative object-oriented programming model with constructs to support light-weight asynchronous tasks as well ...
A work-stealing scheduler for X10's task parallelism with suspension
PPOPP '12The X10 programming language is intended to ease the programming of scalable concurrent and distributed applications. X10 augments a familiar imperative object-oriented programming model with constructs to support light-weight asynchronous tasks as well ...
Integrating Asynchronous Task Parallelism with MPI
IPDPS '13: Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed ProcessingEffective combination of inter-node and intra-node parallelism is recognized to be a major challenge for future extreme-scale systems. Many researchers have demonstrated the potential benefits of combining both levels of parallelism, including increased ...
Comments