skip to main content
research-article

Isolation for nested task parallelism

Published:29 October 2013Publication History
Skip Abstract Section

Abstract

Isolation--the property that a task can access shared data without interference from other tasks--is one of the most basic concerns in parallel programming. Whilethere is a large body of past work on isolated task-parallelism, the integration of isolation, task-parallelism, and nesting of tasks has been a difficult and unresolved challenge. In this pa- per, we present a programming and execution model called Otello where isolation is extended to arbitrarily nested parallel tasks with irregular accesses to heap data. At the same time, no additional burden is imposed on the programmer, who only exposes parallelism by creating and synchronizing parallel tasks, leaving the job of ensuring isolation to the underlying compiler and runtime system.

Otello extends our past work on Aida execution model and the delegated isolation mechanism [22] to the setting of nested parallelism. The basic runtime construct in Aida and Otello is an assembly: a task equipped with a region in the shared heap that it owns. When an assembly A conflicts with an assembly B, A transfers--or delegates--its code and owned region to a carefully selected assembly C in a way that will ensure isolation with B, leaving the responsibility of re-executing task A to C. The choice of C depends on the nesting relationship between A and B.We have implemented Otello on top of the Habanero Java (HJ) parallel programming language [8], and used this implementation to evaluate Otello on collections of nested task-parallel benchmarks and non-nested transactional benchmarks from past work. On the nested task-parallel benchmarks, Otello achieves scalability comparable to HJ programs without built-in isolation, and the relative overhead of Otello is lower than that of many published data-race detection algorithms that detect the isolation violations (but do not enforce isolation). For the transactional benchmarks, Otello incurs lower overhead than a state-of-the-art software transactional memory system (Deuce STM).

References

  1. Kunal Agrawal, Jeremy T. Fineman, and Jim Sukha. Nested parallelism in transactional memory. In Proceedings of PPoPP'08, Salt Lake City, UT, USA, pages 163--174. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Woongki Baek, Chi Cao Minh, Martin Trautmann, Christos Kozyrakis, and Kunle Olukotun. The OpenTM transactional application programming interface. In Proceedings of PACT'07, pages 376387. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Rajkishore Barik, Jisheng Zhao, and Vivek Sarkar. Interproce- dural strength reduction of critical sections in explicity-parallel programs. In Proceedings of PACT'13, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Ganesh Bikshandi, Jose G. Castanos, Sreedhar B. Kodali, V. Krishna Nandivada, Igor Peshansky, Vijay A. Saraswat, Sayantan Sur, Pradeep Varma, and Tong Wen. Efficient, portable implementation of asynchronous multi-place programs. In Proceedings of PPoPP'09. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Robert D. Blumofe and Charles E. Leiserson. Scheduling multithreaded computations by work-stealing. In Proceedings of FOCS'94, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Robert L. Bocchino, Stephen Heumann, Nima Honarmand, Sarita V. Adve, Vikram S. Adve, Adam Welc, and Tatiana Shpeisman. Safe nondeterminism in a deterministic-by-default parallel language. In Proceedings of POPL'11, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Martin Burtscher, Milind Kulkarni, Dimitrios Prountzos, and Keshav Pingali. On the scalability of an automatically parallelized irregular application. In Proceedings of LCPC'08, pages 109 123. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Vincent Cave , Jisheng Zhao, Jun Shirako, and Vivek Sarkar. Habanero-Java: the New Adventures of Old X10. In Proceedings of PPPJ'11, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Robit Chandra, Ramesh Menon, Leo Dagum, David Kohr, Dror Maydan, and Jeff McDonald. Parallel programming in OpenMP. Morgan Kaufmann Publishers Inc., 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. P. Charles et al. X10: an object-oriented approach to non-uniform cluster computing. In Proceedings of OOPSLA'05, New York, NY, USA, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Deuce STM - Java Software Transactional Memory. http://www.deucestm.org/.Google ScholarGoogle Scholar
  12. David Dice, Ori Shalev, and Nir Shavit. Transactional locking II. In Proceedings of DISC'06, pages 194208, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Alejandro Duran et al. Barcelona OpenMP Tasks Suite: a set of benchmarks targeting the exploitation of task parallelism in OpenMP. In Proceedings of ICPP'09, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Cormac Flanagan and Stephen N. Freund. FastTrack: efficient and precise dynamic race detection. In Proceedings of PLDI'09, pages 121 133. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Tim Harris, James R. Larus, and Ravi Rajwar. Transactional Memory, 2nd Edition. Morgan and Claypool, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. JSTAMP. Jstamp. http://demsky.eecs.uci.edu/software.php.Google ScholarGoogle Scholar
  17. Eric Koskinen and Maurice Herlihy. Checkpoints and continuations instead of nested transactions. In Proceedings of SPAA'08, pages 160168, jun 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Milind Kulkarni, Martin Burtscher, Rajasekhar Inkulu, Keshav Pingali, and Calin Cascaval. How much parallelism is there in irregular applications' In Proceedings of PPoPP'09, pages 314. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Milind Kulkarni, Keshav Pingali, Bruce Walter, Ganesh Ramanarayanan, Kavita Bala, and L. Paul Chew. Optimistic parallelism requires abstractions. In Proceedings of PLDI'07, pages 211222, New York, NY, USA, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. Larus and R. Rajwar. Transactional Memory. Morgan and Claypool, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  21. Roberto Lublinerman, Swarat Chaudhuri, and Pavol Cerny. Parallel programming with object assemblies. In Proceedings of OOPSLA'09, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Roberto Lublinerman, Jisheng Zhao, Zoran Budimlic , Swarat Chaudhuri, and Vivek Sarkar. Delegated isolation. In Proceedings of OOPSLA'11, NY, USA, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Mario Me ndez-Loj, Donald Nguyen, Dimitrios Prountzos, Xin Sui, M. Amber Hassaan, Milind Kulkarni, Martin Burtscher, and Keshav Pingalil. Structure-driven optimizations for amor- phous data-parallel programs. In Proceedings of PPoPP'10, pages 3 14, New York, NY, USA, 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Vijay Menon, Ali reza Adl-tabatabai, Steven Balensiefer, Richard L. Hudson, Adam Welc, Tatiana Shpeisman, and Bratin Saha. Practical weak-atomicity semantics for java stm. In Proceedings of ACM Symposium on Parallellism in Algorithms and Architectures, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Chi Cao Minh, JaeWoong Chung, Christos Kozyrakis, and Kunle Olukotun. STAMP: Stanford transactional applications for multiprocessing. In Proceedings of IISWC'08, pages 3546, 2008.Google ScholarGoogle Scholar
  26. V. Krishna Nandivada, Jun Shirako, Jisheng Zhao, and Vivek Sarkar. A transformation framework for optimizing task- parallel programs. ACM Trans. Program. Lang. Syst., 35(1):3, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. OpenMP Application Program Interface, version 3.0, May 2008. http://www.openmp.org/mp-documents/spec30.pdf.Google ScholarGoogle Scholar
  28. Raghavan Raman, Jisheng Zhao, Vivek Sarkar, Martin Vechev, and Eran Yahav. Scalable and precise dynamic datarace detection for structured parallelism. In Proceedings of PLDI'12, pages 531 542. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Jun Shirako, David M. Peixotto, Vivek Sarkar, and William N. Scherer. Phasers: a unified deadlock-free construct for collective and point-to-point synchronization. In Proceedings of ICS'08, pages 277--288. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Haris Volos, Adam Welc, Ali-reza Adl-tabatabai, Tatiana Sh- peisman, Xinmin Tian, and Ravi Narayanaswamy. NePaLTM: Design and implementation of nested parallelism for trans- actional memory systems. In Proceedings of ECOOP'09, jun 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Isolation for nested task parallelism

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM SIGPLAN Notices
            ACM SIGPLAN Notices  Volume 48, Issue 10
            OOPSLA '13
            October 2013
            867 pages
            ISSN:0362-1340
            EISSN:1558-1160
            DOI:10.1145/2544173
            Issue’s Table of Contents
            • cover image ACM Conferences
              OOPSLA '13: Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications
              October 2013
              904 pages
              ISBN:9781450323741
              DOI:10.1145/2509136

            Copyright © 2013 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 29 October 2013

            Check for updates

            Qualifiers

            • research-article

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader