skip to main content
10.1145/2517349.2522715acmconferencesArticle/Chapter ViewAbstractPublication PagessospConference Proceedingsconference-collections
research-article
Open Access

Dandelion: a compiler and runtime for heterogeneous systems

Published:03 November 2013Publication History

ABSTRACT

Computer systems increasingly rely on heterogeneity to achieve greater performance, scalability and energy efficiency. Because heterogeneous systems typically comprise multiple execution contexts with different programming abstractions and runtimes, programming them remains extremely challenging.

Dandelion is a system designed to address this programmability challenge for data-parallel applications. Dandelion provides a unified programming model for heterogeneous systems that span diverse execution contexts including CPUs, GPUs, FPGAs, and the cloud. It adopts the .NET LINQ (Language INtegrated Query) approach, integrating data-parallel operators into general purpose programming languages such as C# and F#. It therefore provides an expressive data model and native language integration for user-defined functions, enabling programmers to write applications using standard high-level languages and development tools.

Dandelion automatically and transparently distributes data-parallel portions of a program to available computing resources, including compute clusters for distributed execution and CPU and GPU cores of individual nodes for parallel execution. To enable automatic execution of .NET code on GPUs, Dandelion cross-compiles .NET code to CUDA kernels and uses the PTask runtime [85] to manage GPU execution. This paper discusses the design and implementation of Dandelion, focusing on the distributed CPU and GPU implementation. We evaluate the system using a diverse set of workloads.

Skip Supplemental Material Section

Supplemental Material

d1-04-christopher-rossbach.mp4

mp4

1.2 GB

References

  1. Apache YARN. http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html.Google ScholarGoogle Scholar
  2. The CCI project. http://cciast.codeplex.com/.Google ScholarGoogle Scholar
  3. The LINQ project. http://msdn.microsoft.com/en-us/library/vstudio/bb397926.aspx.Google ScholarGoogle Scholar
  4. The PLINQ project. http://msdn.microsoft.com/en-us/library/dd460688.aspx.Google ScholarGoogle Scholar
  5. Sort benchmark home page. http://sortbenchmark.org/.Google ScholarGoogle Scholar
  6. IBM 709 electronic data-processing system: advance description. I.B.M., White Plains, NY, 1957.Google ScholarGoogle Scholar
  7. Matlab plug-in for CUDA. https://developer.nvidia.com/matlab-cuda, 2007.Google ScholarGoogle Scholar
  8. JCuda: Java bindings for CUDA. http://www.jcuda.org/jcuda/JCuda.html, 2012.Google ScholarGoogle Scholar
  9. J. S. Auerbach, D. F. Bacon, P. Cheng, and R. M. Rabbah. Lime: a java-compatible and synthesizable language for heterogeneous architectures. In OOPSLA, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. C. Augonnet, J. Clet-Ortega, S. Thibault, and R. Namyst. Data-Aware Task Scheduling on Multi-Accelerator based Platforms. In 16th International Conference on Parallel and Distributed Systems, Shangai, Chine, Dec. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. C. Augonnet and R. Namyst. StarPU: A Unified Runtime System for Heterogeneous Multi-core Architectures.Google ScholarGoogle Scholar
  12. C. Augonnet, S. Thibault, R. Namyst, and M. Nijhuis. Exploiting the Cell/BE Architecture with the StarPU Unified Runtime System. In SAMOS '09, pages 329--339, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. E. Ayguadé, R. M. Badia, F. D. Igual, J. Labarta, R. Mayo, and E. S. Quintana-Ortí. An extension of the starss programming model for platforms with multiple gpus. In Proceedings of the 15th International Euro-Par Conference on Parallel Processing, Euro-Par '09, pages 851--862, Berlin, Heidelberg, 2009. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. R. M. Badia, J. Labarta, R. Sirvent, J. M. Prez, J. M. Cela, and R. Grima. Programming Grid Applications with GRID Superscalar. Journal of Grid Computing, 1:2003, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  15. C. Banino, O. Beaumont, L. Carter, J. Ferrante, A. Legrand, and Y. Robert. Scheduling strategies for master-slave tasking on heterogeneous processor platforms. 2004.Google ScholarGoogle Scholar
  16. M. Bauer, S. Treichler, E. Slaughter, and A. Aiken. Legion: expressing locality and independence with logical regions. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC '12, pages 66:1--66:11, Los Alamitos, CA, USA, 2012. IEEE Computer Society Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. A. Bayoumi, M. Chu, Y. Hanafy, P. Harrell, and G. Refai-Ahmed. Scientific and Engineering Computing Using ATI Stream Technology. Computing in Science and Engineering, 11(6):92--97, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. P. Bellens, J. M. Perez, R. M. Badia, and J. Labarta. CellSs: a programming model for the cell BE architecture. In SC 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. B. Billerbeck, N. Craswell, D. Fetterly, and M. Najork. Microsoft Research at TREC 2011 Web Track. In Proc. of the 20th Text Retrieval Conference, 2011.Google ScholarGoogle Scholar
  20. H. Bos, W. de Bruijn, M. Cristea, T. Nguyen, and G. Portokalidis. Ffpf: Fairly fast packet filters. In Proceedings of OSDI'04, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Y. Bu, B. Howe, M. Balazinska, and M. D. Ernst. Haloop: efficient iterative data processing on large clusters. Proc. VLDB Endow., 3(1--2):285--296, Sept. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian, M. Houston, and P. Hanrahan. Brook for GPUs: Stream Computing on Graphics Hardware. ACM TRANSACTIONS ON GRAPHICS, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. J. Bueno, L. Martinell, A. Duran, M. Farreras, X. Martorell, R. M. Badia, E. Ayguade, and J. Labarta. Productive cluster programming with ompss. In Proceedings of the 17th international conference on Parallel processing - Volume Part I, Euro-Par'11, pages 555--566, Berlin, Heidelberg, 2011. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. P. Calvert. Part II dissertation, computer science tripos, university of cambridge, June 2010.Google ScholarGoogle Scholar
  25. B. Catanzaro, M. Garland, and K. Keutzer. Copperhead: compiling an embedded data parallel language. In Proceedings of the 16th ACM symposium on Principles and practice of parallel programming, PPoPP '11, pages 47--56, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. B. Catanzaro, N. Sundaram, and K. Keutzer. A map reduce framework for programming graphics processors. In In Workshop on Software Tools for MultiCore Systems, 2008.Google ScholarGoogle Scholar
  27. C. Chambers, A. Raniwala, F. Perry, S. Adams, R. Henry, R. Bradshaw, and N. Weizenbaum. FlumeJava: easy, efficient data-parallel pipelines. In PLDI'10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. S. C. Chiu, W.-k. Liao, A. N. Choudhary, and M. T. Kandemir. Processor-embedded distributed smart disks for I/O-intensive workloads: architectures, performance models and evaluation. J. Parallel Distrib. Comput., 65(4):532--551, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. E. Chung, J. Davis, and J. Lee. Linqits: Big data on little clients. In Proceedings of the 40th International Symposium on Computer Architecture (ISCA), 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. C. H. Crawford, P. Henning, M. Kistler, and C. Wright. Accelerating computing with the cell broadband engine processor. In CF 2008, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. M.-L. Cristea, W. de Bruijn, and H. Bos. Fpl-3: towards language support for distributed packet processing. In Proceedings of IFIP Networking 2005, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. M. L. Cristea, C. Zissulescu, E. Deprettere, and H. Bos. Fpl-3e: towards language support for reconfigurable packet processing. In Proceedings of the 5th international conference on Embedded Computer Systems: architectures, Modeling, and Simulation, SAMOS'05, pages 82--92, Berlin, Heidelberg, 2005. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. J. Currey, S. Baker, and C. J. Rossbach. Supporting iteration in a heterogeneous dataflow engine. In SFMA, 2013.Google ScholarGoogle Scholar
  34. A. Currid. TCP offload to the rescue. Queue, 2(3):58--65, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. A. L. Davis and R. M. Keller. Data flow program graphs. IEEE Computer, 15(2):26--41, 1982. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. W. de Bruijn and H. Bos. Pipesfs: Fast linux i/o in the unix tradition. ACM SigOps Operating Systems Review, 42(5), July 2008. Special Issue on R&D in the Linux Kernel. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. W. de Bruijn, H. Bos, and H. Bal. Application-tailored i/o with streamline. ACM Trans. Comput. Syst., 29:6:1--6:33, May 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Z. DeVito, N. Joubert, F. Palacios, S. Oakley, M. Medina, M. Barrientos, E. Elsen, F. Ham, A. Aiken, K. Duraisamy, E. Darve, J. Alonso, and P. Hanrahan. Liszt: a domain specific language for building portable mesh-based pde solvers. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC '11, pages 9:1--9:12, New York, NY, USA, 2011. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. N. T. Duong, Q. A. P. Nguyen, A. T. Nguyen, and H.-D. Nguyen. Parallel pagerank computation using gpus. In Proceedings of the Third Symposium on Information and Communication Technology, SoICT '12, pages 223--230, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. J. Ekanayake, H. Li, B. Zhang, T. Gunarathne, S.-H. Bae, J. Qiu, and G. Fox. Twister: a runtime for iterative mapreduce. In HPDC '10. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. S. Ewen, K. Tzoumas, M. Kaufmann, and V. Markl. Spinning fast iterative data flows. VLDB, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. C. Fetzer and K. Hgstedt. Self/star: A dataflow oriented component framework for pervasive dependability. In 8th IEEE International Workshop on Object-Oriented Real-Time Dependable Systems (WORDS 2003), 15--17 January 2003, Guadalajara, Mexico, pages 66--73. IEEE Computer Society, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  43. N. K. Govindaraju, B. Lloyd, W. Wang, M. Lin, and D. Manocha. Fast computation of database operations using graphics processors. In ACM SIGGRAPH 2005 Courses, SIGGRAPH '05, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. J. Gray, A. Szalay, A. Thakar, P. Kunszt, C. Stoughton, D. Slutz, and J. Vandenberg. Data mining the SDSS SkyServer database. In Distributed Data and Structures 4: Records of the 4th International Meeting, pages 189--210, Paris, France, March 2002. Carleton Scientific. also as MSR-TR-2002-01.Google ScholarGoogle Scholar
  45. K. Gregory and A. Miller. C++ Amp: Accelerated Massive Parallelism With Microsoft Visual C++. Microsoft Press Series. Microsoft GmbH, 2012.Google ScholarGoogle Scholar
  46. D. Grewe and M. OBoyle. A static task partitioning approach for heterogeneous systems using opencl. Compiler Construction, 6601:286--305, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. V. Gupta, K. Schwan, N. Tolia, V. Talwar, and P. Ranganathan. Pegasus: coordinated scheduling for virtualized accelerator-based systems. In Proceedings of the 2011 USENIX conference on USENIX annual technical conference, USENIX-ATC'11, pages 3--3, Berkeley, CA, USA, 2011. USENIX Association. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. T. D. Han and T. S. Abdelrahman. hiCUDA: a high-level directive-based language for GPU programming. In GPGPU 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. B. He, W. Fang, Q. Luo, N. K. Govindaraju, and T. Wang. Mars: a mapreduce framework on graphics processors. In Proceedings of the 17th international conference on Parallel architectures and compilation techniques, PACT '08, pages 260--269, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. B. He, M. Lu, K. Yang, R. Fang, N. K. Govindaraju, Q. Luo, and P. V. Sander. Relational query coprocessing on graphics processors. ACM Trans. Database Syst., 34(4):21:1--21:39, Dec. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. B. He, K. Yang, R. Fang, M. Lu, N. Govindaraju, Q. Luo, and P. Sander. Relational joins on graphics processors. SIGMOD '08, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. The HIVE project. http://hadoop.apache.org/hive/.Google ScholarGoogle Scholar
  53. A. Hormati, Y. Choi, M. Kudlur, R. M. Rabbah, T. Mudge, and S. A. Mahlke. Flextream: Adaptive compilation of streaming applications for heterogeneous architectures. In PACT, pages 214--223, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. T. Hruby, K. van Reeuwijk, and H. Bos. Ruler: high-speed packet matching and rewriting on npus. In ANCS '07: Proceedings of the 3rd ACM/IEEE Symposium on Architecture for networking and communications systems, pages 1--10, New York, NY, USA, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. S. S. Huang, A. Hormati, D. F. Bacon, and R. M. Rabbah. Liquid metal: Object-oriented programming across the hardware/software boundary. In ECOOP, pages 76--103, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Intel. Math kernel library. http://developer.intel.com/software/products/mkl/.Google ScholarGoogle Scholar
  57. M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: distributed data-parallel programs from sequential building blocks. In EuroSys 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. W. Jiang and G. Agrawal. Mate-cg: A map reduce-like framework for accelerating data-intensive computations on heterogeneous clusters. Parallel and Distributed Processing Symposium, International, 0:644--655, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. V. J. Jiménez, L. Vilanova, I. Gelado, M. Gil, G. Fursin, and N. Navarro. Predictive runtime code scheduling for heterogeneous architectures. In HiPEAC 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. P. K., V. K. K., A. S. H. B., S. Balasubramanian, and P. Baruah. Cost efficient pagerank computation using gpu. 2011.Google ScholarGoogle Scholar
  61. S. Kato, K. Lakshmanan, R. Rajkumar, and Y. Ishikawa. Timegraph: GPU scheduling for real-time multi-tasking environments. In Proceedings of the 2011 USENIX conference on USENIX annual technical conference, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. K. Keeton, D. A. Patterson, and J. M. Hellerstein. A case for intelligent disks (IDISKs). SIGMOD Rec., 27(3):42--52, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Khronos Group. The OpenCL Specification, Version 1.2, 2012.Google ScholarGoogle Scholar
  64. A. Kloeckner. pycuda. https://pypi.python.org/pypi/pycuda, 2012.Google ScholarGoogle Scholar
  65. E. Kohler, R. Morris, B. Chen, J. Jannotti, and M. F. Kaashoek. The click modular router. ACM Trans. Comput. Syst., 18, August 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. M. D. Linderman, J. D. Collins, H. Wang, and T. H. Meng. Merge: a programming model for heterogeneous multi-core systems. SIGPLAN Not., 43(3):287--296, Mar. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. M. Linetsky. Programming Microsoft Direct-show. Wordware Publishing Inc., Plano, TX, USA, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. O. Loques, J. Leite, and E. V. Carrera E. P-rio: A modular parallel-programming environment. IEEE Concurrency, 6:47--57, January 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. C.-K. Luk, S. Hong, and H. Kim. Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 42, pages 45--55, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: a system for large-scale graph processing. In SIGMOD. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. M. D. McCool and B. D'Amora. Programming using RapidMind on the Cell BE. In SC '06: Proceedings of the 2006 ACM/IEEE conference on Supercomputing, page 222, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. F. McSherry, D. G. Murray, R. Isaacs, and M. Isard. Differential dataflow. In CIDR, 2013.Google ScholarGoogle Scholar
  73. S. R. Mihaylov, Z. G. Ives, and S. Guha. Rex: recursive, delta-based data-centric computation. Proc. VLDB Endow., 5(11):1280--1291, July 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. D. G. Murray, F. McSherry, R. Isaacs, M. Isard, P. Barham, and M. Abadi. Naiad: a timely dataflow system. SOSP, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. D. G. Murray, M. Schwarzkopf, C. Smowton, S. Smith, A. Madhavapeddy, and S. Hand. Ciel: a universal execution engine for distributed dataflow computing. In NSDI, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. P. Newton and J. C. Browne. The code 2.0 graphical parallel programming language. In Proceedings of the 6th international conference on Supercomputing, ICS '92, pages 167--177, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. NVIDIA. The thrust library. https://developer.nvidia.com/thrust/.Google ScholarGoogle Scholar
  78. NVIDIA. CUDA Toolkit 4.0 CUBLAS Library, 2011.Google ScholarGoogle Scholar
  79. NVIDIA. NVIDIA CUDA 5.0 Programming Guide, 2013.Google ScholarGoogle Scholar
  80. A. Prasad, J. Anantpur, and R. Govindarajan. Automatic compilation of matlab programs for synergistic execution on heterogeneous processors. In Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation, PLDI '11, pages 152--163, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  81. P. C. Pratt-Szeliga, J. W. Fawcett, and R. D. Welch. Rootbeer: Seamlessly using gpus from java. In HPCC-ICESS, pages 375--380, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  82. J. Ragan-Kelley, C. Barnes, A. Adams, S. Paris, F. Durand, and S. Amarasinghe. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation, PLDI '13, pages 519--530, New York, NY, USA, 2013. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  83. V. T. Ravi, M. Becchi, W. Jiang, G. Agrawal, and S. Chakradhar. Scheduling concurrent applications on a cluster of cpu-gpu nodes. In Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012), CCGRID '12, pages 140--147, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  84. E. Riedel, C. Faloutsos, G. A. Gibson, and D. Nagle. Active disks for large-scale data processing. Computer, 34(6):68--74, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  85. C. Rossbach, J. Currey, M. Silberstein, B. Ray, and E. Witchel. Ptask: Operating system abstractions to manage gpus as compute devices. In SOSP, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  86. A. Rungsawang and B. Manaskasemsak. Fast pagerank computation on a gpu cluster. In Proceedings of the 2012 20th Euromicro International Conference on Parallel, Distributed and Network-based Processing, PDP '12, pages 450--456, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  87. S. Ryoo, C. I. Rodrigues, S. S. Baghsorkhi, S. S. Stone, D. B. Kirk, and W.-m. Hwu. Optimization principles and application performance evaluation of a multithreaded GPU using CUDA. In PPoPP 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  88. M. Segal and K. Akeley. The opengl graphics system: A specification version 4.3. Technical report, OpenGL.org, 2012.Google ScholarGoogle Scholar
  89. M. Silberstein, B. Ford, I. Keidar, and E. Witchel. Gpufs: integrating file systems with gpus. In Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '13. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  90. K. Spafford, J. S. Meredith, and J. S. Vetter. Maestro: Data orchestration and tuning for opencl devices. In P. D'Ambra, M. R. Guarracino, and D. Talia, editors, Euro-Par (2), volume 6272 of Lecture Notes in Computer Science, pages 275--286. Springer, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  91. E. Sun, D. Schaa, R. Bagley, N. Rubin, and D. Kaeli. Enabling task-level scheduling on heterogeneous platforms. In Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units, GPGPU-5, pages 84--93, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  92. G. Teodoro, T. Pan, T. Kurc, J. Kong, L. Cooper, N. Podhorszki, S. Klasky, and J. Saltz. High-throughput analysis of large microscopy image datasets on cpu-gpu cluster platforms. 2013.Google ScholarGoogle Scholar
  93. W. Thies, M. Karczmarek, and S. P. Amarasinghe. StreamIt: A Language for Streaming Applications. In CC 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  94. S.-Z. Ueng, M. Lathara, S. S. Baghsorkhi, and W.-M. W. Hwu. CUDA-Lite: Reducing GPU Programming Complexity. In LCPC 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  95. U. Verner, A. Schuster, and M. Silberstein. Processing data streams with hard real-time constraints on heterogeneous systems. In Proceedings of the international conference on Supercomputing, ICS '11, pages 120--129, New York, NY, USA, 2011. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  96. Y. Weinsberg, D. Dolev, T. Anker, M. Ben-Yehuda, and P. Wyckoff. Tapping into the fountain of CPUs: on operating system support for programmable devices. In ASPLOS 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  97. P. Wittek and S. Darányi. Accelerating text mining workloads in a mapreduce-based distributed gpu environment. J. Parallel Distrib. Comput., 73(2):198--206, Feb. 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  98. H. Wu, G. Diamos, S. Cadambi, and S. Yalamanchili. Kernel weaver: Automatically fusing database primitives for efficient gpu computation. In Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-45 '12, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  99. Y. Yan, M. Grossman, and V. Sarkar. JCUDA: A programmer-friendly interface for accelerating java programs with CUDA. In Euro-Par, pages 887--899, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  100. Y. Yu, P. K. Gunda, and M. Isard. Distributed aggregation for data-parallel computing: interfaces and implementations. In SOSP, pages 247--260, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  101. Y. Yu, M. Isard, D. Fetterly, M. Budiu, Ú. Erlingsson, P. K. Gunda, and J. Currey. DryadLINQ: A system for general-purpose distributed data-parallel computing using a high-level language. In Proceedings of the 8th Symposium on Operating Systems Design and Implementation (OSDI), pages 1--14, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  102. Y. Yu, M. Isard, D. Fetterly, M. Budiu, U. Erlingsson, P. K. Gunda, J. Currey, F. McSherry, and K. Achan. Some sample programs written in DryadLINQ. Technical Report MSR-TR-2008-74, Microsoft Research, May 2008.Google ScholarGoogle Scholar
  103. M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin, S. Shenker, and I. Stoica. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In NSDI, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  104. H. Zaragoza, N. Craswell, M. Taylor, S. Saria, and S. Robertson. Microsoft Cambridge at TREC-13: Web and HARD tracks. In Proc. of the 13th Text Retrieval Conference, 2004.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    SOSP '13: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
    November 2013
    498 pages
    ISBN:9781450323888
    DOI:10.1145/2517349

    Copyright © 2013 Owner/Author

    Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 3 November 2013

    Check for updates

    Qualifiers

    • research-article

    Acceptance Rates

    Overall Acceptance Rate131of716submissions,18%

    Upcoming Conference

    SOSP '24

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader