research-article

Open Access

Dandelion: a compiler and runtime for heterogeneous systems

Authors:
Christopher J. Rossbach

Microsoft Research Silicon Valley

Microsoft Research Silicon Valley
View Profile

,
Yuan Yu

Microsoft Research Silicon Valley

Microsoft Research Silicon Valley
View Profile

,
Jon Currey

Microsoft Research Silicon Valley

Microsoft Research Silicon Valley
View Profile

,
Jean-Philippe Martin

Microsoft Research Silicon Valley

Microsoft Research Silicon Valley
View Profile

,
Dennis Fetterly

Microsoft Research Silicon Valley

Microsoft Research Silicon Valley
View Profile

SOSP '13: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems PrinciplesNovember 2013Pages 49–68https://doi.org/10.1145/2517349.2522715

Published:03 November 2013Publication History

SOSP '13: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles

Pages 49–68

ABSTRACT

Computer systems increasingly rely on heterogeneity to achieve greater performance, scalability and energy efficiency. Because heterogeneous systems typically comprise multiple execution contexts with different programming abstractions and runtimes, programming them remains extremely challenging.

Dandelion is a system designed to address this programmability challenge for data-parallel applications. Dandelion provides a unified programming model for heterogeneous systems that span diverse execution contexts including CPUs, GPUs, FPGAs, and the cloud. It adopts the .NET LINQ (Language INtegrated Query) approach, integrating data-parallel operators into general purpose programming languages such as C# and F#. It therefore provides an expressive data model and native language integration for user-defined functions, enabling programmers to write applications using standard high-level languages and development tools.

Dandelion automatically and transparently distributes data-parallel portions of a program to available computing resources, including compute clusters for distributed execution and CPU and GPU cores of individual nodes for parallel execution. To enable automatic execution of .NET code on GPUs, Dandelion cross-compiles .NET code to CUDA kernels and uses the PTask runtime [85] to manage GPU execution. This paper discusses the design and implementation of Dandelion, focusing on the distributed CPU and GPU implementation. We evaluate the system using a diverse set of workloads.

Supplemental Material

d1-04-christopher-rossbach.mp4

mp4

1.2 GB

Download

References

Apache YARN. http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html.Google Scholar
The CCI project. http://cciast.codeplex.com/.Google Scholar
The LINQ project. http://msdn.microsoft.com/en-us/library/vstudio/bb397926.aspx.Google Scholar
The PLINQ project. http://msdn.microsoft.com/en-us/library/dd460688.aspx.Google Scholar
Sort benchmark home page. http://sortbenchmark.org/.Google Scholar
IBM 709 electronic data-processing system: advance description. I.B.M., White Plains, NY, 1957.Google Scholar
Matlab plug-in for CUDA. https://developer.nvidia.com/matlab-cuda, 2007.Google Scholar
JCuda: Java bindings for CUDA. http://www.jcuda.org/jcuda/JCuda.html, 2012.Google Scholar
J. S. Auerbach, D. F. Bacon, P. Cheng, and R. M. Rabbah. Lime: a java-compatible and synthesizable language for heterogeneous architectures. In OOPSLA, 2010. Google ScholarDigital Library
C. Augonnet, J. Clet-Ortega, S. Thibault, and R. Namyst. Data-Aware Task Scheduling on Multi-Accelerator based Platforms. In 16th International Conference on Parallel and Distributed Systems, Shangai, Chine, Dec. 2010. Google ScholarDigital Library
C. Augonnet and R. Namyst. StarPU: A Unified Runtime System for Heterogeneous Multi-core Architectures.Google Scholar
C. Augonnet, S. Thibault, R. Namyst, and M. Nijhuis. Exploiting the Cell/BE Architecture with the StarPU Unified Runtime System. In SAMOS '09, pages 329--339, 2009. Google ScholarDigital Library
E. Ayguadé, R. M. Badia, F. D. Igual, J. Labarta, R. Mayo, and E. S. Quintana-Ortí. An extension of the starss programming model for platforms with multiple gpus. In Proceedings of the 15th International Euro-Par Conference on Parallel Processing, Euro-Par '09, pages 851--862, Berlin, Heidelberg, 2009. Springer-Verlag. Google ScholarDigital Library
R. M. Badia, J. Labarta, R. Sirvent, J. M. Prez, J. M. Cela, and R. Grima. Programming Grid Applications with GRID Superscalar. Journal of Grid Computing, 1:2003, 2003.Google ScholarCross Ref
C. Banino, O. Beaumont, L. Carter, J. Ferrante, A. Legrand, and Y. Robert. Scheduling strategies for master-slave tasking on heterogeneous processor platforms. 2004.Google Scholar
M. Bauer, S. Treichler, E. Slaughter, and A. Aiken. Legion: expressing locality and independence with logical regions. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC '12, pages 66:1--66:11, Los Alamitos, CA, USA, 2012. IEEE Computer Society Press. Google ScholarDigital Library
A. Bayoumi, M. Chu, Y. Hanafy, P. Harrell, and G. Refai-Ahmed. Scientific and Engineering Computing Using ATI Stream Technology. Computing in Science and Engineering, 11(6):92--97, 2009. Google ScholarDigital Library
P. Bellens, J. M. Perez, R. M. Badia, and J. Labarta. CellSs: a programming model for the cell BE architecture. In SC 2006. Google ScholarDigital Library
B. Billerbeck, N. Craswell, D. Fetterly, and M. Najork. Microsoft Research at TREC 2011 Web Track. In Proc. of the 20th Text Retrieval Conference, 2011.Google Scholar
H. Bos, W. de Bruijn, M. Cristea, T. Nguyen, and G. Portokalidis. Ffpf: Fairly fast packet filters. In Proceedings of OSDI'04, 2004. Google ScholarDigital Library
Y. Bu, B. Howe, M. Balazinska, and M. D. Ernst. Haloop: efficient iterative data processing on large clusters. Proc. VLDB Endow., 3(1--2):285--296, Sept. 2010. Google ScholarDigital Library
I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian, M. Houston, and P. Hanrahan. Brook for GPUs: Stream Computing on Graphics Hardware. ACM TRANSACTIONS ON GRAPHICS, 2004. Google ScholarDigital Library
J. Bueno, L. Martinell, A. Duran, M. Farreras, X. Martorell, R. M. Badia, E. Ayguade, and J. Labarta. Productive cluster programming with ompss. In Proceedings of the 17th international conference on Parallel processing - Volume Part I, Euro-Par'11, pages 555--566, Berlin, Heidelberg, 2011. Springer-Verlag. Google ScholarDigital Library
P. Calvert. Part II dissertation, computer science tripos, university of cambridge, June 2010.Google Scholar
B. Catanzaro, M. Garland, and K. Keutzer. Copperhead: compiling an embedded data parallel language. In Proceedings of the 16th ACM symposium on Principles and practice of parallel programming, PPoPP '11, pages 47--56, 2011. Google ScholarDigital Library
B. Catanzaro, N. Sundaram, and K. Keutzer. A map reduce framework for programming graphics processors. In In Workshop on Software Tools for MultiCore Systems, 2008.Google Scholar
C. Chambers, A. Raniwala, F. Perry, S. Adams, R. Henry, R. Bradshaw, and N. Weizenbaum. FlumeJava: easy, efficient data-parallel pipelines. In PLDI'10. Google ScholarDigital Library
S. C. Chiu, W.-k. Liao, A. N. Choudhary, and M. T. Kandemir. Processor-embedded distributed smart disks for I/O-intensive workloads: architectures, performance models and evaluation. J. Parallel Distrib. Comput., 65(4):532--551, 2005. Google ScholarDigital Library
E. Chung, J. Davis, and J. Lee. Linqits: Big data on little clients. In Proceedings of the 40th International Symposium on Computer Architecture (ISCA), 2013. Google ScholarDigital Library
C. H. Crawford, P. Henning, M. Kistler, and C. Wright. Accelerating computing with the cell broadband engine processor. In CF 2008, 2008. Google ScholarDigital Library
M.-L. Cristea, W. de Bruijn, and H. Bos. Fpl-3: towards language support for distributed packet processing. In Proceedings of IFIP Networking 2005, 2005. Google ScholarDigital Library
M. L. Cristea, C. Zissulescu, E. Deprettere, and H. Bos. Fpl-3e: towards language support for reconfigurable packet processing. In Proceedings of the 5th international conference on Embedded Computer Systems: architectures, Modeling, and Simulation, SAMOS'05, pages 82--92, Berlin, Heidelberg, 2005. Springer-Verlag. Google ScholarDigital Library
J. Currey, S. Baker, and C. J. Rossbach. Supporting iteration in a heterogeneous dataflow engine. In SFMA, 2013.Google Scholar
A. Currid. TCP offload to the rescue. Queue, 2(3):58--65, 2004. Google ScholarDigital Library
A. L. Davis and R. M. Keller. Data flow program graphs. IEEE Computer, 15(2):26--41, 1982. Google ScholarDigital Library
W. de Bruijn and H. Bos. Pipesfs: Fast linux i/o in the unix tradition. ACM SigOps Operating Systems Review, 42(5), July 2008. Special Issue on R&D in the Linux Kernel. Google ScholarDigital Library
W. de Bruijn, H. Bos, and H. Bal. Application-tailored i/o with streamline. ACM Trans. Comput. Syst., 29:6:1--6:33, May 2011. Google ScholarDigital Library
Z. DeVito, N. Joubert, F. Palacios, S. Oakley, M. Medina, M. Barrientos, E. Elsen, F. Ham, A. Aiken, K. Duraisamy, E. Darve, J. Alonso, and P. Hanrahan. Liszt: a domain specific language for building portable mesh-based pde solvers. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC '11, pages 9:1--9:12, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
N. T. Duong, Q. A. P. Nguyen, A. T. Nguyen, and H.-D. Nguyen. Parallel pagerank computation using gpus. In Proceedings of the Third Symposium on Information and Communication Technology, SoICT '12, pages 223--230, 2012. Google ScholarDigital Library
J. Ekanayake, H. Li, B. Zhang, T. Gunarathne, S.-H. Bae, J. Qiu, and G. Fox. Twister: a runtime for iterative mapreduce. In HPDC '10. ACM, 2010. Google ScholarDigital Library
S. Ewen, K. Tzoumas, M. Kaufmann, and V. Markl. Spinning fast iterative data flows. VLDB, 2012. Google ScholarDigital Library
C. Fetzer and K. Hgstedt. Self/star: A dataflow oriented component framework for pervasive dependability. In 8th IEEE International Workshop on Object-Oriented Real-Time Dependable Systems (WORDS 2003), 15--17 January 2003, Guadalajara, Mexico, pages 66--73. IEEE Computer Society, 2003.Google ScholarCross Ref
N. K. Govindaraju, B. Lloyd, W. Wang, M. Lin, and D. Manocha. Fast computation of database operations using graphics processors. In ACM SIGGRAPH 2005 Courses, SIGGRAPH '05, 2005. Google ScholarDigital Library
J. Gray, A. Szalay, A. Thakar, P. Kunszt, C. Stoughton, D. Slutz, and J. Vandenberg. Data mining the SDSS SkyServer database. In Distributed Data and Structures 4: Records of the 4th International Meeting, pages 189--210, Paris, France, March 2002. Carleton Scientific. also as MSR-TR-2002-01.Google Scholar
K. Gregory and A. Miller. C++ Amp: Accelerated Massive Parallelism With Microsoft Visual C++. Microsoft Press Series. Microsoft GmbH, 2012.Google Scholar
D. Grewe and M. OBoyle. A static task partitioning approach for heterogeneous systems using opencl. Compiler Construction, 6601:286--305, 2011. Google ScholarDigital Library
V. Gupta, K. Schwan, N. Tolia, V. Talwar, and P. Ranganathan. Pegasus: coordinated scheduling for virtualized accelerator-based systems. In Proceedings of the 2011 USENIX conference on USENIX annual technical conference, USENIX-ATC'11, pages 3--3, Berkeley, CA, USA, 2011. USENIX Association. Google ScholarDigital Library
T. D. Han and T. S. Abdelrahman. hiCUDA: a high-level directive-based language for GPU programming. In GPGPU 2009. Google ScholarDigital Library
B. He, W. Fang, Q. Luo, N. K. Govindaraju, and T. Wang. Mars: a mapreduce framework on graphics processors. In Proceedings of the 17th international conference on Parallel architectures and compilation techniques, PACT '08, pages 260--269, 2008. Google ScholarDigital Library
B. He, M. Lu, K. Yang, R. Fang, N. K. Govindaraju, Q. Luo, and P. V. Sander. Relational query coprocessing on graphics processors. ACM Trans. Database Syst., 34(4):21:1--21:39, Dec. 2009. Google ScholarDigital Library
B. He, K. Yang, R. Fang, M. Lu, N. Govindaraju, Q. Luo, and P. Sander. Relational joins on graphics processors. SIGMOD '08, 2008. Google ScholarDigital Library
The HIVE project. http://hadoop.apache.org/hive/.Google Scholar
A. Hormati, Y. Choi, M. Kudlur, R. M. Rabbah, T. Mudge, and S. A. Mahlke. Flextream: Adaptive compilation of streaming applications for heterogeneous architectures. In PACT, pages 214--223, 2009. Google ScholarDigital Library
T. Hruby, K. van Reeuwijk, and H. Bos. Ruler: high-speed packet matching and rewriting on npus. In ANCS '07: Proceedings of the 3rd ACM/IEEE Symposium on Architecture for networking and communications systems, pages 1--10, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
S. S. Huang, A. Hormati, D. F. Bacon, and R. M. Rabbah. Liquid metal: Object-oriented programming across the hardware/software boundary. In ECOOP, pages 76--103, 2008. Google ScholarDigital Library
Intel. Math kernel library. http://developer.intel.com/software/products/mkl/.Google Scholar
M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: distributed data-parallel programs from sequential building blocks. In EuroSys 2007. Google ScholarDigital Library
W. Jiang and G. Agrawal. Mate-cg: A map reduce-like framework for accelerating data-intensive computations on heterogeneous clusters. Parallel and Distributed Processing Symposium, International, 0:644--655, 2012. Google ScholarDigital Library
V. J. Jiménez, L. Vilanova, I. Gelado, M. Gil, G. Fursin, and N. Navarro. Predictive runtime code scheduling for heterogeneous architectures. In HiPEAC 2009. Google ScholarDigital Library
P. K., V. K. K., A. S. H. B., S. Balasubramanian, and P. Baruah. Cost efficient pagerank computation using gpu. 2011.Google Scholar
S. Kato, K. Lakshmanan, R. Rajkumar, and Y. Ishikawa. Timegraph: GPU scheduling for real-time multi-tasking environments. In Proceedings of the 2011 USENIX conference on USENIX annual technical conference, 2011. Google ScholarDigital Library
K. Keeton, D. A. Patterson, and J. M. Hellerstein. A case for intelligent disks (IDISKs). SIGMOD Rec., 27(3):42--52, 1998. Google ScholarDigital Library
Khronos Group. The OpenCL Specification, Version 1.2, 2012.Google Scholar
A. Kloeckner. pycuda. https://pypi.python.org/pypi/pycuda, 2012.Google Scholar
E. Kohler, R. Morris, B. Chen, J. Jannotti, and M. F. Kaashoek. The click modular router. ACM Trans. Comput. Syst., 18, August 2000. Google ScholarDigital Library
M. D. Linderman, J. D. Collins, H. Wang, and T. H. Meng. Merge: a programming model for heterogeneous multi-core systems. SIGPLAN Not., 43(3):287--296, Mar. 2008. Google ScholarDigital Library
M. Linetsky. Programming Microsoft Direct-show. Wordware Publishing Inc., Plano, TX, USA, 2001. Google ScholarDigital Library
O. Loques, J. Leite, and E. V. Carrera E. P-rio: A modular parallel-programming environment. IEEE Concurrency, 6:47--57, January 1998. Google ScholarDigital Library
C.-K. Luk, S. Hong, and H. Kim. Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 42, pages 45--55, 2009. Google ScholarDigital Library
G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: a system for large-scale graph processing. In SIGMOD. ACM, 2010. Google ScholarDigital Library
M. D. McCool and B. D'Amora. Programming using RapidMind on the Cell BE. In SC '06: Proceedings of the 2006 ACM/IEEE conference on Supercomputing, page 222, 2006. Google ScholarDigital Library
F. McSherry, D. G. Murray, R. Isaacs, and M. Isard. Differential dataflow. In CIDR, 2013.Google Scholar
S. R. Mihaylov, Z. G. Ives, and S. Guha. Rex: recursive, delta-based data-centric computation. Proc. VLDB Endow., 5(11):1280--1291, July 2012. Google ScholarDigital Library
D. G. Murray, F. McSherry, R. Isaacs, M. Isard, P. Barham, and M. Abadi. Naiad: a timely dataflow system. SOSP, 2013. Google ScholarDigital Library
D. G. Murray, M. Schwarzkopf, C. Smowton, S. Smith, A. Madhavapeddy, and S. Hand. Ciel: a universal execution engine for distributed dataflow computing. In NSDI, 2011. Google ScholarDigital Library
P. Newton and J. C. Browne. The code 2.0 graphical parallel programming language. In Proceedings of the 6th international conference on Supercomputing, ICS '92, pages 167--177, 1992. Google ScholarDigital Library
NVIDIA. The thrust library. https://developer.nvidia.com/thrust/.Google Scholar
NVIDIA. CUDA Toolkit 4.0 CUBLAS Library, 2011.Google Scholar
NVIDIA. NVIDIA CUDA 5.0 Programming Guide, 2013.Google Scholar
A. Prasad, J. Anantpur, and R. Govindarajan. Automatic compilation of matlab programs for synergistic execution on heterogeneous processors. In Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation, PLDI '11, pages 152--163, 2011. Google ScholarDigital Library
P. C. Pratt-Szeliga, J. W. Fawcett, and R. D. Welch. Rootbeer: Seamlessly using gpus from java. In HPCC-ICESS, pages 375--380, 2012. Google ScholarDigital Library
J. Ragan-Kelley, C. Barnes, A. Adams, S. Paris, F. Durand, and S. Amarasinghe. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation, PLDI '13, pages 519--530, New York, NY, USA, 2013. ACM. Google ScholarDigital Library
V. T. Ravi, M. Becchi, W. Jiang, G. Agrawal, and S. Chakradhar. Scheduling concurrent applications on a cluster of cpu-gpu nodes. In Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012), CCGRID '12, pages 140--147, 2012. Google ScholarDigital Library
E. Riedel, C. Faloutsos, G. A. Gibson, and D. Nagle. Active disks for large-scale data processing. Computer, 34(6):68--74, 2001. Google ScholarDigital Library
C. Rossbach, J. Currey, M. Silberstein, B. Ray, and E. Witchel. Ptask: Operating system abstractions to manage gpus as compute devices. In SOSP, 2011. Google ScholarDigital Library
A. Rungsawang and B. Manaskasemsak. Fast pagerank computation on a gpu cluster. In Proceedings of the 2012 20th Euromicro International Conference on Parallel, Distributed and Network-based Processing, PDP '12, pages 450--456, 2012. Google ScholarDigital Library
S. Ryoo, C. I. Rodrigues, S. S. Baghsorkhi, S. S. Stone, D. B. Kirk, and W.-m. Hwu. Optimization principles and application performance evaluation of a multithreaded GPU using CUDA. In PPoPP 2008. Google ScholarDigital Library
M. Segal and K. Akeley. The opengl graphics system: A specification version 4.3. Technical report, OpenGL.org, 2012.Google Scholar
M. Silberstein, B. Ford, I. Keidar, and E. Witchel. Gpufs: integrating file systems with gpus. In Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '13. ACM, 2013. Google ScholarDigital Library
K. Spafford, J. S. Meredith, and J. S. Vetter. Maestro: Data orchestration and tuning for opencl devices. In P. D'Ambra, M. R. Guarracino, and D. Talia, editors, Euro-Par (2), volume 6272 of Lecture Notes in Computer Science, pages 275--286. Springer, 2010. Google ScholarDigital Library
E. Sun, D. Schaa, R. Bagley, N. Rubin, and D. Kaeli. Enabling task-level scheduling on heterogeneous platforms. In Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units, GPGPU-5, pages 84--93, 2012. Google ScholarDigital Library
G. Teodoro, T. Pan, T. Kurc, J. Kong, L. Cooper, N. Podhorszki, S. Klasky, and J. Saltz. High-throughput analysis of large microscopy image datasets on cpu-gpu cluster platforms. 2013.Google Scholar
W. Thies, M. Karczmarek, and S. P. Amarasinghe. StreamIt: A Language for Streaming Applications. In CC 2002. Google ScholarDigital Library
S.-Z. Ueng, M. Lathara, S. S. Baghsorkhi, and W.-M. W. Hwu. CUDA-Lite: Reducing GPU Programming Complexity. In LCPC 2008. Google ScholarDigital Library
U. Verner, A. Schuster, and M. Silberstein. Processing data streams with hard real-time constraints on heterogeneous systems. In Proceedings of the international conference on Supercomputing, ICS '11, pages 120--129, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
Y. Weinsberg, D. Dolev, T. Anker, M. Ben-Yehuda, and P. Wyckoff. Tapping into the fountain of CPUs: on operating system support for programmable devices. In ASPLOS 2008. Google ScholarDigital Library
P. Wittek and S. Darányi. Accelerating text mining workloads in a mapreduce-based distributed gpu environment. J. Parallel Distrib. Comput., 73(2):198--206, Feb. 2013. Google ScholarDigital Library
H. Wu, G. Diamos, S. Cadambi, and S. Yalamanchili. Kernel weaver: Automatically fusing database primitives for efficient gpu computation. In Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-45 '12, 2012. Google ScholarDigital Library
Y. Yan, M. Grossman, and V. Sarkar. JCUDA: A programmer-friendly interface for accelerating java programs with CUDA. In Euro-Par, pages 887--899, 2009. Google ScholarDigital Library
Y. Yu, P. K. Gunda, and M. Isard. Distributed aggregation for data-parallel computing: interfaces and implementations. In SOSP, pages 247--260, 2009. Google ScholarDigital Library
Y. Yu, M. Isard, D. Fetterly, M. Budiu, Ú. Erlingsson, P. K. Gunda, and J. Currey. DryadLINQ: A system for general-purpose distributed data-parallel computing using a high-level language. In Proceedings of the 8th Symposium on Operating Systems Design and Implementation (OSDI), pages 1--14, 2008. Google ScholarDigital Library
Y. Yu, M. Isard, D. Fetterly, M. Budiu, U. Erlingsson, P. K. Gunda, J. Currey, F. McSherry, and K. Achan. Some sample programs written in DryadLINQ. Technical Report MSR-TR-2008-74, Microsoft Research, May 2008.Google Scholar
M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin, S. Shenker, and I. Stoica. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In NSDI, 2012. Google ScholarDigital Library
H. Zaragoza, N. Craswell, M. Taylor, S. Saria, and S. Robertson. Microsoft Cambridge at TREC-13: Web and HARD tracks. In Proc. of the 13th Text Retrieval Conference, 2004.Google Scholar

Recommendations

Evaluation of Rodinia Codes on Intel Xeon Phi
ISMS '13: Proceedings of the 2013 4th International Conference on Intelligent Systems, Modelling and Simulation

High performance computing (HPC) is a niche area where various parallel benchmarks are constantly used to explore and evaluate the performance of Heterogeneous computing systems on the horizon. The Rodinia benchmark suite, a collection of parallel ...
Read More
On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing
SAAHPC '11: Proceedings of the 2011 Symposium on Application Accelerators in High-Performance Computing

The graphics processing unit (GPU) has made significant strides as an accelerator in parallel computing. However, because the GPU has resided out on PCIe as a discrete device, the performance of GPU applications can be bottlenecked by data transfers ...
Read More
Dandelion: A Unified Code Offloading System for Wearable Computing

Execution speed seriously bothers application developers and users for wearable devices such as Google Glass. Intensive applications like 3D games suffer from significant delays when CPU is busy. Energy is another concern when the devices are in low ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SOSP '13: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
November 2013
498 pages
ISBN:9781450323888
DOI:10.1145/2517349
General Chair:
Michael Kaminsky
Intel Labs
,
Program Chair:
Mike Dahlin
Google and UT Austin
Copyright © 2013 Owner/Author
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 3 November 2013
Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate131of716submissions,18%
Upcoming Conference
SOSP '24

Sponsor:

sigops

ACM SIGOPS 29th Symposium on Operating Systems Principles

November 5 - 8, 2024

Austin , TX , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 92
  Total Citations
  View Citations
- 3,790
  Total Downloads
- Downloads (Last 12 months)315
- Downloads (Last 6 weeks)47
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Dandelion: a compiler and runtime for heterogeneous systems

SOSP '13: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles

ABSTRACT

Supplemental Material

References

Cited By

Recommendations

Evaluation of Rodinia Codes on Intel Xeon Phi

On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing

Dandelion: A Unified Code Offloading System for Wearable Computing

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Dandelion: a compiler and runtime for heterogeneous systems

SOSP '13: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles

ABSTRACT

Supplemental Material

References

Cited By

Recommendations

Evaluation of Rodinia Codes on Intel Xeon Phi

On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing

Dandelion: A Unified Code Offloading System for Wearable Computing

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media