skip to main content
research-article
Free Access

Can traditional programming bridge the ninja performance gap for parallel computing applications?

Published:23 April 2015Publication History
First page image

References

  1. Arora, N., Shringarpure, A., Vuduc, R.W. Direct N-body Kernels for multicore platforms. In ICPP (2009), 379--387. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Asanovic, K., Bodik, R., Catanzaro, B., Gebis, J., Husbands, P., Keutzer, K., Patterson, D.A., Plishker, W.L., Shalf, J., et al. The Landscape of Parallel Computing Research: A View from Berkeley. Technical Report UCB/EECS-183, 2006.Google ScholarGoogle Scholar
  3. Bienia, C., Kumar, S., Singh, J.P., Li, K. The PARSEC benchmark suite: Characterization and architectural implications. In PACT (2008), 72--81. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Brace, A., Gatarek, D., Musiela, M. The market model of interest rate dynamics. Mathematical Finance 7, 2 (1997),127--155.Google ScholarGoogle ScholarCross RefCross Ref
  5. Chen, Y.K., Chhugani, J., et al. Convergence of recognition, mining and synthesis workloads and its implications. IEEE 96, 5 (2008),790--807.Google ScholarGoogle Scholar
  6. Chhugani, J., Nguyen, A.D., et al. Efficient implementation of sorting on multi-core simd cpu architecture. PVLDB 1, 2 (2008), 1313--1324. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Dally, W.J. The end of denial architecture and the rise of throughput computing. In Keynote Speech at Desgin Automation Conference (2010).Google ScholarGoogle Scholar
  8. Datta, K. Auto-tuning Stencil Codes for Cache-based Multicore Platforms. PhD thesis, EECS Department, University of California, Berkeley (Dec 2009). Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Fowler, M. Domain Specific Languages, 1st edn. Addison-Wesley Professional, Boston, MA 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Giles, M.B. Monte Carlo Evaluation of Sensitivities in Computational Finance. Technical report. Oxford University Computing Laboratory, 2007.Google ScholarGoogle Scholar
  11. Intel. A quick, easy and reliable way to improve threaded performance, 2010. software.intel.com/articles/intel-cilk-plus.Google ScholarGoogle Scholar
  12. Ismail, L., Guerchi, D. Performance evaluation of convolution on the cell broadband engine processor. IEEE PDS 22, 2 (2011), 337--351. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Kachelrieb, M., Knaup, M., Bockenbach, O. Hyperfast perspective cone-beam backprojection. IEEE Nuclear Science 3, (2006), 1679--1683.Google ScholarGoogle Scholar
  14. Kim, C., Chhugani, J., Satish, N., et al. FAST: fast architecture sensitive tree search on modern CPUs and GPUs. In SIGMOD (2010). 339--350. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Lee, V.W., Kim, C., Chhugani, J., Deisher, M., Kim, D., Nguyen, A.D., Satish, N., et al. Debunking the 100X GPU vs. CPU myth: An evaluation of throughput computing on CPU and GPU. In ISCA (2010). 451--460. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. T. N. Mudge. Power: A first-class architectural design constraint. IEEE Computer 34, 4 (2001), 52--58. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Nguyen, A., Satish, N., et al. 3.5-D blocking optimization for stencil computations on modern CPUs and GPUs. In SC10 (2010). 1--13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Nuzman, D., Henderson, R. Multi-platform auto-vectorization. In CGO (2006). 281--294. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Nvidia. CUDA C Best Practices Guide 3, 2 (2010).Google ScholarGoogle Scholar
  20. Podlozhnyuk, V. Black--Scholes option pricing. Nvidia, 2007. http://developer.download.nvidia.com/compute/cuda/1.1-Beta/x86_website/projects/BlackScholes/doc/BlackScholes.pdf.Google ScholarGoogle Scholar
  21. Ryoo, S., Rodrigues, C.I., Baghsorkhi, S.S., Stone, S.S., Kirk, D.B., Hwu, W.M.W. Optimization principles and application performance evaluation of a multithreaded GPU using CUDA. In PPoPP (2008). 73--82. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Satish, N., Kim, C., Chhugani, J., et al. Fast sort on CPUs and GPUs: A case for bandwidth oblivious SIMD sort. In SIGMOD (2010). 351--362. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Satish, N., Kim, C., Chhugani, J., Saito, H., Krishnaiyer, R., Smelyanskiy, M., et al. Can traditional programming bridge the Ninja performance gap for parallel computing applications? In ISCA (2012). 440--451. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Smelyanskiy, M., Holmes, D., et al. Mapping high-fidelity volume rendering to CPU, GPU and many-core. IEEE TVCG, 15, 6(2009), 1563--1570. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Sukop, M.C., Thorne, D.T., Jr. Lattice Boltzmann Modeling: An Introduction for Geoscientists and Engineers, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Tian, X., Saito, H., Girkar, M., Preis, S., Kozhukhov, S., Cherkasov, A.G., Nelson, C., Panchenko, N., Geva, R., Compiling C/C++ SIMD extensions for function and loop vectorizaion on multicore-SIMD processors. In IPDPS Workshops (Springer, NY, 2012). 2349--2358. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Can traditional programming bridge the ninja performance gap for parallel computing applications?

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image Communications of the ACM
        Communications of the ACM  Volume 58, Issue 5
        May 2015
        80 pages
        ISSN:0001-0782
        EISSN:1557-7317
        DOI:10.1145/2766485
        • Editor:
        • Moshe Y. Vardi
        Issue’s Table of Contents

        Copyright © 2015 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 23 April 2015

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDFChinese translation

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format