skip to main content
research-article

Evolution of thread-level parallelism in desktop applications

Published:19 June 2010Publication History
Skip Abstract Section

Abstract

As the effective limits of frequency and instruction level parallelism have been reached, the strategy of microprocessor vendors has changed to increase the number of processing cores on a single chip each generation. The implicit expectation is that software developers will write their applications with concurrency in mind to take advantage of this sudden change in direction. In this study we analyze whether software developers for laptop/desktop machines have followed the recent hardware trends by creating software for chip multi-processing. We conduct a study of a wide range of applications on Microsoft Windows 7 and Apple's OS X Snow Leopard, measuring Thread Level Parallelism on a high performance workstation and a low power desktop. In addition, we explore graphics processing units (GPUs) and their impact on chip multi-processing. We compare our findings to a study done 10 years ago which concluded that a second core was sufficient to improve system responsiveness. Our results on today's machines show that, 10 years later, surprisingly 2-3 cores are more than adequate for most applications and that the GPU often remains under-utilized. However, in some application specific domains an 8 core SMT system with a 240 core GPU can be effectively utilized. Overall these studies suggest that many-core architectures are not a natural fit for current desktop/laptop applications.

References

  1. IEEE. Standard for Threads Interface to POSIX. P1003.1c, 1996.Google ScholarGoogle Scholar
  2. Intel Pentium Processor. http://datasheets.chipdb.org/Intel/x86/Pentium/24199710.PDF, 1997.Google ScholarGoogle Scholar
  3. AMD Athlon Processor Product Brief. http://www.amd.com/us-en/Processors/ProductInformation/0,,30_118_1260_759%5E1151,00.html, 1999.Google ScholarGoogle Scholar
  4. Intel Pentium III Processor. http://www.intel.com/design/intarch/pentiumiii/pentiumiii.htm, 1999.Google ScholarGoogle Scholar
  5. NVIDIA GeForce 256. http://www.nvidia.com/page/geforce256.html, 1999.Google ScholarGoogle Scholar
  6. Power4 system microarchitecture. http://www-03.ibm.com/systems/p/-hardware/whitepapers/power4.html, 2001.Google ScholarGoogle Scholar
  7. AMD Announces World's First 64-Bit, x86 Multi-Core Processors For Servers And Workstations At Second-Anniversary Celebration Of AMD Opteron Processor. AMD News Room, 2005.Google ScholarGoogle Scholar
  8. Intel Has Double Vision: First Multi-Core Silicon Production Begins. Intel Press Room, 2005.Google ScholarGoogle Scholar
  9. AMD "Close to Metal" Technology Unleashes the Power of Stream Computing. AMD News Room, 2006.Google ScholarGoogle Scholar
  10. DTrace User Guide. Sun Microsystems Inc., 2006.Google ScholarGoogle Scholar
  11. NVIDIA Unveils CUDA - the GPU Computing Revolution Begins. NVIDIA News Releases, 2006.Google ScholarGoogle Scholar
  12. Intel Atom Processor. http://www.intel.com/products/processor/atom/specifications.htm, 2008.Google ScholarGoogle Scholar
  13. NVIDIA PerfKit. Nvidia Developer Zone, 2008.Google ScholarGoogle Scholar
  14. The Direct3D11 Compute Shader. Microsoft WINHEC Session GRA-T517, 2008.Google ScholarGoogle Scholar
  15. AMD Displays Llano Die: 4 x86 Cores, 480 Stream Processors. http://www.xbitlabs.com/news/cpu/display/20091111143547 AMD Displays Llano Die 4 x86 Cores 480 Stream Processors.html, 2009.Google ScholarGoogle Scholar
  16. Grand Central Dispatch:A better way to do multicore. Apple Inc. Technical Breif, 2009.Google ScholarGoogle Scholar
  17. Intel Previews Intel Xeon 'Nehalem-EX' Processor. Intel Press Room, 2009.Google ScholarGoogle Scholar
  18. International Technology Roadmap For Semiconductors - System Drivers. Iternational Technology Roadmap for Semiconductors, 2009.Google ScholarGoogle Scholar
  19. Leopard Reference Library. Apple Inc. Developer Connection, 2009.Google ScholarGoogle Scholar
  20. NVIDIA GeForce GT 120 (OEM Product). http://www.nvidia.com/object/product geforce gt 120 us.html, 2009.Google ScholarGoogle Scholar
  21. OMAP 4: Mobile applications platform. Texas Instruments Product Bullentin, 2009.Google ScholarGoogle Scholar
  22. OpenCL:Parallel Computing for Hetergeneous Devices. http://www.khronos.org/developers/library/overview/opencl overview.pdf, 2009.Google ScholarGoogle Scholar
  23. AMD Sets the New Standard for Price, Performance, and Power for the Datacenter. AMD Newsroom, 2010.Google ScholarGoogle Scholar
  24. Intel Sandy Bridge. http://en.wikipedia.org/wiki/Intel Sandy Bridge%28microarchitecture%29, 2010.Google ScholarGoogle Scholar
  25. Intel Spotlights New Extreme Edition Processor, Software Developer Resources at Game Conference. Intel Press Room, 2010.Google ScholarGoogle Scholar
  26. Interactive TLP Bench. http://itlpbench.eecs.umich.edu, 2010.Google ScholarGoogle Scholar
  27. The Snapdragon Platform. http://www.qctconnect.com/products/snapdragon.html, 2010.Google ScholarGoogle Scholar
  28. Ultra-Thin Notebooks: Powered by ultra-low-voltage Intel Core processors. http://www.intel.com/in/irdonline/ultra low.htm, 2010.Google ScholarGoogle Scholar
  29. L. A. Barroso, K. Gharachorloo, R. McNamara, A. Nowatzyk, S. Qadeer, B. Sano, S. Smith, R. Stets, and B. Verghese. Piranha: a scalable architecture based on single-chip multiprocessing. In ISCA '00: Proceedings of the 27th annual international symposium on Computer architecture, pages 282--293, New York, NY, USA, 2000. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. A. Berillo. Multi-Core Processors in 3D Games. http://ixbtlabs.com/articles3/video/quadcorep6.html, 2008.Google ScholarGoogle Scholar
  31. B. Chen, Y. Endo, K. Chan, D. M. A. Dias, A. Dias, M. Seltzer, and M. D. Smith. The measured performance of personal computer operating systems. ACM Transactions on Computer Systems, 14:3-40, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Y. Endo, Z. Wang, J. B. Chen, and M. Seltzer. Using latency to evaluate interactive system performance. In OSDI '96: Proceedings of the second USENIX symposium on Operating systems design and implementation, pages 185--199, New York, NY, USA, 1996. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. K. Flautner, R. Uhlig, S. Reinhardt, and T. Mudge. Thread-level parallelism and interactive performance of desktop applications. SIGARCH Comput. Archit. News, 28(5):129--138, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. K. Flautner, R. Uhlig, S. Reinhardt, and T. Mudge. Thread-level parallelism of desktop applications. Workshop on Multi-threaded Execution, Architecture and Compilation, 2000.Google ScholarGoogle Scholar
  35. E. Frachtenberg. Process scheduling for the parallel desktop. In ISPAN '05: Proceedings of the 8th International Symposium on Parallel Architectures,Algorithms and Networks, pages 132--139, Washington, DC, USA, 2005. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. E. Frachtenberg and Y. Etsion. Hardware Parallelism: Are Operating Systems Ready? (Case Studies in Mis-Scheduling). Workshop on the Interaction between Operating System and Computer Architecture, 2006.Google ScholarGoogle Scholar
  37. N. Giacaman, O. Sinnen, N. Giacaman, and O. Sinnen. Inhibitors for desktop parallelisation, 2006.Google ScholarGoogle Scholar
  38. L. Hammond, B. A. Hubbert, M. Siu, M. K. Prabhu, M. Chen, and K. Olukotun. The stanford hydra cmp. IEEE Micro, 20(2):71--84, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. C. Hauser, C. Jacobi, M. Theimer, B. Welch, and M. Weiser. Using threads in interactive systems: a case study. SIGOPS Oper. Syst. Rev., 27(5):94--105, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. L. D. Hung and S. Sakai. Dynamic estimation of task level parallelism with operating system support. In ISPAN '05: Proceedings of the 8th International Symposium on Parallel Architectures,Algorithms and Networks, pages 358--363, Washington, DC, USA, 2005. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. R. Isaacs, P. Barham, J. Bulpin, R. Mortier, and D. Narayanan. Request extraction in magpie: events, schemas and temporal joins. In EW11: Proceedings of the 11th workshop on ACM SIGOPS European workshop, page 17, New York, NY, USA, 2004. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. C. G. Jones, R. Liu, L. Meyerovich, K. Asanovic, and R. Bodik. Parallelizing the Web Browser. First USENIX Workshop on Hot Topics in Parallelism, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. D. C. Lee, P. J. Crowley, J.-L. Baer, T. E. Anderson, and B. N. Bershad. Execution characteristics of desktop applications on windows nt. SIGARCH Comput. Archit. News, 26(3):27--38, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. M. K. McKusick, K. Bostic, M. J. Karels, and J. S. Quarterman. The design and implementation of the 4.4BSD operating system. Addison Wesley Longman Publishing Co., Inc., Redwood City, CA, USA, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. T. D. Nguyen, R. Vaswani, and J. Zahorjan. Parallel application characterization for multiprocessor scheduling policy design. In of Lectures Notes in Computer Science, pages 105--118. Springer-Verlag, 1996.Google ScholarGoogle ScholarCross RefCross Ref
  46. I. Park and R. Buch. Improve Debugging And Performance Tuning With ETW. MSDN Magazine, 2007.Google ScholarGoogle Scholar
  47. R. Rashid, R. Baron, R. Forin, D. Golub, and M. Jones. Mach: A system software kernel. In Proceedings of the 1989 IEEE International Conference, COMPCON, pages 176--178. Press, 1989.Google ScholarGoogle ScholarCross RefCross Ref
  48. M. Zhou and A. J. Smith. Analysis of personal computer workloads. In MASCOTS '99: Proceedings of the 7th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, page 208, Washington, DC, USA, 1999. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Evolution of thread-level parallelism in desktop applications

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM SIGARCH Computer Architecture News
        ACM SIGARCH Computer Architecture News  Volume 38, Issue 3
        ISCA '10
        June 2010
        508 pages
        ISSN:0163-5964
        DOI:10.1145/1816038
        Issue’s Table of Contents
        • cover image ACM Conferences
          ISCA '10: Proceedings of the 37th annual international symposium on Computer architecture
          June 2010
          520 pages
          ISBN:9781450300537
          DOI:10.1145/1815961

        Copyright © 2010 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 19 June 2010

        Check for updates

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader