research-article

Evolution of thread-level parallelism in desktop applications

Authors:
Geoffrey Blake

University of Michigan, Ann Arbor, MI, USA

University of Michigan, Ann Arbor, MI, USA
View Profile

,
Ronald G. Dreslinski

University of Michigan, Ann Arbor, MI, USA

University of Michigan, Ann Arbor, MI, USA
View Profile

,
Trevor Mudge

University of Michigan, Ann Arbor, MI, USA

University of Michigan, Ann Arbor, MI, USA
View Profile

,
Krisztián Flautner

ARM, Cambridge, United Kingdom

ARM, Cambridge, United Kingdom
View Profile

Authors Info & Claims

ACM SIGARCH Computer Architecture News Volume 38 Issue 3June 2010pp 302–313https://doi.org/10.1145/1816038.1816000

Published:19 June 2010Publication History

ACM SIGARCH Computer Architecture News

Abstract

As the effective limits of frequency and instruction level parallelism have been reached, the strategy of microprocessor vendors has changed to increase the number of processing cores on a single chip each generation. The implicit expectation is that software developers will write their applications with concurrency in mind to take advantage of this sudden change in direction. In this study we analyze whether software developers for laptop/desktop machines have followed the recent hardware trends by creating software for chip multi-processing. We conduct a study of a wide range of applications on Microsoft Windows 7 and Apple's OS X Snow Leopard, measuring Thread Level Parallelism on a high performance workstation and a low power desktop. In addition, we explore graphics processing units (GPUs) and their impact on chip multi-processing. We compare our findings to a study done 10 years ago which concluded that a second core was sufficient to improve system responsiveness. Our results on today's machines show that, 10 years later, surprisingly 2-3 cores are more than adequate for most applications and that the GPU often remains under-utilized. However, in some application specific domains an 8 core SMT system with a 240 core GPU can be effectively utilized. Overall these studies suggest that many-core architectures are not a natural fit for current desktop/laptop applications.

References

IEEE. Standard for Threads Interface to POSIX. P1003.1c, 1996.Google Scholar
Intel Pentium Processor. http://datasheets.chipdb.org/Intel/x86/Pentium/24199710.PDF, 1997.Google Scholar
AMD Athlon Processor Product Brief. http://www.amd.com/us-en/Processors/ProductInformation/0,,30_118_1260_759%5E1151,00.html, 1999.Google Scholar
Intel Pentium III Processor. http://www.intel.com/design/intarch/pentiumiii/pentiumiii.htm, 1999.Google Scholar
NVIDIA GeForce 256. http://www.nvidia.com/page/geforce256.html, 1999.Google Scholar
Power4 system microarchitecture. http://www-03.ibm.com/systems/p/-hardware/whitepapers/power4.html, 2001.Google Scholar
AMD Announces World's First 64-Bit, x86 Multi-Core Processors For Servers And Workstations At Second-Anniversary Celebration Of AMD Opteron Processor. AMD News Room, 2005.Google Scholar
Intel Has Double Vision: First Multi-Core Silicon Production Begins. Intel Press Room, 2005.Google Scholar
AMD "Close to Metal" Technology Unleashes the Power of Stream Computing. AMD News Room, 2006.Google Scholar
DTrace User Guide. Sun Microsystems Inc., 2006.Google Scholar
NVIDIA Unveils CUDA - the GPU Computing Revolution Begins. NVIDIA News Releases, 2006.Google Scholar
Intel Atom Processor. http://www.intel.com/products/processor/atom/specifications.htm, 2008.Google Scholar
NVIDIA PerfKit. Nvidia Developer Zone, 2008.Google Scholar
The Direct3D11 Compute Shader. Microsoft WINHEC Session GRA-T517, 2008.Google Scholar
AMD Displays Llano Die: 4 x86 Cores, 480 Stream Processors. http://www.xbitlabs.com/news/cpu/display/20091111143547 AMD Displays Llano Die 4 x86 Cores 480 Stream Processors.html, 2009.Google Scholar
Grand Central Dispatch:A better way to do multicore. Apple Inc. Technical Breif, 2009.Google Scholar
Intel Previews Intel Xeon 'Nehalem-EX' Processor. Intel Press Room, 2009.Google Scholar
International Technology Roadmap For Semiconductors - System Drivers. Iternational Technology Roadmap for Semiconductors, 2009.Google Scholar
Leopard Reference Library. Apple Inc. Developer Connection, 2009.Google Scholar
NVIDIA GeForce GT 120 (OEM Product). http://www.nvidia.com/object/product geforce gt 120 us.html, 2009.Google Scholar
OMAP 4: Mobile applications platform. Texas Instruments Product Bullentin, 2009.Google Scholar
OpenCL:Parallel Computing for Hetergeneous Devices. http://www.khronos.org/developers/library/overview/opencl overview.pdf, 2009.Google Scholar
AMD Sets the New Standard for Price, Performance, and Power for the Datacenter. AMD Newsroom, 2010.Google Scholar
Intel Sandy Bridge. http://en.wikipedia.org/wiki/Intel Sandy Bridge%28microarchitecture%29, 2010.Google Scholar
Intel Spotlights New Extreme Edition Processor, Software Developer Resources at Game Conference. Intel Press Room, 2010.Google Scholar
Interactive TLP Bench. http://itlpbench.eecs.umich.edu, 2010.Google Scholar
The Snapdragon Platform. http://www.qctconnect.com/products/snapdragon.html, 2010.Google Scholar
Ultra-Thin Notebooks: Powered by ultra-low-voltage Intel Core processors. http://www.intel.com/in/irdonline/ultra low.htm, 2010.Google Scholar
L. A. Barroso, K. Gharachorloo, R. McNamara, A. Nowatzyk, S. Qadeer, B. Sano, S. Smith, R. Stets, and B. Verghese. Piranha: a scalable architecture based on single-chip multiprocessing. In ISCA '00: Proceedings of the 27th annual international symposium on Computer architecture, pages 282--293, New York, NY, USA, 2000. ACM. Google ScholarDigital Library
A. Berillo. Multi-Core Processors in 3D Games. http://ixbtlabs.com/articles3/video/quadcorep6.html, 2008.Google Scholar
B. Chen, Y. Endo, K. Chan, D. M. A. Dias, A. Dias, M. Seltzer, and M. D. Smith. The measured performance of personal computer operating systems. ACM Transactions on Computer Systems, 14:3-40, 1995. Google ScholarDigital Library
Y. Endo, Z. Wang, J. B. Chen, and M. Seltzer. Using latency to evaluate interactive system performance. In OSDI '96: Proceedings of the second USENIX symposium on Operating systems design and implementation, pages 185--199, New York, NY, USA, 1996. ACM. Google ScholarDigital Library
K. Flautner, R. Uhlig, S. Reinhardt, and T. Mudge. Thread-level parallelism and interactive performance of desktop applications. SIGARCH Comput. Archit. News, 28(5):129--138, 2000. Google ScholarDigital Library
K. Flautner, R. Uhlig, S. Reinhardt, and T. Mudge. Thread-level parallelism of desktop applications. Workshop on Multi-threaded Execution, Architecture and Compilation, 2000.Google Scholar
E. Frachtenberg. Process scheduling for the parallel desktop. In ISPAN '05: Proceedings of the 8th International Symposium on Parallel Architectures,Algorithms and Networks, pages 132--139, Washington, DC, USA, 2005. IEEE Computer Society. Google ScholarDigital Library
E. Frachtenberg and Y. Etsion. Hardware Parallelism: Are Operating Systems Ready? (Case Studies in Mis-Scheduling). Workshop on the Interaction between Operating System and Computer Architecture, 2006.Google Scholar
N. Giacaman, O. Sinnen, N. Giacaman, and O. Sinnen. Inhibitors for desktop parallelisation, 2006.Google Scholar
L. Hammond, B. A. Hubbert, M. Siu, M. K. Prabhu, M. Chen, and K. Olukotun. The stanford hydra cmp. IEEE Micro, 20(2):71--84, 2000. Google ScholarDigital Library
C. Hauser, C. Jacobi, M. Theimer, B. Welch, and M. Weiser. Using threads in interactive systems: a case study. SIGOPS Oper. Syst. Rev., 27(5):94--105, 1993. Google ScholarDigital Library
L. D. Hung and S. Sakai. Dynamic estimation of task level parallelism with operating system support. In ISPAN '05: Proceedings of the 8th International Symposium on Parallel Architectures,Algorithms and Networks, pages 358--363, Washington, DC, USA, 2005. IEEE Computer Society. Google ScholarDigital Library
R. Isaacs, P. Barham, J. Bulpin, R. Mortier, and D. Narayanan. Request extraction in magpie: events, schemas and temporal joins. In EW11: Proceedings of the 11th workshop on ACM SIGOPS European workshop, page 17, New York, NY, USA, 2004. ACM. Google ScholarDigital Library
C. G. Jones, R. Liu, L. Meyerovich, K. Asanovic, and R. Bodik. Parallelizing the Web Browser. First USENIX Workshop on Hot Topics in Parallelism, 2009. Google ScholarDigital Library
D. C. Lee, P. J. Crowley, J.-L. Baer, T. E. Anderson, and B. N. Bershad. Execution characteristics of desktop applications on windows nt. SIGARCH Comput. Archit. News, 26(3):27--38, 1998. Google ScholarDigital Library
M. K. McKusick, K. Bostic, M. J. Karels, and J. S. Quarterman. The design and implementation of the 4.4BSD operating system. Addison Wesley Longman Publishing Co., Inc., Redwood City, CA, USA, 1996. Google ScholarDigital Library
T. D. Nguyen, R. Vaswani, and J. Zahorjan. Parallel application characterization for multiprocessor scheduling policy design. In of Lectures Notes in Computer Science, pages 105--118. Springer-Verlag, 1996.Google ScholarCross Ref
I. Park and R. Buch. Improve Debugging And Performance Tuning With ETW. MSDN Magazine, 2007.Google Scholar
R. Rashid, R. Baron, R. Forin, D. Golub, and M. Jones. Mach: A system software kernel. In Proceedings of the 1989 IEEE International Conference, COMPCON, pages 176--178. Press, 1989.Google ScholarCross Ref
M. Zhou and A. J. Smith. Analysis of personal computer workloads. In MASCOTS '99: Proceedings of the 7th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, page 208, Washington, DC, USA, 1999. IEEE Computer Society. Google ScholarDigital Library

Index Terms

Evolution of thread-level parallelism in desktop applications
1. General and reference
  1. Cross-computing tools and techniques
    1. Measurement
    2. Metrics

Recommendations

Evolution of thread-level parallelism in desktop applications
ISCA '10: Proceedings of the 37th annual international symposium on Computer architecture

As the effective limits of frequency and instruction level parallelism have been reached, the strategy of microprocessor vendors has changed to increase the number of processing cores on a single chip each generation. The implicit expectation is that ...
Read More
A Stall-Aware Warp Scheduling for Dynamically Optimizing Thread-level Parallelism in GPGPUs
ICS '15: Proceedings of the 29th ACM on International Conference on Supercomputing

General-Purpose Graphic Processing Units (GPGPU) have been widely used in high performance computing as application accelerators due to their massive parallelism and high throughput. A GPGPU generally contains two layers of schedulers, a cooperative-...
Read More
Thread shuffling: combining DVFS and thread migration toreduce energy consumptions for multi-core systems
ISLPED '11: Proceedings of the 17th IEEE/ACM international symposium on Low-power electronics and design

In recent years, multi-core systems have become mainstream in computer industry. The design of multi-cores takes advantage of thread-level parallelism in emerging applications that are computationally intensive and highly parallel. Energy efficiency is ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM SIGARCH Computer Architecture News Volume 38, Issue 3
ISCA '10
June 2010
508 pages
ISSN:0163-5964
DOI:10.1145/1816038
Issue’s Table of Contents
ISCA '10: Proceedings of the 37th annual international symposium on Computer architecture
June 2010
520 pages
ISBN:9781450300537
DOI:10.1145/1815961
General Chair:
André Seznec
INRIA Rennes
,
Program Chairs:
Uri Weiser
Technion
,
Ronny Ronen
Intel
Copyright © 2010 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 19 June 2010
Check for updates
Author Tags
benchmarking
desktop applications
multi-core
thread level parallelism
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 92
  Total Citations
  View Citations
- 1,504
  Total Downloads
- Downloads (Last 12 months)54
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Evolution of thread-level parallelism in desktop applications

ACM SIGARCH Computer Architecture News

Abstract

References

Cited By

Index Terms

Recommendations

Evolution of thread-level parallelism in desktop applications

A Stall-Aware Warp Scheduling for Dynamically Optimizing Thread-level Parallelism in GPGPUs

Thread shuffling: combining DVFS and thread migration toreduce energy consumptions for multi-core systems