Abstract
As the effective limits of frequency and instruction level parallelism have been reached, the strategy of microprocessor vendors has changed to increase the number of processing cores on a single chip each generation. The implicit expectation is that software developers will write their applications with concurrency in mind to take advantage of this sudden change in direction. In this study we analyze whether software developers for laptop/desktop machines have followed the recent hardware trends by creating software for chip multi-processing. We conduct a study of a wide range of applications on Microsoft Windows 7 and Apple's OS X Snow Leopard, measuring Thread Level Parallelism on a high performance workstation and a low power desktop. In addition, we explore graphics processing units (GPUs) and their impact on chip multi-processing. We compare our findings to a study done 10 years ago which concluded that a second core was sufficient to improve system responsiveness. Our results on today's machines show that, 10 years later, surprisingly 2-3 cores are more than adequate for most applications and that the GPU often remains under-utilized. However, in some application specific domains an 8 core SMT system with a 240 core GPU can be effectively utilized. Overall these studies suggest that many-core architectures are not a natural fit for current desktop/laptop applications.
- IEEE. Standard for Threads Interface to POSIX. P1003.1c, 1996.Google Scholar
- Intel Pentium Processor. http://datasheets.chipdb.org/Intel/x86/Pentium/24199710.PDF, 1997.Google Scholar
- AMD Athlon Processor Product Brief. http://www.amd.com/us-en/Processors/ProductInformation/0,,30_118_1260_759%5E1151,00.html, 1999.Google Scholar
- Intel Pentium III Processor. http://www.intel.com/design/intarch/pentiumiii/pentiumiii.htm, 1999.Google Scholar
- NVIDIA GeForce 256. http://www.nvidia.com/page/geforce256.html, 1999.Google Scholar
- Power4 system microarchitecture. http://www-03.ibm.com/systems/p/-hardware/whitepapers/power4.html, 2001.Google Scholar
- AMD Announces World's First 64-Bit, x86 Multi-Core Processors For Servers And Workstations At Second-Anniversary Celebration Of AMD Opteron Processor. AMD News Room, 2005.Google Scholar
- Intel Has Double Vision: First Multi-Core Silicon Production Begins. Intel Press Room, 2005.Google Scholar
- AMD "Close to Metal" Technology Unleashes the Power of Stream Computing. AMD News Room, 2006.Google Scholar
- DTrace User Guide. Sun Microsystems Inc., 2006.Google Scholar
- NVIDIA Unveils CUDA - the GPU Computing Revolution Begins. NVIDIA News Releases, 2006.Google Scholar
- Intel Atom Processor. http://www.intel.com/products/processor/atom/specifications.htm, 2008.Google Scholar
- NVIDIA PerfKit. Nvidia Developer Zone, 2008.Google Scholar
- The Direct3D11 Compute Shader. Microsoft WINHEC Session GRA-T517, 2008.Google Scholar
- AMD Displays Llano Die: 4 x86 Cores, 480 Stream Processors. http://www.xbitlabs.com/news/cpu/display/20091111143547 AMD Displays Llano Die 4 x86 Cores 480 Stream Processors.html, 2009.Google Scholar
- Grand Central Dispatch:A better way to do multicore. Apple Inc. Technical Breif, 2009.Google Scholar
- Intel Previews Intel Xeon 'Nehalem-EX' Processor. Intel Press Room, 2009.Google Scholar
- International Technology Roadmap For Semiconductors - System Drivers. Iternational Technology Roadmap for Semiconductors, 2009.Google Scholar
- Leopard Reference Library. Apple Inc. Developer Connection, 2009.Google Scholar
- NVIDIA GeForce GT 120 (OEM Product). http://www.nvidia.com/object/product geforce gt 120 us.html, 2009.Google Scholar
- OMAP 4: Mobile applications platform. Texas Instruments Product Bullentin, 2009.Google Scholar
- OpenCL:Parallel Computing for Hetergeneous Devices. http://www.khronos.org/developers/library/overview/opencl overview.pdf, 2009.Google Scholar
- AMD Sets the New Standard for Price, Performance, and Power for the Datacenter. AMD Newsroom, 2010.Google Scholar
- Intel Sandy Bridge. http://en.wikipedia.org/wiki/Intel Sandy Bridge%28microarchitecture%29, 2010.Google Scholar
- Intel Spotlights New Extreme Edition Processor, Software Developer Resources at Game Conference. Intel Press Room, 2010.Google Scholar
- Interactive TLP Bench. http://itlpbench.eecs.umich.edu, 2010.Google Scholar
- The Snapdragon Platform. http://www.qctconnect.com/products/snapdragon.html, 2010.Google Scholar
- Ultra-Thin Notebooks: Powered by ultra-low-voltage Intel Core processors. http://www.intel.com/in/irdonline/ultra low.htm, 2010.Google Scholar
- L. A. Barroso, K. Gharachorloo, R. McNamara, A. Nowatzyk, S. Qadeer, B. Sano, S. Smith, R. Stets, and B. Verghese. Piranha: a scalable architecture based on single-chip multiprocessing. In ISCA '00: Proceedings of the 27th annual international symposium on Computer architecture, pages 282--293, New York, NY, USA, 2000. ACM. Google ScholarDigital Library
- A. Berillo. Multi-Core Processors in 3D Games. http://ixbtlabs.com/articles3/video/quadcorep6.html, 2008.Google Scholar
- B. Chen, Y. Endo, K. Chan, D. M. A. Dias, A. Dias, M. Seltzer, and M. D. Smith. The measured performance of personal computer operating systems. ACM Transactions on Computer Systems, 14:3-40, 1995. Google ScholarDigital Library
- Y. Endo, Z. Wang, J. B. Chen, and M. Seltzer. Using latency to evaluate interactive system performance. In OSDI '96: Proceedings of the second USENIX symposium on Operating systems design and implementation, pages 185--199, New York, NY, USA, 1996. ACM. Google ScholarDigital Library
- K. Flautner, R. Uhlig, S. Reinhardt, and T. Mudge. Thread-level parallelism and interactive performance of desktop applications. SIGARCH Comput. Archit. News, 28(5):129--138, 2000. Google ScholarDigital Library
- K. Flautner, R. Uhlig, S. Reinhardt, and T. Mudge. Thread-level parallelism of desktop applications. Workshop on Multi-threaded Execution, Architecture and Compilation, 2000.Google Scholar
- E. Frachtenberg. Process scheduling for the parallel desktop. In ISPAN '05: Proceedings of the 8th International Symposium on Parallel Architectures,Algorithms and Networks, pages 132--139, Washington, DC, USA, 2005. IEEE Computer Society. Google ScholarDigital Library
- E. Frachtenberg and Y. Etsion. Hardware Parallelism: Are Operating Systems Ready? (Case Studies in Mis-Scheduling). Workshop on the Interaction between Operating System and Computer Architecture, 2006.Google Scholar
- N. Giacaman, O. Sinnen, N. Giacaman, and O. Sinnen. Inhibitors for desktop parallelisation, 2006.Google Scholar
- L. Hammond, B. A. Hubbert, M. Siu, M. K. Prabhu, M. Chen, and K. Olukotun. The stanford hydra cmp. IEEE Micro, 20(2):71--84, 2000. Google ScholarDigital Library
- C. Hauser, C. Jacobi, M. Theimer, B. Welch, and M. Weiser. Using threads in interactive systems: a case study. SIGOPS Oper. Syst. Rev., 27(5):94--105, 1993. Google ScholarDigital Library
- L. D. Hung and S. Sakai. Dynamic estimation of task level parallelism with operating system support. In ISPAN '05: Proceedings of the 8th International Symposium on Parallel Architectures,Algorithms and Networks, pages 358--363, Washington, DC, USA, 2005. IEEE Computer Society. Google ScholarDigital Library
- R. Isaacs, P. Barham, J. Bulpin, R. Mortier, and D. Narayanan. Request extraction in magpie: events, schemas and temporal joins. In EW11: Proceedings of the 11th workshop on ACM SIGOPS European workshop, page 17, New York, NY, USA, 2004. ACM. Google ScholarDigital Library
- C. G. Jones, R. Liu, L. Meyerovich, K. Asanovic, and R. Bodik. Parallelizing the Web Browser. First USENIX Workshop on Hot Topics in Parallelism, 2009. Google ScholarDigital Library
- D. C. Lee, P. J. Crowley, J.-L. Baer, T. E. Anderson, and B. N. Bershad. Execution characteristics of desktop applications on windows nt. SIGARCH Comput. Archit. News, 26(3):27--38, 1998. Google ScholarDigital Library
- M. K. McKusick, K. Bostic, M. J. Karels, and J. S. Quarterman. The design and implementation of the 4.4BSD operating system. Addison Wesley Longman Publishing Co., Inc., Redwood City, CA, USA, 1996. Google ScholarDigital Library
- T. D. Nguyen, R. Vaswani, and J. Zahorjan. Parallel application characterization for multiprocessor scheduling policy design. In of Lectures Notes in Computer Science, pages 105--118. Springer-Verlag, 1996.Google ScholarCross Ref
- I. Park and R. Buch. Improve Debugging And Performance Tuning With ETW. MSDN Magazine, 2007.Google Scholar
- R. Rashid, R. Baron, R. Forin, D. Golub, and M. Jones. Mach: A system software kernel. In Proceedings of the 1989 IEEE International Conference, COMPCON, pages 176--178. Press, 1989.Google ScholarCross Ref
- M. Zhou and A. J. Smith. Analysis of personal computer workloads. In MASCOTS '99: Proceedings of the 7th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, page 208, Washington, DC, USA, 1999. IEEE Computer Society. Google ScholarDigital Library
Index Terms
- Evolution of thread-level parallelism in desktop applications
Recommendations
Evolution of thread-level parallelism in desktop applications
ISCA '10: Proceedings of the 37th annual international symposium on Computer architectureAs the effective limits of frequency and instruction level parallelism have been reached, the strategy of microprocessor vendors has changed to increase the number of processing cores on a single chip each generation. The implicit expectation is that ...
A Stall-Aware Warp Scheduling for Dynamically Optimizing Thread-level Parallelism in GPGPUs
ICS '15: Proceedings of the 29th ACM on International Conference on SupercomputingGeneral-Purpose Graphic Processing Units (GPGPU) have been widely used in high performance computing as application accelerators due to their massive parallelism and high throughput. A GPGPU generally contains two layers of schedulers, a cooperative-...
Thread shuffling: combining DVFS and thread migration toreduce energy consumptions for multi-core systems
ISLPED '11: Proceedings of the 17th IEEE/ACM international symposium on Low-power electronics and designIn recent years, multi-core systems have become mainstream in computer industry. The design of multi-cores takes advantage of thread-level parallelism in emerging applications that are computationally intensive and highly parallel. Energy efficiency is ...
Comments