ABSTRACT
In a multi-programmed computing environment, threads of execution exhibit different runtime characteristics and hardware resource requirements. Not only do the behaviors of distinct threads differ, but each thread may also present diversity in its performance and resource usage over time. A heterogeneous chip multiprocessor (CMP) architecture consists of processor cores and caches of varying size and complexity. Prior work has shown that heterogeneous CMPs can meet the needs of a multi-programmed computing environment better than a homogeneous CMP system. In fact, the use of a combination of cores with different caches and instruction issue widths better accommodates threads with different computational requirements.A central issue in the design and use of heterogeneous systems is to determine an assignment of tasks to processors which better exploits the hardware resources in order to improve performance. In this paper we argue that the benefits of heterogeneous CMPs are bolstered by the usage of a dynamic assignment policy, i.e., a runtime mechanism which observes the behavior of the running threads and exploits thread migration between the cores. We validate our analysis by means of simulation. Specifically, our model assumes a combination of Alpha EV5 and Alpha EV6 processors and of integer and floating point programs from the SPEC2000 benchmark suite. We show that a dynamic assignment can outperform a static one by 20% to 40% on average and by as much as 80% in extreme cases, depending on the degree of multithreading simulated.
- R. Kumar, K. I. Farkas, N. P. Jouppi, P. Ranganathan, and D. M. Tullsen. Single-ISA Heterogeneous Multi-core Architecture: The Potential for Processor Power Reduction. In International Symposium on Microarchitecture, Dec. 2003 Google ScholarDigital Library
- R. Kumar, D. M. Tullsen, P. Ranganathan, N. P. Jouppi, K. I. Farkas. Single-ISA Heterogeneous Multi-Core Architecture for Multithread Workload Performance. In Proceedings of the 31st International Symposium on Computer Architecture, June, 2004 Google ScholarDigital Library
- R. Kumar, N. P. Jouppi, D. M. Tullsen. Conjoined-code Chip Multiprocessing. In Proc.of the 37th International Symposium on Microarchitecture, December, 2004 Google ScholarDigital Library
- J. Hennessey and D. Patterson. Computer Architecture a Quantitative Approach. Morgan Kauffmann Publishers, Inc., 3rd Edition, 2003 Google ScholarDigital Library
- R. E. Kessler, E. J. MecLellan, and D. A. Webb. The Alpha 21264 Microprocessor Architecture. IEEE Micro, 19(2):24--36, March/April 1999. Google ScholarDigital Library
- N. L. Binkert, E. G. Hallnor, and S. K. Reinhardt. Network-Oriented Full-System Simulation using M5. In Sixth Workshop on Computer Architecture Evaluation using Commercial Workloads (CAECW), February 2003Google Scholar
- T. Sherwood, E. Perelman, G. Hamerly, S. Sair, and B. Calder. Discovering and exploiting program phases. In IEEE Micro: Micro's Top Picks from Computer Architecture Conferences, Dec. 2003Google Scholar
- T. Sherwood, S. Sair, and B. Calder. Phase Tracking and Prediction. In Proc. of the 30th Annual International Symposium on Computer Architecture, IEEE CS Press, 2003, pp 336--349 Google ScholarDigital Library
- F. Sun, S. Ravi, A. Raghunathan, and N. K. Jha. Synthesis of Application-specific Heterogeneous Multiprocessor Architectures using Extensible Processors. In Proc. of the 18th International Conference on VLSI Design, January, 2005 Google ScholarDigital Library
- G. C. Sih and E. A. Lee, A Compile-Time Scheduling Heuristic for Interconnection-Constrained Heterogeneous Processor Architecture. In IEEE Trans. Parallel and distributed systems, vol. 4, no. 2, pp.175--187, Feb. 1993. Google ScholarDigital Library
- H. Oh and S. Ha. A Static Scheduling Heuristic for Heterogeneous Processors, In Proc. Euro-Par'96, August, 1996. Google ScholarDigital Library
- R. J. O. Figuiredo and J. A. B. Fortes. Impact of Heterogeneity on DSM Performances. In Sixth International Symposium on High Performance Computer Architecture, January, 2000Google Scholar
- J. M. Tendler et al. POWER4 System Microarchitecture. In IBM Journal of Research and Development. Vol 46, No. 1, January 2002. Google ScholarDigital Library
- S. Richardson. MPOC: A Chip Multiprocessor for Embedded Systems. HP Technical Report HPL-2002-186, 2002.Google Scholar
- Digital Equipment Corp. Alpha 21164 Microprocessor Hardware Reference Manual. October, 1996.Google Scholar
- K. Olukotun, B. A. Nayfeh, L. Hammond, K. Wilson, and K. Chang. The Case for a Single-Chip Multiprocessor. In Proc. APLOS VII, October, 1996 Google ScholarDigital Library
- J. Andrews and C. Polychronopoulos. An Analytical Approach to Performance/Cost Modeling of Parallel Computers. In Journal of Parallel and Distributed Computing,no. 12, pp. 343--356, 1991. Google ScholarDigital Library
Index Terms
- Dynamic thread assignment on heterogeneous multiprocessor architectures
Recommendations
Position-aware thread-level speculative parallelization for large-scale chip-multiprocessor
CF '15: Proceedings of the 12th ACM International Conference on Computing FrontiersThread-Level Speculation (TLS) is an effective mechanism for exploiting automatic parallelization of the sequential programs, especially for the large scale chip multiprocessor (CMP) which is rich of idle computation resources on chip. TLS could use the ...
The Cell Broadband Engine: Exploiting Multiple Levels of Parallelism in a Chip Multiprocessor
As CMOS feature sizes continue to shrink and traditional microarchitectural methods for delivering high performance (e.g., deep pipelining) become too expensive and power-hungry, chip multiprocessors (CMPs) become an exciting new direction by which system ...
Energy Efficient Chip-to-Chip Wireless Interconnection for Heterogeneous Architectures
Heterogeneous multichip architectures have gained significant interest in high-performance computing clusters to cater to a wide range of applications. In particular, heterogeneous systems with multiple multicore CPUs, GPUs, and memory have become ...
Comments