ABSTRACT
Applications on today's high-end systems typically make varying load demands over time. A single application may have many different phases during its lifetime, and workload mixes show interleaved phases. Memory-intensive work or phases may exhibit performance saturation at frequencies below the maximum possible for the processors due to the disparity between processor and memory speeds. Performance saturation is a sign of over-provisioning and leads to energy-inefficient systems. Computers using heterogeneous processors, with the same ISA, but different implementation details, have been proposed as a way of reducing power while avoiding or limiting performance degradation. However, using heterogeneous processors effectively is complicated and requires intelligent schedulingThe research reported here explores the use of a heterogeneous system of processors with identical ISAs and implementation details, but with differing voltages and frequencies. The scheduler uses the execution characteristics of each application to predict its future processing needs and then schedule it to a processor which matches those needs if one is available. The predictions are used to minimize the performance loss to the system as a whole rather than that of a single application. The result limits system power while minimizing total performance loss. A prototype implementation on a Power4 four-processor system is presented. The prototype scheduler is validated using both synthetic and real-world benchmarks. The prototype shows reasonable predictor accuracy and significant power savings for memory-bound applications
- C. Lefurgy, K. Rajamani, F. Rawson, W. M. Felter, M. Kistler and T.W. Keller, "Energy Management for Commercial Servers", Computer, volume 36, number 12, December, 2003, pages 39--48. Google ScholarDigital Library
- R. Kotla, A. Devgan, S. Ghiasi, T. Keller and F. Rawson, "Characterizing the Impact of Different Memory-Intensity Levels", IEEE 7th Annual Workshop on Workload Characterization (WWC-7), October, 2004.Google ScholarCross Ref
- K. Flautner and T. Mudge, "Vertigo: Automatic performance-setting for Linux", Proceedings of the 5th Symposium on Operating Systems Design and Implementation (OSDI'02), December, 2002, pages 105--116. Google ScholarDigital Library
- R. Kumar, K.I. Farkas, N.P. Jouppi, P. Ranganathan, and D.M. Tullsen, "A Multi-Core Approach to Addressing the Energy-Complexity Problem in Microprocessors", Workshop on Complexity-Effective Design, 2003.Google Scholar
- R. Kumar, K.I. Farkas, N.P. Jouppi, P. Ranganathan, and D.M. Tullsen, "Single-IA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction", Proceedings of the 36th International Symposium on Microarchitecture, December, 2003. Google ScholarDigital Library
- R. Kumar, D.M. Tullsen, P. Ranganathan, N.P. Jouppi, K.I. Farkas, "Single -ISA Heterogeneneous Multi-Core ArchitecturesGoogle Scholar
- Transmeta Corporation, "Transmeta LongRun Dynamic Power/Thermal Management", http://www.transmeta.com/crusoe/longrun.html.Google Scholar
- D. Bodas, "New Server Power-Management Technologies Address Power and Cooling Challenges", Technology@Intel, http://www.intel.com/update/contents/sv09031.htm.Google Scholar
- S. Ghiasi and D. Grunwald, "Aide de Camp: Asymmetric Dual Core Design for Power and Energy Reduction", Technical Report CU-CS-964-03, Department of Computer Science, University of Colorado, Boulder, May, 2003.Google Scholar
- S. Ghiasi and D. Grunwald, "Thermal Management with Asymmetric Dual Core Designs", Technical Report CU-CS-965-03, Department of Computer Science, University of Colorado, Boulder, May, September, 2003.Google Scholar
- S. Ghiasi, "Aide de Camp: Asymmetric Multi-Core Design for Dynamic Thermal Management", Ph. D. thesis, Department of Computer Science, University of Colorado, Boulder, July, 2004. Google ScholarDigital Library
- P. Stanley-Marbell, M. Hsiao and U.Kremer, "A Hardware Architecture for Dynamic Performance and Energy Adaptation", Power-Aware Computer Systems, Lecture Notes in Computer Science 2325, Springer Verlag, 2002. Google ScholarDigital Library
- Intel Corporation, "Intel Pentium M: Enhanced SpeedStep Technology", http://developer.intel.com.Google Scholar
- G. Anselmi, D. Daines, S. Lutz, M. Okano, W. Seiwald, D. Williams and S. Vetter, pSeries 630 Models 6C4 and 6E4 Technical Overview and Introduction, IBM Corporation, December, 2003.Google Scholar
- E.N. Elnozahy, M. Kistler, and R. Rajamony, "Energy Conservation Policies for Web Servers", Proceedings of the 4th Annual Usenix Symposium on Internet Technologies and Systems, Usenix Association, 2003. Google ScholarDigital Library
- T. Sherwood, E. Perelman, G. Hamerly and B. Calder, "Automatically Characterizing Large Scale Program Behavior", Proceedings of the Tenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS X), October, 2002. Google ScholarDigital Library
- A.S. Dhodapkar and J.E. Smith, "Comparing Program Phase Detection Techniques", 36th Annual International Symposium on Microarchitecture (Micro-36), December, 2003. Google ScholarDigital Library
- A. Snavely and L. Carter, "Symbiotic Task-scheduling on the Tera MTA", Workshop on Multi-Threaded Execution, Architecture and Compilation (MTEAC'00), January, 2000.Google Scholar
- S. Ghiasi and W.M. Felter, "CPU Packing for Multiprocessor Power Reduction", 3rd Workshop on Power Aware Computer Systems, 2003. Google ScholarDigital Library
- A. Devgan, "LAVA: Leakage Avoidance and Analysis", IBM User's Guide, 2004.Google Scholar
- R. Kotla, S. Ghiasi, T.W. Keller, and F. Rawson, "Scheduling Processor Voltage and Frequency in Server and Cluster Systems", 1st Workshop on High-Performance, Power-Aware Computing (HPPAC), 2005. Google ScholarDigital Library
- D. Grunwald, P. Levis, K.I. Farkas, C.B. Morrey, and M. Neufeld, "Policies for Dynamic Clock Scheduling", 4th Symposium on Operating Systems Design and Implementation, 2000. Google ScholarDigital Library
Index Terms
- Scheduling for heterogeneous processors in server systems
Recommendations
Dynamic MIPS Rate Stabilization for Complex Processors
Modern microprocessor cores reach their high performance levels with the help of high clock rates, parallel and speculative execution of a large number of instructions, and vast cache hierarchies. Modern cores also have adaptive features to regulate ...
Evaluation of scheduling techniques on a SPARC-based VLIW testbed
MICRO 30: Proceedings of the 30th annual ACM/IEEE international symposium on MicroarchitectureThe performance of Very Long Instruction Word (VLIW) microprocessors depends on the close cooperation between the compiler and the architecture. This paper evaluates a set of important compilation techniques and related architectural features for VLIW ...
Operation Tables for Scheduling in the Presence of Incomplete Bypassing
CODES+ISSS '04: Proceedings of the international conference on Hardware/Software Codesign and System Synthesis: 2004Register bypassing is a powerful and widely used feature in modern processors to eliminate certain data hazards. Although complete bypassing is ideal for performance, bypassing has significant impact on cycle time, area, and power consumption of the ...
Comments