|
ABSTRACT
Interconnection networks have been deployed as the communication fabric in a wide spectrum of parallel computer systems, ranging from chip multiprocessors (CMPs) and embedded multicore systems-on-a-chip (SoCs) to clusters and server blades. Recent technology trends have permitted a rapid growth of chip resources, faster clock rates, and wider communication bandwidths, however, these trends have also led to an increase in power consumption that is becoming a key limiting factor in the design of such scalable interconnected systems. Power-aware networks, therefore, need to become inherent components of single and multi-chip parallel systems. In the hardware arena, recent interconnection network power-management research work has employed limited-scope techniques that mostly focus on reducing the power consumed by the network communication links. As these limited-scope techniques are not tailored to the applications running on the network, power savings and the corresponding impact on network latency vary significantly from one application to the next as we demonstrate in this paper; in many cases, network performance can severely suffer. In the software arena, extensive research on compile-time optimizations has produced parallelizing compilers that can efficiently map an application onto hardware for high performance. However, research into power-aware parallelizing compilers is in its infancy. In this paper, we take the first steps toward tailoring applications' communication needs at run-time for low power. We propose software techniques that extend the flow of a parallelizing compiler in order to direct run-time network power reduction. We target network links, a significant power consumer in these systems, allowing dynamic voltage scaling (DVS) instructions extracted during static compilation to orchestrate link voltage and frequency transitions for power savings during application run-time. Concurrently, an online hardware mechanism measures network congestion levels and adapts these off-line DVS settings to maximize network performance. Our simulations over three existing parallel systems, ranging from very fine-grained single-chip to coarse-grained multi-chip architectures, show that link power consumption can be reduced by up to 76.3%, with a minor increase in latency, ranging from 0.18 to 6.78% across a number of benchmark suites.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
|
 |
3
|
|
 |
4
|
Guangyu Chen , Feihui Li , Mahmut Kandemir, Compiler-directed channel allocation for saving power in on-chip networks, Conference record of the 33rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages, p.194-205, January 11-13, 2006, Charleston, South Carolina, USA
|
| |
5
|
|
 |
6
|
|
| |
7
|
|
 |
8
|
|
| |
9
|
|
| |
10
|
InfiniBand 2006. InfiniBand Trade Alliance. The InfiniBand Architecture. Available {online}: http://www.infinibandta.org.
|
| |
11
|
ITRS. 2005. Semiconductor Industry Association. International Technology Roadmap for Semiconductors, available {online}: http://www.itrs.net/Common/2005ITRS/Home2005.htm.
|
| |
12
|
|
| |
13
|
|
| |
14
|
Himanshu Kaul , Dennis Sylvester , David Blaauw , Trevor Mudge , Todd Austin, DVS for On-Chip Bus Designs Based on Timing Error Correction, Proceedings of the conference on Design, Automation and Test in Europe, p.80-85, March 07-11, 2005
[doi> 10.1109/DATE.2005.125]
|
 |
15
|
E. J. Kim , K. H. Yum , G. M. Link , N. Vijaykrishnan , M. Kandemir , M. J. Irwin , M. Yousif , C. R. Das, Energy optimization techniques in cluster interconnects, Proceedings of the 2003 international symposium on Low power electronics and design, August 25-27, 2003, Seoul, Korea
[doi> 10.1145/871506.871620]
|
| |
16
|
Kim, J. and Horowitz, M. 2002. Adaptive supply serial links with sub-1V operation and per-pin clock recovery. In Proceedings of the International Solid-State Circuits Conference. 1403--1413.
|
 |
17
|
|
| |
18
|
|
| |
19
|
Chunho Lee , Miodrag Potkonjak , William H. Mangione-Smith, MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.330-335, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
 |
20
|
Walter Lee , Rajeev Barua , Matthew Frank , Devabhaktuni Srikrishna , Jonathan Babb , Vivek Sarkar , Saman Amarasinghe, Space-time scheduling of instruction-level parallelism on a raw machine, Proceedings of the eighth international conference on Architectural support for programming languages and operating systems, p.46-57, October 02-07, 1998, San Jose, California, United States
|
| |
21
|
|
| |
22
|
|
| |
23
|
Mellanox. 2006. Mellanox Technologies Performance, Price, Power, Volume Metric (PPPV). Available {online}: http://www.mellanox.com/products/shared/PPPV.pdf.
|
| |
24
|
|
| |
25
|
Pai, V. S., Ranganathan, P., and Adve, S. V. 1997. RSIM: An execution-driven simulator for ILP-based shared-memory multiprocessors and uniprocessors. IEEE Technical Committee on Computer Architecture (TCCA) Newsletter 35, 11 (Oct.), 37--48.
|
| |
26
|
|
 |
27
|
Robert P. Wilson , Robert S. French , Christopher S. Wilson , Saman P. Amarasinghe , Jennifer M. Anderson , Steve W. K. Tjiang , Shih-Wei Liao , Chau-Wen Tseng , Mary W. Hall , Monica S. Lam , John L. Hennessy, SUIF: an infrastructure for research on parallelizing and optimizing compilers, ACM SIGPLAN Notices, v.29 n.12, p.31-37, Dec. 1994
[doi> 10.1145/193209.193217]
|
 |
28
|
Karthikeyan Sankaralingam , Ramadass Nagarajan , Haiming Liu , Changkyu Kim , Jaehyuk Huh , Doug Burger , Stephen W. Keckler , Charles R. Moore, Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture, Proceedings of the 30th annual international symposium on Computer architecture, June 09-11, 2003, San Diego, California
|
 |
29
|
H. Saputra , M. Kandemir , N. Vijaykrishnan , M. J. Irwin , J. S. Hu , C-H. Hsu , U. Kremer, Energy-conscious compilation based on voltage scaling, Proceedings of the joint conference on Languages, compilers and tools for embedded systems: software and compilers for embedded systems, June 19-21, 2002, Berlin, Germany
|
| |
30
|
Shang, L. 2002. PoPNet simulator. Available {online}: http://www.princeton.edu/~lshang/popnet. html.
|
| |
31
|
|
 |
32
|
|
| |
33
|
|
 |
34
|
Vassos Soteriou , Noel Eisley , Li-Shiuan Peh, Software-directed power-aware interconnection networks, Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems, September 24-27, 2005, San Francisco, California, USA
[doi> 10.1145/1086297.1086333]
|
| |
35
|
SPEC. 2006. The Standard Performance Evaluation Corporation. Available {online}: http://www.spec.org/.
|
| |
36
|
|
 |
37
|
Michael Bedford Taylor , Walter Lee , Jason Miller , David Wentzlaff , Ian Bratt , Ben Greenwald , Henry Hoffmann , Paul Johnson , Jason Kim , James Psota , Arvind Saraf , Nathan Shnidman , Volker Strumpen , Matt Frank , Saman Amarasinghe , Anant Agarwal, Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams, Proceedings of the 31st annual international symposium on Computer architecture, p.2, June 19-23, 2004, München, Germany
|
| |
38
|
|
| |
39
|
Wei, G.-Y., Kim, J., Liu, D., Sidiropoulos, S., and Horowitz, M. A. 2000. A variable-frequency parallel I/O interface with adaptive power-supply regulation. Solid-State Circuits 35, 11 (Nov.), 1600--1610.
|
 |
40
|
Steven Cameron Woo , Moriyoshi Ohara , Evan Torrie , Jaswinder Pal Singh , Anoop Gupta, The SPLASH-2 programs: characterization and methodological considerations, Proceedings of the 22nd annual international symposium on Computer architecture, p.24-36, June 22-24, 1995, S. Margherita Ligure, Italy
|
 |
41
|
|
 |
42
|
|
|