research-article

Low-power inter-core communication through cache partitioning in embedded multiprocessors

Authors:

Xiangrong Zhou,

Peter PetrovAuthors Info & Claims

SBCCI '09: Proceedings of the 22nd Annual Symposium on Integrated Circuits and System Design: Chip on the Dunes

Article No.: 5, Pages 1 - 6

https://doi.org/10.1145/1601896.1601903

Published: 31 August 2009 Publication History

Abstract

We present an application-driven customization methodology for energy-efficient inter-core communication in embedded multiprocessors. The methodology leverages configurable cache architectures and integrates software and hardware support to achieve energy-efficient data sharing between producer and consumer tasks. The technique is especially beneficial for data-streaming applications exploiting pipeline parallelism where computational phases are mapped to separate processor cores. The application-driven data cache partitioning achieves low-power and low-latency (no coherence misses) inter-core data sharing. The basic premise of the proposed technique is to separate through cache partitioning the private data from the several shared data buffers used by each producer/consumer task. Such partitioning will result in the following benefits: 1) Data cache accesses caused by the processor and the coherence mechanism will need to access only a cache partition instead of the entire cache structure, resulting in significant power reductions; 2) Interference (caused by both processor and coherence activities) across private data and the several shared data buffers is eliminated - this in turn enables the efficient implementation of application-driven remote cache updates at synchronization boundaries.

References

[1]

M. Ekman, F. Dahlgren and P. Stenstrom, "TLB and snoop energy-reduction using virtual caches in low-power chipmicroprocessors", in ISLPED, pp. 243--246, August 2002.

Digital Library

[2]

M. Loghi, M. Poncino and L. Benini, "Cache coherence tradeoffs in shared-memory MPSoCs", ACM Transactions on Embedded Computing Systems, vol. 5, n. 2, pp. 383--407, 2006.

Digital Library

[3]

P. Cumming, "The TI OMAP Platform Approach to SoC", in Winning the SOC Revolution, Kluwer Academic Publishers, 2003.

[4]

W. Wolf, "The Future of Multiprocessor Systems-on-Chips", in DAC, pp. 681--685, June 2004.

Digital Library

[5]

A. Moshovos, G. Memik, A. Choudhary and B. Falsafi, "JETTY: Filtering Snoops for Reduced Energy Consumption in SMP Servers", in HPCA, 2001.

Digital Library

[6]

A. Moshovos, "RegionScout: Exploiting Coarse Grain Sharing in Snoop-Based Coherence", in ISCA, 2005.

Digital Library

[7]

C. Yu and P. Petrov, "Aggressive snoop reduction for synchronized producer-consumer communication in energy-efficient embedded multi-processors", in CODES+ISSS, pp. 245--250, 2007.

Digital Library

[8]

A. Patel and K. Ghose, "Energy-efficient MESI cache coherence with pro-active snoop filtering for multicore microprocessors", in ISLPED, pp. 247--252, 2008.

Digital Library

[9]

C. Ballapuram, A. Sharif and H-H. Lee, "Exploiting access semantics and program behavior to reduce snoop power in chip multiprocessors", in ASPLOS, pp. 60--69, 2008.

Digital Library

[10]

W. Thies, V. Chandrasekhar and S. Amarasinghe, "A Practical Approach to Exploiting Coarse-Grained Pipeline Parallelism in C Programs", in MICRO, pp. 356--369, 2007.

Digital Library

[11]

D. H. Albonesi, "Selective Cache Ways: On-Demand Cache Resource Allocation", in 32nd MICRO, pp. 248--259, November 1999.

Digital Library

[12]

A. Gordon-Ross and F. Vahid, "A self-tuning configurable cache", in DAC, pp. 234--237, 2007.

Digital Library

[13]

C. Zhang, F. Vahid and W. Najjar, "A highly configurable cache architecture for embedded systems", in ISCA, pp. 136--146, 2003.

Digital Library

[14]

J. Montanaro et al., "A 160Mhz, 32b 0.5W CMOS RISC Microprocessor", in IEEE ISCC, pp. 214--229, February 1996.

Digital Library

[15]

B. Khailany, W. Dally, U. Kapasi, P. Mattson, J. Namkoong, J. Owens, B. Towles, A. Chang and S. Rixner, "Imagine: Media Processing with Streams", IEEE Micro, vol. 21, n. 2, pp. 35--46, 2001.

Digital Library

[16]

C. Lee, M. Potkonjak and W. H. Mangione-Smith, "Media-Bench: A Tool for Evaluating and Synthesizing Multimedia and Communications Systems", in MICRO, pp. 330--335, Dec 1997.

Digital Library

[17]

M.R Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge and R. B. Brown, "MiBench: A free, commercially representative embedded benchmark suite", in WWC, pp. 3--14, Dec 2001.

Digital Library

[18]

N. Binkert, R. Dreslinski, L. Hsu, K. Lim, A. Saidi and S. Reinhardt, "The M5 Simulator: Modeling Networked Systems", IEEE Micro, vol. 26, n. 4, pp. 52--60, 2006.

Digital Library

[19]

S. Thoziyoor, N. Muralimanohar, J. Ahn and N. Jouppi, "CACTI 5.1", Technical report, HP Laboratories Palo Alto, April 2008.

Index Terms

Low-power inter-core communication through cache partitioning in embedded multiprocessors
1. Computer systems organization
  1. Embedded and cyber-physical systems
  2. Real-time systems
2. Hardware
  1. Integrated circuits
    1. Semiconductor memory

Recommendations

Inter-core cooperative TLB for chip multiprocessors
ASPLOS '10

Translation Lookaside Buffers (TLBs) are commonly employed in modern processor designs and have considerable impact on overall system performance. A number of past works have studied TLB designs to lower access times and miss rates, specifically for ...
Low-power snoop architecture for synchronized producer-consumer embedded multiprocessing

We introduce a cross-layer customization methodology where application knowledge regarding data sharing in producer-consumer relationships is used in order to aggressively eliminate unnecessary and predictable snoop-induced cache lookups even for ...
Inter-core cooperative TLB for chip multiprocessors
ASPLOS XV: Proceedings of the fifteenth International Conference on Architectural support for programming languages and operating systems

Translation Lookaside Buffers (TLBs) are commonly employed in modern processor designs and have considerable impact on overall system performance. A number of past works have studied TLB designs to lower access times and miss rates, specifically for ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SBCCI '09: Proceedings of the 22nd Annual Symposium on Integrated Circuits and System Design: Chip on the Dunes

August 2009

325 pages

ISBN:9781605587059

DOI:10.1145/1601896

General Chair:
Ivan Saraiva
UFRN, Brazil
,
Program Chairs:
Renato Perez Ribas
UFRGS, Brazil
,
Calvin Plett
Carleton University, Canada

Copyright © 2009 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SBC: Brazilian Computer Society
SIGDA: ACM Special Interest Group on Design Automation
SBMICRO: Brazilian Microelectronics Society
IEEE Circuits & Systems Society

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 August 2009

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SBCCI '09

Sponsor:

SBC
SIGDA
SBMICRO

SBCCI '09: 22nd Symposium on Integrated Circuits and System Design

August 31 - September 3, 2009

Natal, Brazil

Acceptance Rates

SBCCI '09 Paper Acceptance Rate 50 of 119 submissions, 42%;

Overall Acceptance Rate 133 of 347 submissions, 38%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
228
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 08 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten