skip to main content
10.1145/1601896.1601903acmconferencesArticle/Chapter ViewAbstractPublication PagessbcciConference Proceedingsconference-collections
research-article

Low-power inter-core communication through cache partitioning in embedded multiprocessors

Published:31 August 2009Publication History

ABSTRACT

We present an application-driven customization methodology for energy-efficient inter-core communication in embedded multiprocessors. The methodology leverages configurable cache architectures and integrates software and hardware support to achieve energy-efficient data sharing between producer and consumer tasks. The technique is especially beneficial for data-streaming applications exploiting pipeline parallelism where computational phases are mapped to separate processor cores. The application-driven data cache partitioning achieves low-power and low-latency (no coherence misses) inter-core data sharing. The basic premise of the proposed technique is to separate through cache partitioning the private data from the several shared data buffers used by each producer/consumer task. Such partitioning will result in the following benefits: 1) Data cache accesses caused by the processor and the coherence mechanism will need to access only a cache partition instead of the entire cache structure, resulting in significant power reductions; 2) Interference (caused by both processor and coherence activities) across private data and the several shared data buffers is eliminated - this in turn enables the efficient implementation of application-driven remote cache updates at synchronization boundaries.

References

  1. M. Ekman, F. Dahlgren and P. Stenstrom, "TLB and snoop energy-reduction using virtual caches in low-power chipmicroprocessors", in ISLPED, pp. 243--246, August 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. M. Loghi, M. Poncino and L. Benini, "Cache coherence tradeoffs in shared-memory MPSoCs", ACM Transactions on Embedded Computing Systems, vol. 5, n. 2, pp. 383--407, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. P. Cumming, "The TI OMAP Platform Approach to SoC", in Winning the SOC Revolution, Kluwer Academic Publishers, 2003.Google ScholarGoogle Scholar
  4. W. Wolf, "The Future of Multiprocessor Systems-on-Chips", in DAC, pp. 681--685, June 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A. Moshovos, G. Memik, A. Choudhary and B. Falsafi, "JETTY: Filtering Snoops for Reduced Energy Consumption in SMP Servers", in HPCA, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Moshovos, "RegionScout: Exploiting Coarse Grain Sharing in Snoop-Based Coherence", in ISCA, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. C. Yu and P. Petrov, "Aggressive snoop reduction for synchronized producer-consumer communication in energy-efficient embedded multi-processors", in CODES+ISSS, pp. 245--250, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. Patel and K. Ghose, "Energy-efficient MESI cache coherence with pro-active snoop filtering for multicore microprocessors", in ISLPED, pp. 247--252, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. C. Ballapuram, A. Sharif and H-H. Lee, "Exploiting access semantics and program behavior to reduce snoop power in chip multiprocessors", in ASPLOS, pp. 60--69, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. W. Thies, V. Chandrasekhar and S. Amarasinghe, "A Practical Approach to Exploiting Coarse-Grained Pipeline Parallelism in C Programs", in MICRO, pp. 356--369, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. H. Albonesi, "Selective Cache Ways: On-Demand Cache Resource Allocation", in 32nd MICRO, pp. 248--259, November 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. Gordon-Ross and F. Vahid, "A self-tuning configurable cache", in DAC, pp. 234--237, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. C. Zhang, F. Vahid and W. Najjar, "A highly configurable cache architecture for embedded systems", in ISCA, pp. 136--146, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Montanaro et al., "A 160Mhz, 32b 0.5W CMOS RISC Microprocessor", in IEEE ISCC, pp. 214--229, February 1996.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. B. Khailany, W. Dally, U. Kapasi, P. Mattson, J. Namkoong, J. Owens, B. Towles, A. Chang and S. Rixner, "Imagine: Media Processing with Streams", IEEE Micro, vol. 21, n. 2, pp. 35--46, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. C. Lee, M. Potkonjak and W. H. Mangione-Smith, "Media-Bench: A Tool for Evaluating and Synthesizing Multimedia and Communications Systems", in MICRO, pp. 330--335, Dec 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M.R Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge and R. B. Brown, "MiBench: A free, commercially representative embedded benchmark suite", in WWC, pp. 3--14, Dec 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. N. Binkert, R. Dreslinski, L. Hsu, K. Lim, A. Saidi and S. Reinhardt, "The M5 Simulator: Modeling Networked Systems", IEEE Micro, vol. 26, n. 4, pp. 52--60, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S. Thoziyoor, N. Muralimanohar, J. Ahn and N. Jouppi, "CACTI 5.1", Technical report, HP Laboratories Palo Alto, April 2008.Google ScholarGoogle Scholar

Index Terms

  1. Low-power inter-core communication through cache partitioning in embedded multiprocessors

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SBCCI '09: Proceedings of the 22nd Annual Symposium on Integrated Circuits and System Design: Chip on the Dunes
          August 2009
          325 pages
          ISBN:9781605587059
          DOI:10.1145/1601896

          Copyright © 2009 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 31 August 2009

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          SBCCI '09 Paper Acceptance Rate50of119submissions,42%Overall Acceptance Rate133of347submissions,38%
        • Article Metrics

          • Downloads (Last 12 months)0
          • Downloads (Last 6 weeks)0

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader