skip to main content
10.1145/605397.605401acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
Article

Temporally silent stores

Published:01 October 2002Publication History

ABSTRACT

Recent work has shown that silent stores--stores which write a value matching the one already stored at the memory location--occur quite frequently and can be exploited to reduce memory traffic and improve performance. This paper extends the definition of silent stores to encompass sets of stores that change the value stored at a memory location, but only temporarily, and subsequently return a previous value of interest to the memory location. The stores that cause the value to revert are called temporally silent stores. We redefine multiprocessor sharing to account for temporal silence and show that in the limit, up to 45% of communication misses in scientific and commercial applications can be eliminated by exploiting values that change only temporarily. We describe a practical mechanism that detects temporally silent stores and removes the coherence traffic they cause in conventional multiprocessors. We find that up to 42% of communication misses can be eliminated with a simple extension to the MESI protocol. Further, we examine application and operating system code to provide insight into the temporal silence phenomenon and characterize temporal silence by examining value frequencies and dynamic instruction distances between temporally silent pairs. These studies indicate that the operating system is involved heavily in temporal silence, in both commercial and scientific workloads, and that while detectable synchronization primitives provide substantial contributions, significant opportunity exists outside these references.

References

  1. H. Akkary and M. A. Driscoll. A dynamic multithreading processor. In Proceedings of the 31st Annual International Symposium on Microarchitecture, pages 226-236, Dallas, TX, USA, 30 November-2 December 1998. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Alameldeen, C. Mauer, M. Xu, P. Harper, M. Martin, D. Sorin, M. Hill, and D. Wood. Evaluating non-deterministic multi-threaded commercial workloads. In Proceedings of Computer Architecture Evaluation using Commercial Workloads (CAECW-02), February 2002.Google ScholarGoogle Scholar
  3. L. Barroso, K. Gharachorloo, and F. Bugnion. Memory system characterization of commercial workloads. In Proceedings of the 25th Annual International Symposium on Computer Architecture, pages 3-14, June 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. G. B. Bell, K. M. Lepak, and M. H. Lipasti. A characterization of silent stores. In Proceedings of PACT-2000, Philadelphia, PA, October 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. Borkenhagen and S. Storino. 5th Generation 64-bit Power-PC-Compatible Commercial Processor Design. IBM White-paper available from http://www.rs6000.ibm.com, 1999.Google ScholarGoogle Scholar
  6. H. W. Cain, R. Rajwar, M. Marden, and M. H. Lipasti. An architectural characterization of java tpc-w. In Proc. of HPCA-7, January 2001.Google ScholarGoogle Scholar
  7. M. Cintra and J. Torrellas. Eliminating squashes through learning cross-thread violations in speculative parallelization for multiprocessors. In HPCA, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. IBM Corporation. AIX v4.3 online documentation. http://nc-sp.upenn.edu/aix4.3html/, 2002.Google ScholarGoogle Scholar
  9. D. Culler and J. P. Singh. Parallel Computer Architecture: A Hardware/Software Approach. Morgan Kaufmann Publishers, Inc., San Mateo, CA, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Dubois, J. Skeppstedt, L. Ricciulli, K. Ramamurthy, and P. Stenström. The Detection and Elimination of Useless Misses in Multiprocessors. In 20th Annual International Symposium on Computer Architecture, May 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. R. Goodman and P. J. Woest. The wisconsin multicube: A new large-scale cache coherent multiprocessor. In Proceedings of the 15th Annual International Symposium on Computer Architecture, June 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. S. Kaxiras and J. R. Goodman. Improving CC-NUMA performance using instruction-based prediction. In Proceedings of HPCA-5, Orlando, January 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. T. Keller, A. M. Maynard, R. Simpson, and P. Bohrer. Simos-ppc full system simulator. http://www.cs.utexas.edu/users/cart/simOS.Google ScholarGoogle Scholar
  14. G. Lauterbach and T. Horel. UltraSPARC-III: designing third generation 64-bit performance. IEEE Micro, 19(3):56-66, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. K. M. Lepak, G. B. Bell, and M. H. Lipasti. Silent stores and store value locality. IEEE Transactions on Computers, 50(11), November 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. K. M. Lepak and M. H. Lipasti. On the value locality of store instructions. In Proceedings of ISCA-2000, Vancouver, B.C., Canada, June 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. K. M. Lepak and M. H. Lipasti. Silent stores for free. In Proceedings of MICRO-2000, Monterrey, CA, November 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. M. K. Martin, D. J. Sorin, A. Ailamaki, A. R. Alameldeen, R. M. Dickson, C. J. Mauer, K. E. Moore, M. Plakal, M. D. Hill, and D. A. Wood. Timestamp snooping: An approach for extending SMPs. ACM SIG-PLAN Notices, 35(11):25-36, November 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. C. Moore. POWER4 system microarchitecture. In Proceedings of the Microprocessor Forum, October 2000.Google ScholarGoogle Scholar
  20. R. Rajwar and J. R. Goodman. Speculative lock elision: Enabling highly concurrent multithreaded execution. In MICRO-34, December 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. G. Steffan, C. B. Colohan, A. Zhai, and T. C. Mowry. Improving value communication for thread-level speculation. In HPCA, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. S. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. The SPLASH-2 programs: Characterization and methodological considerations. In Proceedings of the 22th International Symposium on Computer Architecture, June 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  1. Temporally silent stores

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        ASPLOS X: Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
        October 2002
        318 pages
        ISBN:1581135742
        DOI:10.1145/605397
        • cover image ACM SIGARCH Computer Architecture News
          ACM SIGARCH Computer Architecture News  Volume 30, Issue 5
          Special Issue: Proceedings of the 10th annual conference on Architectural Support for Programming Languages and Operating Systems
          December 2002
          296 pages
          ISSN:0163-5964
          DOI:10.1145/635506
          Issue’s Table of Contents
        • cover image ACM SIGOPS Operating Systems Review
          ACM SIGOPS Operating Systems Review  Volume 36, Issue 5
          December 2002
          296 pages
          ISSN:0163-5980
          DOI:10.1145/635508
          Issue’s Table of Contents
        • cover image ACM SIGPLAN Notices
          ACM SIGPLAN Notices  Volume 37, Issue 10
          October 2002
          296 pages
          ISSN:0362-1340
          EISSN:1558-1160
          DOI:10.1145/605432
          Issue’s Table of Contents

        Copyright © 2002 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 1 October 2002

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • Article

        Acceptance Rates

        ASPLOS X Paper Acceptance Rate24of175submissions,14%Overall Acceptance Rate535of2,713submissions,20%

        Upcoming Conference

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader