ABSTRACT
Recent work has shown that silent stores--stores which write a value matching the one already stored at the memory location--occur quite frequently and can be exploited to reduce memory traffic and improve performance. This paper extends the definition of silent stores to encompass sets of stores that change the value stored at a memory location, but only temporarily, and subsequently return a previous value of interest to the memory location. The stores that cause the value to revert are called temporally silent stores. We redefine multiprocessor sharing to account for temporal silence and show that in the limit, up to 45% of communication misses in scientific and commercial applications can be eliminated by exploiting values that change only temporarily. We describe a practical mechanism that detects temporally silent stores and removes the coherence traffic they cause in conventional multiprocessors. We find that up to 42% of communication misses can be eliminated with a simple extension to the MESI protocol. Further, we examine application and operating system code to provide insight into the temporal silence phenomenon and characterize temporal silence by examining value frequencies and dynamic instruction distances between temporally silent pairs. These studies indicate that the operating system is involved heavily in temporal silence, in both commercial and scientific workloads, and that while detectable synchronization primitives provide substantial contributions, significant opportunity exists outside these references.
- H. Akkary and M. A. Driscoll. A dynamic multithreading processor. In Proceedings of the 31st Annual International Symposium on Microarchitecture, pages 226-236, Dallas, TX, USA, 30 November-2 December 1998. ACM Press. Google ScholarDigital Library
- A. Alameldeen, C. Mauer, M. Xu, P. Harper, M. Martin, D. Sorin, M. Hill, and D. Wood. Evaluating non-deterministic multi-threaded commercial workloads. In Proceedings of Computer Architecture Evaluation using Commercial Workloads (CAECW-02), February 2002.Google Scholar
- L. Barroso, K. Gharachorloo, and F. Bugnion. Memory system characterization of commercial workloads. In Proceedings of the 25th Annual International Symposium on Computer Architecture, pages 3-14, June 1998. Google ScholarDigital Library
- G. B. Bell, K. M. Lepak, and M. H. Lipasti. A characterization of silent stores. In Proceedings of PACT-2000, Philadelphia, PA, October 2000. Google ScholarDigital Library
- J. Borkenhagen and S. Storino. 5th Generation 64-bit Power-PC-Compatible Commercial Processor Design. IBM White-paper available from http://www.rs6000.ibm.com, 1999.Google Scholar
- H. W. Cain, R. Rajwar, M. Marden, and M. H. Lipasti. An architectural characterization of java tpc-w. In Proc. of HPCA-7, January 2001.Google Scholar
- M. Cintra and J. Torrellas. Eliminating squashes through learning cross-thread violations in speculative parallelization for multiprocessors. In HPCA, 2002. Google ScholarDigital Library
- IBM Corporation. AIX v4.3 online documentation. http://nc-sp.upenn.edu/aix4.3html/, 2002.Google Scholar
- D. Culler and J. P. Singh. Parallel Computer Architecture: A Hardware/Software Approach. Morgan Kaufmann Publishers, Inc., San Mateo, CA, 1999. Google ScholarDigital Library
- M. Dubois, J. Skeppstedt, L. Ricciulli, K. Ramamurthy, and P. Stenström. The Detection and Elimination of Useless Misses in Multiprocessors. In 20th Annual International Symposium on Computer Architecture, May 1993. Google ScholarDigital Library
- J. R. Goodman and P. J. Woest. The wisconsin multicube: A new large-scale cache coherent multiprocessor. In Proceedings of the 15th Annual International Symposium on Computer Architecture, June 1988. Google ScholarDigital Library
- S. Kaxiras and J. R. Goodman. Improving CC-NUMA performance using instruction-based prediction. In Proceedings of HPCA-5, Orlando, January 1999. Google ScholarDigital Library
- T. Keller, A. M. Maynard, R. Simpson, and P. Bohrer. Simos-ppc full system simulator. http://www.cs.utexas.edu/users/cart/simOS.Google Scholar
- G. Lauterbach and T. Horel. UltraSPARC-III: designing third generation 64-bit performance. IEEE Micro, 19(3):56-66, 1999. Google ScholarDigital Library
- K. M. Lepak, G. B. Bell, and M. H. Lipasti. Silent stores and store value locality. IEEE Transactions on Computers, 50(11), November 2001. Google ScholarDigital Library
- K. M. Lepak and M. H. Lipasti. On the value locality of store instructions. In Proceedings of ISCA-2000, Vancouver, B.C., Canada, June 2000. Google ScholarDigital Library
- K. M. Lepak and M. H. Lipasti. Silent stores for free. In Proceedings of MICRO-2000, Monterrey, CA, November 2000. Google ScholarDigital Library
- M. M. K. Martin, D. J. Sorin, A. Ailamaki, A. R. Alameldeen, R. M. Dickson, C. J. Mauer, K. E. Moore, M. Plakal, M. D. Hill, and D. A. Wood. Timestamp snooping: An approach for extending SMPs. ACM SIG-PLAN Notices, 35(11):25-36, November 2000. Google ScholarDigital Library
- C. Moore. POWER4 system microarchitecture. In Proceedings of the Microprocessor Forum, October 2000.Google Scholar
- R. Rajwar and J. R. Goodman. Speculative lock elision: Enabling highly concurrent multithreaded execution. In MICRO-34, December 2001. Google ScholarDigital Library
- J. G. Steffan, C. B. Colohan, A. Zhai, and T. C. Mowry. Improving value communication for thread-level speculation. In HPCA, 2002. Google ScholarDigital Library
- S. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. The SPLASH-2 programs: Characterization and methodological considerations. In Proceedings of the 22th International Symposium on Computer Architecture, June 1995. Google ScholarDigital Library
- Temporally silent stores
Recommendations
Temporally silent stores
Special Issue: Proceedings of the 10th annual conference on Architectural Support for Programming Languages and Operating SystemsRecent work has shown that silent stores--stores which write a value matching the one already stored at the memory location--occur quite frequently and can be exploited to reduce memory traffic and improve performance. This paper extends the definition ...
Temporally silent stores
Recent work has shown that silent stores--stores which write a value matching the one already stored at the memory location--occur quite frequently and can be exploited to reduce memory traffic and improve performance. This paper extends the definition ...
Temporally silent stores
Recent work has shown that silent stores--stores which write a value matching the one already stored at the memory location--occur quite frequently and can be exploited to reduce memory traffic and improve performance. This paper extends the definition ...
Comments