skip to main content
10.1145/1183401.1183440acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
Article

Feedback-directed memory disambiguation through store distance analysis

Published: 28 June 2006 Publication History

Abstract

Feedback-directed optimization has developed into an increasingly important tool in designing optimizing compilers. Based upon profiling, memory distance analysis has shown much promise in predicting data locality and memory dependences, and has seen use in locality based optimizations and memory disambiguation. In this paper, we apply a form of memory distance, called store distance, to the problem of memory disambiguation in out-of-order issue processors. Store distance is defined as the number of store references between a load and the previous store accessing the same memory location. By generating a representative store distance for each load instruction, we can apply a compiler/micro-architecture cooperative scheme to direct run-time load speculation. Using store distance, the processor can, in most cases, accurately determine on which specific store instruction a load depends according to its store distance annotation. Our experiments show that the proposed store distance method performs much better than the previous distance based memory disambiguation scheme, and yields a performance very close to perfect memory disambiguation. The store distance based scheme also outperforms the store set technique with a relatively small predictor space and achieves performance comparable to that of a 16K-entry store set implementation for both floating point and integer programs.

References

[1]
G. Almasi, C. Cascaval, and D. Padua. Calculating stack distance efficiently. In Proceedings of the first ACM Workshop on Memory System Performance, Berlin, Germany, 2002.
[2]
K. Beyls and E. D'Hollander. Reuse distance as a metric for cache behavior. In Proceedings of the IASTED Conference on Parallel and Distributed Computing and Systems, 2001.
[3]
K. Beyls and E. D'Hollander. Reuse distance-based cache hint selection. In Proceedings of the 8th International Euro-Par Conference, August 2002.
[4]
C. Cascaval and D. Padua. Estimating cache misses and locality using stack distance. In Proceedings of the 17th International Conference on Supercomputing, pages 150--159, San Francisco, CA, June 2003.
[5]
G. Z. Chrysos and J. S. Emer. Memory dependence prediction using store sets. In Proceedings of the 25th International Conference on Computer Architecture, pages 142--153, June 1998.
[6]
C. Ding. Improving effective bandwidth through compiler enhancement of global and dynamic reuse. PhD thesis, Rice University, 2000.
[7]
C. Ding and Y. Zhong. Predicting whole-program locality through reuse distance analysis. In Proceedings of the 2003 ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 245--257, San Diego, California, June 2003.
[8]
C. Fang, S. Carr, S. Onder, and Z. Wang. Reuse-distance-based miss-rate prediction on a per instruction basis. In Proceedings of the Second ACM Workshop on Memory System Performance, pages 60--68, Washington, D.C., June 2004.
[9]
C. Fang, S. Carr, S. Onder, and Z. Wang. Instruction based memory distance analysis and its application to optimization. In Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques, St. Louis, MO, September 2005.
[10]
G. Goff, K. Kennedy, and C.-W. Tseng. Practical dependence testing. In Proceedings of the ACM SIGPLAN '91 Conference on Programming Language Design and Implementation, pages 15--29, Toronto, Ontario, June 1991.
[11]
J. Hesson, J. LeBlanc, and S. Ciavaglia. Apparatus to dynamically control the Out-Of-Order execution of Load-Store instructions. US. Patent 5, 615, 350, Filed Dec. 1995, Issued Mar. 1997.
[12]
C. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood. Pin: building customized program analysis tools with dynamic instrumentation. In Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 190--200, Chicago, IL, June 2005.
[13]
G. Marin and J. Mellor-Crummey. Cross architecture performance predictions for scientific applications using parameterized models. In Proceedings of the Joint International Conference on Measurement and Modeling of Computer Systems, New York, NY, June 2004.
[14]
R. L. Mattson, J. Gecsei, D. Slutz, and I. L. Traiger. Evaluation techniques for storage hierarchies. IBM Systems Journal, 9(2):78--117, 1970.
[15]
A. I. Moshovos. Memory Dependence Prediction. PhD thesis, University of Wisconsin - Madison, 1998.
[16]
A. I. Moshovos, S. E. Breach, T. N. Vijaykumar, and G. S. Sohi. Dynamic speculation and synchronization of data dependences. In Proceedings of the 24th International Conference on Computer Architecture, pages 181--193, June 1997.
[17]
S. Onder. Cost effective memory dependence prediction using speculation levels and color sets. In International Conference on Parallel Architectures and Compilation Techniques, pages 232--241, Charlottesville, Virginia, September 2002.
[18]
S. Onder and R. Gupta. Automatic generation of microarchitecture simulators. In IEEE International Conference on Computer Languages, pages 80--89, Chicago, IL, May 1998.
[19]
S. Onder and R. Gupta. Dynamic memory disambiguation in the presence of out-of-order store issuing. Journal of Instruction Level Parallelism, Volume 4, June 2002. (www.microarch.org/vol4).
[20]
G. Reinman, B. Calder, D. Tullsen, G. Tyson, and T. Austin. Classifying load and store instructions for memory renaming. In Proceedings of the 13th Annual ACM International Conference on Supercomputing, pages 399--407, Rhodes, Greece, June 1999.
[21]
X. Shen, Y. Zhong, and C. Ding. Locality phase prediction. In Proceedings of the Eleventh International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-XI), Boston, MA, Oct. 2004.
[22]
A. Srivastava and E. A. Eustace. Atom: A system for building customized program analysis tools. In Proceeding of ACM SIGPLAN Conference on Programming Language Design and Inplementation, June 1994.
[23]
S. Steely, D. Sager, and D. Fite. Memory reference tagging. US. Patent 5, 619, 662, Filed Aug. 1994, Issued Apr. 1997.
[24]
R. A. Sugumar and S. G. Abraham. Efficient simulation of caches under optimal replacement with applications to miss characterization. In Proceedings of the ACM SIGMETRICS Conference on Measurement & Modeling Computer Systems, pages 24--35, Santa Clara, CA, May 1993.
[25]
Z. Wang. Cooperative hardware/software caching for next-generation memory systems. PhD thesis, University of Massachusetts, Amherst, 2004.
[26]
A. Yoaz, M. Erez, R. Ronen, and S. Jourdan. Speculation techniques for improving load related instruction scheduling. In Proceedings of the 26th Annual International Symposium on Computer Architecture, pages 42--53, Atlanta, GA, May 1999.
[27]
Y Zhong, C. Ding, and K. Kennedy. Reuse distance analysis for scientific programs. In Proceedings of Workshop on Language, Compilers, and Runtime Systems for Scalable Compilers, Washington, DC, 2002.
[28]
Y. Zhong, S. Dropsho, and C. Ding. Miss rate prediction across all program inputs. In Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques, pages 91--101, New Orleans, LA, September 2003.
[29]
Y. Zhong, M. Orlovich, X. Shen, and C. Ding. Array regrouping and structure splitting using whole-program reference affinity. In Proceedings of the 2004 ACM SIGPLAN Conference on Programming Language Design and Implementation, Washington, D.C., June 2004.

Cited By

View all
  • (2024)Improving Memory Dependence Prediction with Static AnalysisArchitecture of Computing Systems10.1007/978-3-031-66146-4_20(301-315)Online publication date: 13-May-2024
  • (2017)EEALProceedings of the Great Lakes Symposium on VLSI 201710.1145/3060403.3060445(113-118)Online publication date: 10-May-2017
  • (2011)Automated locality optimization based on the reuse distance of string operationsProceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization10.5555/2190025.2190065(181-190)Online publication date: 2-Apr-2011
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICS '06: Proceedings of the 20th annual international conference on Supercomputing
June 2006
385 pages
ISBN:1595932828
DOI:10.1145/1183401
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 June 2006

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. memory disambiguation
  2. store distance

Qualifiers

  • Article

Conference

ICS06
Sponsor:
ICS06: International Conference on Supercomputing 2006
June 28 - July 1, 2006
Queensland, Cairns, Australia

Acceptance Rates

ICS '06 Paper Acceptance Rate 37 of 141 submissions, 26%;
Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)9
  • Downloads (Last 6 weeks)0
Reflects downloads up to 22 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Improving Memory Dependence Prediction with Static AnalysisArchitecture of Computing Systems10.1007/978-3-031-66146-4_20(301-315)Online publication date: 13-May-2024
  • (2017)EEALProceedings of the Great Lakes Symposium on VLSI 201710.1145/3060403.3060445(113-118)Online publication date: 10-May-2017
  • (2011)Automated locality optimization based on the reuse distance of string operationsProceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization10.5555/2190025.2190065(181-190)Online publication date: 2-Apr-2011
  • (2011)Automated locality optimization based on the reuse distance of string operationsInternational Symposium on Code Generation and Optimization (CGO 2011)10.1109/CGO.2011.5764686(181-190)Online publication date: Apr-2011
  • (2010)Is reuse distance applicable to data locality analysis on chip multiprocessors?Proceedings of the 19th joint European conference on Theory and Practice of Software, international conference on Compiler Construction10.1007/978-3-642-11970-5_15(264-282)Online publication date: 20-Mar-2010
  • (2009)Replacing Associative Load QueuesIEEE Transactions on Computers10.1109/TC.2008.14658:4(496-511)Online publication date: 1-Apr-2009
  • (2009)Spotlight - a low complexity highly accurate profile-based branch predictor2009 IEEE 28th International Performance Computing and Communications Conference10.1109/PCCC.2009.5403813(239-247)Online publication date: Dec-2009
  • (2006)DMDCProceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2006.21(297-308)Online publication date: 9-Dec-2006

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media