skip to main content
research-article

Predictor virtualization

Published: 01 March 2008 Publication History

Abstract

Many hardware optimizations rely on collecting information about program behavior at runtime. This information is stored in lookup tables. To be accurate and effective, these optimizations usually require large dedicated on-chip tables. Although technology advances offer an increased amount of on-chip resources, these resources are allocated to increase the size of on-chip conventional cache hierarchies.
This work proposes Predictor Virtualization, a technique that uses the existing memory hierarchy to emulate large predictor tables. We demonstrate the benefits of this technique by virtualizing a state-of-the-art data prefetcher. Full-system, cycle-accurate simulations demonstrate that the virtualized prefetcher preserves the performance benefits of the original design, while reducing the on-chip storage dedicated to the predictor table from 60KB down to less than one kilobyte.

Supplementary Material

index.html (index.html)
Slides from the presentation
ZIP File (p157-ioana-slides.zip)
Supplemental material for Predictor virtualization
Audio only (1346301.mp3)
Video (1346301.mp4)

References

[1]
Almog, Y., Rosner, R., Schwartz, N., and Schmorak, A. Specialized Dynamic Optimizations for High-Performance Energy-Efficient Microarchitecture. In Proc. of the Intl' Symposium on Code Generation and Optimization, 2004.
[2]
Barham, P., Dragovic, B., Fraser, K., Hand, S., Harris, T., Ho, A., Neugebauer, R., Pratt, I., and Warfield, A. Xen and the art of virtualization. In Proc. of the 19th Symposium on Operating Systems Principles, 2003.
[3]
Barroso, L. A., Gharachorloo, K., McNamara, R., Nowatzyk, A., Qadeer, S., Sano, B., Smith, S., Stets, R., and Verghese, B. Piranha: a scalable architecture based on single-chipu multiprocessing. In Proc. Intl' Symposium on Computer Architecture, 2000.
[4]
Cantin, J. F., Lipasti, M. H., and Smith, J. E. Stealth prefetching. In Proc. of the 12th Intl' Conference on Architectural Support For Programming Languages and Operating Systems, 2006.
[5]
Chaiken, D., Kubiatowicz, J., and Agarwal, A. LimitLESS directories: A scalable cache coherence scheme. In Proc. of the Intl' Conference on Architectural Support For Programming Languages and Operating Systems, 1991.
[6]
Clark, C., Fraser, K., Hand, S., Hansen, J. G., Jul, E., Limpach, C., Pratt, I., and Warfield, A. Live migration of virtual machines. In Proc. of the 2nd Symposium on Networked Systems Design & Implementation, 2005.
[7]
Cooksey, R., Jourdan, S., and Grunwald, D. A stateless, content-directed data prefetching mechanism. In Proc. of the 10th Intl' Conference on Architectural Support For Programming Languages and Operating Systems, 2002.
[8]
Collins, J., Sair, S., Calder, B., and Tullsen, D. M. Pointer cache assisted prefetching. In Proc. of the 35th Intl' Symposium on Microarchitecture, 2002.
[9]
Ekman, M., and Stenström, P. Enhancing multiprocessor architecture simulation speed using matched-pair comparison. Proc. Intl' Symp. on the Performance Analysis of Systems and Software, 2005.
[10]
Ferdman, M., and Falsafi, B. Last-Touch Correlated Data Streaming. In Proc. of the Intl' Symposium on Performance Analysis of Systems and Software, 2007.
[11]
Gniady, C. and Falsafi, B. Speculative sequential consistency with little custom storage. In Proc. of the Intl' Conference on Parallel Architectures and Compilation Techniques, 2002.
[12]
Hardavellas, N., Somogyi, S., Wenisch, T. F., Wunderlich, R. E., Chen, S., Kim, J., Falsafi, B, Hoe, J. C., and Nowatzyk, A. G. SimFlex: A fast, accurate, flexible full-system simulation framework for performance evaluation of server architecture. SIGMETRICS Performance Evaluation Review, 2004.
[13]
Hu, Z., Martonosi, M., and Kaxiras, S. Timekeeping in the Memory System: Predicting and Optimizing Memory Behavior. In Proc.of the 29th Intl' Symposium on Computer Architecture, 2002.
[14]
Jerger, N., Hill, E., and Lipasti, M. Friendly Fire: Understanding the Effects of Multiprocessor Prefetching. In Proc. of the International Symposium on Performance Analysis of Systems and Software, 2006.
[15]
Keltcher, C.N., McGrath, K.J., Ahmed, A., Conway, P. The AMD Opteron processor for multiprocessor servers. IEEE Micro, 23(2): 66--76, 2003.
[16]
Lipasti, M. H. and Shen, J. P. Exceeding the dataflow limit via value prediction. In Proc. of the 29th Intl' Symposium on Microarchitecture, pages 226--237, 1996.
[17]
Lipasti, M. H., Wilkerson, C. B., and Shen, J. P. Value locality and load value prediction. In Proc. of the Seventh Intl' Conference on Architectural Support For Programming Languages and Operating Systems, 1996.
[18]
Nesbit, K. J., and Smith, J. E. Data Cache Prefetching Using a Global History Buffer. In the Proc. of the 10th Intl' Symposium on High Performance Computer Architecture, 2004.
[19]
Patel, S.J., and Lumetta, S.S. rePLay: A hardware framework for dynamic optimization. Transactions on Computers, 50(6): 590--608, 2001.
[20]
Qureshi, M.K., Lynch, D.N., Mutlu, O., Patt, Y. N., A Case for MLP-Aware Cache Replacement, In Proc. of the 33rd Intl' Symposium on Computer Architecture, 2006.
[21]
Rajwar, R., Herlihy, M., and Lai, K. Virtualizing Transactional Memory. In Proc. of the 32nd Intl' Symposium on Computer Architecture, 2005.
[22]
Ranganathan, P., Adve, S., and Jouppi, N. P. Reconfigurable caches and their application to media processing. In Proc. of the 27th Intl' Symposium on Computer Architecture 2000.
[23]
Rosner, R., Almog, Y., Moffie, M., Schwartz, N., and Mendelson, A. Power awareness through selective dynamically optimized traces. In Proc. of the 31th Intl' Symposium on Computer Architecture, 2004.
[24]
Sazeides, Y. and Smith, J. E.The predictability of data values. In Proc. of the 30th Intl' Symposium on Microarchitecture, 1997
[25]
Sherwood, T., Sair, S., and Calder, B. Predictor-directed stream buffers. In Proc. of the 33rd Intl' Symposium on Microarchitecture, 2000
[26]
Sodani, A. and Sohi, G. S. Dynamic instruction reuse. In Proc. of the 24th Intl' Symposium on Computer Architecture, 1997
[27]
Somogyi, S., Wenisch, T. F., Ailamaki, A., Falsafi, B., Moshovos, A. Spatial Memory Streaming. In Proc. Intl' Symposium on Computer Architecture, 2006.
[28]
Tendler, J., Dodson, S., and Fields, S. IBM eServer Power4 System Microarchitecture, Technical White Paper, IBM Server Group, 2001
[29]
VMWare -- http://www.vmware.com
[30]
Wang, K. and Franklin, M. Highly accurate data value prediction using hybrid predictors. In the Proc. of the 30th Intl' Symposium on Microarchitecture, 1997.
[31]
Wang, Z., Burger, D., McKinley, K. S., Reinhardt, S. K., and Weems, C. C. Guided region prefetching: a cooperative hardware/software approach. In Proc. of the 30th Intl' Symposium on Computer Architecture, 2003
[32]
Wenisch, T. F., Somogyi, S., Hardavellas, N., Kim, J., Ailamaki, A., and Falsafi, B. Temporal Streaming of Shared Memory. In Proc. of the 32nd Intl' Symposium on Computer Architecture, 2005.
[33]
Wenisch, T.F., Wunderlich, R. E., Ferdman, M., Ailamaki, A., Falsafi, B., and Hoe, J. C. SimFlex: statistical sampling of computer system simuation. IEEE Micro, 26(4): 18--31, 2006.
[34]
Wunderlich, R. E., Wenisch, T. F., Falsafi, B., Hoe, J. C. SMARTS: Accelerating microarchitecture simulation via rigorous statistical sampling. In Proc. of the 30th Intl' Symposium on Computer Architecture, 2003.
[35]
Zhang, W., Calder, B., and Tullsen, D. M. An Event-Driven Multithreaded Dynamic Optimization Framework. In Proc. of the 14th Intl' Conference on Parallel Architectures and Compilation Techniques, 2005.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGOPS Operating Systems Review
ACM SIGOPS Operating Systems Review  Volume 42, Issue 2
ASPLOS '08
March 2008
339 pages
ISSN:0163-5980
DOI:10.1145/1353535
Issue’s Table of Contents
  • cover image ACM Conferences
    ASPLOS XIII: Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
    March 2008
    352 pages
    ISBN:9781595939586
    DOI:10.1145/1346281
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 March 2008
Published in SIGOPS Volume 42, Issue 2

Check for updates

Author Tags

  1. caches
  2. memory hierarchy
  3. metadata
  4. predictor virtualization

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)20
  • Downloads (Last 6 weeks)3
Reflects downloads up to 30 Jan 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media