skip to main content
article
Free Access

Implementing branch-predictor decay using quasi-static memory cells

Published:01 June 2004Publication History
Skip Abstract Section

Abstract

With semiconductor technology advancing toward deep submicron, leakage energy is of increasing concern, especially for large on-chip array structures such as caches and branch predictors. Recent work has suggested that larger, aggressive branch predictors can and should be used in order to improve microprocessor performance. A further consideration is that more aggressive branch predictors, especially multiported predictors for multiple branch prediction, may be thermal hot spots, thus further increasing leakage. Moreover, as the branch predictor holds state that is transient and predictive, elements can be discarded without adverse effect. For these reasons, it is natural to consider applying decay techniques---already shown to reduce leakage energy for caches---to branch-prediction structures.Due to the structural difference between caches and branch predictors, applying decay techniques to branch predictors is not straightforward. This paper explores the strategies for exploiting spatial and temporal locality to make decay effective for bimodal, gshare, and hybrid predictors, as well as the branch target buffer (BTB). Furthermore, the predictive behavior of branch predictors steers them towards decay based not on state-preserving, static storage cells, but rather quasi-static, dynamic storage cells. This paper will examine the results of implementing decaying branch-predictor structures with dynamic---appropriately, decaying---cells rather than the standard static SRAM cell.Overall, this paper demonstrates that decay techniques can apply to more than just caches, with the branch predictor and BTB as an example. We show decay can either be implemented at the architectural level, or with a wholesale replacement of static storage cells with quasi-static storage cells, which naturally implement decay. More importantly, decay techniques can be applied and should be applied to other such transient and/or predictive structures.

References

  1. Agarwal, A. et al. 2002. DRG-Cache: a data retention gated-ground cache for low power. In Proceedings of the 39th Design Automation Conference.]] Google ScholarGoogle Scholar
  2. Azizi, N. et al. 2002. Low-leakage asymmetric-cell SRAM. In Proceedings of the International Symposium on Low Power Electronics and Design.]] Google ScholarGoogle Scholar
  3. Balasubramonian, R. et al. 2000. Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures. In Proceedings of the 33rd International Symposium on Microarchitecture.]] Google ScholarGoogle Scholar
  4. Brooks, D., Tiwari, V., and Martonosi, M. 2000. Wattch: A framework for architecture-level power analysis and optimizations. In Proceedings of the ISCA-27.]] Google ScholarGoogle Scholar
  5. Burger, D. C. and Austin, T. M. 1997. The SimpleScalar tool set, Version 2.0. Computer Architecture News 25, 3 (June), 13--25.]] Google ScholarGoogle Scholar
  6. Chang, P.-Y., Hao, E., and Patt, Y. N. 1995. Alternative implementations of hybrid branch predictors. In Proceedings of the Micro-28. 252--57.]] Google ScholarGoogle Scholar
  7. Diefendorff, K. 1999. Pentium III = Pentium II + SSE. Microprocessor Report.]]Google ScholarGoogle Scholar
  8. Diodato, P. et al. 1998. Merged DRAM-LOGIC in the Year 2001. In Proceedings of the IEEE International Workshop on Memory, Technology, Design, and Testing.]] Google ScholarGoogle Scholar
  9. Diodato, P. et al. 2001. Embedded DRAM: An element and circuit evaluation. In IEEE Custom Integrated Circuits Conference.]]Google ScholarGoogle Scholar
  10. Diodato, P. W. 2001. Personal communication.]]Google ScholarGoogle Scholar
  11. Flautner, K. et al. 2002. Drowsy caches: Simple techniques for reducing leakage power. In Proceedings of the 29th International Symposium on Computer Architecture.]] Google ScholarGoogle Scholar
  12. Gwennap, L. 1996. Digital 21264 sets new standard. Microprocessor Report, 11--16.]]Google ScholarGoogle Scholar
  13. Hanamura, S. et al. 1987. A 256K CMOS SRAM with internal refresh. In The 1987 IEEE International Solid-State Circuits Conference.]]Google ScholarGoogle Scholar
  14. Hanson, H. et al. 2001. Static energy reduction techniques for microprocessor caches. In Proceedings of the 2001 International Conference on Computer Design. 276--83.]] Google ScholarGoogle Scholar
  15. Heo, S. et al. 2002. Dynamic fine-grain leakage reduction using leakage-biased bit lines. In Proceedings of the 29th International Symposium on Computer Architecture.]] Google ScholarGoogle Scholar
  16. Holgate, R. W. and Ibbett, R. N. 1980. An analysis of instruction fetching strategies in pipelined computers. IEEE Transactions on Computers C-29, 4 (Apr.), 325--329.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Hu, Z. et al. 2002a. Applying decay strategies to branch predictors for leakage energy savings. In Proceedings of the 2002 International Conference on Computer Design.]] Google ScholarGoogle Scholar
  18. Hu, Z. et al. 2002b. Managing leakage for transient data: Decay and quasi-static 4T memory cells. In Proceedings of the 2002 International Symposium on International Symposium on Low Power Electronics and Design.]] Google ScholarGoogle Scholar
  19. Hu, Z., Juang, P., Skadron, K., Clark, D., and Martonosi, M. 2001. Applying Decay Strategies to Branch Predictors for Leakage Energy Savings. Tech. rep., CS-2001-24, University of Virginia.]] Google ScholarGoogle Scholar
  20. Hu, Z., Kaxiras, S., and Martonosi, M. 2002. Let caches decay: Reducing leakage energy via exploitation of cache generational behavior. ACM Trans. Comput. Syst.]] Google ScholarGoogle Scholar
  21. Hu, Z., Kaxiras, S., and Martonosi, M. 2003. Timekeeping techniques for predicting and optimizing memory behavior. In The 2003 IEEE International Solid-State Circuits Conference.]]Google ScholarGoogle Scholar
  22. Butts, J. A. and Sohi, G. 2000. A static power model for architects. In Proceedings of the 33rd International Symposium on Microarchitecture.]] Google ScholarGoogle Scholar
  23. Jiménez, D. A., Keckler, S. W., and Lin, C. 2000. The impact of delay on the design of branch predictors. In Proceedings of the 33rd International Symposium on Microarchitecture. 67--77.]] Google ScholarGoogle Scholar
  24. Juang, P. et al. 2002. Implementing decay techniques using 4T quasi-static memory cells. Comput. Arch. Lett.]]Google ScholarGoogle Scholar
  25. Kaxiras, S., Hu, Z., et al. 2000. Cache-line decay: A mechanism to reduce cache leakage power. In Workshop on Power-Aware Computer Systems (PACS). In conjunction with ASPLOS-IX.]] Google ScholarGoogle Scholar
  26. Kaxiras, S., Hu, Z., and Martonosi, M. 2001. Cache decay: Exploiting generational behavior to reduce cache leakage power. In Proceedings of the 28th International Symposium on Computer Architecture.]] Google ScholarGoogle Scholar
  27. Kesharvarzi, A. et al. 1997. Intrinsic iddq: Origins, reduction, and applications in deep sub-μm low-power CMOS IC's. In Proceedings of the IEEE International Test Conference. 146--155.]] Google ScholarGoogle Scholar
  28. Lai, A., Fide, C., and Falsafi, B. 2001. Dead-block prediction and dead-block correlating prefetchers. In Proceedings of the 28th International Symposium on Computer Architecture.]] Google ScholarGoogle Scholar
  29. Li, L. et al. 2002. Leakage energy management in cache hiearchies. In Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques.]] Google ScholarGoogle Scholar
  30. Li, Y. et al. 2004. State-preserving vs. non-state-preserving leakage control in caches. In Proceedings of the 2004 Design, Automation and Test in Europe (DATE).]] Google ScholarGoogle Scholar
  31. Lipasti, M., Wilkerson, C. B., and Shen, J. P. 1996. Value locality and load value prediction. In Proceedings of the ASPLOS-VII. 138--47.]] Google ScholarGoogle Scholar
  32. Losq, J. J. 1982. Generalized history table for branch prediction. IBM Tech. Discl. Bull. 25, 1 (June), 99--101.]]Google ScholarGoogle Scholar
  33. Lyons, R. et al. 1987. CMOS static memory with a new four-transistor memory cell. In Proceedings of the 1987 Stanford Conference On Advanced Research in VLSI. 111--132.]]Google ScholarGoogle Scholar
  34. McFarling, S. 1993. Combining branch predictors. Tech. Note TN-36, DEC WRL.]]Google ScholarGoogle Scholar
  35. Noda, K. et al. 1998. A 1.9 μm2 loadless CMOS four-transistor SRAM cell in a 0.18 μm logic technology. IEDM Tech. Dig., 847--850.]]Google ScholarGoogle Scholar
  36. Parikh, D., Skadron, K., Zhang, Y., Barcella, M., and Stan, M. 2002. Power issues related to branch prediction. In Proceedings of the HPCA-8. 233--244.]] Google ScholarGoogle Scholar
  37. Powell, M. et al. 2000. Gated-Vdd: A circuit technique to reduce leakage in cache memories. In Proceedings of the International Symposium on Low Power Electronics and Design.]] Google ScholarGoogle Scholar
  38. Roy, K. 1998. Leakage power reduction in low-voltage CMOS designs. In Proceedings of the International Conference on Electronics, Circuits, and Systems. 167--73.]]Google ScholarGoogle Scholar
  39. Sankaranarayanan, K. and Skadron, K. 2004. Profile-based adaptation for cache decay. ACM Trans. Archit. Code Optim. in press.]] Google ScholarGoogle Scholar
  40. Schuster, S., Terman, L., and Franch, R. 1987. A 4-device CMOS static RAM cell using sub-threshold conduction. In Symposium on VLSI Technology, Systems, and Applications.]]Google ScholarGoogle Scholar
  41. Semiconductor Industry Association. 2001. From website: The international technology roadmap for semiconductors. Available at http://public.itrs.net/Files/2001ITRS/Home.htm.]]Google ScholarGoogle Scholar
  42. Seznec, A. et al. 2002. Design tradeoffs for the alpha EV8 conditional branch predictor. In Proceedings of the 2002 International Symposium on Computer Architecture.]] Google ScholarGoogle Scholar
  43. Smith, J. E. 1981. A study of branch prediction strategies. In Proceedings of the 8th International Symposium on Computer Architecture. 135--48.]] Google ScholarGoogle Scholar
  44. Song, P. 1997. UltraSparc-3 aims at MP servers. Microprocessor Report, 29--34.]]Google ScholarGoogle Scholar
  45. The Standard Performance Evaluation Corporation. 2000. Available at http://www.spec.org.]]Google ScholarGoogle Scholar
  46. Velusamy, S. et al. 2002. Adaptive cache decay using formal feedback control. In Proceedings of the 2002 Workshop on Memory Performance Issues. In conjunction with ISCA-29).]]Google ScholarGoogle Scholar
  47. Wolf, W. 1998. Modern VLSI Design: Systems on Silicon. Prentice Hall. Prentice-Hall.]] Google ScholarGoogle Scholar
  48. Yang, S. et al. 2002. Exploiting choice in resizable cache design to optimize deep-submicron processor energy-delay. In Proceedings of the 8th International Symposium on High-Performance Computer Architecture.]] Google ScholarGoogle Scholar
  49. Yang, S.-H. et al. 2001. An integrated circuit/architecture approach to reducing leakage in deep-submicron high-performance I-caches. In Proceedings of the 7th International Symposium on High-Performance Computer Architecture.]] Google ScholarGoogle Scholar
  50. Zhang, W. et al. 2002. Compiler-directed instruction cache leakage optimization. In Proceedings of the 35th Annual IEEE/ACM International Symposium on Microarchitecture.]] Google ScholarGoogle Scholar
  51. Zhou, H. et al. 2003. Adaptive mode control: A static-power-efficient cache design. ACM Trans. Embedded Comput. Syst. Special issue on Power-Aware Embedded Computing.]] Google ScholarGoogle Scholar
  52. Zhou, H., Toburen, M., Rotenberg, E., and Conte, T. 2001. Adaptive mode control: A static-power-efficient cache design. In Proceedings 2001 International Conference on Parallel Architectures and Compilation.]] Google ScholarGoogle Scholar

Index Terms

  1. Implementing branch-predictor decay using quasi-static memory cells

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Architecture and Code Optimization
      ACM Transactions on Architecture and Code Optimization  Volume 1, Issue 2
      June 2004
      119 pages
      ISSN:1544-3566
      EISSN:1544-3973
      DOI:10.1145/1011528
      Issue’s Table of Contents

      Copyright © 2004 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 1 June 2004
      Published in taco Volume 1, Issue 2

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader