Abstract
With semiconductor technology advancing toward deep submicron, leakage energy is of increasing concern, especially for large on-chip array structures such as caches and branch predictors. Recent work has suggested that larger, aggressive branch predictors can and should be used in order to improve microprocessor performance. A further consideration is that more aggressive branch predictors, especially multiported predictors for multiple branch prediction, may be thermal hot spots, thus further increasing leakage. Moreover, as the branch predictor holds state that is transient and predictive, elements can be discarded without adverse effect. For these reasons, it is natural to consider applying decay techniques---already shown to reduce leakage energy for caches---to branch-prediction structures.Due to the structural difference between caches and branch predictors, applying decay techniques to branch predictors is not straightforward. This paper explores the strategies for exploiting spatial and temporal locality to make decay effective for bimodal, gshare, and hybrid predictors, as well as the branch target buffer (BTB). Furthermore, the predictive behavior of branch predictors steers them towards decay based not on state-preserving, static storage cells, but rather quasi-static, dynamic storage cells. This paper will examine the results of implementing decaying branch-predictor structures with dynamic---appropriately, decaying---cells rather than the standard static SRAM cell.Overall, this paper demonstrates that decay techniques can apply to more than just caches, with the branch predictor and BTB as an example. We show decay can either be implemented at the architectural level, or with a wholesale replacement of static storage cells with quasi-static storage cells, which naturally implement decay. More importantly, decay techniques can be applied and should be applied to other such transient and/or predictive structures.
- Agarwal, A. et al. 2002. DRG-Cache: a data retention gated-ground cache for low power. In Proceedings of the 39th Design Automation Conference.]] Google Scholar
- Azizi, N. et al. 2002. Low-leakage asymmetric-cell SRAM. In Proceedings of the International Symposium on Low Power Electronics and Design.]] Google Scholar
- Balasubramonian, R. et al. 2000. Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures. In Proceedings of the 33rd International Symposium on Microarchitecture.]] Google Scholar
- Brooks, D., Tiwari, V., and Martonosi, M. 2000. Wattch: A framework for architecture-level power analysis and optimizations. In Proceedings of the ISCA-27.]] Google Scholar
- Burger, D. C. and Austin, T. M. 1997. The SimpleScalar tool set, Version 2.0. Computer Architecture News 25, 3 (June), 13--25.]] Google Scholar
- Chang, P.-Y., Hao, E., and Patt, Y. N. 1995. Alternative implementations of hybrid branch predictors. In Proceedings of the Micro-28. 252--57.]] Google Scholar
- Diefendorff, K. 1999. Pentium III = Pentium II + SSE. Microprocessor Report.]]Google Scholar
- Diodato, P. et al. 1998. Merged DRAM-LOGIC in the Year 2001. In Proceedings of the IEEE International Workshop on Memory, Technology, Design, and Testing.]] Google Scholar
- Diodato, P. et al. 2001. Embedded DRAM: An element and circuit evaluation. In IEEE Custom Integrated Circuits Conference.]]Google Scholar
- Diodato, P. W. 2001. Personal communication.]]Google Scholar
- Flautner, K. et al. 2002. Drowsy caches: Simple techniques for reducing leakage power. In Proceedings of the 29th International Symposium on Computer Architecture.]] Google Scholar
- Gwennap, L. 1996. Digital 21264 sets new standard. Microprocessor Report, 11--16.]]Google Scholar
- Hanamura, S. et al. 1987. A 256K CMOS SRAM with internal refresh. In The 1987 IEEE International Solid-State Circuits Conference.]]Google Scholar
- Hanson, H. et al. 2001. Static energy reduction techniques for microprocessor caches. In Proceedings of the 2001 International Conference on Computer Design. 276--83.]] Google Scholar
- Heo, S. et al. 2002. Dynamic fine-grain leakage reduction using leakage-biased bit lines. In Proceedings of the 29th International Symposium on Computer Architecture.]] Google Scholar
- Holgate, R. W. and Ibbett, R. N. 1980. An analysis of instruction fetching strategies in pipelined computers. IEEE Transactions on Computers C-29, 4 (Apr.), 325--329.]]Google ScholarDigital Library
- Hu, Z. et al. 2002a. Applying decay strategies to branch predictors for leakage energy savings. In Proceedings of the 2002 International Conference on Computer Design.]] Google Scholar
- Hu, Z. et al. 2002b. Managing leakage for transient data: Decay and quasi-static 4T memory cells. In Proceedings of the 2002 International Symposium on International Symposium on Low Power Electronics and Design.]] Google Scholar
- Hu, Z., Juang, P., Skadron, K., Clark, D., and Martonosi, M. 2001. Applying Decay Strategies to Branch Predictors for Leakage Energy Savings. Tech. rep., CS-2001-24, University of Virginia.]] Google Scholar
- Hu, Z., Kaxiras, S., and Martonosi, M. 2002. Let caches decay: Reducing leakage energy via exploitation of cache generational behavior. ACM Trans. Comput. Syst.]] Google Scholar
- Hu, Z., Kaxiras, S., and Martonosi, M. 2003. Timekeeping techniques for predicting and optimizing memory behavior. In The 2003 IEEE International Solid-State Circuits Conference.]]Google Scholar
- Butts, J. A. and Sohi, G. 2000. A static power model for architects. In Proceedings of the 33rd International Symposium on Microarchitecture.]] Google Scholar
- Jiménez, D. A., Keckler, S. W., and Lin, C. 2000. The impact of delay on the design of branch predictors. In Proceedings of the 33rd International Symposium on Microarchitecture. 67--77.]] Google Scholar
- Juang, P. et al. 2002. Implementing decay techniques using 4T quasi-static memory cells. Comput. Arch. Lett.]]Google Scholar
- Kaxiras, S., Hu, Z., et al. 2000. Cache-line decay: A mechanism to reduce cache leakage power. In Workshop on Power-Aware Computer Systems (PACS). In conjunction with ASPLOS-IX.]] Google Scholar
- Kaxiras, S., Hu, Z., and Martonosi, M. 2001. Cache decay: Exploiting generational behavior to reduce cache leakage power. In Proceedings of the 28th International Symposium on Computer Architecture.]] Google Scholar
- Kesharvarzi, A. et al. 1997. Intrinsic iddq: Origins, reduction, and applications in deep sub-μm low-power CMOS IC's. In Proceedings of the IEEE International Test Conference. 146--155.]] Google Scholar
- Lai, A., Fide, C., and Falsafi, B. 2001. Dead-block prediction and dead-block correlating prefetchers. In Proceedings of the 28th International Symposium on Computer Architecture.]] Google Scholar
- Li, L. et al. 2002. Leakage energy management in cache hiearchies. In Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques.]] Google Scholar
- Li, Y. et al. 2004. State-preserving vs. non-state-preserving leakage control in caches. In Proceedings of the 2004 Design, Automation and Test in Europe (DATE).]] Google Scholar
- Lipasti, M., Wilkerson, C. B., and Shen, J. P. 1996. Value locality and load value prediction. In Proceedings of the ASPLOS-VII. 138--47.]] Google Scholar
- Losq, J. J. 1982. Generalized history table for branch prediction. IBM Tech. Discl. Bull. 25, 1 (June), 99--101.]]Google Scholar
- Lyons, R. et al. 1987. CMOS static memory with a new four-transistor memory cell. In Proceedings of the 1987 Stanford Conference On Advanced Research in VLSI. 111--132.]]Google Scholar
- McFarling, S. 1993. Combining branch predictors. Tech. Note TN-36, DEC WRL.]]Google Scholar
- Noda, K. et al. 1998. A 1.9 μm2 loadless CMOS four-transistor SRAM cell in a 0.18 μm logic technology. IEDM Tech. Dig., 847--850.]]Google Scholar
- Parikh, D., Skadron, K., Zhang, Y., Barcella, M., and Stan, M. 2002. Power issues related to branch prediction. In Proceedings of the HPCA-8. 233--244.]] Google Scholar
- Powell, M. et al. 2000. Gated-Vdd: A circuit technique to reduce leakage in cache memories. In Proceedings of the International Symposium on Low Power Electronics and Design.]] Google Scholar
- Roy, K. 1998. Leakage power reduction in low-voltage CMOS designs. In Proceedings of the International Conference on Electronics, Circuits, and Systems. 167--73.]]Google Scholar
- Sankaranarayanan, K. and Skadron, K. 2004. Profile-based adaptation for cache decay. ACM Trans. Archit. Code Optim. in press.]] Google Scholar
- Schuster, S., Terman, L., and Franch, R. 1987. A 4-device CMOS static RAM cell using sub-threshold conduction. In Symposium on VLSI Technology, Systems, and Applications.]]Google Scholar
- Semiconductor Industry Association. 2001. From website: The international technology roadmap for semiconductors. Available at http://public.itrs.net/Files/2001ITRS/Home.htm.]]Google Scholar
- Seznec, A. et al. 2002. Design tradeoffs for the alpha EV8 conditional branch predictor. In Proceedings of the 2002 International Symposium on Computer Architecture.]] Google Scholar
- Smith, J. E. 1981. A study of branch prediction strategies. In Proceedings of the 8th International Symposium on Computer Architecture. 135--48.]] Google Scholar
- Song, P. 1997. UltraSparc-3 aims at MP servers. Microprocessor Report, 29--34.]]Google Scholar
- The Standard Performance Evaluation Corporation. 2000. Available at http://www.spec.org.]]Google Scholar
- Velusamy, S. et al. 2002. Adaptive cache decay using formal feedback control. In Proceedings of the 2002 Workshop on Memory Performance Issues. In conjunction with ISCA-29).]]Google Scholar
- Wolf, W. 1998. Modern VLSI Design: Systems on Silicon. Prentice Hall. Prentice-Hall.]] Google Scholar
- Yang, S. et al. 2002. Exploiting choice in resizable cache design to optimize deep-submicron processor energy-delay. In Proceedings of the 8th International Symposium on High-Performance Computer Architecture.]] Google Scholar
- Yang, S.-H. et al. 2001. An integrated circuit/architecture approach to reducing leakage in deep-submicron high-performance I-caches. In Proceedings of the 7th International Symposium on High-Performance Computer Architecture.]] Google Scholar
- Zhang, W. et al. 2002. Compiler-directed instruction cache leakage optimization. In Proceedings of the 35th Annual IEEE/ACM International Symposium on Microarchitecture.]] Google Scholar
- Zhou, H. et al. 2003. Adaptive mode control: A static-power-efficient cache design. ACM Trans. Embedded Comput. Syst. Special issue on Power-Aware Embedded Computing.]] Google Scholar
- Zhou, H., Toburen, M., Rotenberg, E., and Conte, T. 2001. Adaptive mode control: A static-power-efficient cache design. In Proceedings 2001 International Conference on Parallel Architectures and Compilation.]] Google Scholar
Index Terms
- Implementing branch-predictor decay using quasi-static memory cells
Recommendations
NTB branch predictor: dynamic branch predictor for high-performance embedded processors
Branch prediction accuracy becomes more crucial in high-performance embedded processors. The importance of branch prediction in embedded processors continues to grow in the future. Many branch predictors have been proposed to alleviate the performance ...
Bias-Free Branch Predictor
MICRO-47: Proceedings of the 47th Annual IEEE/ACM International Symposium on MicroarchitecturePrior research in neutrally-inspired perceptron predictors and Geometric History Length-based TAGE predictors has shown significant improvements in branch prediction accuracy by exploiting correlations in long branch histories. However, not all branches ...
Comments