ABSTRACT
Compiler-directed Computation Reuse (CCR) enhances program execution speed and efficiency by eliminating dynamic computation redundancy. In this approach, the compiler designates large program regions for potential reuse. During run time, the execution results of these reusable regions are recorded into hardware buffers for future reuse. Previous work shows that CCR can result in significant performance enhancements in general applications. A major limitation of the work is that the compiler relies on value profiling to identify reusable regions, making it difficult to deploy the scheme in many software production environments. This paper presents a new hardware model that alleviates the need for value profiling at compile time. The compiler is allowed to designate reusable regions that may prove to be inappropriate. The hardware mechanism monitors the dynamic behavior of compiler-designated regions and selectively activates the profitable ones at run time. Experimental results show that the proposed design makes more effective utilization of hardware buffer resources, achieves rapid employment of computation regions, and improves reuse accuracy, all of which promote more flexible compiler methods of identifying reusable computation regions.
- 1.J. Auslander, M. Philipose, C. Chambers, S. Eggers, and B. Bershad. Fast, effective dynamic compilation. In Proceedings of the ACM SIGPLAN 1996 Conference on Programming Language Design and Implementation, volume 31, pages 149-159, June 1996. Google ScholarDigital Library
- 2.T. Autrey and M. Wolfe. Initial results for glacial variable analysis. International Journal of Parallel Programming, 26(1), February 1998. Google ScholarDigital Library
- 3.B. Calder, P. Feller, and A. Eustace. Value profiling. In Proceedings of the 30th Annual International Symposium on Microarchitecture, pages 259-269, December 1997. Google ScholarDigital Library
- 4.D. Callahan, K. Cooper, K. Kennedy, and L. Torczon. Interprocedural constant propagation. In Proceedings of the Symposium on Compiler Construction, 1986. Google ScholarDigital Library
- 5.W. H. Chen, C. H. Smith, and S. Fralick. A fast computational algorithm for the discrete cosine transform. IEEE Transactions on Communications, COM-25:1004-1009, September 1977.Google ScholarCross Ref
- 6.B. C. Cheng and W. W. Hwu. Interprocedural pointer analysis using access paths. In Proceedings of the ACM SIGPLAN '00 Conference on Programming Language Design and Implementation, June 2000. Google ScholarDigital Library
- 7.D. A. Connors and W. W. Hwu. Compiler-directed computation reuse (CCR). In Proceedings of the 32nd Annual International Symposium on Microarchitecture, pages 158-169, November 1999. Google ScholarDigital Library
- 8.M. D. Ernst, J. Cockrell, W. G. Griswold, and D. Notkin. Dynamically discovering likely program invariants to support program evolution. In Proceedings of the 19th International Conference on Software Engineering, pages 213-224, May 1999. Google ScholarDigital Library
- 9.Hwu. The Superblock: An effective technique for VLIW and superscalar compilation. The Journal of Supercomputing, 7(1):229-248, January 1993. Google ScholarDigital Library
- 10.C. Lee and W. Mangione-Smith. Mediabench: A tool for evaluating and synthesizing multimedia and communications systems. In Proceedings of the 30th Annual International Symposium on Microarchitecture, pages 330-335, December 1997. Google ScholarDigital Library
- 11.D. C. Lee, P. J. Crowley, J. L. Baer, T. E. Anderson, and B. N. Bershad. Execution characteristics of desktop applications on windows nt. In Proceedings of the 25th International Symposium on Computer Architecture, pages 27-38, June 1998. Google ScholarDigital Library
- 12.M. H. Lipasti, C. B. Wilkerson, and J. P. Shen. Value locality and load value prediction. In Proceedings of 7th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 138-147, September 1996. Google ScholarDigital Library
- 13.M. C. Merten, A. R. Trick, and W. W. Hwu. A hardware-driven profiling scheme for identifying program hot spots to support runtime optimization. In Proceedings of the 1999 International Symposium on Computer Architecture, pages 136-147, May 1999. Google ScholarDigital Library
- 14.E. Rotenberg and J. E. Smith. Trace cache: a low latency approach to high bandwidth instruction fetching. In Proceedings of the 29th International Symposium on Microarchitecture, pages 24-34, December 1996. Google ScholarDigital Library
- 15.Y. Sazeides and J. E. Smith. The predictability of data values. In Proceedings of the 30th International Symposium on Microarchitecture, pages 248-258, December 1997. Google ScholarDigital Library
- 16.A. Sodani and G. S. Sohi. Dynamic instruction reuse. In Proceedings of the 25th International Symposium on Computer Architecture, pages 194-205, June 1998. Google ScholarDigital Library
- 17.M. N. Wegman and F. K. Zadeck. Constant propagation with conditional branches. In Proceedings of the 12th Symposium on Principles of Programming Languages, pages 291-299, January 1985. Google ScholarDigital Library
- 18.T. Xanthopoulos and A. Chandrakasan. A low-power IDCT macrocell for MPEG-2 exploiting data properties for minimal activity. IEEE Journal of Solid-State Circuits, pages 693-703, May 1999.Google Scholar
Index Terms
- Hardware support for dynamic activation of compiler-directed computation reuse
Recommendations
Hardware support for dynamic activation of compiler-directed computation reuse
Compiler-directed Computation Reuse (CCR) enhances program execution speed and efficiency by eliminating dynamic computation redundancy. In this approach, the compiler designates large program regions for potential reuse. During run time, the execution ...
Hardware support for dynamic activation of compiler-directed computation reuse
Special Issue: Proceedings of the ninth international conference on Architectural support for programming languages and operating systems (ASPLOS '00)Compiler-directed Computation Reuse (CCR) enhances program execution speed and efficiency by eliminating dynamic computation redundancy. In this approach, the compiler designates large program regions for potential reuse. During run time, the execution ...
Hardware support for dynamic activation of compiler-directed computation reuse
Compiler-directed Computation Reuse (CCR) enhances program execution speed and efficiency by eliminating dynamic computation redundancy. In this approach, the compiler designates large program regions for potential reuse. During run time, the execution ...
Comments