|
ABSTRACT
Multi-core processors, with low communication costs and high availability of execution cores, will increase the use of execution and compilation models that use short threads to expose parallelism. Current branch predictors seek to incorporate large amounts of control flow history to maximize accuracy. However, when that history is absent the predictor fails to work as intended. Thus, modern predictors are almost useless for threads below a certain length. Using a Speculative Multithreaded (SpMT) architecture as an example of a system which generates shorter threads, this work examines techniques to improve branch prediction accuracy when a new thread begins to execute on a different core. This paper proposes a minor change to the branch predictor that gives virtually the same performance on short threads as an idealized predictor that incorporates unknowable pre-history of a spawned speculative thread. At the same time, strong performance on long threads is preserved. The proposed technique sets the global history register of the spawned thread to the initial value of the program counter. This novel and simple design reduces branch mispredicts by 29% and provides as much as a 13% IPC improvement on selected SPEC2000 benchmarks.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
|
| |
3
|
|
 |
4
|
Robert S. Chappell , Jared Stark , Sangwook P. Kim , Steven K. Reinhardt , Yale N. Patt, Simultaneous subordinate microthreading (SSMT), Proceedings of the 26th annual international symposium on Computer architecture, p.186-195, May 01-04, 1999, Atlanta, Georgia, United States
|
 |
5
|
I-Cheng K. Chen , John T. Coffey , Trevor N. Mudge, Analysis of branch prediction via data compression, Proceedings of the seventh international conference on Architectural support for programming languages and operating systems, p.128-137, October 01-04, 1996, Cambridge, Massachusetts, United States
|
| |
6
|
J. Chung, H. Chafi, C. Minh, A. McDonald, B. Carlstrom, C. Kozyrakis, and K. Olukotun. The common case transactional behavior of multithreaded programs. In Sixth International Symposium on High-Performance Computer Architecture, pages 266--277, Feb. 2006.
|
 |
7
|
|
 |
8
|
Jamison D. Collins , Hong Wang , Dean M. Tullsen , Christopher Hughes , Yong-Fong Lee , Dan Lavery , John P. Shen, Speculative precomputation: long-range prefetching of delinquent loads, Proceedings of the 28th annual international symposium on Computer architecture, p.14-25, June 30-July 04, 2001, Göteborg, Sweden
|
| |
9
|
M. de Alba and D. Kaeli. Path-based hardware loop prediction. In 4th International Conference on Control, Virtual Instrumentation and Digital Systems, August 2002.
|
 |
10
|
|
| |
11
|
|
| |
12
|
|
 |
13
|
Lance Hammond , Brian D. Carlstrom , Vicky Wong , Ben Hertzberg , Mike Chen , Christos Kozyrakis , Kunle Olukotun, Programming with transactional coherence and consistency (TCC), Proceedings of the 11th international conference on Architectural support for programming languages and operating systems, October 07-13, 2004, Boston, MA, USA
|
| |
14
|
|
| |
15
|
Herbert H. J. Hum , Olivier Maquelin , Kevin B. Theobald , Xinmin Tian , Guang R. Gao , Laurie J. Hendren, A study of the EARTH-MANNA multithreaded system, International Journal of Parallel Programming, v.24 n.4, p.319-348, Aug. 1996
|
| |
16
|
|
 |
17
|
|
| |
18
|
|
| |
19
|
|
| |
20
|
|
| |
21
|
|
| |
22
|
P. Marcuello. Speculative multithreaded processors, Ph. D. Thesis, Universitat Politecnica de Catalunya. 2003.
|
| |
23
|
|
| |
24
|
|
| |
25
|
|
| |
26
|
S. McFarling. Combining branch predictors. DEC WRL Technical Note TN-36, 1993.
|
 |
27
|
Pierre Michaud , André Seznec , Richard Uhlig, Trading conflict and capacity aliasing in conditional branch predictors, Proceedings of the 24th annual international symposium on Computer architecture, p.292-303, June 01-04, 1997, Denver, Colorado, United States
|
 |
28
|
|
| |
29
|
|
 |
30
|
|
 |
31
|
|
 |
32
|
Carlos García Quiñones , Carlos Madriles , Jesús Sánchez , Pedro Marcuello , Antonio González , Dean M. Tullsen, Mitosis compiler: an infrastructure for speculative threading based on pre-computation slices, Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation, June 12-15, 2005, Chicago, IL, USA
|
 |
33
|
S. Sakai , y. Yamaguchi , K. Hiraki , Y. Kodama , T. Yuba, An architecture of a dataflow single chip processor, Proceedings of the 16th annual international symposium on Computer architecture, p.46-53, April 1989, Jerusalem, Israel
|
 |
34
|
|
| |
35
|
A. Seznec. The L-TAGE branch predictor. In Journal of Instruction-Level Parallelism, vol. 9, May 2007.
|
 |
36
|
|
| |
37
|
A. Seznec and P. Michaud. De-aliashed hybrid branch predictors. Technical Report RR-3618, Inria, Feb. 1999.
|
 |
38
|
|
 |
39
|
|
 |
40
|
|
 |
41
|
Eric Sprangle , Robert S. Chappell , Mitch Alsup , Yale N. Patt, The agree predictor: a mechanism for reducing negative branch history interference, Proceedings of the 24th annual international symposium on Computer architecture, p.284-291, June 01-04, 1997, Denver, Colorado, United States
|
| |
42
|
|
| |
43
|
|
 |
44
|
J. Greggory Steffan , Christopher B. Colohan , Antonia Zhai , Todd C. Mowry, A scalable approach to thread-level speculation, Proceedings of the 27th annual international symposium on Computer architecture, p.1-12, June 2000, Vancouver, British Columbia, Canada
|
| |
45
|
|
| |
46
|
D. Tullsen. Simulation and modeling of a simultaneous multithreading processor. In In 22nd Annual Computer Measurement Group Conference, December 1996.
|
| |
47
|
|
 |
48
|
|
 |
49
|
|
| |
50
|
|
| |
51
|
|
 |
52
|
|
|