Abstract
This paper proposes a cycle accounting architecture for Simultaneous Multithreading (SMT) processors that estimates the execution times for each of the threads had they been executed alone, while they are running simultaneously on the SMT processor. This is done by accounting each cycle to either a base, miss event or waiting cycle component during multi-threaded execution. Single-threaded alone execution time is then estimated as the sum of the base and miss event components; the waiting cycle component represents the lost cycle count due to SMT execution. The cycle accounting architecture incurs reasonable hardware cost (around 1KB of storage) and estimates single-threaded performance with average prediction errors around 7.2% for two-program workloads and 11.7% for four-program workloads.
The cycle accounting architecture has several important applications to system software and its interaction with SMT hardware. For one, the estimated single-thread alone execution time provides an accurate picture to system software of the actually consumed processor cycles per thread. The alone execution time instead of the total execution time (timeslice) may make system software scheduling policies more effective. Second, a new class of thread-progress aware SMT fetch policies based on per-thread progress indicators enable system software level priorities to be enforced at the hardware level.
- C. Boneti, F. J. Cazorla, R. Gioiosa, A. Buyuktosunoglu, C.-Y. Cher, and M. Valero. Software-controlled priority characterization of POWER5 processor. In ISCA, pages 415--426, June 2008. Google ScholarDigital Library
- F. J. Cazorla, P. M. W. Knijnenburg, R. Sakellariou, E. Fernández, A. Ramirez, and M. Valero. Predictable performance in SMT processors: Synergy between the OS and SMTs. IEEE Transactions on Computers, 55(7):785--799, July 2006. Google ScholarDigital Library
- F. J. Cazorla, A. Ramirez, M. Valero, and E. Fernandez. Dynamically controlled resource allocation in SMT processors. In MICRO, pages 171--182, Dec. 2004. Google ScholarDigital Library
- F. J. Cazorla, A. Ramirez, M. Valero, P. M. W. Knijnenburg, R. Sakellariou, and E. Fernández. QoS for high-performance SMT processors in embedded systems. IEEE Micro, 24(4):24--31, July 2004. Google ScholarDigital Library
- S. Choi and D. Yeung. Learning-based SMT processor resource distribution via hill-climbing. In ISCA, pages 239--250, June 2006. Google ScholarDigital Library
- Y. Chou, B. Fahs, and S. Abraham. Microarchitecture optimizations for exploiting memory-level parallelism. In ISCA, pages 76--87, June 2004. Google ScholarDigital Library
- E. Cota=Robles. Priority Based Simultaneous Multi-Threading, Dec. 2003. United States Patent No. 6,658,447 B2.Google Scholar
- J. Dean, J. E. Hicks, C. A. Waldspurger, W. E. Weihl, and G. Chrysos. ProfileMe: Hardware support for instruction-level profiling on out-of-order processors. In MICRO, Dec. 1997. Google ScholarDigital Library
- J. Emer. EV8: The post-ultimate alpha. Keynote presentation at PACT, Sept. 2001.Google Scholar
- S. Eyerman and L. Eeckhout. A memory-level parallelism aware fetch policy for SMT processors. In HPCA, pages 240--249, Feb. 2007. Google ScholarDigital Library
- S. Eyerman and L. Eeckhout. System-level performance metrics for multi-program workloads. IEEE Micro, 28(3):42--53, May/June 2008. Google ScholarDigital Library
- S. Eyerman, L. Eeckhout, T. Karkhanis, and J. E. Smith. A performance counter architecture for computing accurate CPI components. In ASPLOS, pages 175--184, Oct. 2006. Google ScholarDigital Library
- A. Fedorova, M. Seltzer, and M. D. Smith. A non-work-conserving operating system scheduler for SMT processors. In WIOSCA, in conjunction with ISCA, June 2006.Google Scholar
- B. A. Fields, R. Bodik, M. D. Hill, and C. J. Newburn. Interaction cost and shotgun profiling. ACM Transactions on Architecture and Code Optimization, 1(3):272--304, Sept. 2004. Google ScholarDigital Library
- R. Gabor, S. Weiss, and A. Mendelson. Fairness enforcement in switch on event multithreading. ACM Transactions on Architecture and Code Optimization (TACO), 4(3):34, Sept. 2007. Google ScholarDigital Library
- R. Jain, C. J. Hughes, and S. V. Adve. Soft real-time scheduling on simultaneous multithreaded processors. In Proceedings of the 23rd IEEE International Real-Time Systems Symposium, pages 134--145, Dec. 2002. Google ScholarDigital Library
- T. S. Karkhanis and J. E. Smith. A first-order superscalar processor model. In ISCA, pages 338--349, June 2004. Google ScholarDigital Library
- K. Luo, J. Gummaraju, and M. Franklin. Balancing throughput and fairness in SMT processors. In ISPASS, pages 164--171, Nov. 2001.Google Scholar
- A. Mericas. Performance monitoring on the POWER5 microprocessor. In L. K. John and L. Eeckhout, editors, Performance Evaluation and Benchmarking, pages 247--266. CRC Press, 2006.Google Scholar
- M. K. Qureshi, D. N. Lynch, O. Mutlu, and Y. N. Patt. A case for MLP-aware cache replacement. In ISCA, pages 167--177, June 2006. Google ScholarDigital Library
- S. E. Raasch and S. K. Reinhardt. The impact of resource partitioning on SMT processors. In PACT, pages 15--26, Sept. 2003. Google ScholarDigital Library
- T. Sherwood, E. Perelman, G. Hamerly, and B. Calder. Automatically characterizing large scale program behavior. In ASPLOS, pages 45--57, Oct. 2002. Google ScholarDigital Library
- A. Snavely and D. M. Tullsen. Symbiotic jobscheduling for simultaneous multithreading processor. In ASPLOS, pages 234--244, Nov. 2000. Google ScholarDigital Library
- A. Snavely, D. M. Tullsen, and G. Voelker. Symbiotic jobscheduling with priorities for a simultaneous multithreading processor. In SIGMETRICS, pages 66--76, June 2002. Google ScholarDigital Library
- W. Stallings. Operating Systems: Internals and Design Principles. Prentice Hall, fifth edition, 2005. Google ScholarDigital Library
- D. Tullsen. Simulation and modeling of a simultaneous multithreading processor. In Proceedings of the 22nd Annual Computer Measurement Group Conference, Dec. 1996.Google Scholar
- D. M. Tullsen and J. A. Brown. Handling long-latency loads in a simultaneous multithreading processor. In MICRO, pages 318--327, Dec. 2001. Google ScholarDigital Library
- D. M. Tullsen, S. J. Eggers, J. S. Emer, H. M. Levy, J. L. Lo, and R. L. Stamm. Exploiting choice: Instruction fetch and issue on an implementable simultaneous multithreading processor. In ISCA, pages 191--202, May 1996. Google ScholarDigital Library
- D. M. Tullsen, S. J. Eggers, and H. M. Levy. Simultaneous multithreading: Maximizing on-chip parallelism. In ISCA, pages 392--403, June 1995. Google ScholarDigital Library
Index Terms
Per-thread cycle accounting in SMT processors
Recommendations
Per-thread cycle accounting in SMT processors
ASPLOS 2009This paper proposes a cycle accounting architecture for Simultaneous Multithreading (SMT) processors that estimates the execution times for each of the threads had they been executed alone, while they are running simultaneously on the SMT processor. ...
Per-thread cycle accounting in SMT processors
ASPLOS XIV: Proceedings of the 14th international conference on Architectural support for programming languages and operating systemsThis paper proposes a cycle accounting architecture for Simultaneous Multithreading (SMT) processors that estimates the execution times for each of the threads had they been executed alone, while they are running simultaneously on the SMT processor. ...
Memory-level parallelism aware fetch policies for simultaneous multithreading processors
A thread executing on a simultaneous multithreading (SMT) processor that experiences a long-latency load will eventually stall while holding execution resources. Existing long-latency load aware SMT fetch policies limit the amount of resources allocated ...
Comments