skip to main content
research-article

Per-thread cycle accounting in SMT processors

Published:07 March 2009Publication History
Skip Abstract Section

Abstract

This paper proposes a cycle accounting architecture for Simultaneous Multithreading (SMT) processors that estimates the execution times for each of the threads had they been executed alone, while they are running simultaneously on the SMT processor. This is done by accounting each cycle to either a base, miss event or waiting cycle component during multi-threaded execution. Single-threaded alone execution time is then estimated as the sum of the base and miss event components; the waiting cycle component represents the lost cycle count due to SMT execution. The cycle accounting architecture incurs reasonable hardware cost (around 1KB of storage) and estimates single-threaded performance with average prediction errors around 7.2% for two-program workloads and 11.7% for four-program workloads.

The cycle accounting architecture has several important applications to system software and its interaction with SMT hardware. For one, the estimated single-thread alone execution time provides an accurate picture to system software of the actually consumed processor cycles per thread. The alone execution time instead of the total execution time (timeslice) may make system software scheduling policies more effective. Second, a new class of thread-progress aware SMT fetch policies based on per-thread progress indicators enable system software level priorities to be enforced at the hardware level.

References

  1. C. Boneti, F. J. Cazorla, R. Gioiosa, A. Buyuktosunoglu, C.-Y. Cher, and M. Valero. Software-controlled priority characterization of POWER5 processor. In ISCA, pages 415--426, June 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. F. J. Cazorla, P. M. W. Knijnenburg, R. Sakellariou, E. Fernández, A. Ramirez, and M. Valero. Predictable performance in SMT processors: Synergy between the OS and SMTs. IEEE Transactions on Computers, 55(7):785--799, July 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. F. J. Cazorla, A. Ramirez, M. Valero, and E. Fernandez. Dynamically controlled resource allocation in SMT processors. In MICRO, pages 171--182, Dec. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. F. J. Cazorla, A. Ramirez, M. Valero, P. M. W. Knijnenburg, R. Sakellariou, and E. Fernández. QoS for high-performance SMT processors in embedded systems. IEEE Micro, 24(4):24--31, July 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Choi and D. Yeung. Learning-based SMT processor resource distribution via hill-climbing. In ISCA, pages 239--250, June 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Y. Chou, B. Fahs, and S. Abraham. Microarchitecture optimizations for exploiting memory-level parallelism. In ISCA, pages 76--87, June 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. E. Cota=Robles. Priority Based Simultaneous Multi-Threading, Dec. 2003. United States Patent No. 6,658,447 B2.Google ScholarGoogle Scholar
  8. J. Dean, J. E. Hicks, C. A. Waldspurger, W. E. Weihl, and G. Chrysos. ProfileMe: Hardware support for instruction-level profiling on out-of-order processors. In MICRO, Dec. 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Emer. EV8: The post-ultimate alpha. Keynote presentation at PACT, Sept. 2001.Google ScholarGoogle Scholar
  10. S. Eyerman and L. Eeckhout. A memory-level parallelism aware fetch policy for SMT processors. In HPCA, pages 240--249, Feb. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. S. Eyerman and L. Eeckhout. System-level performance metrics for multi-program workloads. IEEE Micro, 28(3):42--53, May/June 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. S. Eyerman, L. Eeckhout, T. Karkhanis, and J. E. Smith. A performance counter architecture for computing accurate CPI components. In ASPLOS, pages 175--184, Oct. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. Fedorova, M. Seltzer, and M. D. Smith. A non-work-conserving operating system scheduler for SMT processors. In WIOSCA, in conjunction with ISCA, June 2006.Google ScholarGoogle Scholar
  14. B. A. Fields, R. Bodik, M. D. Hill, and C. J. Newburn. Interaction cost and shotgun profiling. ACM Transactions on Architecture and Code Optimization, 1(3):272--304, Sept. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. R. Gabor, S. Weiss, and A. Mendelson. Fairness enforcement in switch on event multithreading. ACM Transactions on Architecture and Code Optimization (TACO), 4(3):34, Sept. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. R. Jain, C. J. Hughes, and S. V. Adve. Soft real-time scheduling on simultaneous multithreaded processors. In Proceedings of the 23rd IEEE International Real-Time Systems Symposium, pages 134--145, Dec. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. T. S. Karkhanis and J. E. Smith. A first-order superscalar processor model. In ISCA, pages 338--349, June 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. K. Luo, J. Gummaraju, and M. Franklin. Balancing throughput and fairness in SMT processors. In ISPASS, pages 164--171, Nov. 2001.Google ScholarGoogle Scholar
  19. A. Mericas. Performance monitoring on the POWER5 microprocessor. In L. K. John and L. Eeckhout, editors, Performance Evaluation and Benchmarking, pages 247--266. CRC Press, 2006.Google ScholarGoogle Scholar
  20. M. K. Qureshi, D. N. Lynch, O. Mutlu, and Y. N. Patt. A case for MLP-aware cache replacement. In ISCA, pages 167--177, June 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. S. E. Raasch and S. K. Reinhardt. The impact of resource partitioning on SMT processors. In PACT, pages 15--26, Sept. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. T. Sherwood, E. Perelman, G. Hamerly, and B. Calder. Automatically characterizing large scale program behavior. In ASPLOS, pages 45--57, Oct. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. A. Snavely and D. M. Tullsen. Symbiotic jobscheduling for simultaneous multithreading processor. In ASPLOS, pages 234--244, Nov. 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. A. Snavely, D. M. Tullsen, and G. Voelker. Symbiotic jobscheduling with priorities for a simultaneous multithreading processor. In SIGMETRICS, pages 66--76, June 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. W. Stallings. Operating Systems: Internals and Design Principles. Prentice Hall, fifth edition, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. D. Tullsen. Simulation and modeling of a simultaneous multithreading processor. In Proceedings of the 22nd Annual Computer Measurement Group Conference, Dec. 1996.Google ScholarGoogle Scholar
  27. D. M. Tullsen and J. A. Brown. Handling long-latency loads in a simultaneous multithreading processor. In MICRO, pages 318--327, Dec. 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. D. M. Tullsen, S. J. Eggers, J. S. Emer, H. M. Levy, J. L. Lo, and R. L. Stamm. Exploiting choice: Instruction fetch and issue on an implementable simultaneous multithreading processor. In ISCA, pages 191--202, May 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. D. M. Tullsen, S. J. Eggers, and H. M. Levy. Simultaneous multithreading: Maximizing on-chip parallelism. In ISCA, pages 392--403, June 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Per-thread cycle accounting in SMT processors

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM SIGARCH Computer Architecture News
      ACM SIGARCH Computer Architecture News  Volume 37, Issue 1
      ASPLOS 2009
      March 2009
      346 pages
      ISSN:0163-5964
      DOI:10.1145/2528521
      Issue’s Table of Contents
      • cover image ACM Conferences
        ASPLOS XIV: Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
        March 2009
        358 pages
        ISBN:9781605584065
        DOI:10.1145/1508244

      Copyright © 2009 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 7 March 2009

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader