ABSTRACT
The introduction of massively multithreaded (MMT) processors, comprised of a large number of cores with many shared resources, has made task scheduling, in particular task to hardware thread assignment, one of the most promising ways to improve system performance. However, finding an optimal task assignment for a workload running on MMT processors is an NP-complete problem. Due to the fact that the performance of the best possible task assignment is unknown, the room for improvement of current task-assignment algorithms cannot be determined. This is a major problem for the industry because it could lead to: (1)~A waste of resources if excessive effort is devoted to improving a task assignment algorithm that already provides a performance that is close to the optimal one, or (2)~significant performance loss if insufficient effort is devoted to improving poorly-performing task assignment algorithms.
In this paper, we present a method based on Extreme Value Theory that allows the prediction of the performance of the optimal task assignment in MMT processors. We further show that executing a sample of several hundred or several thousand random task assignments is enough to obtain, with very high confidence, an assignment with a performance that is close to the optimal one. We validate our method with an industrial case study for a set of multithreaded network applications running on an UltraSPARC~T2 processor.
- OpenSPARC™ T2 Core Microarchitecture Specification. Sun Microsystems, Inc, 2007.Google Scholar
- OpenSPARC™ T2 System-On-Chip (SOC) Microarchitecture Specification. Sun Microsystems, Inc, 2007.Google Scholar
- Netra Data Plane Software Suite 2.0 Update 2 Reference Manual. Sun Microsystems, Inc, 2008.Google Scholar
- Netra Data Plane Software Suite 2.0 Update 2 User's Guide. Sun Microsystems, Inc, 2008.Google Scholar
- Software. Hardware. Complete. http://www.oracle.com/ocom/groups/public/@ocom/documents /webcontent/044518.pdf, 2010.Google Scholar
- C. Acosta, F. Cazorla, A. Ramirez, and M. Valero. Thread to Core Assignment in SMT On-Chip Multiprocessors. In SBAC-PAD '09: Proceedings of the 21st International Symposium on Computer Architecture and High Performance Computing, 2009. Google ScholarDigital Library
- A. V. Aho and M. J. Corasick. Efficient String Matching: An Aid to Bibliographic Search. Commun. ACM, 18, 1975. Google ScholarDigital Library
- A. Azzalini. Statistical Inference Based on the Likelihood. Chapman and Hall, London, 1996.Google Scholar
- A. A. Balkema and L. de Haan. Residual life time at great age. Annals of Probability, 2: 792--804, 1974.Google ScholarCross Ref
- J. Beirlant, Y. Goegebeur, J. Segers, and J. Teugels. Statistics of Extremes: Theory and Applications. John Wiley and Sons, Ltd, 2004.Google ScholarCross Ref
- E. Castillo. Extreme value theory in engineering. Academic Press, Inc., 1988.Google Scholar
- E. Castillo and A. Hadi. Fitting the Generalized Pareto Distribution to data. Journal of the American Statistical Association, 92, 1997.Google Scholar
- S. Kim, and Y. Solihin}contention_predictD. Chandra, F. Guo, S. Kim, and Y. Solihin. Predicting inter-thread cache contention on a chip multi-processor architecture. In HPCA '05: Proceedings of the 11th International Symposium on High-Performance Computer Architecture, 2005. Google ScholarDigital Library
- S. J. Chapman. Essentials of MATLAB Programming. Cengage Learning, 2009.Google Scholar
- H. Chernoff. On the distribution of the likelihood ratio. Annals of Mathematical Statistics, 25, 1954.Google Scholar
- W. G. Cochran. Sampling techniques. Wiley, 1977.Google Scholar
- K. J. Connolly. Law of Internet Security and Privacy. Aspen Publishers, 2003. Google ScholarDigital Library
- M. De Vuyst, R. Kumar, and D. Tullsen. Exploiting unbalanced thread scheduling for energy and performance on a CMP of SMT processors. In IPDPS '06: Proceedings of the 20th IEEE International Parallel and Distributed Processing Symposium, 2006. Google ScholarDigital Library
- D. Eckhoff, T. Limmer, and F. Dressler. Hash Tables for Efficient Flow Monitoring: Vulnerabilities and Countermeasures. In LCN '09: Proceedings of the 34th Conference on Local Computer Networks, 2009.Google Scholar
- A. El-Moursy, R. Garg, D. Albonesi, and S. Dwarkadas. Compatible phase co-scheduling on a CMP of multi-threaded processors. In IPDPS '06: Proceedings of the 20th IEEE International Parallel and Distributed Processing Symposium, 2006. Google ScholarDigital Library
- S. Eyerman and L. Eeckhout. Per-thread cycle accounting in SMT processors. In ASPLOS '09: Proceeding of the 14th international conference on Architectural support for programming languages and operating systems, 2009. Google ScholarDigital Library
- S. Eyerman and L. Eeckhout. Probabilistic job symbiosis modeling for SMT processor scheduling. In ASPLOS '10: Proceeding of the 15th international conference on Architectural support for programming languages and operating systems, 2010. Google ScholarDigital Library
- A. Fedorova, M. Seltzer, C. Small, and D. Nussbaum. Performance of multithreaded chip multiprocessors and implications for operating system design. In ATEC '05: Proceedings of the annual conference on USENIX Annual Technical Conference, 2005. Google ScholarDigital Library
- M. R. Garey and D. S. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. W.H. Freeman and Company, 1979. Google ScholarDigital Library
- M. Gilli and E. Kellezi. An application of extreme value theory for measuring financial risk. Computational Economics, 27, 2006. Google ScholarDigital Library
- R. Gioiosa, F. Petrini, K. Davis, and F. Lebaillif-Delamare. Analysis of system overhead on parallel computers. In ISSPIT '04: 4th IEEE International Symposium on Signal Processing and Information Technology, Rome, Italy, 2004.Google ScholarCross Ref
- M. Greenwood. The natural duration of cancer. Reports on Public Health and Medical Subjects, 33: 1--26, 1926.Google Scholar
- S. Grimshaw. Computing the maximum likelihood estimates for the Generalized Pareto Distribution to data. Technometrics, 35, 1993.Google Scholar
- F. Guo and T. Chiueh. Traffic Analysis: From Stateful Firewall to Network Intrusion Detection System. In RPE Report, 2004.Google Scholar
- J. R. M. Hosking and J. R. Wallis. Parameter and quantile estimation for the generalised pareto distribution. Technometrics, 29, 1987. Google ScholarDigital Library
- Internet Engineering Task Force (IETF). Request for Comments (RFC), http://www.rfc-editor.org/.Google Scholar
- R. Jain, C. J. Hughes, and S. V. Adve. Soft real-time scheduling on simultaneous multithreaded processors. In RTSS '02: Proceedings of the 23rd IEEE Real-Time Systems Symposium, 2002. Google ScholarDigital Library
- Y. Jiang, X. Shen, J. Chen, and R. Tripathi. Analysis and approximation of optimal co-scheduling on chip multiprocessors. In PACT '08: Proceedings of the 17th international conference on Parallel architectures and compilation techniques, 2008. Google ScholarDigital Library
- E. L. Kaplan and P. Meier. Nonparametric estimation from incomplete observations. Journal of the American Statistical Association, 52 (282): 457--481, 1958.Google ScholarCross Ref
- E. Kellezi and M. Gilli. Extreme value theory for tail-related risk measures. Fame research paper series, International Center for Financial Asset Management and Engineering, 2000.Google Scholar
- J. Kihm, A. Settle, A. Janiszewski, and D. A. Connors. Understanding the impact of inter-thread cache interference on ILP in modern SMT processors. The Journal of Instruction Level Parallelism, 7, 2005.Google Scholar
- Kunze, Mudigonda, Jason, and Vin}Kokku04CaseR. Kokku, T. L. Riché, A. Kunze, J. Mudigonda, J. Jason, and H. M. Vin. A case for run-time adaptation in packet processing systems. SIGCOMM Comput. Commun. Rev., 34 (1), 2004. Google ScholarDigital Library
- R. Kumar, D. M. Tullsen, P. Ranganathan, N. P. Jouppi, and K. I. Farkas. Single-ISA heterogenous multi-core architectures for multithreaded workload performance. In ISCA '04: Proceedings of the 31st Annual International Symposium on Computer Architecture, 2004. Google ScholarDigital Library
- R. L. McGregor, C. D. Antonopoulos, and D. S. Nikolopoulos. Scheduling algorithms for effective thread pairing on hybrid multiprocessors. In IPDPS '05: Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium, 2005. Google ScholarDigital Library
- M. Norton. Optimizing Pattern Matching for Intrusion Detection, http://docs.idsresearch.org/OptimizingPatternMatchingForIDS.pdf, 2004.Google Scholar
- Probe network monitor. http://www.ntop.org.Google Scholar
- F. Petrini, D. Kerbyson, and S. Pakin. The case of the missing supercomputer performance: Achieving optimal performance on the 8,192 processors of ASCI Q. In SC '03: Proceedings of the 2003 ACM/IEEE conference on Supercomputing, 2003. Google ScholarDigital Library
- J. I. Pickands. Statistical inference using extreme value order statistics. Annals of Statististics, 3: 119--131, 1975.Google ScholarCross Ref
- P. Radojković, V. Cakarević, J. Verdú, A. Pajuelo, F. J. Cazorla, M. Nemirovsky, and M. Valero. Thread to strand binding of parallel network applications in massive multi-threaded systems. In PPoPP-2010: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Bangalore, India, 2010. Google ScholarDigital Library
- A. Settle, J. Kihm, A. Janiszewski, and D. Connors. Architectural support for enhanced SMT job scheduling. In PACT '04: Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques, 2004. Google ScholarDigital Library
- D. Shelepov, J. C. S. Alcaide, S. Jeffery, A. Fedorova, N. Perez, Z. F. Huang, S. Blagodurov, and V. Kumar. Hass: A scheduler for heterogeneous multicore systems. In ACM SIGOPS Operating Systems Review, 2009. Google ScholarDigital Library
- T. Sherwood, G. Varghese, and B. Calder. A Pipelined Memory Architecture for High Throughput Network Processors. In ISCA '03: Proceedings of the 30th Annual International Symposium on Computer Architecture, 2003. Google ScholarDigital Library
- E. Shmueli, G. Almasi, J. Brunheroto, J. Castanos, G. Dozsa, S. Kumar, and D. Lieber. Evaluating the effect of replacing CNK with Linux on the compute-nodes of Blue Gene/L. In ICS '08: Proceedings of the 22nd Annual International Conference on Supercomputing, 2008. Google ScholarDigital Library
- A. Snavely and D. M. Tullsen. Symbiotic job scheduling for a simultaneous multithreaded processor. In ASPLOS 2000: Proceedings of the 9th International Conference on Architectural Support for Programming Languages and Operating Systems, 2000. Google ScholarDigital Library
- A. Snavely, D. M. Tullsen, and G. Voelker. Symbiotic jobscheduling with priorities for a simultaneous multithreading processor. In Proceedings of the 2002 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, 2002. Google ScholarDigital Library
- Snort network intrusion prevention and detection system. http://www.snort.org/.Google Scholar
- N. Tajvidi. Design and implementation of statistical computations for Generalized Pareto Distributions. Technical Report, Chalmers University of Technology, 1996.Google Scholar
- D. Tam, R. Azimi, and M. Stumm. Thread clustering: sharing-aware scheduling on SMP-CMP-SMT multiprocessors. In EuroSys '07: Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems, 2007. Google ScholarDigital Library
- TCPDUMP/LIBPCAP public repository. http://www.tcpdump.org/.Google Scholar
- S. B. Vardeman. Statistics for Engineering Problem Solving. PWS publishing company, 1993.Google Scholar
- V. Cakarević, P. Radojković, J. Verdú, A. Pajuelo, F. Cazorla, M. Nemirovsky, and M. Valero. Characterizing the resource-sharing levels in the UltraSPARC T2 processor. In MICRO-42: Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, NY, USA, 2009. Google ScholarDigital Library
- J. Verdú. Analysis and Architectural Support for Parallel Stateful Packet Processing, PhD Thesis. Universitat Politècnica de Catalunya, 2008.Google Scholar
- Vermont (VERsatile MONitoring Toolkit). http://vermont.berlios.de/.Google Scholar
- S. S. Wilks. The large-sample distribution of the likelihood ratio for testing composite hypotheses. Annals of Mathematical Statistics, 9, 1938.Google Scholar
- S. S. Wilks. Mathematical Statistics. Princeton University Press, 1943.Google Scholar
- Wireshark network protocol analyzer. http://www.wireshark.org/.Google Scholar
- T. Wolf, N. Weng, and C.-H. Tai. Design considerations for network processor operating systems. In ANCS '05: Proceedings of ACM/IEEE Symposium on Architectures for Networking and Communication Systems, Princeton, NJ, 2005. Google ScholarDigital Library
Index Terms
- Optimal task assignment in multithreaded processors: a statistical approach
Recommendations
Optimal task assignment in multithreaded processors: a statistical approach
ASPLOS '12The introduction of massively multithreaded (MMT) processors, comprised of a large number of cores with many shared resources, has made task scheduling, in particular task to hardware thread assignment, one of the most promising ways to improve system ...
Optimal task assignment in multithreaded processors: a statistical approach
ASPLOS '12The introduction of massively multithreaded (MMT) processors, comprised of a large number of cores with many shared resources, has made task scheduling, in particular task to hardware thread assignment, one of the most promising ways to improve system ...
An evaluation of speculative instruction execution on simultaneous multithreaded processors
Modern superscalar processors rely heavily on speculative execution for performance. For example, our measurements show that on a 6-issue superscalar, 93% of committed instructions for SPECINT95 are speculative. Without speculation, processor resources ...
Comments