ABSTRACT
Current trends of research on multithreading processors are the chip multithreading (CMT), which aims to exploit thread level parallelism (TLP) and to improve performance of software built onalltraditional threading components, e.g. pthreads. However, CMT is principally a straight forward extension of conventionalall symmetric multiprocessor (SMP) techniques, and it will suffer from the same limits to scalable multithreaded processing ifallit is built only on the traditional sequential-computation-based framework. Consideringallthese limitations of sequential-processor-basedallmultithreading, we are taking another approach to developing a multithreading processor dedicated to thread level parallelism(TLP). Our processor, named Fuce, is based on continuation-based multithreading. A thread is defined as a block of sequentially ordered instructions which areall executed exclusively. Every execution of a thread is triggered by one or more events, each of which is called continuation. The hardware cost and performance of the Fuce processor areallevaluated by means of a hardware implementation on FPGA and software simulation.
- M. Amamiya and et al. Research On the Kernel System based on Fine-grain Multithreading for Parallel and Distributed Processing. Report of Scientific Research (A) (15200002) granted by JSPS, (in Japanese), 2006.Google Scholar
- M. Amamiya and T. Kawano. Advanced Topics in Dataflow Computing and Multithreading, chapter Design Principle of Massively Parallel Distributed-Memory Multiprocessor Architecture, pages 1--17. IEEE Press, 1995.Google Scholar
- S. Amamiya, R. Hasegawa, H. Fujita, A. Hirano, and M. Amamiya. A language design for non-interruptible multithreading environment fuce. 2007. submitted to ISC2007.Google Scholar
- J. Clabes, J. Friedrich, M. Sweet, J. DiLullo, S. Chu, D. Plass, J. Dawson, P. Muench, L. Powell, M. Floyd, B. Sinharoy, M. Lee, M. Goulet, J. Wagoner, N. Schwartz, S. Runyon, G. Gorman, P. Restle, R. Kalla, J. McGill, and S. Dodson. Design and implementation of the POWER5 microprocessor. In DAC '04: Proceedings of the 41st annual conference on Design automation, pages 670--672, New York, NY, USA, 2004. ACM Press. Google ScholarDigital Library
- S. Gochman, A. Mendelson, A. Naveh, and E. Rotem. Introduction to Intel Core Duo Processor Architecture. Intel Technology Journal, 10(2):89--97, 2006.Google ScholarCross Ref
- H. H. J. Hum, O. Maquelin, K. B. Theobald, X. Tian, X. Tang, G. R. Gao, P. Cupryk, N. Elmasri, L. J. Hendren, A. Jimenez, S. Krishnan, A. Marquez, S. Merali, S. S. Nemawarkar, P. Panangaden, X. Xue, and Y. Zhu. A design study of the earth multiprocessor. In PACT '95: Proceedings of the IFIP WG10.3 working conference on Paral lel architectures and compilation techniques, pages 59--68, Manchester, UK, UK, 1995. IFIP Working Group on Algol. Google ScholarDigital Library
- K. M. Kavi, H. Y. Youn, and A. R. Hurson. PL/PS: A Non-Blocking Multithreaded Architecture With Decoupled Memory And Pipelines. In Proceedings of the Fifth International Conference on Advanced Computing (ADCOMP '97), Madras, India., 1997.Google Scholar
- P. Kongetira, K. Aingaran, and K. Olukotun. Niagara: a 32-way multi-threaded Sparc processor. Micro, IEEE, 25(1):21--29, 2005. Google ScholarDigital Library
- S. Kusakabe, M. Izumi, S. Amamiya, Y. Nomura, H. Taniguchi, and M. Amamiya. Scalability of Continuation-based Fine-grained Multithreading in Handing Multiple I/O Requests on Fuce. to appear in Computing Frontier, 2007. Google ScholarDigital Library
- J. L. Lo, S. J. Eggers, J. S. Emer, H. M. Levy, R. L. Stamm, and D. M. Tullsen. Converting Thread-Level Parallelism to Instruction-Level Parallelism via Simultaneous Multithreading. ACM Transactions on Computer Systems, 15(3):322--354, 1997. Google ScholarDigital Library
- D. T. Marr, F. Binns, D. L. Hill, G. Hinton, D. A. Koufaty, J. A. Miller, and M. Upton. Hyper-threading technology architecture and microarchitecture. Intel Technology Journal, 6(1):4--15, 2002.Google Scholar
- MIPS Technolocies, MIPS32 Architecture For Programmers Volume II: The MIPS32 Instruction Set.Google Scholar
- R. S. Nikhil, G. M. Papadopoulos, and Arvind. *T: a multithreaded massively parallel architecture. SIGARCH Comput. Archit. News, 20(2):156--167, 1992. Google ScholarDigital Library
- L. Roh and W. A. Na jjar. Analysis of Communications and Overhead Reduction in Multithreading Execution. In Proceedings of the 1995 International Conference on Parallel Architectures and Compilation Techniques, 1995. Google ScholarDigital Library
- S. Sakai, Y. Yamaguchi, K. Hiraki, Y. Kodama, and T. Yuba. An architecture of a dataflow single chip processor. In ISCA '89: Proceedings of the 16th annual international symposium on Computer architecture, pages 46--53, New York, NY, USA, 1989. ACM Press. Google ScholarDigital Library
- K. Sankaralingam, R. Nagara jan, H. Liu, C. Kim, J. Huh, D. Burger, S. W. Keckler, and C. R. Moore. Exploiting ILP, TLP, and DLP with the Polymorphous TRIPS Architecture. In Proceedings of the 30th Annual International Symposium on Computer Architecture, pages 422--433, 2003. Google ScholarDigital Library
- U. T. Sigmund U. Evaluating a Multithreaded Superscalar Microprocessor Versus a Multiprocessor Chip. In Proceedings of the 4th PASA Workshop on Paral lel Systems and Algorithms, pages 147--159, 1996.Google Scholar
- T. Ungerer, B. Robič, and J. Šilc. A survey of processors with explicit multithreading. ACM Computing Surveys, 35(1):29--63, 2003. Google ScholarDigital Library
Index Terms
- Fuce: the continuation-based multithreading processor
Recommendations
An evaluation of speculative instruction execution on simultaneous multithreaded processors
Modern superscalar processors rely heavily on speculative execution for performance. For example, our measurements show that on a 6-issue superscalar, 93% of committed instructions for SPECINT95 are speculative. Without speculation, processor resources ...
Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading
To achieve high performance, contemporary computer systems rely on two forms of parallelism: instruction-level parallelism (ILP) and thread-level parallelism (TLP). Wide-issue super-scalar processors exploit ILP by executing multiple instructions from a ...
Predictable performance in SMT processors
CF '04: Proceedings of the 1st conference on Computing frontiersCurrent instruction fetch policies in SMT processors are oriented towards optimization of overall throughput and/or fairness. However, they provide no control over how individual threads are executed, leading to performance unpredictability, since the ...
Comments