ACM Home Page
Please provide us with feedback. Feedback
Warp Processors
Full text PdfPdf (556 KB)
Source ACM Transactions on Design Automation of Electronic Systems (TODAES) archive
Volume 11 ,  Issue 3  (July 2006) table of contents
SECTION: Online Only: ACM Transactions on Design Automation of Electronic Systems, vol. 11, issue 3 (Novel Paradigms in System-Level Design) table of contents
Pages: 659 - 681  
Year of Publication: 2006
ISSN:1084-4309
Also published in ...
Authors
Roman Lysecky  University of Arizona, Tucson, AZ
Greg Stitt  University of California, Riverside, CA
Frank Vahid  University of California, Riverside, CA
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 19,   Downloads (12 Months): 140,   Citation Count: 3
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1142980.1142986
What is a DOI?

ABSTRACT

We describe a new processing architecture, known as a warp processor, that utilizes a field-programmable gate array (FPGA) to improve the speed and energy consumption of a software binary executing on a microprocessor. Unlike previous approaches that also improve software using an FPGA but do so using a special compiler, a warp processor achieves these improvements completely transparently and operates from a standard binary. A warp processor dynamically detects the binary's critical regions, reimplements those regions as a custom hardware circuit in the FPGA, and replaces the software region by a call to the new hardware implementation of that region. While not all benchmarks can be improved using warp processing, many can, and the improvements are dramatically better than those achievable by more traditional architecture improvements. The hardest part of warp processing is that of dynamically reimplementing code regions on an FPGA, requiring partitioning, decompilation, synthesis, placement, and routing tools, all having to execute with minimal computation time and data memory so as to coexist on chip with the main processor. We describe the results of developing our warp processor. We developed a custom FPGA fabric specifically designed to enable lean place and route tools, and we developed extremely fast and efficient versions of partitioning, decompilation, synthesis, technology mapping, placement, and routing. Warp processors achieve overall application speedups of 6.3X with energy savings of 66&percent; across a set of embedded benchmark applications. We further show that our tools utilize acceptably small amounts of computation and memory which are far less than traditional tools. Our work illustrates the feasibility and potential of warp processing, and we can foresee the possibility of warp processing becoming a feature in a variety of computing domains, including desktop, server, and embedded applications.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Altera Corp. 2006. Customer showcase. http://www.altera.com/corporate/cust_successes/ customer_showcase/view_product/csh-vproduct-nios.jsp.
 
2
Altera Corp. 2005. Excalibur embedded processor solutions. http://www.altera.com/products/ devices/excalibur/exc-index.html.
 
3
Atmel Corp. 2005. FPSLIC (AVR with FPGA), http://www.atmel.com/products/FPSLIC/.
 
4
 
5
Banerjee, P., Mittal, G., Zaretsky, D., and Tang, X. 2004. BINACHIP-FPGA: A tool to map DSP software binaries and assembly programs onto FPGAs. In Proceedings of the Embedded Signal Processing Conference (GSPx).
 
6
Berkeley Design Technology, Inc. 2004. http://www.bdti.com/articles/info_eet0207fpga.htm# DSPEnhanced&percent;20FPGAs.
 
7
 
8
 
9
10
11
12
 
13
Christensen, F. 2004. A scalable software-defined radio development system. Xcell J., Winter.
 
14
 
15
 
16
Cifuentes, C., Simon, D., and Fraboulet, A. 1998. Assembly to high-level language translation. Department of Computer Science and Electrical Engineering, University of Queensland. Tech. Rep. 439.
 
17
Cifuentes, C., Van Emmerik, M., Ung, D., Simon, D., and Waddington, T. 1999. Preliminary experiences with the use of the UQBT binary translation framework. In Proceedings of the Workshop on Binary Translation, 12--22.
 
18
Critical Blue. 2005. http://www.criticalblue.com.
 
19
D.H. Brown Associates. 2004. Cray XD1 brings high-bandwidth supercomputing to the mid-market. White Paper prepared for Cray, Inc., http://www.cray.com/downloads/dhbrown_crayxd1_ oct2004.pdf.
 
20
EEMBC. 2005. The Embedded Microprocessor Benchmark Consortium. http://www.eembc.org.
 
21
Eles, P., Peng, Z., Kuchchinski, K., and Doboli, A. 1997. System level hardware/software partitioning based on simulated annealing and Tabu search. Kluwer's Design Automation for Embedded Systems 2, 1, 5--32.
 
22
 
23
Gajski, D., Vahid, F., Narayan, S., and Gong, J. 1998. SpecSyn: An environment supporting the specify-explore-refine paradigm for hardware/software system design. IEEE Trans. Very Large Scale Integration Syst. (TVLSI) 6, 1, 84--100.
 
24
25
 
26
 
27
28
29
 
30
 
31
 
32
33
 
34
35
36
 
37
 
38
Matsumoto, C. 2000. Triscend adds 32-bit configurable SoC line. EE Times, http://www. eet.com/story/OEG20000828S0015.
 
39
40
 
41
Morris, K. 2005. Cray goes FPGA. FPGA and Programmable Logic J., April.
 
42
 
43
Singh, S., Rose, J., Chow, P., and Lewis, D. 1992. The effect of logic block architecture on FPGA performance. IEEE J. Solid-State Circuits. 27, 3, 3--12.
44
 
45
46
47
 
48
Tensilica, Inc. 2006. XPRES compiler, automatically generate processors from standard C code. http://www.tensilica.com/products/xpres.htm.
 
49
Triscend Corp. 2003. http://www.triscend.com.
50
51
 
52
Xilinx, Inc. 2006. http://www.xilinx.com.
 
53
Xilinx, Inc. 2005a. Customer success stories, http://www.xilinx.com/company/success/csprod. htm#embedded.
 
54
Xilinx, Inc. 2005b. Virtex-4 FPGAs, http://www.xilinx.com/products/silicon_solutions/fpgas/ virtex/virtex4/index.htm.
 
55
Xilinx, Inc. 2004a. Partnering for success, Xilinx and photonic bridges. http://www.xilinx.com/ ipcenter/processor_central/embedded/success_PB.pdf.
 
56
Xilinx, Inc. 2004b. Virtex-II Pro/ProX FPGAs, http://www.xilinx.com/products/silicon_solutions/ fpgas/virtex/virtex_ii_pro_fpgas/.
 
57
Xilinx, Inc. 2000a. Xilinx introduces high level language compiler for Virtex FPGAs. Xilinx Press Release. http://www.xilinx.com/prs_rls/00119_forge.htm.
 
58
Xilinx, Inc. 2000b. Xilinx Version 3.3i software doubles clock frequencies. Xilinx Press Release. http://www.xilinx.com/prs_rls/00118_3_3i.htm.
 
59
60
 
61


Collaborative Colleagues:
Roman Lysecky: colleagues
Greg Stitt: colleagues
Frank Vahid: colleagues