skip to main content
article

Spatial computation

Published: 07 October 2004 Publication History

Abstract

This paper describes a computer architecture, Spatial Computation (SC), which is based on the translation of high-level language programs directly into hardware structures. SC program implementations are completely distributed, with no centralized control. SC circuits are optimized for wires at the expense of computation units.In this paper we investigate a particular implementation of SC: ASH (Application-Specific Hardware). Under the assumption that computation is cheaper than communication, ASH replicates computation units to simplify interconnect, building a system which uses very simple, completely dedicated communication channels. As a consequence, communication on the datapath never requires arbitration; the only arbitration required is for accessing memory. ASH relies on very simple hardware primitives, using no associative structures, no multiported register files, no scheduling logic, no broadcast, and no clocks. As a consequence, ASH hardware is fast and extremely power efficient.In this work we demonstrate three features of ASH: (1) that such architectures can be built by automatic compilation of C programs; (2) that distributed computation is in some respects fundamentally different from monolithic superscalar processors; and (3) that ASIC implementations of ASH use three orders of magnitude less energy compared to high-end superscalar processors, while being on average only 33% slower in performance (3.5x worst-case).

References

[1]
International technology roadmap for semiconductors (ITRS). http://public.itrs.net/Files/1999 SIA Roadmap/Design.pdf, 1999.]]
[2]
V. Agarwal, H.S. Murukkathampoondi, S.W. Keckler, and D.C. Burger. Clock rate versus IPC: The end of the road for conventional microarchitectures. In International Symposium on Computer Architecture (ISCA), June 2000.]]
[3]
Vicki H. Allan, Reese B. Jones, Randal M. Lee, and Stephen J. Allan. Software pipelining. ACM Computing Surveys, 27(3):367--432, September 1995.]]
[4]
Bharadwaj S Amrutur and Mark A Horowitz. Speed and power scaling of SRAMs. IEEE Journal of Solid State Circuits, 35(2):175--185, February 2000.]]
[5]
Andrew W. Appel. SSA is functional programming. ACM SIGPLAN Notices, April 1998.]]
[6]
Guido Arnout. C for system level design. In Design, Automation and Test in Europe (DATE), pages 384--387, Munich, Germany, March 1999.]]
[7]
Arvind and Robert A. Iannucci. A critique of multiprocessing von Neumann style. In International Symposium on Computer Architecture (ISCA), pages 426--436. IEEE Computer Society Press, 1983.]]
[8]
David I. August, Wen mei W. Hwu, and Scott A. Mahlke. A framework for balancing control flow and predication. In International Symposium on Computer Architecture (ISCA), December 1997.]]
[9]
Jonathan Babb, Martin Rinard, Csaba Andras Moritz, Walter Lee, Matthew Frank Rajeev Barua, and Saman Amarasinghe. Parallelizing applications into silicon. In IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM), 1999.]]
[10]
Daniel W. Bailey and Bradley J. Benschneider. Clocking design and analysis for a 600-MHz Alpha microprocessor. IEEE Journal of Solid-State Circuits, 33(11):1627, November 1998.]]
[11]
Micah Beck, Richard Johnson, and Keshav Pingali. From control flow to data flow. Journal of Parallel and Distributed Computing, 12:118--129, 1991.]]
[12]
Kees van Berkel and Martin Rem. VLSI programming of asynchronous circuits for low power. In Graham Birtwistle and Al Davis, editors, Asynchronous Digital Circuit Design, Workshops in Computing, pages 152--210. Springer Verlag, 1995. summary at www.cse.ttu.edu.tw/ cheng/courses/soc/S02/AsyncSoc08.ppt; also Nat.Lab. Technical Note Nr. UR 005/94, Philips Research Laboratories, Eindhoven, the Netherlands.]]
[13]
R. Brayton, A. Sangiovanni-Vincentelli, G. Hachtel, and C. McMullin. Logic Minimization Algorithms for Digital Circuits. Kluwer Academic Publishers, Boston, MA, 1984.]]
[14]
C.F. Brej and J.D. Garside. Early output logic using anti-tokens. In International Workshop on Logic Synthesis, pages 302--309, May 2003.]]
[15]
David Brooks, Vivek Tiwari, and Margaret Martonosi. Wattch: a framework for architectural-level power analysis and optimizations. In International Symposium on Computer Architecture (ISCA), pages 83--94. ACM Press, 2000.]]
[16]
Mihai Budiu. Spatial Computation. PhD thesis, Carnegie Mellon University, Computer Science Department, December 2003. Technical report CMU-CS-03-217.]]
[17]
Mihai Budiu and Seth Copen Goldstein. Compiling application-specific hardware. In International Conference on Field Programmable Logic and Applications (FPL), pages 853--863, Montpellier (La Grande-Motte), France, September 2002.]]
[18]
Mihai Budiu and Seth Copen Goldstein. Optimizing memory accesses for spatial computation. In International ACM/IEEE Symposium on Code Generation and Optimization (CGO), pages 216--227, San Francisco, CA, March 23-26 2003.]]
[19]
Mihai Budiu and Seth Copen Goldstein. Inter-iteration scalar replacement in the presence of conditional control-flow. Technical Report CMU-CS-04-103, Carnegie Mellon University, Department of Computer Science, 2004.]]
[20]
Mihai Budiu, Mahim Mishra, Ashwin Bharambe, and Seth Copen Goldstein. Peer-to-peer hardware-software interfaces for reconfigurable fabrics. In IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM), pages 57--66, Napa Valley, CA, April 2002.]]
[21]
Doug Burger and Todd M. Austin. The SimpleScalar tool set, version 2.0. In Computer Architecture News, volume 25, pages 13--25. ACM SIGARCH, June 1997.]]
[22]
Timothy J. Callahan and John Wawrzynek. Instruction level parallelism for reconfigurable computing. In Hartenstein and Keevallik, editors, International Conference on Field Programmable Logic and Applications (FPL), volume 1482 of Lecture Notes in Computer Science, Tallinin, Estonia, September 1998. Springer-Verlag.]]
[23]
Joao M. P. Cardoso and Markus Weinhardt. PXPP-VC: A C compiler with temporal partitioning for the PACT-XPP architecture. In International Conference on Field Programmable Logic and Applications (FPL), Montpellier (La Grande-Motte), France, September 2002.]]
[24]
Lori Carter, Beth Simon, Brad Calder, Larry Carter, and Jeanne Ferrante. Predicated static single assignment. In International Conference on Parallel Architectures and Compilation Techniques (PACT), October 1999.]]
[25]
Lori Carter, Beth Simon, Brad Calder, Larry Carter, and Jeanne Ferrante. Path analysis and renaming for predicated instruction scheduling. International Journal of Parallel Programming, special issue, 28(6), 2000.]]
[26]
Eylon Caspi, Michael Chu, Randy Huang, Joseph Yeh, Yury Markovskiy, Andre DeHon, and John Wawrzynek. Stream computations organized for reconfigurable execution (SCORE): Introduction and tutorial. In International Conference on Field Programmable Logic and Applications (FPL), Lecture Notes in Computer Science. Springer Verlag, 2000.]]
[27]
Tiberiu Chelcea and Steven M. Nowick. Resynthesis and peephole transformations for the optimization of large-scale asynchronous systems. In DAC, pages 405--410, New York, June 10--14 2002. ACM Press.]]
[28]
Fred Chow, Raymond Lo, Shin-Ming Liu, Sun Chan, and Mark Streich. Effective representation of aliases and indirect memory operations in SSA form. In International Conference on Compiler Construction (CC), pages 253--257, April 1996.]]
[29]
T.A.C.M. Claasen. High speed: not the only way to exploit the intrinsic computational power of silicon. In IEEE International Solid-State Circuits Conference, pages 22--25, San Francisco, CA, 1999. IEEE Catalog Number: 99CH36278.]]
[30]
Keith D. Cooper and Li Xu. An efficient static analysis algorithm to detect redundant memory operations. In Workshop on Memory Systems Performance (MSP '02), Berlin, Germany, June 2002.]]
[31]
Celoxica Corporation. Handel-C language reference manual, 2003.]]
[32]
CoWare, Inc. Flexible platform-based design with the CoWare N2C design system, October 2000.]]
[33]
David E. Culler and Arvind. Resource requirements of dataflow programs. In International Symposium on Computer Architecture (ISCA), pages 141--150, 1988.]]
[34]
R. Cytron, J. Ferrante, B. Rosen, M. Wegman, and K. Zadeck. Efficiently computing static single assignment form and the control dependence graph. ACM Transactions on Programming Languages and Systems (TOPLAS), 13(4):451--490, 1991.]]
[35]
Ron Cytron and Reid Gershbein. Efficient accommodation of may-alias information in SSA form. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 36--45. ACM Press, 1993.]]
[36]
W. J. Dally and A. Chang. The role of custom design in ASIC chips. In Design Automation Conference (DAC), Los Angeles, CA, June 2000.]]
[37]
W. R. Davis, N. Zhang, K. Camera, D. Markovic, T. Smilkstein, M. J. Ammer, E. Yeo, S. Augsburger, B. Nikolic, and R. W. Brodersen. A design environment for high throughput, low power dedicated signal processing systems. IEEE Journal of Solid-State Circuits, 37(3):420--431, March 2002.]]
[38]
Andre DeHon. Very large scale spatial computing. In Third International Conference on Unconventional Models of Computation, 2002.]]
[39]
Jack B. Dennis. First version of a data flow procedure language. In Lecture Notes in Computer Science 19: Programming Symposium, pages 362--376. Springer-Verlag: Berlin, New York, 1974.]]
[40]
Pedro Diniz, Mary Hall, Joonseok Park, Byoungro So, and Heidi Ziegler. Bridging the gap between compilation and synthesis in the DEFACTO system. In Workshop on Languages and Compilers for Parallel Computing (LCPC), 2001.]]
[41]
Carl Ebeling, Darren C. Cronquist, Paul Franklin, Jason Secosky, and Stefan G. Berg. Mapping applications to the RaPiD configurable architecture. In IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM), 1997.]]
[42]
D. Edwards and A. Bardsley. Balsa: An asynchronous hardware synthesis language. The Computer J., 45(1):12--18, 2002.]]
[43]
Brian Fields, Rastislav Bodyk, and Mark D. Hill. Slack: Maximizing performance under technological constraints. In International Symposium on Computer Architecture (ISCA), pages 47--58, 2002.]]
[44]
David Mark Gallagher. Memory Disambiguation to Facilitate Instruction-Level Parallelism Compilation. PhD thesis, Graduate College of the University of Illinois at Urbana-Champaign, 1995.]]
[45]
Emden Gansner and Stephen North. An open graph visualization system and its applications to software engineering. Software Practice And Experience, 1(5), 1999. http://www.research.att.com/sw/tools/graphviz.]]
[46]
Guang R. Gao. A Pipelined Code Mapping Scheme for Static Data Flow Computers. PhD thesis, MIT Laboratory for Computer Science, 1986.]]
[47]
Varghese George, Hui Zhang, and Jan Rabaey. The design of a low energy FPGA. In International Symposium on Low-Power Design (ISLPED), pages 188--193. ACM Press, 1999.]]
[48]
A. Ghosh, J. Kunkel, and S. Liao. Hardware synthesis from C/C++. In Design, Automation and Test in Europe (DATE), pages 384--387, Munich, Germany, March 1999.]]
[49]
M. Gokhale and A. Marks. Automatic synthesis of parallel programs targeted to dynamically reconfigurable logic arrays. In W. Moore and W. Luk, editors, International Conference on Field Programmable Logic and Applications (FPL), pages 399--408, Oxford, England, August 1995. Springer.]]
[50]
M. Gokhale, J. Stone, J. Arnold, and M. Kalinowski. Stream-oriented FPGA computing in the Streams-C high level language. In IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM), pages 49--56, 2000.]]
[51]
Seth Copen Goldstein and Mihai Budiu. NanoFabrics: Spatial computing using molecular electronics. In International Symposium on Computer Architecture (ISCA), pages 178--189, Goteborg, Sweden, 2001.]]
[52]
Seth Copen Goldstein, Herman Schmit, Matthew Moe, Mihai Budiu, Srihari Cadambi, R. Reed Taylor, and Ronald Laufer. PipeRench: a coprocessor for streaming multimedia acceleration. In International Symposium on Computer Architecture (ISCA), pages 28--39, Atlanta, GA, 1999.]]
[53]
R. Gonzalez and M. Horowitz. Supply and threshold voltage scaling for low power CMOS. IEEE Journal of Solid-State Circuits, 32(8), August 1997.]]
[54]
Sumit Gupta, Nick Savoiu, Nikil Dutt, Rajesh Gupta, Alex Nicolau, Timothy Kam, Michael Kishinevsky, and Shai Rotem. Coordinated transformations for high-level synthesis of high performance microprocessor blocks. In Design Automation Conference (DAC), pages 898--903. ACM Press, 2002.]]
[55]
Sumit Gupta, Nick Savoiu, Sunwoo Kim, Nikil D. Dutt, Rajesh K. Gupta, and Alexandru Nicolau. Speculation techniques for high level synthesis of control intensive designs. In Design Automation Conference (DAC), pages 269--272, 2001.]]
[56]
R. Ho, K. Mai, and M. Horowitz. The future of wires. IEEE Journal, 89(4):490--504, April 2001.]]
[57]
Hoare. Communicating sequential processes. In C. A. A. Hoare and C. B. Jones (Ed.), Essays in Computing Science, Prentice Hall. 1989.]]
[58]
James C. Hoe and Arvind. Synthesis of operation-centric hardware descriptions. In IEEE/ACM International Conference on Computer-aided design (ICCAD), San Jose, California, November 2000.]]
[59]
Doug Johnson. Programming a Xilinx FPGA in "C". Xcell Quarterly Journal, 34, 1999.]]
[60]
Andrew Kay, Toshio Nomura, Akihisa Yamada, Koichi Nishida, Ryoji Sakurai, and Takashi Kambe. Hardware synthesis with Bach system. In IEEE International Symposium on Circuits and Systems (ISCAS), Orlando, 1999.]]
[61]
Brian W. Kernighan and Dennis M. Ritchie. The C Programming Language. Software Series. Prentice Hall, 2 edition, 1988.]]
[62]
H. T. Kung. Why systolic architectures? IEEE Computer, 15(1):37--46, 1982.]]
[63]
Monica S. Lam and Robert P. Wilson. Limits of control flow on parallelism. In International Symposium on Computer Architecture (ISCA), 1992.]]
[64]
Christopher Lapkowski and Laurie J. Hendren. Extended SSA numbering: Introducing SSA properties to languages with multi-level pointers. In the 1998 International Conference on Compiler Construction, volume 1383 of Lecture Notes in Computer Science, pages 128--143, March 1998.]]
[65]
Luciano Lavagno and Ellen Sentovich. ECL: A specification environment for system-level design. In Design Automation Conference (DAC), pages 511--516, New Orleans, LA, June 1999.]]
[66]
Chunho Lee, Miodrag Potkonjak, and William H. Mangione-Smith. MediaBench: a tool for evaluating and synthesizing multimedia and communications systems. In IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 330--335, 1997.]]
[67]
Walter Lee, Rajeev Barua, Matthew Frank, Devabhaktuni Srikrishna, Jonathan Babb, Vivek Sarkar, and Saman Amarasinghe. Space-time scheduling of instruction-level parallelism on a Raw machine. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 46--57, 1998.]]
[68]
Yanbing Li, Tim Callahan, Ervan Darnell, Randolph Harr, Uday Kurkure, and Jon Stockwood. Hardware-software co-design of embedded reconfigurable architectures. In Design Automation Conference (DAC), 2000.]]
[69]
Stan Liao, Steven W. K. Tjiang, and Rajesh Gupta. An efficient implementation of reactivity for modeling hardware in the Scenic design environment. In Design Automation Conference (DAC), pages 70--75, 1997.]]
[70]
Andrew Matthew Lines. Pipelined asynchronous circuits. Master's thesis, California Institute of Technology, Computer Science Department, 1995. CS-TR-95-21.]]
[71]
Raymond Lo, Fred Chow, Robert Kennedy, Shin-Ming Liu, and Peng Tu. Register promotion by sparse partial redundancy elimination of loads and stores. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 26--37. ACM Press, 1998.]]
[72]
John Lu and Keith D. Cooper. Register promotion in C programs. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 308--319. ACM Press, 1997.]]
[73]
Scott A. Mahlke, Richard E. Hauk, James E. McCormick, David I. August, and Wen mei W. Hwu. A comparison of full and partial predicated execution support for ILP processors. In International Symposium on Computer Architecture (ISCA), pages 138--149, Santa Margherita Ligure, Italy, May 1995. ACM.]]
[74]
Scott A. Mahlke, David C. Lin, William Y. Chen, Richard E. Hank, and Roger A. Bringmann. Effective compiler support for predicated execution using the hyperblock. In International Symposium on Computer Architecture (ISCA), pages 45--54, Dec 1992.]]
[75]
Ken Mai, Tim Paaske, Nuwan Jayasena, Ron Ho, William J. Dally, and Mark Horowitz. Smart memories: A modular reconfigurable architecture. In International Symposium on Computer Architecture (ISCA), June 2000.]]
[76]
A. J. Martin. Programming in VLSI: From communicating processes to delay-insensitive circuits. In C. A. R. Hoare, editor, Developments in Concurrency and Communication, UT Year of Programming Series, pages 1--64. Addison-Wesley, 1990.]]
[77]
Alain J. Martin, Mika Nystrm, Karl Papadantonakis, Paul I. Penzes, Piyush Prakash, Catherine G. Wong, Jonathan Chang, Kevin S. Ko, Benjamin Lee, Elaine Ou, James Pugh, Eino-Ville Talvala, James T. Tong, and Ahmet Tura. The Lutonium: A sub-nanojoule asynchronous 8051 microcontroller. In International Symposium on Advanced Research in Asynchronous Circuits and Systems (ASYNC), May 2003.]]
[78]
Tsutomu Maruyama and Tsutomu Hoshino. A C to HDL compiler for pipeline processing on FPGAs. In IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM), 2000.]]
[79]
D. May. OCCAM. SIGPLAN Notices, 18(4):69--79, May 1983.]]
[80]
Giovanni De Micheli. Hardware synthesis from C/C++ models. In Design, Automation and Test in Europe (DATE), Munich, Germany, 1999.]]
[81]
David E. Muller and W. S. Bartky. A theory of asynchronous circuits. In International Symposium on the Theory of Switching Functions, pages 204--243, 1959.]]
[82]
Karl J. Ottenstein, Robert A. Ballance, and Arthur B. Maccabe. The program dependence web: a representation supporting control-, data-, and demand-driven interpretation of imperative languages. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 257--271, 1990.]]
[83]
Keshav Pingali, Micah Beck, Richard Johnson, Mayan Moudgill, and Paul Stodghill. Dependence flow graphs: An algebraic approach to program dependencies. In ACM Symposium on Principles of Programming Languages (POPL), volume 18, 1991.]]
[84]
Rahul Razdan and Michael D. Smith. A high-performance microarchitecture with hardware-programmed functional units. In IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 172--180, November 1994.]]
[85]
Robert B. Reese, Mitch A. Thornton, and Cherrice Traver. Arithmetic logic circuits using self-timed bit level dataflow and early evaluation. In International Conference on Computer Design (ICCD), page 18, Austin, TX, September 23-26 2001.]]
[86]
R. Rinker, M. Carter, A. Patel, M. Chawathe, C. Ross, J. Hammes, W. Najjar, and A.P.W. Bohm. An automated process for compiling dataflow graphs into hardware. IEEE Transactions on VLSI, 9 (1), February 2001.]]
[87]
Scott Rixner, William J. Dally, Ujval J. Kapasi, Brucek Khailany, Abelardo Lopez-Lagunas, Peter R. Mattson, and John D. Owens. A bandwidth-efficient architecture for media processing. In IEEE/ACM International Symposium on Microarchitecture (MICRO), December 1998.]]
[88]
Ray Roth and Dinesh Ramanathan. A high-level design methodology using C++. In IEEE International High Level Design Validation and Test Workshop, November 1999.]]
[89]
K. Sankaralingam, R. Nagarajan, D.C. Burger, and S.W. Keckler. A technology-scalable architecture for fast clocks and high ILP. In Workshop on the Interaction of Compilers and Computer Architecture, January 2001.]]
[90]
P. Schaumont, S. Vernalde, L. Rijnders, M. Engels, and I. Bolsens. A programming environment for the design of complex high speed ASICs. In Design Automation Conference (DAC), pages 315--320, San Francisco, June 1998.]]
[91]
Klaus E. Schauser and Seth C. Goldstein. How much non-strictness do lenient programs require? In International Conference on Functional Programming Languages and Computer Architecture, pages 216--225. ACM Press, 1995.]]
[92]
M. Schlansker, T.M. Conte, J. Dehnert, K. Ebcioglu, J.Z. Fang, and C.L. Thompson. Compilers for instruction-level parallelism. IEEE Computer, 30(12):63--69, 1997. This was a report from a cross-industry task force on ILP.]]
[93]
R. Schreiber, S. Aditya (Gupta), B.R. Rau, S. Mahlke, V. Kathail, B. Ra. Rau, D. Cronquist, and M. Sivaraman. PICO-NPA: High-level synthesis of nonprogrammable hardware accelerators. Journal of VLSI Signal Processing, 2001.]]
[94]
Luc Semeria, Koichi Sato, and Giovanni De Micheli. Synthesis of hardware models in C with pointers and complex data structures. IEEE Transactions on VLSI, 2001.]]
[95]
Greg Snider, Barry Shackleford, and Richard J. Carter. Attacking the semantic gap between application programming languages and configurable hardware. In ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA), pages 115--124. ACM Press, 2001.]]
[96]
Donald Soderman and Yuri Panchul. Implementing C algorithms in reconfigurable hardware using C2Verilog. In Kenneth L. Pocek and Jeffrey Arnold, editors, IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM), pages 339--342, Los Alamitos, CA, April 1998. IEEE Computer Society Press.]]
[97]
Bjarne Steensgaard. Sparse functional stores for imperative programs. In ACM SIGPLAN Workshop on Intermediate Representations, pages 62--70, 1995.]]
[98]
Ivan Sutherland. Micropipelines: Turing award lecture. Communications of the ACM, 32 (6):720--738, June 1989.]]
[99]
Steven Swanson, Ken Michelson, and Mark Oskin. WaveScalar. Technical Report 2003-01-01, Washington University at Seattle, Computer Science Department, January 2003.]]
[100]
A. Takayama, Y. Shibata, K. Iwai, H. Miyazaki, K. Higure, and X.-P. Ling. Implementation and evaluation of the compiler for WASMII, a virtual hardware system. In International Workshop on Parallel Processing, pages 346--351, 1999.]]
[101]
John Teifel and Rajit Manohar. Static tokens: Using dataflow to automate oncurrent pipeline synthesis. In International Symposium on Advanced Research in Asynchronous Circuits and Systems (ASYNC), pages 17--27, Heraklion, Crete, Greece, April 2004.]]
[102]
Herve Touati and Mark Shand. PamDC: a C++ library for the simulation and generation of Xilinx FPGA designs. http://research.compaq.com/SRC/pamette/PamDC.pdf, 1999.]]
[103]
Y-F. Tsai, D. Duarte, N. Vijaykrishnan, and M.J. Irwin. Implications of technology scaling on leakage reduction techniques. In Design Automation Conference (DAC), San Diego, CA, June 2004.]]
[104]
Kees van Berkel. Handshake Circuits: An Asynchronous Architecture for VLSI Programming, volume 5 of Intl. Series on Parallel Computation. Cambridge University Press, 1993.]]
[105]
A. H. Veen and R. van den Born. The RC compiler for the DTN dataflow computer. Journal of Parallel and Distributed Computing, 10:319--332, 1990.]]
[106]
Arthur H. Veen. Dataflow machine architecture. ACM Computing Surveys, 18 (4):365--396, 1986.]]
[107]
Girish Venkataramani, Mihai Budiu, and Seth Copen Goldstein. C to asynchronous dataflow circuits: An end-to-end toolflow. In International Workshop on Logic Syntheiss, Temecula, CA, June 2004.]]
[108]
John von Neumann. First draft of a report on the EDVAC. Contract No. W-670-ORD-492, Moore School of Electrical Engineering, University of Pennsylvania, Philadelphia. Reprinted (in part) in Randell, Brian. 1982. Origins of Digital Computers: Selected Papers, Springer-Verlag, Berlin Heidelberg, June 1945.]]
[109]
Kazutoshi Wakabayashi and Takumi Okamoto. C-based SoC design flow and EDA tools: An ASIC and system vendor perspective. IEEE Transactions on Computer-Aided Design, 19(12):1507--1522, December 2000.]]
[110]
M. Wazlowski, L. Agarwal, T. Lee, A. Smith, E. Lam, P. Athanas, H. Silverman, and S. Ghosh. PRISM-II compiler and architecture. In IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM), pages 9--16, Napa Valley, CA, Apr 1993.]]
[111]
Robert P. Wilson, Robert S. French, Christopher S. Wilson, Saman P. Amarasinghe, Jennifer M. Anderson, Steve W. K. Tjiang, Shih-Wei Liao, Chau-Wen Tseng, Mary W. Hall, Monica S. Lam, and John L. Hennessy. SUIF: An infrastructure for research on parallelizing and optimizing compilers. In ACM SIGPLAN Notices, volume 29, pages 31--37, December 1994.]]
[112]
Niklaus Wirth. Hardware compilation: Translating programs into circuits. IEEE Computer, 31 (6):25--31, June 1998.]]
[113]
M. J. Wirthlin and B. L. Hutchings. A dynamic instruction set computer. In P. Athanas and K. L. Pocek, editors, IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM), pages 99--107, Napa, CA, April 1995.]]
[114]
R. D. Wittig and P. Chow. OneChip: An FPGA processor with reconfigurable logic. In J. Arnold and K. L. Pocek, editors, IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM), pages 126--135, Napa, CA, April 1996.]]
[115]
Alex Zhi Ye, Andreas Moshovos, Scott Hauck, and Prithviraj Banerjee. CHIMAERA: A high-performance architecture with a tightly-coupled reconfigurable unit. In International Symposium on Computer Architecture (ISCA), ACM Computer Architecture News. ACM Press, 2000.]]
[116]
Ning Zhang and Bob Brodersen. The cost of flexibility in systems on a chip design for signal processing applications. http://bwrc.eecs.berkeley.edu/Classes/EE225C/Papers/arch design.doc, Spring 2002.]]

Cited By

View all
  • (2011)10x10: A General-purpose Architectural Approach to Heterogeneity and Energy EfficiencyProcedia Computer Science10.1016/j.procs.2011.04.2174(1987-1996)Online publication date: 2011
  • (2010)Impact of high-level transformations within the ROCCC frameworkACM Transactions on Architecture and Code Optimization10.1145/1880043.18800447:4(1-36)Online publication date: 30-Dec-2010
  • (2023)Towards Efficient Control Flow Handling in Spatial Architecture via Architecting the Control Flow PlaneProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614246(1395-1408)Online publication date: 28-Oct-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGARCH Computer Architecture News
ACM SIGARCH Computer Architecture News  Volume 32, Issue 5
ASPLOS 2004
December 2004
283 pages
ISSN:0163-5964
DOI:10.1145/1037947
Issue’s Table of Contents
  • cover image ACM Conferences
    ASPLOS XI: Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
    October 2004
    296 pages
    ISBN:1581138040
    DOI:10.1145/1024393
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 October 2004
Published in SIGARCH Volume 32, Issue 5

Check for updates

Author Tags

  1. application-specific hardware
  2. dataflow machine
  3. low-power
  4. spatial computation

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)60
  • Downloads (Last 6 weeks)2
Reflects downloads up to 19 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2011)10x10: A General-purpose Architectural Approach to Heterogeneity and Energy EfficiencyProcedia Computer Science10.1016/j.procs.2011.04.2174(1987-1996)Online publication date: 2011
  • (2010)Impact of high-level transformations within the ROCCC frameworkACM Transactions on Architecture and Code Optimization10.1145/1880043.18800447:4(1-36)Online publication date: 30-Dec-2010
  • (2023)Towards Efficient Control Flow Handling in Spatial Architecture via Architecting the Control Flow PlaneProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614246(1395-1408)Online publication date: 28-Oct-2023
  • (2022)A Survey of FPGA-Based Vision Systems for Autonomous CarsIEEE Access10.1109/ACCESS.2022.323028210(132525-132563)Online publication date: 2022
  • (2022)Overview of SDCSoftware Defined Chips10.1007/978-981-19-6994-2_2(27-76)Online publication date: 21-Oct-2022
  • (2021)Fifer: Practical Acceleration of Irregular Applications on Reconfigurable ArchitecturesMICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3466752.3480048(1064-1077)Online publication date: 18-Oct-2021
  • (2021)SARA: Scaling a Reconfigurable Dataflow Accelerator2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA52012.2021.00085(1041-1054)Online publication date: Jun-2021
  • (2021)Fluid: An Asynchronous High-level Synthesis Tool for Complex Program Structures2021 27th IEEE International Symposium on Asynchronous Circuits and Systems (ASYNC)10.1109/ASYNC48570.2021.00009(1-8)Online publication date: Sep-2021
  • (2020)SOFF: An OpenCL High-Level Synthesis Framework for FPGAs2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA45697.2020.00034(295-308)Online publication date: May-2020
  • (2020)A Hybrid Systolic-Dataflow Architecture for Inductive Matrix Algorithms2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA47549.2020.00063(703-716)Online publication date: Feb-2020
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media