ACM Home Page
Please provide us with feedback. Feedback
Optimizing instruction-set extensible processors under data bandwidth constraints
Full text PdfPdf (330 KB)
Source Design, Automation, and Test in Europe archive
Proceedings of the conference on Design, automation and test in Europe table of contents
Nice, France
SESSION: Application-specific architectures table of contents
Pages: 588 - 593  
Year of Publication: 2007
ISBN:978-3-9810801-2-4
Authors
Kubilay Atasu  Imperial College London and Bogazici University, Istanbul
Robert G. Dimond  Imperial College London
Oskar Mencer  Imperial College London
Wayne Luk  Imperial College London
Can Özturan  Bogazici University, Istanbul
Günhan Dündar  Bogazici University, Istanbul
Sponsors
: IEEE Council on Electronic Design Automation (CEDA)
: The EDA Consortium
EDAA : European Design and Automation Association
SIGDA : ACM Design Automation
RAS : RAS
: The IEEE Computer Society TTTC
: ECSI
Publisher
EDA Consortium  San Jose, CA, USA
Bibliometrics
Downloads (6 Weeks): 3,   Downloads (12 Months): 46,   Citation Count: 0
Additional Information:

abstract   references   collaborative colleagues  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   

ABSTRACT

We present a methodology for generating optimized architectures for data bandwidth constrained extensible processors. We describe a scalable Integer Linear Programming (ILP) formulation, that extracts the most profitable set of instruction-set extensions given the available data bandwidth and transfer latency. Unlike previous approaches, we differentiate between number of inputs and outputs for instruction-set extensions and the number of register file ports. This differentiation makes our approach applicable to architectures that include architecturally visible state registers and dedicated data transfer channels. We support a comprehensive design space exploration to characterize the area/performance trade-offs for various applications. We evaluate our approach using actual ASIC implementations to demonstrate that our automatically customized processors meet timing within the target silicon area. For an embedded processor with only two register read ports and one register write port, we obtain up to 4.3 × speed-up with extensions incurring only a 35% area overhead.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Hog cplex. http://www.ilog.com/products/cplex/.
 
2
Mibench. http://www.eecs.umich.edu/mibench/.
 
3
Nauty package. http://cs.anu.edu.au/people/bdm/nauty.
 
4
Trimaran. http://www.trimaran.org.
5
6
 
7
8
9
10
 
11
12
13
14
15
Collaborative Colleagues:
Kubilay Atasu: colleagues
Robert G. Dimond: colleagues
Oskar Mencer: colleagues
Wayne Luk: colleagues
Can Özturan: colleagues
Günhan Dündar: colleagues