ABSTRACT
Big data analytics using the JVM-based MapReduce framework has become a popular approach to address the explosive growth of data sizes. Adopting FPGAs in datacenters as accelerators to improve performance and energy efficiency also attracts increasing attention. However, the integration of FPGAs into such JVM-based frameworks raises the challenge of poor programmability. Programmers must not only rewrite Java/Scala programs to C/C++ or OpenCL, but, to achieve high performance, they must also take into consideration the intricacies of FPGAs. To address this challenge, we present S2FA (Spark-to-FPGA-Accelerator), an automation framework that generates FPGA accelerator designs from Apache Spark programs written in Scala. S2FA bridges the semantic gap between object-oriented languages and HLS C while achieving high performance using learning-based design space exploration. Evaluation results show that our generated FPGA designs achieve up to 49.9× performance improvement for several machine learning applications compared to their corresponding implementations on the JVM.
- Amazon EC2 F1 Instance. https://aws.amazon.com/ec2/instance-types/f1/.Google Scholar
- Apache Hadoop. http://hadoop.apache.org/.Google Scholar
- Aparapi in amd developer website. https://github.com/aparapi/aparapi.Google Scholar
- Falcon Computing Solutions, Inc. http://falcon-computing.com/.Google Scholar
- Rose Compiler Infrastructure. http://rosecompiler.org/.Google Scholar
- Xilinx SDx. www.xilinx.com/products/design-tools/software-zone/sdaccel.html.Google Scholar
- J. Ansel et al. 2014. OpenTuner: An Extensible Framework for Program Autotuning. In PACT. Google ScholarDigital Library
- Y.-T. Chen et al. 2016. When Spark Meets FPGAs: A Case Study for Next-Generation DNA Sequencing Acceleration. In HotCloud. Google ScholarDigital Library
- J. Cong et al. 2016. Source-to-Source Optimization for HLS. In FPGAs for Software Programmers. Springer International Publishing. Google ScholarDigital Library
- J. Cong et al. 2016. Software Infrastructure for Enabling FPGA-Based Accelerations in Data Centers: Invited Paper. In ISLPED. Google ScholarDigital Library
- J. Cong et al. 2011. High-Level Synthesis for FPGAs: From Prototyping to Deployment. TCAD. Google ScholarDigital Library
- J. Dean et al. 2008. MapReduce: Simplified Data Processing on Large Clusters. OSDI. Google ScholarDigital Library
- Á. Fialho et al. 2010. Analyzing bandit-based adaptive operator selection mechanisms. Ann Math Artif Intell. Google ScholarDigital Library
- M. Huang et al. 2016. Programming and Runtime Support to Blaze FPGA Accelerator Deployment at Datacenter Scale. In SoCC. Google ScholarDigital Library
- D. Koeplinger et al. 2016. Automatic Generation of Efficient Accelerators for Reconfigurable Hardware. In ISCA. Google ScholarDigital Library
- H.-Y. Liu et al. 2013. On learning-based methods for design-space exploration with high-level synthesis. In DAC. Google ScholarDigital Library
- R. Prabhakar et al. 2016. Generating Configurable Hardware from Parallel Patterns. ASPLOS. Google ScholarDigital Library
- A. Putnam et al. 2014. A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services. In ISCA. Google ScholarDigital Library
- R. Rodríguez et al. 2012. Image segmentation via an iterative algorithm of the mean shift filtering for different values of the stopping threshold. IJIR (2012).Google Scholar
- B. C. Schafer et al. 2012. Machine learning predictive modelling high-level synthesis design space exploration. IET CDT (2012).Google Scholar
- O. Segal et al. 2015. SparkCL: A Unified Programming Framework for Accelerators on Heterogeneous Clusters. CoRR.Google Scholar
- C. E. Shannon. 2001. A mathematical theory of communication. ACM MC2R. Google ScholarDigital Library
- T. F. Smith et al. 1981. Identification of common molecular subsequences. JMB.Google Scholar
- Z. Wang et al. 2016. A performance analysis framework for optimizing OpenCL applications on FPGAs. In HPCA.Google Scholar
- Z. Wang et al. 2016. Melia: A MapReduce Framework on OpenCL-based FPGAs. TPDS. Google ScholarDigital Library
- C. Xu et al. 2017. A Parallel Bandit-Based Approach for Autotuning FPGA Compilation. In FPGA. Google ScholarDigital Library
- S. Xydis et al. 2015. SPIRIT: Spectral-Aware pareto iterative refinement optimization for supervised high-level synthesis. TCAD (2015).Google Scholar
- M. Zaharia et al. 2010. Spark: Cluster Computing with Working Sets. In HotCloud. Google ScholarDigital Library
- G. Zhong et al. 2014. Design space exploration of multiple loops on FPGAs using high level synthesis. In ICCD.Google Scholar
- W. Zuo et al. 2013. Improving Polyhedral Code Generation for High-level Synthesis. In CODES+ISSS. Google ScholarDigital Library
Recommendations
S2FA: An Accelerator Automation Framework for Heterogeneous Computing in Datacenters
2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)Big data analytics using the JVM-based MapReduce framework has become a popular approach to address the explosive growth of data sizes. Adopting FPGAs in datacenters as accelerators to improve performance and energy efficiency also attracts increasing ...
Synthesizable Standard Cell FPGA Fabrics Targetable by the Verilog-to-Routing CAD Flow
Special Section on Field Programmable Logic and Applications 2015 and Regular PapersIn this article, we consider implementing field-programmable gate arrays (FPGAs) using a standard cell design methodology and present a framework for the automated generation of synthesizable FPGA fabrics. The open-source Verilog-to-Routing (VTR) FPGA ...
Comments