### Design and Evaluation of a Scalable Engine for 3D-FFT Computation in an FPGA Cluster

#### Abstract

The Three Dimensional Fast Fourier Transform (3D-FFT) is commonly used to solve the partial differential equations describing the system evolution in several physical phenomena, such as the motion of viscous fluids described by the Navierâ€“Stokes equations. Simulation of such problems requires the use of a parallel High-Performance Computing architecture since the size of the problem grows with the cube of the FFT size, and the representation of the single point comprises several double precision floating- point complex numbers. Modern High-Performance Computing (HPC) systems are considering the inclusion of FPGAs as components of this computing architecture because they can combine effective hardware acceleration capabilities and dedicated communication facilities. Furthermore, the network topology can be optimized for the specific calculation that the cluster must perform, especially in the case of algorithms limited by the data exchange delay between the processors. In this paper, we explore an HPC design that uses FPGA accelerators to compute the 3DFFT. We devise a scalable FFT engine based on a custom radix-2 double-precision core that is used to implement the Decimation in Frequency version of the Cooleyâ€“Tukey FFT algorithm. The FFT engine can be adapted to different technology constraints and networking topologies by adjusting the number of cores and configuration parameters in order to minimize the overall calculation time. We compare the various possible configurations with the technological limits of available hardware. Finally, we evaluate the bandwidth required for continuous FFT execution in the APEnet toroidal mesh network.Â

#### Keywords

#### Full Text:

PDF#### References

Giardino, D., Matta, M., Silvestri, F., SpanÃ², S., & Trobiani, V. â€œFPGA implementation of hand-written number recognition based on CNN.â€ 2019 International Journal on Advanced Science, Engineering and Information Technology, 9(1).

Ismail, A. R., & Zarir, A. A., â€œConvolutional neural networks and deep belief networks for analyzing the imbalanced class issue in handwritten datasetâ€ International Journal on Advanced Science, Engineering and Information Technology, vol. 7(6), 2302-2307, 2017.

Cardarilli, G. C., Cristini, A., Di Nunzio, L., Re, M., Salerno, M., & Susi, G. â€œSpiking neural networks based on LIF with latency: Simulation and synchronization effects,â€ 2013 IEEE Asilomar Conference on Signals, Systems and Computers, 1838-1842

G. C. Cardarilli, L. Di Nunzio, R. Fazzolari, D. Giardino, M. Matta, M. Patetta, M. Re, S. SpanÃ² â€œApproximated computing for low power Neural Networksâ€ 2019 Telkomnika (Telecommunication Computing Electronics and Control), 17 (3), ARTICLE IN PRESS

Esposito, A., Lomuscio, A., Cardarilli, G. C., Di Nunzio, L., Fazzolari, R., Nannarelli, A., & Re, M. â€œDynamically-loaded hardware libraries (HLL) technology for audio applications,â€ 2017 Conference Record - Asilomar Conference on Signals, Systems, and Computers, 882-886.

Abhishek, S., Veni, S., & Narayanankutty, K. A. â€œSplines in Compressed Sensing.â€ International Journal on Advanced Science, Engineering and Information Technology, 6(4), 469-476, 2016.

Tan, S. Y., Arshad, H., & Abdullah, A.. An efficient and robust mobile augmented reality application. International Journal on Advanced Science, Engineering and Information Technology, 8(4-2), 1672-1678, 2018.

Castro, F. L., De Luca, M., & Iarossi, S. â€œSimulation of an Ultrasonic Flow Meter for Liquids,â€ Sensors (pp. 397-402). Springer, Cham, 2015.

Waheeb, W., & Ghazali, R., â€œChaotic time series forecasting using higher-order neural networks,â€ International Journal on Advanced Science, Engineering and Information Technology, 6(5), 624-629, 2016.

Mustaffa, Z., Sulaiman, M. H., Rohidin, D., Ernawan, F., & Kasim, S. Time Series Predictive Analysis based on Hybridization of Meta-heuristic Algorithms. International Journal on Advanced Science, Engineering and Information Technology, 8(5), 1919-1925, 2018.

Bostanbekov, K., Nurseitov, D., & Kim, D. Risk Assessment Model of Technogenic Pollution of the Environment from Oil Spill in the Northern Caspian Sea. International Journal on Advanced Science, Engineering and Information Technology, 8(1), 37-43, 2018.

Lim, S. Y., Fotsing, P. T., Almasri, A., Musa, O., Kiah, M. L. M., Ang, T. F., & Ismail, R. Blockchain Technology the Identity Management and Authentication Service Disruptor: A Survey. International Journal on Advanced Science, Engineering and Information Technology, 8(4-2), 1735-1745, 2018.

Benedetti, I., Giuliano, R., Lodovisi, C., & Mazzenga, F. â€œ5G wireless dense access network for automotive applications: Opportunities and costs.â€, In 2017 IEEE International Conference of Electrical and Electronic Technologies for Automotive (pp. 1-6).

G. C. Cardarilli, L. Di Nunzio, R. Fazzolari, M. Re and R. B. Lee, "Integration of butterfly and inverse butterfly nets in embedded processors: Effects on power saving," 2012 Conference Record of the Forty Sixth Asilomar Conference on Signals, Systems and Computers (ASILOMAR), Pacific Grove, CA, 2012, pp. 1457-1459.

Cardarilli, G.C., Di Nunzio, L., Fazzolari, R., Re, M., Silvestri, F. and SpanÃ², S. 2018, â€œEnergy consumption saving in embedded microprocessors using hardware acceleratorsâ€, Telkomnika (Telecommunication Computing Electronics and Control), vol. 16, no. 3, pp. 1019-1026.

Caulfield, A. M., Chung, E. S., Putnam, A., Angepat, H., Fowers, J., Haselman, M. â€œA cloud-scale acceleration architecture.â€ In 2016 IEEE/ACM International Symposium on Microarchitecture (p. 7)

https://en.wikipedia.org/wiki/Fast_Fourier_transform

Vu, D. T., & Linh, N. M. â€œSolving Navier-Stokes Equation Using FPGA Cellular Neural Network Chi.â€ In 2016 International Conference on Advances in Information and Communication Technology (pp. 562-571). Springer, Cham,

Chin, M., Herbordt, M. C., & Langhammer, M. â€œPerformance potential of molecular dynamics simulations on the high-performance reconfigurable computing system.â€ In 2008 IEEE Second International Workshop on High-Performance Reconfigurable Computing Technology and Applications (pp. 1-10).

Lanotte, A. S., Benzi, R., Malapaka, S. K., Toschi, F., & Biferale, L. Turbulence on a fractal Fourier set. Physical review letters, 115(26), 2015.

Giuliano, R., Mazzenga, F., Neri, A., & Vegni, A. M. â€œSecurity access protocols in IoT capillary networks,â€ IEEE Internet of Things Journal, 4(3), 645-657, 2017

Giuliano, R., Mazzenga, F., Neri, A., & Vegni, A. M., â€œSecurity access protocols in IoT networks with heterogenous non-IP terminals. Paper presented at the Proceedingsâ€, IEEE International Conference on Distributed Computing in Sensor Systems, DCOSS 2014, 257-262.

Bracciale, L., Loreti, P., Detti, A., Paolillo, R., & Melazzi, N. B.. â€œLightweight Named Object: an ICN-based Abstraction for IoT Device Programming and Managementâ€ IEEE Internet of Things Journal, 2019

Benedetti, I., Giuliano, R., Lodovisi, C., & Mazzenga, F., â€œ5G wireless dense access network for automotive applications: Opportunities and costs.â€ In 2017 IEEE International Conference of Electrical and Electronic Technologies for Automotive (pp. 1-6).

Mazzenga, F., Giuliano, R., & Vatalaro, F, â€œFttC-based fronthaul for 5G dense/ultra-dense access network: Performance and costs in realistic scenariosâ€. Future Internet, 9(4) 2017

Detti, A., Bracciale, L., Loreti, P., Rossi, G., & Melazzi, N. B. â€œA cluster-based scalable router for information-centric networks,â€ Computer Networks, 142, 24-32, 2018.

Detti, A., Orru, M., Paolillo, R., Rossi, G., Loreti, P., Bracciale, L., & Melazzi, N. B. â€œApplication of information centric networking to nosql databases: the spatio-temporal use caseâ€ In 2017 IEEE International Symposium on Local and Metropolitan Area Networks LANMAN (pp. 1-6).

Amendola, R., et al. "APENet: a high speed, low latency 3D interconnect network." 2004 IEEE International Conference on Cluster Computing.

Muslim, F. B., Ma, L., Roozmeh, M., & Lavagno, L. â€œEfficient FPGA implementation of OpenCL high-performance computing applications via high-level synthesisâ€ IEEE Access, 5, 2747-2762, 2017.

Sheng, Jiayi, et al. "Design of 3D FFTs with FPGA clusters." 2014 IEEE High-Performance Extreme Computing Conference (HPEC).

Sheng, Jiayi, et al. "HPC on FPGA clouds: 3D FFTs and implications for molecular dynamics." 2017 27th International Conference on Field Programmable Logic and Applications (FPL).

Amendola, Roberto, et al. "APEnet+: a 3D Torus network optimized for GPU-based HPC Systems." Journal of Physics: Conference Series. Vol. 396. No. 4. IOP Publishing, 2012.

Amendola, Roberto, et al. "Latest generation interconnect technologies in APEnet+ networking infrastructure." Journal of Physics: Conference Series. Vol. 898. No. 8. IOP Publishing, 2017.

Joshi, Shubhangi M. "FFT architectures: a review." International Journal of Computer Applications 116.7 (2015).

Ayinala, Manohar, Michael Brown, and Keshab K. Parhi. "Pipelined parallel FFT architectures via folding transformation." IEEE Transactions on Very Large Scale Integration (VLSI) Systems 20.6 (2012): 1068-1081.

Garrido, Mario, Keshab K. Parhi, and JesÃºs Grajal. "A pipelined FFT architecture for real-valued signals." IEEE Transactions on Circuits and Systems I: Regular Papers 56.12 (2009): 2634-2643.

Garrido, Mario, et al. "Pipelined radix-2k feedforward FFT architectures." IEEE Transactions on Very Large Scale Integration (VLSI) Systems 21.1 (2013): 23-32.

Pekurovsky, D. (2012). P3DFFT: A framework for parallel computations of Fourier transforms in three dimensions. SIAM Journal on Scientific Computing, 34(4), C192-C209.

DOI: http://dx.doi.org/10.18517/ijaseit.9.2.8308

### Refbacks

- There are currently no refbacks.

Published by INSIGHT - Indonesian Society for Knowledge and Human Development