The finite-difference time-domain (FDTD) algorithm is a numerical stencil computation method, which is widely used in solving electromagnetic simulation problems. However, this algorithm is both computing and storage intensive, so the simulation efficiency is usually restricted in software implementation on CPUs. Recently, hardware accelerators have proved to be effective in improving the performance of various stencil computations. In this paper, we propose a hardware architecture of the 3D FDTD algorithm along with a practical convolutional perfectly matched layer (CPML) boundary condition and implement it on a field programmable gate array (FPGA). By applying the chain processing elements array and temporal parallel strategy, the proposed accelerator can achieve a maximum of 608 mega cells per second (Mcells/s), which is approximately 6 times higher than that of other reported designs on FPGAs. Moreover, the accelerator can maintain the speed above 467 Mcells/s for different grid sizes and CPML layers without modifying the hardware design, which demonstrates the performance stability and flexibility of the architecture under various applications.
2. Jensen, M. A. and Y. Rahmat-Samii, "Performance analysis of antennas for hand-held transceivers using FDTD," IEEE Transactions on Antennas and Propagation, Vol. 42, No. 8, 1106-1113, 1994.
3. Orjubin, G., F. Petit, E. Richalot, S. Mengue, and O. Picon, "Cavity losses modeling using lossless FDTD method," IEEE Transactions on Electromagnetic Compatibility, Vol. 48, No. 2, 429-431, 2006.
4. Ziolkowski, R. W., "The incorporation of microscopic material models into the FDTD approach for ultrafast optical pulse simulations," IEEE Transactions on Antennas and Propagation, Vol. 45, No. 3, 375-391, 1997.
5. Wang, X., W. Yin, Y. Yu, Z. Chen, J. Wang, and Y. Guo, "A Convolutional Perfect Matched Layer (CPML) for one-step leapfrog ADI-FDTD method and its applications to EMC problems," IEEE Transactions on Electromagnetic Compatibility, Vol. 54, No. 5, 1066-1076, 2012.
6. Mukherjee, B. and D. K. Vishwakarma, "Application of finite difference time domain to calculate the transmission coefficient of an electromagnetic wave impinging perpendicularly on a dielectric interface with modified MUR-I ABC," Defence Science Journal, Vol. 62, 228-235, 2012.
7. Sypek, P., A. Dziekonski, and M. Mrozowski, "How to render FDTD computations more effective using a graphics accelerator," IEEE Transactions on Magnetics, Vol. 45, No. 3, 1324-1327, 2009.
8. Zygiridis, T. T., "High-order error-optimized FDTD algorithm with GPU implementation," IEEE Transactions on Magnetics, Vol. 49, No. 5, 1809-1812, 2013.
9. Cicuttin, M., L. Codecasa, B. Kapidani, R. Specogna, and F. Trevisan, "GPU accelerated time-domain discrete geometric approach method for Maxwell’s Equations on tetrahedral grids," IEEE Transactions on Magnetics, Vol. 54, No. 3, 1-4, 2018.
10. Livesey, M., J. F. Stack, F. Costen, T. Nanri, N. Nakashima, and S. Fujino, "Development of a CUDA implementation of the 3D FDTD method," IEEE Antennas and Propagation Magazine, Vol. 54, No. 5, 186-195, 2012.
11. Jia, C., L. Guo, and P. Yang, "EM scattering from a target above a 1-D randomly rough sea surface using GPU-based parallel FDTD ," IEEE Antennas and Wireless Propagation Letters, Vol. 14, 217-220, 2015.
12. Lee, K. H., I. Ahmed, R. S.M. Goh, E. H. Khoo, E. P. Li, and T. G. G. Hung, "Implementation of the FDTD method based on Lorentz-Drude dispersive model on GPU for plasmonics applications," Progress In Electromagnetics Research, Vol. 116, 441-456, 2011.
13. Ghouwayel, A. A. and Y. Louet, "FPGA implementation of a re-configurable FFT for multi-standard systems in software radio context," IEEE Transactions on Consumer Electronics, Vol. 55, No. 2, 950-958, 2009.
14. Ingemarsson, C., P. Källström, F. Qureshi, and O. Gustafsson, "Efficient FPGA mapping of pipeline SDF FFT cores," IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Vol. 25, No. 9, 2486-2497, 2017.
15. Choi, S., et al., "Design of FPGA-based LZ77 compressor with runtime configurable compression ratio and throughput," IEEE Access, Vol. 7, 149583-149594, 2019.
16. Li, B., L. Zhang, Z. Shang, and Q. Dong, "Implementation of LZMA compression algorithm on FPGA," Electronics Letters, Vol. 50, No. 21, 1522-1524, 2014.
17. Nguyen, D. T., T. N. Nguyen, H. Kim, and H. Lee, "A high-throughput and power-efficient FPGA implementation of YOLO CNN for object detection," IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Vol. 27, No. 8, 1861-1873, 2019.
18. Guo, K., et al., "Angel-eye: A complete design flow for mapping CNN onto embedded FPGA," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol. 37, No. 1, 35-47, 2018.
19. Fujita, Y. and H. Kawaguchi, "Full-custom PCB implementation of the FDTD/FIT dedicated computer," IEEE Transactions on Magnetics, Vol. 45, No. 3, 1100-1103, 2009.
20. Okina, K., R. Soejima, K. Fukumoto, Y. Shibata, and K. Oguri, "Power performance profiling of 3-D stencil computation on an FPGA accelerator for efficient pipeline optimization," SIGARCH Comput. Archit. News, Vol. 43, No. 4, 9-14, 2015.
21. Sano, K., Y. Hatsuda, and S. Yamamoto, "Multi-FPGA accelerator for scalable stencil computation with constant memory bandwidth," IEEE Transactions on Parallel and Distributed Systems, Vol. 25, No. 3, 695-705, 2014.
22. Kawaguchi, H. and S. Matsuoka, "Conceptual design of 3-D FDTD dedicated computer with dataflow architecture for high performance microwave simulation," IEEE Transactions on Magnetics, Vol. 51, No. 3, Art No. 7202404, 2015.
23. Kawaguchi, H., "Improved architecture of FDTD dataflow machine for higher performance electromagnetic wave simulation," IEEE Transactions on Magnetics, Vol. 52, No. 3, Art No. 7206604, 2016.
24. Waidyasooriya, H. M., Y. Takei, S. Tatsumi, and M. Hariyama, "Open CL-based FPGA-platform for stencil computation and its optimization methodology," IEEE Transactions on Parallel and Distributed Systems, Vol. 28, No. 5, 1390-1402, 2017.
25. Roden, J. A. and S. D. Gedney, "Convolution PML (CPML): An efficient FDTD implementation of the CFS-PML for arbitrary media," Microw. Opt. Technol. Lett., Vol. 27, 334-339, 2000.
26. Giefers, H., C. Plessl, and J. Förstner, "Accelerating finite difference time domain simulations with reconfigurable dataflow computers," SIGARCH Comput. Archit. News, Vol. 41, No. 5, 65-70, 2014.
27. Toivanen, I., T. P. Stefanski, N. Kuster, and N. Chavanne, "Comparison of CPML implementations for the GPU-accelerated FDTD solver," Progress In Electromagnetics Research B, Vol. 19, 61-75, 2011.