Vol. 116

Front:[PDF file] Back:[PDF file]
Latest Volume
All Volumes
All Issues

A Memory Efficient and Fast Sparse Matrix Vector Product on a GPU

By Adam Dziekonski, Adam Lamecki, and Michal Mrozowski
Progress In Electromagnetics Research, Vol. 116, 49-63, 2011


This paper proposes a new sparse matrix storage format which allows an efficient implementation of a sparse matrix vector product on a Fermi Graphics Processing Unit (GPU). Unlike previous formats it has both low memory footprint and good throughput. The new format, which we call Sliced ELLR-T has been designed specifically for accelerating the iterative solution of a large sparse and complex-valued system of linear equations arising in computational electromagnetics. Numerical tests have shown that the performance of the new implementation reaches 69 GFLOPS in complex single precision arithmetic. Compared to the optimized six core Central Processing Unit (CPU) (Intel Xeon 5680) this performance implies a speedup by a factor of six. In terms of speed the new format is as fast as the best format published so far and at the same time it does not introduce redundant zero elements which have to be stored to ensure fast memory access. Compared to previously published solutions, significantly larger problems can be handled using low cost commodity GPUs with limited amount of on-board memory.


Adam Dziekonski, Adam Lamecki, and Michal Mrozowski, "A Memory Efficient and Fast Sparse Matrix Vector Product on a GPU," Progress In Electromagnetics Research, Vol. 116, 49-63, 2011.


    1. Krakiwsky, S. E., L. E. Turner, and M. Okoniewski, "Acceleration of finite difference time-domain (FDTD) using graphics processor units (GPU) ," IEEE MTT-S International Microwave Symposium Digest 2004, 1033-1036, June 2004.

    2. Adams, S., J. Payne, and R. Boppana, Finite difference time domain (FDTD) simulations using graphics processors, High Performance Computing Modernization Program Users Group Conference, 2007.

    3. Sypek, P., A. Dziekonski, and M. Mrozowski, "How to render FDTD computations more effective using a graphics accelerator," IEEE Transactions on Magnetic, Vol. 45, No. 3, 1324-1327, March 2009.

    4. Xu, K., Z. Fan, D.-Z. Ding, and R.-S. Chen, "GPU accelerated unconditionally stable crank-nicolson FDTD method for the analysis of three-dimensional microwave circuits ," Progress In Electromagnetics Research, Vol. 102, 381-395, 2010.

    5. Stefanski, T. P. and T. D. Drysdale, "Acceleration of the 3D ADIFDTD method using graphics processor units," IEEE MTT-S International Microwave Symposium Digest 2009, 241-244, June 2009.

    6. Rossi, F. V., P. P. M. So, N. Fichtner, and P. Russer, "Massively parallel two-dimensional TLM algorithm on graphics processing units," IEEE MTT-S International Microwave Symposium Digest 2008, 153-156, June 2008.

    7. Rossi, F. and P. P. M. So, "Hardware accelerated symmetric condensed node TLM procedure for NVIDIA graphics processing units," IEEE APSURSI Antennas and Propagation Society International Symposium 2009 , 1-4, June 2009.

    8. Tao, Y. B., H. Lin, and H. J. Bao, "From CPU to GPU: GPU-based electromagnetic computing (GPUECO)," Progress In Electromagnetics Research, Vol. 81, 1-19, 2008.

    9. Gao, P. C., Y. B. Tao, and H. Lin, "Fast RCS prediction using multiresolution shooting and bouncing ray method on the GPU," Progress In Electromagnetics Research, Vol. 107, 187-202, 2010.

    10. Lezar, E. and D. B. Davidson, "GPU-accelerated method of moments by example: Monostatic scattering," IEEE Antennas and Propagation Magazine, Vol. 52, 120-135, 2010.

    11. Garcia-Castillo, L. E., I. Gomez-Revuelto, F. Saez de Adana, and M. Salazar-Palma, "A finite element method for the analysis of radiation and scattering of electromagnetic waves on complex environments," Computer Methods in Applied Mechanics and Engineering , Vol. 194, No. 2-5, 637-655, February 2005.

    12. Gomez-Revuelto, I., L. E. Garcia-Castillo, D. Pardo, and L. Demkowicz, "A two-dimensional self-adaptive finite element method for the analysis of open region problems in electromagnetics," IEEE Transactions on Magnetics, Vol. 43, No. 4, 1337-1340, April 2007.

    13. Lezar, E. and D. B. Davidson, "GPU-based arnoldi factorisation for accelerating finite element eigenanalysis," Proceedings of the 11th International Conference on Electromagnetics in Advanced Applications --- ICEAA'09, 380-383, September 2009.

    14. Jian, L. and K. T. Chau, "Design and analysis of a magnetic-geared electronic-continuously variable transmission system using finite element method," Progress In Electromagnetics Research, Vol. 107, 47-61, 2010.

    15. Ping, X. W. and T. J. Cui, "The factorized sparse approximate inverse preconditioned conjugate gradient algorithm for finite element analysis of scattering problems," Progress In Electromagnetics Research, Vol. 98, 15-31, 2009.

    16. Tian, J., Z. Q. Lv, X. W. Shi, L. Xu, and F. Wei, "An efficient approach for multifrontal algorithm to solve non-positive-definite finite element equations in electromagnetic problems," Progress In Electromagnetics Research, Vol. 95, 121-133, 2009.

    17. Saad, Y., Iterative Methods for Sparse Linear Systems, SIAM, 2004.

    18. Velamparambil, S., S. MacKinnon-Cormier, J. Perry, R. Lemos, M. Okoniewski, and J. Leon, "GPU accelerated krylov subspace methods for computational electromagnetics," 38th European Microwave Conference EuMC 2008, 1312-1314, October 27-31, 2008.

    19. Cwikla, A., M. Mrozowski, and M. Rewienski, "Finite-difference analysis of a loaded hemispherical resonator," IEEE Transactions on Microwave Theory and Techniques, Vol. 51, No. 5, 1506-1511, May 2003.

    20. Yang, X., "A survey of various conjugate gradient algorithms for iterative solution of the largest/smallest eigenvalue and eigenvector of a symmetric matrix," Progress In Electromagnetics Research, Vol. 5, 567-588, 1991.

    21. Bell, N. and M. Garland, "Efficient sparse matrix-vector multiplication on CUDA," NVIDIA Technical Report NVR-2008-004, NVIDIA Corporation, December 2008.

    22. Vazquez, F., E. M. Garzon, J. A. Martinez, and J. J. Fernandez, "The sparse matrix vector product on GPUs," Proceedings of the 2009 International Conference on Computational and Mathematical Methods in Science and Engineering, Vol. 2, 1081-1092, July 2009.

    23. Monakov, A., A. Lokhmotov, and A. Avetisyan, "Automatically tuning sparse matrix-vector multiplication for GPU architectures," High Performance Embedded Architectures and Compilers, Lecture Notes in Computer Science, Vol. 5952, 111-125, 2010.

    24. Vazquez, F., G. Ortega, J. J. Fernandez, and E. M. Garzon, "Improving the performance of the sparse matrix vector product with GPUs," IEEE 10th International Conference on Computer and Information Technology (CIT), 1146-1151, 2010.

    25. Dziekonski, A., A. Lamecki, and M. Mrozowski, "GPU acceleration of multilevel solvers for analysis of microwave components with finite element method ," IEEE Microwave and Wireless Components Letters, Vol. 21, No. 1, January 1-3, 2011.

    26. Kirk, D. B. and W. W. Hwu, Programming Massively Parallel Processors: A Hands-on Approach, Elsevier Inc., 2010.

    27. Sanders, J. and E. Kandrot, CUDA by Example: An Introduction to General-Purpose GPU Programming, Nvidia Corporation, 2011.

    28. Programming Guide Version 3.2, Nvidia Corporation, 2011.

    29. http://www.nvidia.com/object/fermi architecture.html,.

    30. CUDA CUSPARSE Library, Nvidia Corporation, 2011,.

    31. Lee, V. W., C. Kim, J. Chhugani, M. Deisher, D. Kim, A. D. Nguyen, N. Satish, M. Smelyanskiy, S. Chennupaty, P. Hammarlund, R. Singhal, and P. Dubey, "Debunking the 100X GPU vs. CPU myth: An evaluation of throughput computing on CPU and GPU ," ACM SIGARCH Computer Architecture News --- ISCA'10, Vol. 38, June 2010.

    32., "http://software.intel.com/en-us/articles/intel-math-kernel-library-intel-mkl-intel-mkl-100-threading/#5,".

    33. Kucharski, A. and P. Slobodzian, "The application of macromodels to the analysis of a dielectric resonator antenna excited by a cavity backed slot ," 38th European Microwave Conference, EuMC 2008, 519-522, October 27-31, 2008.