Vol. 116
Latest Volume
All Volumes
PIER 180 [2024] PIER 179 [2024] PIER 178 [2023] PIER 177 [2023] PIER 176 [2023] PIER 175 [2022] PIER 174 [2022] PIER 173 [2022] PIER 172 [2021] PIER 171 [2021] PIER 170 [2021] PIER 169 [2020] PIER 168 [2020] PIER 167 [2020] PIER 166 [2019] PIER 165 [2019] PIER 164 [2019] PIER 163 [2018] PIER 162 [2018] PIER 161 [2018] PIER 160 [2017] PIER 159 [2017] PIER 158 [2017] PIER 157 [2016] PIER 156 [2016] PIER 155 [2016] PIER 154 [2015] PIER 153 [2015] PIER 152 [2015] PIER 151 [2015] PIER 150 [2015] PIER 149 [2014] PIER 148 [2014] PIER 147 [2014] PIER 146 [2014] PIER 145 [2014] PIER 144 [2014] PIER 143 [2013] PIER 142 [2013] PIER 141 [2013] PIER 140 [2013] PIER 139 [2013] PIER 138 [2013] PIER 137 [2013] PIER 136 [2013] PIER 135 [2013] PIER 134 [2013] PIER 133 [2013] PIER 132 [2012] PIER 131 [2012] PIER 130 [2012] PIER 129 [2012] PIER 128 [2012] PIER 127 [2012] PIER 126 [2012] PIER 125 [2012] PIER 124 [2012] PIER 123 [2012] PIER 122 [2012] PIER 121 [2011] PIER 120 [2011] PIER 119 [2011] PIER 118 [2011] PIER 117 [2011] PIER 116 [2011] PIER 115 [2011] PIER 114 [2011] PIER 113 [2011] PIER 112 [2011] PIER 111 [2011] PIER 110 [2010] PIER 109 [2010] PIER 108 [2010] PIER 107 [2010] PIER 106 [2010] PIER 105 [2010] PIER 104 [2010] PIER 103 [2010] PIER 102 [2010] PIER 101 [2010] PIER 100 [2010] PIER 99 [2009] PIER 98 [2009] PIER 97 [2009] PIER 96 [2009] PIER 95 [2009] PIER 94 [2009] PIER 93 [2009] PIER 92 [2009] PIER 91 [2009] PIER 90 [2009] PIER 89 [2009] PIER 88 [2008] PIER 87 [2008] PIER 86 [2008] PIER 85 [2008] PIER 84 [2008] PIER 83 [2008] PIER 82 [2008] PIER 81 [2008] PIER 80 [2008] PIER 79 [2008] PIER 78 [2008] PIER 77 [2007] PIER 76 [2007] PIER 75 [2007] PIER 74 [2007] PIER 73 [2007] PIER 72 [2007] PIER 71 [2007] PIER 70 [2007] PIER 69 [2007] PIER 68 [2007] PIER 67 [2007] PIER 66 [2006] PIER 65 [2006] PIER 64 [2006] PIER 63 [2006] PIER 62 [2006] PIER 61 [2006] PIER 60 [2006] PIER 59 [2006] PIER 58 [2006] PIER 57 [2006] PIER 56 [2006] PIER 55 [2005] PIER 54 [2005] PIER 53 [2005] PIER 52 [2005] PIER 51 [2005] PIER 50 [2005] PIER 49 [2004] PIER 48 [2004] PIER 47 [2004] PIER 46 [2004] PIER 45 [2004] PIER 44 [2004] PIER 43 [2003] PIER 42 [2003] PIER 41 [2003] PIER 40 [2003] PIER 39 [2003] PIER 38 [2002] PIER 37 [2002] PIER 36 [2002] PIER 35 [2002] PIER 34 [2001] PIER 33 [2001] PIER 32 [2001] PIER 31 [2001] PIER 30 [2001] PIER 29 [2000] PIER 28 [2000] PIER 27 [2000] PIER 26 [2000] PIER 25 [2000] PIER 24 [1999] PIER 23 [1999] PIER 22 [1999] PIER 21 [1999] PIER 20 [1998] PIER 19 [1998] PIER 18 [1998] PIER 17 [1997] PIER 16 [1997] PIER 15 [1997] PIER 14 [1996] PIER 13 [1996] PIER 12 [1996] PIER 11 [1995] PIER 10 [1995] PIER 09 [1994] PIER 08 [1994] PIER 07 [1993] PIER 06 [1992] PIER 05 [1991] PIER 04 [1991] PIER 03 [1990] PIER 02 [1990] PIER 01 [1989]
2011-04-20
A Memory Efficient and Fast Sparse Matrix Vector Product on a GPU
By
Progress In Electromagnetics Research, Vol. 116, 49-63, 2011
Abstract
This paper proposes a new sparse matrix storage format which allows an efficient implementation of a sparse matrix vector product on a Fermi Graphics Processing Unit (GPU). Unlike previous formats it has both low memory footprint and good throughput. The new format, which we call Sliced ELLR-T has been designed specifically for accelerating the iterative solution of a large sparse and complex-valued system of linear equations arising in computational electromagnetics. Numerical tests have shown that the performance of the new implementation reaches 69 GFLOPS in complex single precision arithmetic. Compared to the optimized six core Central Processing Unit (CPU) (Intel Xeon 5680) this performance implies a speedup by a factor of six. In terms of speed the new format is as fast as the best format published so far and at the same time it does not introduce redundant zero elements which have to be stored to ensure fast memory access. Compared to previously published solutions, significantly larger problems can be handled using low cost commodity GPUs with limited amount of on-board memory.
Citation
Adam Dziekonski, Adam Lamecki, and Michal Mrozowski, "A Memory Efficient and Fast Sparse Matrix Vector Product on a GPU," Progress In Electromagnetics Research, Vol. 116, 49-63, 2011.
doi:10.2528/PIER11031607
References

1. Krakiwsky, S. E., L. E. Turner, and M. Okoniewski, "Acceleration of finite difference time-domain (FDTD) using graphics processor units (GPU) ," IEEE MTT-S International Microwave Symposium Digest 2004, 1033-1036, June 2004.

2. Adams, S., J. Payne, and R. Boppana, "Finite difference time domain (FDTD) simulations using graphics processors," High Performance Computing Modernization Program Users Group Conference, 2007.

3. Sypek, P., A. Dziekonski, and M. Mrozowski, "How to render FDTD computations more effective using a graphics accelerator," IEEE Transactions on Magnetic, Vol. 45, No. 3, 1324-1327, March 2009.
doi:10.1109/TMAG.2009.2012614

4. Xu, K., Z. Fan, D.-Z. Ding, and R.-S. Chen, "GPU accelerated unconditionally stable crank-nicolson FDTD method for the analysis of three-dimensional microwave circuits ," Progress In Electromagnetics Research, Vol. 102, 381-395, 2010.
doi:10.2528/PIER10020606

5. Stefanski, T. P. and T. D. Drysdale, "Acceleration of the 3D ADIFDTD method using graphics processor units," IEEE MTT-S International Microwave Symposium Digest 2009, 241-244, June 2009.
doi:10.1109/MWSYM.2009.5165678

6. Rossi, F. V., P. P. M. So, N. Fichtner, and P. Russer, "Massively parallel two-dimensional TLM algorithm on graphics processing units," IEEE MTT-S International Microwave Symposium Digest 2008, 153-156, June 2008.
doi:10.1109/MWSYM.2008.4633126

7. Rossi, F. and P. P. M. So, "Hardware accelerated symmetric condensed node TLM procedure for NVIDIA graphics processing units," IEEE APSURSI Antennas and Propagation Society International Symposium 2009 , 1-4, June 2009.
doi:10.1109/APS.2009.5171726

8. Tao, Y. B., H. Lin, and H. J. Bao, "From CPU to GPU: GPU-based electromagnetic computing (GPUECO)," Progress In Electromagnetics Research, Vol. 81, 1-19, 2008.
doi:10.2528/PIER07121302

9. Gao, P. C., Y. B. Tao, and H. Lin, "Fast RCS prediction using multiresolution shooting and bouncing ray method on the GPU," Progress In Electromagnetics Research, Vol. 107, 187-202, 2010.
doi:10.2528/PIER10061807

10. Lezar, E. and D. B. Davidson, "GPU-accelerated method of moments by example: Monostatic scattering," IEEE Antennas and Propagation Magazine, Vol. 52, 120-135, 2010.
doi:10.1109/MAP.2010.5723240

11. Garcia-Castillo, L. E., I. Gomez-Revuelto, F. Saez de Adana, and M. Salazar-Palma, "A finite element method for the analysis of radiation and scattering of electromagnetic waves on complex environments," Computer Methods in Applied Mechanics and Engineering , Vol. 194, No. 2-5, 637-655, February 2005.
doi:10.1016/j.cma.2004.05.025

12. Gomez-Revuelto, I., L. E. Garcia-Castillo, D. Pardo, and L. Demkowicz, "A two-dimensional self-adaptive finite element method for the analysis of open region problems in electromagnetics," IEEE Transactions on Magnetics, Vol. 43, No. 4, 1337-1340, April 2007.
doi:10.1109/TMAG.2007.892413

13. Lezar, E. and D. B. Davidson, "GPU-based arnoldi factorisation for accelerating finite element eigenanalysis," Proceedings of the 11th International Conference on Electromagnetics in Advanced Applications --- ICEAA'09, 380-383, September 2009.
doi:10.1109/ICEAA.2009.5297413

14. Jian, L. and K. T. Chau, "Design and analysis of a magnetic-geared electronic-continuously variable transmission system using finite element method," Progress In Electromagnetics Research, Vol. 107, 47-61, 2010.
doi:10.2528/PIER10062806

15. Ping, X. W. and T. J. Cui, "The factorized sparse approximate inverse preconditioned conjugate gradient algorithm for finite element analysis of scattering problems," Progress In Electromagnetics Research, Vol. 98, 15-31, 2009.
doi:10.2528/PIER09071703

16. Tian, J., Z. Q. Lv, X. W. Shi, L. Xu, and F. Wei, "An efficient approach for multifrontal algorithm to solve non-positive-definite finite element equations in electromagnetic problems," Progress In Electromagnetics Research, Vol. 95, 121-133, 2009.
doi:10.2528/PIER09070207

17. Saad, Y., Iterative Methods for Sparse Linear Systems, SIAM, 2004.

18. Velamparambil, S., S. MacKinnon-Cormier, J. Perry, R. Lemos, M. Okoniewski, and J. Leon, "GPU accelerated krylov subspace methods for computational electromagnetics," 38th European Microwave Conference EuMC 2008, 1312-1314, October 27-31, 2008.

19. Cwikla, A., M. Mrozowski, and M. Rewienski, "Finite-difference analysis of a loaded hemispherical resonator," IEEE Transactions on Microwave Theory and Techniques, Vol. 51, No. 5, 1506-1511, May 2003.
doi:10.1109/TMTT.2003.810131

20. Yang, X., "A survey of various conjugate gradient algorithms for iterative solution of the largest/smallest eigenvalue and eigenvector of a symmetric matrix," Progress In Electromagnetics Research, Vol. 5, 567-588, 1991.

21. Bell, N. and M. Garland, "Efficient sparse matrix-vector multiplication on CUDA," NVIDIA Technical Report NVR-2008-004, NVIDIA Corporation, December 2008.

22. Vazquez, F., E. M. Garzon, J. A. Martinez, and J. J. Fernandez, "The sparse matrix vector product on GPUs," Proceedings of the 2009 International Conference on Computational and Mathematical Methods in Science and Engineering, Vol. 2, 1081-1092, July 2009.

23. Monakov, A., A. Lokhmotov, and A. Avetisyan, "Automatically tuning sparse matrix-vector multiplication for GPU architectures," High Performance Embedded Architectures and Compilers, Lecture Notes in Computer Science, Vol. 5952, 111-125, 2010.
doi:10.1007/978-3-642-11515-8_10

24. Vazquez, F., G. Ortega, J. J. Fernandez, and E. M. Garzon, "Improving the performance of the sparse matrix vector product with GPUs," IEEE 10th International Conference on Computer and Information Technology (CIT), 1146-1151, 2010.
doi:10.1109/CIT.2010.208

25. Dziekonski, A., A. Lamecki, and M. Mrozowski, "GPU acceleration of multilevel solvers for analysis of microwave components with finite element method ," IEEE Microwave and Wireless Components Letters, Vol. 21, No. 1, January 1-3, 2011.

26. Kirk, D. B. and W. W. Hwu, Programming Massively Parallel Processors: A Hands-on Approach, Elsevier Inc., 2010.

27. Sanders, J. and E. Kandrot, CUDA by Example: An Introduction to General-Purpose GPU Programming, Nvidia Corporation, 2011.

28. Programming Guide Version 3.2, Nvidia Corporation, 2011.

29. http://www.nvidia.com/object/fermi architecture.html.

30. CUDA CUSPARSE Library, Nvidia Corporation, 2011.

31. Lee, V. W., C. Kim, J. Chhugani, M. Deisher, D. Kim, A. D. Nguyen, N. Satish, M. Smelyanskiy, S. Chennupaty, P. Hammarlund, R. Singhal, and P. Dubey, "Debunking the 100X GPU vs. CPU myth: An evaluation of throughput computing on CPU and GPU ," ACM SIGARCH Computer Architecture News --- ISCA'10, Vol. 38, June 2010.

32. "http://software.intel.com/en-us/articles/intel-math-kernel-library-intel-mkl-intel-mkl-100-threading/#5,".

33. Kucharski, A. and P. Slobodzian, "The application of macromodels to the analysis of a dielectric resonator antenna excited by a cavity backed slot ," 38th European Microwave Conference, EuMC 2008, 519-522, October 27-31, 2008.