Multi-GPU parallel compressible flow solver and its performance analysis

LAI Jianqi; LI Hua; ZHANG Ran; CHANG Qing

doi:10.7527/S1000-6893.2018.21944

ACTA AERONAUTICAET ASTRONAUTICA SINICA >

2018 , Vol. 39 >Issue 9: 121944 - 121953

DOI: https://doi.org/10.7527/S1000-6893.2018.21944

Fluid Mechanics and Flight Mechanics

Multi-GPU parallel compressible flow solver and its performance analysis

LAI Jianqi ,
LI Hua ,
ZHANG Ran ,
CHANG Qing

Expand

College of Aerospace Science and Engineering, National University of Defense Technology, Changsha 410073, China

Received date: 2017-12-19

Revised date: 2018-02-08

Online published: 2018-04-02

Supported by

National Natural Science Foundation of China (11472004)

Fold

Abstract

To achieve efficient numerical solutions for large-scale compressible flow problems, Graphics Processing Units (GPU)-based parallel computing is studied. A multi-GPU parallel compressible flow solver based on Message Passing Interface + Compute Unified Device Architecture (MPI+CUDA)is built on the NVIDIA GTX 1070. This solver is applicable to structured meshes, and an upwind finite volume scheme AUSM⁺UP is used for spatial discretization. A one-dimensional domain decomposition method is used to divide the computational grid into the same size, so as to obtain load balancing among GPUs. According to the case of the supersonic inlet, the parallel performance of single GPU and scalability of multi-GPU are analyzed for this solver. The numerical results show that for single GPU, parallel computing can get a speedup ratio of 37 to 46 times, greatly improving computational efficiency. For four GPUs, the speedup ratio increases from 47 to 143 times and parallel efficiency maintains above 70%, demonstrating good scalability of the solver.

Key words： Graphics Processing Units (GPU); Compute Unified Device Architecture (CUDA); parallel computing; speedup ratio; parallel efficiency

Cite this article

LAI Jianqi , LI Hua , ZHANG Ran , CHANG Qing . Multi-GPU parallel compressible flow solver and its performance analysis[J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2018 , 39(9) : 121944 -121953 . DOI: 10.7527/S1000-6893.2018.21944

References

[1] 张来平, 贺立新, 刘伟, 等. 基于非结构/混合网格的高阶精度格式研究进展[J]. 力学进展, 2013, 43(2):202-236. ZHANG L P, HE L X, LIU W, et al. Reviews of high-order methods on unstructured and hybrid grid[J]. Advances in Mechanics, 2013, 43(2):202-236(in Chinese).
[2] 周铸, 黄江涛, 黄勇, 等. CFD技术在航空工程领域的应用、挑战与发展[J]. 航空学报, 2017, 38(3):1-25. ZHOU Z, HUANG J T, HUANG Y, et al. CFD tech-nology in aeronautic engineering field:Application, challenge and development[J]. Acta Aeronautica et Astronautica Sinica, 2017, 38(3):1-25(in Chinese).
[3] NIEMEYER K E, SUNG C J. Recent progress and challenges in exploiting graphics processors in computational fluid dynamics[J]. Journal of Supercomputing, 2014, 67(2):528-564.
[4] NVIDIA. CUDA C programming guide 8.0[M]. Santa Clara:NVIDIA Corporation, 2017.
[5] FRIEDRICHS M S, EASTMAN P, VAIDYAN-ATHAN V, et al. Accelerating molecular dynamic simulation on graphics processing units[J]. Journal of Computational Chemistry, 2009, 30(6):864-872.
[6] PAULIN M, MAIRAL J, DOUZE M, et al. Convolutional patch representations for image retrieval:An unsupervised approach[J]. International Journal of Computer Vision, 2017, 121(1):149-168.
[7] KHAJEH-SAEED A, PEROT J B. Computational fluid dynamics simulations using many graphics pro-cessors[J]. Computing in Science & Engineering, 2012, 14(3):10-19.
[8] VU V T, CATS G, WOLTERS L. Graphics pro-cessing unit optimizations for the dynamics of the HIRLAM weather forecast model[J]. Concurrency & Computation Practice & Experience, 2013, 25(10):1376-1393.
[9] MIELIKAINEN J, HUANG B, HUANG H L A, et al. Improved GPU/CUDA based parallel weather and re-search forecast (WRF) single moment 5-class (WSM5) cloud microphysics[J]. IEEE Journal of Selected Topics in Applied Earth Observations & Remote Sensing, 2012, 5(4):1256-1265.
[10] BRANDVIK T, PULLAN G. Acceleration of a 3D Euler solver using commodity graphics hardware[C]//46th AIAA Aerospace Sciences Meeting and Exhibit. Reston, VA:AIAA, 2008.
[11] JACOBSEN D A, THIBAULT J C, SENOCAK I. An MPI-CUDA implementation for massively parallel in-compressible flow computations on multi-GPU clus-ters[C]//48th AIAA Aerospace Sciences Meeting Including the New Horizons Forum and Aerospace Exposition. Reston, VA:AIAA, 2010.
[12] CASTONGUAY P, WILLIAMS D M, VINCENT P E, et al. On the development of a high-order, multi-GPU enabled, compressible viscous flow solver for mixed unstructured grids[C]//20th AIAA Computational Fluid Dynamics Conference. Reston, VA:AIAA, 2011.
[13] EMELYANOV V N, KARPENKO A G, KOZELKOV A S, et al. Analysis of impact of general-purpose graphics processor units in supersonic flow modeling[J]. Acta Astronautica, 2017, 135(7):198-207.
[14] WATKINS J, RO MERO J, JAMESON A. Multi-GPU, implicit time stepping for high-order methods on unstructured grids[C]//46th AIAA Fluid Dynamics Conference. Reston, VA:AIAA, 2016.
[15] AISSA M, VERSTRAETE T, VUIK C. Toward a GPU-aware comparison of explicit and implicit CFD simulations on structured meshes[J]. Computers & Mathematics with Applications, 2017, 74(1):201-217.
[16] 宋慎义, 王彦棡, 刘冰, 等. 基于GPU的非结构网格CFD求解器的设计与优化[J]. 科研信息化技术与应用, 2012, 3(1):30-38. SONG S Y, WANG Y G, LIU B, et al. Design and optimization of an unstructured grid CFD solver based on GPU[J]. E-Science Technology & Application, 2012, 3(1):30-38(in Chinese).
[17] XU C F, DENG X G, ZHANG L L, et al. Collaborat-ing CPU and GPU for large-scale high-order CFD simulations with complex grids on the TianHe-1A supercomputer[J]. Journal of Computational Physics, 2014, 278(23):275-297.
[18] CAO W, XU C F, WANG Z H, et al. CPU/GPU com-puting for a multi-block structured grid based high-order flow solver on a large heterogeneous system[J]. Cluster Computing, 2014, 17(2):255-270.
[19] XU C F, ZHANG L L, DENG X G, et al. Balancing CPU-GPU collaborative high-order CFD simulations on the TianHe-1A supercomputer[C]//IEEE 28th International Parallel & Distributed Processing Symposium. Piscataway, NJ:IEEE, 2014:725-734.
[20] Li D L, XU C F, WANG Y, et al. Parallelizing and optimizing large-scale 3D multi-phase flow simulations on the TianHe-2 supercomputer[J]. Concurrency and Computation:Practice and Experience, 2016, 28(5):1678-1692.
[21] MA W P, LU Z H, ZHANG J. GPU parallelization of unstructured/hybrid grid ALE multigrid unsteady solver for moving body problems[J]. Computers & Fluids, 2015, 110(5):122-135.
[22] 刘枫, 李桦, 田正雨, 等. 基于MPI+CUDA的异构并行可压缩流求解器[J]. 国防科技大学学报, 2014, 36(1):6-10. LIU F, LI H, TIAN Z Y, et al. Heterogeneous parallel compressible flow solver based on MPI+CUDA[J]. Journal of National University of Defense Technology, 2014, 36(1):6-10(in Chinese).
[23] 曹文斌, 李桦, 谢文佳, 等. 应用多GPU的可压缩湍流并行计算[J]. 国防科技大学学报, 2015, 37(3):78-83. CAO W B, LI H, XIE W J, et al. Parallel computing of compressible turbulence using multi-GPU clusters[J]. Journal of National University of Defense Technology, 2015, 37(3):78-83(in Chinese).
[24] BLAZEK J. Computational fluid dynamics:Principles and applications[M]. 3rd ed. Amsterdam:Elsevier, 2015:7-25.
[25] LIOU M S. A sequel to AUSM, part Ⅱ:AUSM+up for all speeds[J]. Journal of Computational Physics, 2006, 214(1):137-170.
[26] VAN LEER B. Towards the ultimate conservative difference scheme. V. A second-order sequel to Go-dunov's method[J]. Journal of Computational Physics, 1997, 32(1):101-136.
[27] 阎超. 计算流体力学方法及应用[M]. 北京:北京航空航天大学出版社, 2006:123-131. YAN C. The application of computational fluid dynamics method[M]. Beijing:Beihang University Press, 2006:123-131(in Chinese).
[28] JAMESON A, SCHMIDT W, TURKEL E. Numerical solution of the Euler equations by finite volume methods using Runge-Kutta time stepping schemes[C]//AIAA 14th Fluid and Plasma Dynamics Conference. Reston, VA:AIAA, 1981.
[29] BAGHAPOUR B, MCCALL A, ROY C J. Multilevel parallelism for CFD codes on heterogeneous plat-forms[C]//46th AIAA Fluid Dynamics Conference. Reston, VA:AIAA, 2016.
[30] XIA Y, LOU J, LUO H, et al. OpenACC acceleration of an unstructured CFD solver based on a reconstructed discontinuous Galerkin method for compressible flows[J]. International Journal for Numerical Methods in Fluids, 2015, 78(3):123-139.
[31] NICKOLLS J. Scalable parallel programming with CUDA introduction[J]. Queue, 2008, 6(2):1-9.
[32] 张兵, 韩景龙. 基于GPU和隐式格式的CFD并行计算方法[J]. 航空学报, 2010, 31(2):249-256. ZHANG B, HAN J L. Parallel computing methods for CFD using a GPU and implicit scheme[J]. Acta Aeronautica et Astronautica Sinica, 2010, 31(2):249-256(in Chinese).
[33] GUAN J, YAN S, JIN J M. An OpenMP-CUDA implementation of multilevel fast multipole algorithm for electromagnetic simulation on multi-GPU computing systems[J]. IEEE Transactions on Antennas & Propagation, 2013, 61(7):3607-3616.
[34] YANG C T, HUANG C L, LIN C F. Hybrid CUDA, OpenMP, and MPI parallel programming on multicore GPU clusters[J]. Computer Physics Communications, 2011, 182(1):266-269.
[35] ALONSO P, CORTINA R, MARTINEZ-ZALDIVARF J, et al. Neville elimination on multi and many-core systems:OpenMP, MPI and CUDA[J]. Journal of Supercomputing, 2011, 58(2):215-225.
[36] REINARTZ B U, HERRMANN C D, BALLMANN J, et al. Aerodynamic performance analysis of a hypersonic inlet isolator using computation and experiment[J]. Journal of Propulsion & Power, 2003, 19(5):868-875.

Options

Outlines

模态框（Modal）标题

Abstract

Cite this article

References