流体力学与飞行力学

多GPU并行可压缩流求解器及其性能分析

  • 赖剑奇 ,
  • 李桦 ,
  • 张冉 ,
  • 常青
展开
  • 国防科技大学 空天科学学院, 长沙 410073

收稿日期: 2017-12-19

  修回日期: 2018-02-08

  网络出版日期: 2018-04-02

基金资助

国家自然科学基金(11472004)

Multi-GPU parallel compressible flow solver and its performance analysis

  • LAI Jianqi ,
  • LI Hua ,
  • ZHANG Ran ,
  • CHANG Qing
Expand
  • College of Aerospace Science and Engineering, National University of Defense Technology, Changsha 410073, China

Received date: 2017-12-19

  Revised date: 2018-02-08

  Online published: 2018-04-02

Supported by

National Natural Science Foundation of China (11472004)

摘要

为实现可压缩流问题的大规模高效数值求解,开展基于图形处理单元(GPU)的并行计算研究。在NVIDIA GTX 1070上建立了基于消息传递接口+统一计算设备架构(MPI+CUDA)的多GPU并行可压缩流求解器,该求解器基于结构网格有限体积法,空间离散采用AUSM+UP格式。采用一维区域分解法对计算网格进行划分,使得各GPU之间达到负载平衡。针对超声速进气道算例,对算法单GPU并行性能和多GPU可扩展性能进行分析。数值结果显示,单GPU并行计算可以获得37~46倍的加速比,极大地提高了计算效率;4块GPU并行计算加速比从47倍增加到143倍,并行效率维持在70%以上,说明并行算法具有良好的可扩展性。

本文引用格式

赖剑奇 , 李桦 , 张冉 , 常青 . 多GPU并行可压缩流求解器及其性能分析[J]. 航空学报, 2018 , 39(9) : 121944 -121953 . DOI: 10.7527/S1000-6893.2018.21944

Abstract

To achieve efficient numerical solutions for large-scale compressible flow problems, Graphics Processing Units (GPU)-based parallel computing is studied. A multi-GPU parallel compressible flow solver based on Message Passing Interface + Compute Unified Device Architecture (MPI+CUDA)is built on the NVIDIA GTX 1070. This solver is applicable to structured meshes, and an upwind finite volume scheme AUSM+UP is used for spatial discretization. A one-dimensional domain decomposition method is used to divide the computational grid into the same size, so as to obtain load balancing among GPUs. According to the case of the supersonic inlet, the parallel performance of single GPU and scalability of multi-GPU are analyzed for this solver. The numerical results show that for single GPU, parallel computing can get a speedup ratio of 37 to 46 times, greatly improving computational efficiency. For four GPUs, the speedup ratio increases from 47 to 143 times and parallel efficiency maintains above 70%, demonstrating good scalability of the solver.

参考文献

[1] 张来平, 贺立新, 刘伟, 等. 基于非结构/混合网格的高阶精度格式研究进展[J]. 力学进展, 2013, 43(2):202-236. ZHANG L P, HE L X, LIU W, et al. Reviews of high-order methods on unstructured and hybrid grid[J]. Advances in Mechanics, 2013, 43(2):202-236(in Chinese).
[2] 周铸, 黄江涛, 黄勇, 等. CFD技术在航空工程领域的应用、挑战与发展[J]. 航空学报, 2017, 38(3):1-25. ZHOU Z, HUANG J T, HUANG Y, et al. CFD tech-nology in aeronautic engineering field:Application, challenge and development[J]. Acta Aeronautica et Astronautica Sinica, 2017, 38(3):1-25(in Chinese).
[3] NIEMEYER K E, SUNG C J. Recent progress and challenges in exploiting graphics processors in computational fluid dynamics[J]. Journal of Supercomputing, 2014, 67(2):528-564.
[4] NVIDIA. CUDA C programming guide 8.0[M]. Santa Clara:NVIDIA Corporation, 2017.
[5] FRIEDRICHS M S, EASTMAN P, VAIDYAN-ATHAN V, et al. Accelerating molecular dynamic simulation on graphics processing units[J]. Journal of Computational Chemistry, 2009, 30(6):864-872.
[6] PAULIN M, MAIRAL J, DOUZE M, et al. Convolutional patch representations for image retrieval:An unsupervised approach[J]. International Journal of Computer Vision, 2017, 121(1):149-168.
[7] KHAJEH-SAEED A, PEROT J B. Computational fluid dynamics simulations using many graphics pro-cessors[J]. Computing in Science & Engineering, 2012, 14(3):10-19.
[8] VU V T, CATS G, WOLTERS L. Graphics pro-cessing unit optimizations for the dynamics of the HIRLAM weather forecast model[J]. Concurrency & Computation Practice & Experience, 2013, 25(10):1376-1393.
[9] MIELIKAINEN J, HUANG B, HUANG H L A, et al. Improved GPU/CUDA based parallel weather and re-search forecast (WRF) single moment 5-class (WSM5) cloud microphysics[J]. IEEE Journal of Selected Topics in Applied Earth Observations & Remote Sensing, 2012, 5(4):1256-1265.
[10] BRANDVIK T, PULLAN G. Acceleration of a 3D Euler solver using commodity graphics hardware[C]//46th AIAA Aerospace Sciences Meeting and Exhibit. Reston, VA:AIAA, 2008.
[11] JACOBSEN D A, THIBAULT J C, SENOCAK I. An MPI-CUDA implementation for massively parallel in-compressible flow computations on multi-GPU clus-ters[C]//48th AIAA Aerospace Sciences Meeting Including the New Horizons Forum and Aerospace Exposition. Reston, VA:AIAA, 2010.
[12] CASTONGUAY P, WILLIAMS D M, VINCENT P E, et al. On the development of a high-order, multi-GPU enabled, compressible viscous flow solver for mixed unstructured grids[C]//20th AIAA Computational Fluid Dynamics Conference. Reston, VA:AIAA, 2011.
[13] EMELYANOV V N, KARPENKO A G, KOZELKOV A S, et al. Analysis of impact of general-purpose graphics processor units in supersonic flow modeling[J]. Acta Astronautica, 2017, 135(7):198-207.
[14] WATKINS J, RO MERO J, JAMESON A. Multi-GPU, implicit time stepping for high-order methods on unstructured grids[C]//46th AIAA Fluid Dynamics Conference. Reston, VA:AIAA, 2016.
[15] AISSA M, VERSTRAETE T, VUIK C. Toward a GPU-aware comparison of explicit and implicit CFD simulations on structured meshes[J]. Computers & Mathematics with Applications, 2017, 74(1):201-217.
[16] 宋慎义, 王彦棡, 刘冰, 等. 基于GPU的非结构网格CFD求解器的设计与优化[J]. 科研信息化技术与应用, 2012, 3(1):30-38. SONG S Y, WANG Y G, LIU B, et al. Design and optimization of an unstructured grid CFD solver based on GPU[J]. E-Science Technology & Application, 2012, 3(1):30-38(in Chinese).
[17] XU C F, DENG X G, ZHANG L L, et al. Collaborat-ing CPU and GPU for large-scale high-order CFD simulations with complex grids on the TianHe-1A supercomputer[J]. Journal of Computational Physics, 2014, 278(23):275-297.
[18] CAO W, XU C F, WANG Z H, et al. CPU/GPU com-puting for a multi-block structured grid based high-order flow solver on a large heterogeneous system[J]. Cluster Computing, 2014, 17(2):255-270.
[19] XU C F, ZHANG L L, DENG X G, et al. Balancing CPU-GPU collaborative high-order CFD simulations on the TianHe-1A supercomputer[C]//IEEE 28th International Parallel & Distributed Processing Symposium. Piscataway, NJ:IEEE, 2014:725-734.
[20] Li D L, XU C F, WANG Y, et al. Parallelizing and optimizing large-scale 3D multi-phase flow simulations on the TianHe-2 supercomputer[J]. Concurrency and Computation:Practice and Experience, 2016, 28(5):1678-1692.
[21] MA W P, LU Z H, ZHANG J. GPU parallelization of unstructured/hybrid grid ALE multigrid unsteady solver for moving body problems[J]. Computers & Fluids, 2015, 110(5):122-135.
[22] 刘枫, 李桦, 田正雨, 等. 基于MPI+CUDA的异构并行可压缩流求解器[J]. 国防科技大学学报, 2014, 36(1):6-10. LIU F, LI H, TIAN Z Y, et al. Heterogeneous parallel compressible flow solver based on MPI+CUDA[J]. Journal of National University of Defense Technology, 2014, 36(1):6-10(in Chinese).
[23] 曹文斌, 李桦, 谢文佳, 等. 应用多GPU的可压缩湍流并行计算[J]. 国防科技大学学报, 2015, 37(3):78-83. CAO W B, LI H, XIE W J, et al. Parallel computing of compressible turbulence using multi-GPU clusters[J]. Journal of National University of Defense Technology, 2015, 37(3):78-83(in Chinese).
[24] BLAZEK J. Computational fluid dynamics:Principles and applications[M]. 3rd ed. Amsterdam:Elsevier, 2015:7-25.
[25] LIOU M S. A sequel to AUSM, part Ⅱ:AUSM+up for all speeds[J]. Journal of Computational Physics, 2006, 214(1):137-170.
[26] VAN LEER B. Towards the ultimate conservative difference scheme. V. A second-order sequel to Go-dunov's method[J]. Journal of Computational Physics, 1997, 32(1):101-136.
[27] 阎超. 计算流体力学方法及应用[M]. 北京:北京航空航天大学出版社, 2006:123-131. YAN C. The application of computational fluid dynamics method[M]. Beijing:Beihang University Press, 2006:123-131(in Chinese).
[28] JAMESON A, SCHMIDT W, TURKEL E. Numerical solution of the Euler equations by finite volume methods using Runge-Kutta time stepping schemes[C]//AIAA 14th Fluid and Plasma Dynamics Conference. Reston, VA:AIAA, 1981.
[29] BAGHAPOUR B, MCCALL A, ROY C J. Multilevel parallelism for CFD codes on heterogeneous plat-forms[C]//46th AIAA Fluid Dynamics Conference. Reston, VA:AIAA, 2016.
[30] XIA Y, LOU J, LUO H, et al. OpenACC acceleration of an unstructured CFD solver based on a reconstructed discontinuous Galerkin method for compressible flows[J]. International Journal for Numerical Methods in Fluids, 2015, 78(3):123-139.
[31] NICKOLLS J. Scalable parallel programming with CUDA introduction[J]. Queue, 2008, 6(2):1-9.
[32] 张兵, 韩景龙. 基于GPU和隐式格式的CFD并行计算方法[J]. 航空学报, 2010, 31(2):249-256. ZHANG B, HAN J L. Parallel computing methods for CFD using a GPU and implicit scheme[J]. Acta Aeronautica et Astronautica Sinica, 2010, 31(2):249-256(in Chinese).
[33] GUAN J, YAN S, JIN J M. An OpenMP-CUDA implementation of multilevel fast multipole algorithm for electromagnetic simulation on multi-GPU computing systems[J]. IEEE Transactions on Antennas & Propagation, 2013, 61(7):3607-3616.
[34] YANG C T, HUANG C L, LIN C F. Hybrid CUDA, OpenMP, and MPI parallel programming on multicore GPU clusters[J]. Computer Physics Communications, 2011, 182(1):266-269.
[35] ALONSO P, CORTINA R, MARTINEZ-ZALDIVARF J, et al. Neville elimination on multi and many-core systems:OpenMP, MPI and CUDA[J]. Journal of Supercomputing, 2011, 58(2):215-225.
[36] REINARTZ B U, HERRMANN C D, BALLMANN J, et al. Aerodynamic performance analysis of a hypersonic inlet isolator using computation and experiment[J]. Journal of Propulsion & Power, 2003, 19(5):868-875.
文章导航

/