多GPU并行可压缩流求解器及其性能分析

doi:10.7527/S1000-6893.2018.21944

流体力学与飞行力学

本期目录 | 过刊浏览 | 高级检索

前一篇 | 后一篇

多GPU并行可压缩流求解器及其性能分析

赖剑奇, 李桦, 张冉, 常青

国防科技大学空天科学学院, 长沙 410073

收稿日期:2017-12-19 修回日期:2018-02-08 出版日期:2018-09-15 发布日期:2018-04-02
通讯作者: 李桦 E-mail:lihuakd@tom.com
基金资助:
国家自然科学基金（11472004）

Multi-GPU parallel compressible flow solver and its performance analysis

LAI Jianqi, LI Hua, ZHANG Ran, CHANG Qing

College of Aerospace Science and Engineering, National University of Defense Technology, Changsha 410073, China

Received:2017-12-19 Revised:2018-02-08 Online:2018-09-15 Published:2018-04-02
Supported by:
National Natural Science Foundation of China (11472004)

摘要/Abstract

摘要： 为实现可压缩流问题的大规模高效数值求解，开展基于图形处理单元（GPU）的并行计算研究。在NVIDIA GTX 1070上建立了基于消息传递接口+统一计算设备架构（MPI+CUDA）的多GPU并行可压缩流求解器，该求解器基于结构网格有限体积法，空间离散采用AUSM⁺UP格式。采用一维区域分解法对计算网格进行划分，使得各GPU之间达到负载平衡。针对超声速进气道算例，对算法单GPU并行性能和多GPU可扩展性能进行分析。数值结果显示，单GPU并行计算可以获得37~46倍的加速比，极大地提高了计算效率；4块GPU并行计算加速比从47倍增加到143倍，并行效率维持在70%以上，说明并行算法具有良好的可扩展性。

关键词: 图形处理单元(GPU), 统一计算设备架构(CUDA), 并行计算, 加速比, 并行效率

Abstract: To achieve efficient numerical solutions for large-scale compressible flow problems, Graphics Processing Units (GPU)-based parallel computing is studied. A multi-GPU parallel compressible flow solver based on Message Passing Interface + Compute Unified Device Architecture (MPI+CUDA)is built on the NVIDIA GTX 1070. This solver is applicable to structured meshes, and an upwind finite volume scheme AUSM⁺UP is used for spatial discretization. A one-dimensional domain decomposition method is used to divide the computational grid into the same size, so as to obtain load balancing among GPUs. According to the case of the supersonic inlet, the parallel performance of single GPU and scalability of multi-GPU are analyzed for this solver. The numerical results show that for single GPU, parallel computing can get a speedup ratio of 37 to 46 times, greatly improving computational efficiency. For four GPUs, the speedup ratio increases from 47 to 143 times and parallel efficiency maintains above 70%, demonstrating good scalability of the solver.

Key words: Graphics Processing Units (GPU), Compute Unified Device Architecture (CUDA), parallel computing, speedup ratio, parallel efficiency

中图分类号:

V211.3

赖剑奇, 李桦, 张冉, 常青. 多GPU并行可压缩流求解器及其性能分析[J]. 航空学报, 2018, 39(9): 121944-121953.

LAI Jianqi, LI Hua, ZHANG Ran, CHANG Qing. Multi-GPU parallel compressible flow solver and its performance analysis[J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2018, 39(9): 121944-121953.

参考文献

[1] 张来平, 贺立新, 刘伟, 等. 基于非结构/混合网格的高阶精度格式研究进展[J]. 力学进展, 2013, 43(2):202-236. ZHANG L P, HE L X, LIU W, et al. Reviews of high-order methods on unstructured and hybrid grid[J]. Advances in Mechanics, 2013, 43(2):202-236(in Chinese).
[2] 周铸, 黄江涛, 黄勇, 等. CFD技术在航空工程领域的应用、挑战与发展[J]. 航空学报, 2017, 38(3):1-25. ZHOU Z, HUANG J T, HUANG Y, et al. CFD tech-nology in aeronautic engineering field:Application, challenge and development[J]. Acta Aeronautica et Astronautica Sinica, 2017, 38(3):1-25(in Chinese).
[3] NIEMEYER K E, SUNG C J. Recent progress and challenges in exploiting graphics processors in computational fluid dynamics[J]. Journal of Supercomputing, 2014, 67(2):528-564.
[4] NVIDIA. CUDA C programming guide 8.0[M]. Santa Clara:NVIDIA Corporation, 2017.
[5] FRIEDRICHS M S, EASTMAN P, VAIDYAN-ATHAN V, et al. Accelerating molecular dynamic simulation on graphics processing units[J]. Journal of Computational Chemistry, 2009, 30(6):864-872.
[6] PAULIN M, MAIRAL J, DOUZE M, et al. Convolutional patch representations for image retrieval:An unsupervised approach[J]. International Journal of Computer Vision, 2017, 121(1):149-168.
[7] KHAJEH-SAEED A, PEROT J B. Computational fluid dynamics simulations using many graphics pro-cessors[J]. Computing in Science & Engineering, 2012, 14(3):10-19.
[8] VU V T, CATS G, WOLTERS L. Graphics pro-cessing unit optimizations for the dynamics of the HIRLAM weather forecast model[J]. Concurrency & Computation Practice & Experience, 2013, 25(10):1376-1393.
[9] MIELIKAINEN J, HUANG B, HUANG H L A, et al. Improved GPU/CUDA based parallel weather and re-search forecast (WRF) single moment 5-class (WSM5) cloud microphysics[J]. IEEE Journal of Selected Topics in Applied Earth Observations & Remote Sensing, 2012, 5(4):1256-1265.
[10] BRANDVIK T, PULLAN G. Acceleration of a 3D Euler solver using commodity graphics hardware[C]//46th AIAA Aerospace Sciences Meeting and Exhibit. Reston, VA:AIAA, 2008.
[11] JACOBSEN D A, THIBAULT J C, SENOCAK I. An MPI-CUDA implementation for massively parallel in-compressible flow computations on multi-GPU clus-ters[C]//48th AIAA Aerospace Sciences Meeting Including the New Horizons Forum and Aerospace Exposition. Reston, VA:AIAA, 2010.
[12] CASTONGUAY P, WILLIAMS D M, VINCENT P E, et al. On the development of a high-order, multi-GPU enabled, compressible viscous flow solver for mixed unstructured grids[C]//20th AIAA Computational Fluid Dynamics Conference. Reston, VA:AIAA, 2011.
[13] EMELYANOV V N, KARPENKO A G, KOZELKOV A S, et al. Analysis of impact of general-purpose graphics processor units in supersonic flow modeling[J]. Acta Astronautica, 2017, 135(7):198-207.
[14] WATKINS J, RO MERO J, JAMESON A. Multi-GPU, implicit time stepping for high-order methods on unstructured grids[C]//46th AIAA Fluid Dynamics Conference. Reston, VA:AIAA, 2016.
[15] AISSA M, VERSTRAETE T, VUIK C. Toward a GPU-aware comparison of explicit and implicit CFD simulations on structured meshes[J]. Computers & Mathematics with Applications, 2017, 74(1):201-217.
[16] 宋慎义, 王彦棡, 刘冰, 等. 基于GPU的非结构网格CFD求解器的设计与优化[J]. 科研信息化技术与应用, 2012, 3(1):30-38. SONG S Y, WANG Y G, LIU B, et al. Design and optimization of an unstructured grid CFD solver based on GPU[J]. E-Science Technology & Application, 2012, 3(1):30-38(in Chinese).
[17] XU C F, DENG X G, ZHANG L L, et al. Collaborat-ing CPU and GPU for large-scale high-order CFD simulations with complex grids on the TianHe-1A supercomputer[J]. Journal of Computational Physics, 2014, 278(23):275-297.
[18] CAO W, XU C F, WANG Z H, et al. CPU/GPU com-puting for a multi-block structured grid based high-order flow solver on a large heterogeneous system[J]. Cluster Computing, 2014, 17(2):255-270.
[19] XU C F, ZHANG L L, DENG X G, et al. Balancing CPU-GPU collaborative high-order CFD simulations on the TianHe-1A supercomputer[C]//IEEE 28th International Parallel & Distributed Processing Symposium. Piscataway, NJ:IEEE, 2014:725-734.
[20] Li D L, XU C F, WANG Y, et al. Parallelizing and optimizing large-scale 3D multi-phase flow simulations on the TianHe-2 supercomputer[J]. Concurrency and Computation:Practice and Experience, 2016, 28(5):1678-1692.
[21] MA W P, LU Z H, ZHANG J. GPU parallelization of unstructured/hybrid grid ALE multigrid unsteady solver for moving body problems[J]. Computers & Fluids, 2015, 110(5):122-135.
[22] 刘枫, 李桦, 田正雨, 等. 基于MPI+CUDA的异构并行可压缩流求解器[J]. 国防科技大学学报, 2014, 36(1):6-10. LIU F, LI H, TIAN Z Y, et al. Heterogeneous parallel compressible flow solver based on MPI+CUDA[J]. Journal of National University of Defense Technology, 2014, 36(1):6-10(in Chinese).
[23] 曹文斌, 李桦, 谢文佳, 等. 应用多GPU的可压缩湍流并行计算[J]. 国防科技大学学报, 2015, 37(3):78-83. CAO W B, LI H, XIE W J, et al. Parallel computing of compressible turbulence using multi-GPU clusters[J]. Journal of National University of Defense Technology, 2015, 37(3):78-83(in Chinese).
[24] BLAZEK J. Computational fluid dynamics:Principles and applications[M]. 3rd ed. Amsterdam:Elsevier, 2015:7-25.
[25] LIOU M S. A sequel to AUSM, part Ⅱ:AUSM+up for all speeds[J]. Journal of Computational Physics, 2006, 214(1):137-170.
[26] VAN LEER B. Towards the ultimate conservative difference scheme. V. A second-order sequel to Go-dunov's method[J]. Journal of Computational Physics, 1997, 32(1):101-136.
[27] 阎超. 计算流体力学方法及应用[M]. 北京:北京航空航天大学出版社, 2006:123-131. YAN C. The application of computational fluid dynamics method[M]. Beijing:Beihang University Press, 2006:123-131(in Chinese).
[28] JAMESON A, SCHMIDT W, TURKEL E. Numerical solution of the Euler equations by finite volume methods using Runge-Kutta time stepping schemes[C]//AIAA 14th Fluid and Plasma Dynamics Conference. Reston, VA:AIAA, 1981.
[29] BAGHAPOUR B, MCCALL A, ROY C J. Multilevel parallelism for CFD codes on heterogeneous plat-forms[C]//46th AIAA Fluid Dynamics Conference. Reston, VA:AIAA, 2016.
[30] XIA Y, LOU J, LUO H, et al. OpenACC acceleration of an unstructured CFD solver based on a reconstructed discontinuous Galerkin method for compressible flows[J]. International Journal for Numerical Methods in Fluids, 2015, 78(3):123-139.
[31] NICKOLLS J. Scalable parallel programming with CUDA introduction[J]. Queue, 2008, 6(2):1-9.
[32] 张兵, 韩景龙. 基于GPU和隐式格式的CFD并行计算方法[J]. 航空学报, 2010, 31(2):249-256. ZHANG B, HAN J L. Parallel computing methods for CFD using a GPU and implicit scheme[J]. Acta Aeronautica et Astronautica Sinica, 2010, 31(2):249-256(in Chinese).
[33] GUAN J, YAN S, JIN J M. An OpenMP-CUDA implementation of multilevel fast multipole algorithm for electromagnetic simulation on multi-GPU computing systems[J]. IEEE Transactions on Antennas & Propagation, 2013, 61(7):3607-3616.
[34] YANG C T, HUANG C L, LIN C F. Hybrid CUDA, OpenMP, and MPI parallel programming on multicore GPU clusters[J]. Computer Physics Communications, 2011, 182(1):266-269.
[35] ALONSO P, CORTINA R, MARTINEZ-ZALDIVARF J, et al. Neville elimination on multi and many-core systems:OpenMP, MPI and CUDA[J]. Journal of Supercomputing, 2011, 58(2):215-225.
[36] REINARTZ B U, HERRMANN C D, BALLMANN J, et al. Aerodynamic performance analysis of a hypersonic inlet isolator using computation and experiment[J]. Journal of Propulsion & Power, 2003, 19(5):868-875.

E-mail：hkxb@buaa.edu.cn

关于我们

期刊社服务

专业学科

封面文章

友情链接

主管单位：中国科学技术协会主办单位：中国航空学会北京航空航天大学

多GPU并行可压缩流求解器及其性能分析

Multi-GPU parallel compressible flow solver and its performance analysis

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

[1]	陈坚强, 吴晓军, 张健, 李彬, 贾洪印, 周乃春. FlowStar:国家数值风洞(NNW)工程非结构通用CFD软件[J]. 航空学报, 2021, 42(9): 625739-625739.
[2]	刘君, 魏雁昕, 陈洁. 基于非结构网格有限差分法的扎染算法[J]. 航空学报, 2021, 42(7): 124557-124557.
[3]	王年华, 常兴华, 赵钟, 张来平. 非结构CFD软件MPI+OpenMP混合并行及超大规模非定常并行计算的应用[J]. 航空学报, 2020, 41(10): 123859-123859.
[4]	唐静, 张健, 李彬, 崔鹏程, 周乃春. 非结构混合网格自适应并行技术[J]. 航空学报, 2020, 41(1): 123202-123202.
[5]	黄江涛, 张绎典, 高正红, 余婧, 周铸, 余雷. 基于流场/声爆耦合伴随方程的超声速公务机声爆优化[J]. 航空学报, 2019, 40(5): 122505-122505.
[6]	邱滋华, 徐敏, 张斌, 梁春雷. 适用于涡激振荡问题研究的并行高精度方法[J]. 航空学报, 2019, 40(3): 122483-122483.
[7]	董军, 叶靓. DDES方法在复杂旋翼流场计算中的应用[J]. 航空学报, 2018, 39(6): 121689-121689.
[8]	黄江涛, 周铸, 刘刚, 高正红, 黄勇, 王运涛. 飞行器气动/结构多学科延迟耦合伴随系统数值研究[J]. 航空学报, 2018, 39(5): 121731-121731.
[9]	崔鹏程, 唐静, 李彬, 马明生, 邓有奇. 基于超网格的重叠网格守恒插值方法[J]. 航空学报, 2018, 39(3): 121569-121569.
[10]	肖中云, 刘刚, 牟斌, 江雄. 旋转坐标系下分区计算的LU隐式方法[J]. 航空学报, 2018, 39(10): 122079-122079.
[11]	刘宏康, 阎超, 林博希, 赵雅甜. 基于图剖分的多块结构网格负载平衡方法[J]. 航空学报, 2017, 38(5): 120558-120558.
[12]	史亚云, 白俊强, 华俊, 杨体浩. 基于当地变量的横流转捩预测模型的研究与改进[J]. 航空学报, 2016, 37(3): 780-789.
[13]	李志辉, 吴俊林, 蒋新宇, 马强. 跨流域高超声速绕流Boltzmann模型方程并行算法[J]. 航空学报, 2015, 36(1): 201-212.
[14]	李彬, 唐静, 邓有奇, 张耀冰. 并行的多重网格方法在离散伴随优化中的应用[J]. 航空学报, 2014, 35(8): 2091-2101.
[15]	黄飞, 苗文博, 程晓丽, 沈清. 一种DSMC方法的并行策略[J]. 航空学报, 2014, 35(4): 968-974.