航空学报 > 2018, Vol. 39 Issue (9): 121944-121953   doi: 10.7527/S1000-6893.2018.21944

多GPU并行可压缩流求解器及其性能分析

赖剑奇, 李桦, 张冉, 常青   

  1. 国防科技大学 空天科学学院, 长沙 410073
  • 收稿日期:2017-12-19 修回日期:2018-02-08 出版日期:2018-09-15 发布日期:2018-04-02
  • 通讯作者: 李桦 E-mail:lihuakd@tom.com
  • 基金资助:
    国家自然科学基金(11472004)

Multi-GPU parallel compressible flow solver and its performance analysis

LAI Jianqi, LI Hua, ZHANG Ran, CHANG Qing   

  1. College of Aerospace Science and Engineering, National University of Defense Technology, Changsha 410073, China
  • Received:2017-12-19 Revised:2018-02-08 Online:2018-09-15 Published:2018-04-02
  • Supported by:
    National Natural Science Foundation of China (11472004)

摘要: 为实现可压缩流问题的大规模高效数值求解,开展基于图形处理单元(GPU)的并行计算研究。在NVIDIA GTX 1070上建立了基于消息传递接口+统一计算设备架构(MPI+CUDA)的多GPU并行可压缩流求解器,该求解器基于结构网格有限体积法,空间离散采用AUSM+UP格式。采用一维区域分解法对计算网格进行划分,使得各GPU之间达到负载平衡。针对超声速进气道算例,对算法单GPU并行性能和多GPU可扩展性能进行分析。数值结果显示,单GPU并行计算可以获得37~46倍的加速比,极大地提高了计算效率;4块GPU并行计算加速比从47倍增加到143倍,并行效率维持在70%以上,说明并行算法具有良好的可扩展性。

关键词: 图形处理单元(GPU), 统一计算设备架构(CUDA), 并行计算, 加速比, 并行效率

Abstract: To achieve efficient numerical solutions for large-scale compressible flow problems, Graphics Processing Units (GPU)-based parallel computing is studied. A multi-GPU parallel compressible flow solver based on Message Passing Interface + Compute Unified Device Architecture (MPI+CUDA)is built on the NVIDIA GTX 1070. This solver is applicable to structured meshes, and an upwind finite volume scheme AUSM+UP is used for spatial discretization. A one-dimensional domain decomposition method is used to divide the computational grid into the same size, so as to obtain load balancing among GPUs. According to the case of the supersonic inlet, the parallel performance of single GPU and scalability of multi-GPU are analyzed for this solver. The numerical results show that for single GPU, parallel computing can get a speedup ratio of 37 to 46 times, greatly improving computational efficiency. For four GPUs, the speedup ratio increases from 47 to 143 times and parallel efficiency maintains above 70%, demonstrating good scalability of the solver.

Key words: Graphics Processing Units (GPU), Compute Unified Device Architecture (CUDA), parallel computing, speedup ratio, parallel efficiency

中图分类号: