航空学报 > 2020, Vol. 41 Issue (10): 123859-123859   doi: 10.7527/S1000-6893.2020.23859

非结构CFD软件MPI+OpenMP混合并行及超大规模非定常并行计算的应用

王年华1, 常兴华1, 赵钟2, 张来平1,2   

  1. 1. 中国空气动力研究与发展中心 空气动力学国家重点实验室, 绵阳 621000;
    2. 中国空气动力研究与发展中心 计算空气动力研究所, 绵阳 621000
  • 收稿日期:2020-02-02 修回日期:2020-03-10 发布日期:2020-03-06
  • 通讯作者: 王年华 E-mail:nianhuawong@126.com
  • 基金资助:
    国家重点研发计划(2016YFB0200701);国家自然科学基金(11532016,11672324)

Implementation of hybrid MPI+OpenMP parallelization on unstructured CFD solver and its applications in massive unsteady simulations

WANG Nianhua1, CHANG Xinghua1, ZHAO Zhong2, ZHANG Laiping1,2   

  1. 1. State Key Laboratory of Aerodynamics, China Aerodynamics Research and Development Center, Mianyang 621000, China;
    2. Computational Aerodynamics Institute, China Aerodynamics Research and Development Center, Mianyang 621000, China
  • Received:2020-02-02 Revised:2020-03-10 Published:2020-03-06
  • Supported by:
    National Key Research and Development Program of China (2016YFB0200701); National Natural Science Foundation of China (11532016, 11672324)

摘要: 常规工程应用中,非定常数值模拟(如多体分离)的计算量十分巨大,如果为了达到更高的计算精度,加密网格或者采用高精度方法将会使得计算量进一步增大,导致非定常数值模拟在CFD工程应用中成为十分耗时和昂贵的工作,因此,提高非定常数值模拟的可扩展性和计算效率十分必要。为充分发挥既有分布内存又有共享内存的多核处理器的性能和效率优势,对作者团队开发的非结构网格二阶精度有限体积CFD软件(HyperFLOW)进行了混合并行改造,在计算节点间采用MPI消息传递机制,在节点内采用OpenMP共享内存的MPI+OpenMP混合并行策略。首先分别实现了两种粒度(粗粒度和细粒度)的混合并行,并基于国产in-house集群采用CRM标模(约4 000万网格单元)定常湍流算例对两种混合并行模式进行了测试和比较。结果表明,粗粒度在进程数和分区数较少的小规模并行时具有效率优势,16线程时效率较高;而细粒度混合并行在大规模并行计算时具有优势,8线程时效率较高。其次,验证了混合并行在非定常计算情况下的可扩展性,采用机翼外挂物投放标模算例,分别生成3.6亿和28.8亿非结构重叠网格,采用对等的(P2P)网格读入模式和优化的重叠网格隐式装配策略,网格读入和重叠网格装配耗时仅需数十秒;采用3.6亿网格,完成了非定常状态效率测试及非定常分离过程的湍流流场计算,在in-house集群上12 288核并行效率达到90%(以768核为基准),在天河2号上12 288核并行效率达到70%(以384核为基准),数值模拟结果与试验结果符合良好。最后,在in-house集群上采用28.8亿非结构重叠网格进行了4.9万核的并行效率测试,结果显示,4.9万核并行效率达到55.3%(以4 096核为基准)。

关键词: MPI+OpenMP混合并行, 并行效率, 计算流体力学, 重叠网格, 非定常计算

Abstract: In conventional engineering applications, the computational cost of unsteady flow simulation such as store separation is massive, and becomes even larger if higher accuracy is desired via refining grids or adopting higher order methods. Consequently, unsteady flow simulation is both time-consuming and expensive in CFD engineering applications. Therefore, it is necessary to improve the scalability and efficiency of unsteady flow simulation. To achieve the potential of multi-core CPU processors with both distributed and shared memories, Message Passing Interface (MPI) and OpenMP are adopted for inter-node communication and intra-node shared memory, respectively. This paper firstly implements the MPI+OpenMP hybrid parallelization, both coarse-grain and fine-grain, in our in-house code HyperFLOW. The Common Research Model (CRM) with about 40 million unstructured grid cells is employed to test the implementation on an in-house cluster. The results show that coarse-grain hybrid parallelization is superior at small scales and reaches the highest efficiency at 16 threads, whereas fine-grain is more suitable for large scale parallelization and reaches the highest efficiency at 8 threads. In addition, unstructured overset grids with 0.36 billion cells and 2.88 billion cells are generated for the wing store separation standard model. It only takes dozens of seconds to read the massive grids and complete the overset grids assembly by adopting the P2P (peer to peer) grid reading mode and the optimized overset implicit assembly method. The unsteady store separation process is simulated and parallel efficiency is calculated. The parallel efficiency of 12 288 cores is 90% (based on 768 cores) on the in-house cluster and 70% (based on 384 cores) on the Tianhe 2 supercomputer when 0.36 billion cells are used. The numerical 6 DOF (degree of freedom) results agree well with the experimental data. Finally, for the grid with 2.88 billion cells, parallel efficiency tests are conducted with 4.9×104 CPU cores on the in-house cluster, and the results show that the parallel efficiency reaches 55.3% (based on 4 096 cores).

Key words: MPI+OpenMP hybrid parallelization, parallel efficiency, computational fluid dynamics, overset grids, unsteady simulation

中图分类号: