流体力学与飞行力学

非结构CFD软件MPI+OpenMP混合并行及超大规模非定常并行计算的应用

  • 王年华 ,
  • 常兴华 ,
  • 赵钟 ,
  • 张来平
展开
  • 1. 中国空气动力研究与发展中心 空气动力学国家重点实验室, 绵阳 621000;
    2. 中国空气动力研究与发展中心 计算空气动力研究所, 绵阳 621000

收稿日期: 2020-02-02

  修回日期: 2020-03-10

  网络出版日期: 2020-03-06

基金资助

国家重点研发计划(2016YFB0200701);国家自然科学基金(11532016,11672324)

Implementation of hybrid MPI+OpenMP parallelization on unstructured CFD solver and its applications in massive unsteady simulations

  • WANG Nianhua ,
  • CHANG Xinghua ,
  • ZHAO Zhong ,
  • ZHANG Laiping
Expand
  • 1. State Key Laboratory of Aerodynamics, China Aerodynamics Research and Development Center, Mianyang 621000, China;
    2. Computational Aerodynamics Institute, China Aerodynamics Research and Development Center, Mianyang 621000, China

Received date: 2020-02-02

  Revised date: 2020-03-10

  Online published: 2020-03-06

Supported by

National Key Research and Development Program of China (2016YFB0200701); National Natural Science Foundation of China (11532016, 11672324)

摘要

常规工程应用中,非定常数值模拟(如多体分离)的计算量十分巨大,如果为了达到更高的计算精度,加密网格或者采用高精度方法将会使得计算量进一步增大,导致非定常数值模拟在CFD工程应用中成为十分耗时和昂贵的工作,因此,提高非定常数值模拟的可扩展性和计算效率十分必要。为充分发挥既有分布内存又有共享内存的多核处理器的性能和效率优势,对作者团队开发的非结构网格二阶精度有限体积CFD软件(HyperFLOW)进行了混合并行改造,在计算节点间采用MPI消息传递机制,在节点内采用OpenMP共享内存的MPI+OpenMP混合并行策略。首先分别实现了两种粒度(粗粒度和细粒度)的混合并行,并基于国产in-house集群采用CRM标模(约4 000万网格单元)定常湍流算例对两种混合并行模式进行了测试和比较。结果表明,粗粒度在进程数和分区数较少的小规模并行时具有效率优势,16线程时效率较高;而细粒度混合并行在大规模并行计算时具有优势,8线程时效率较高。其次,验证了混合并行在非定常计算情况下的可扩展性,采用机翼外挂物投放标模算例,分别生成3.6亿和28.8亿非结构重叠网格,采用对等的(P2P)网格读入模式和优化的重叠网格隐式装配策略,网格读入和重叠网格装配耗时仅需数十秒;采用3.6亿网格,完成了非定常状态效率测试及非定常分离过程的湍流流场计算,在in-house集群上12 288核并行效率达到90%(以768核为基准),在天河2号上12 288核并行效率达到70%(以384核为基准),数值模拟结果与试验结果符合良好。最后,在in-house集群上采用28.8亿非结构重叠网格进行了4.9万核的并行效率测试,结果显示,4.9万核并行效率达到55.3%(以4 096核为基准)。

本文引用格式

王年华 , 常兴华 , 赵钟 , 张来平 . 非结构CFD软件MPI+OpenMP混合并行及超大规模非定常并行计算的应用[J]. 航空学报, 2020 , 41(10) : 123859 -123859 . DOI: 10.7527/S1000-6893.2020.23859

Abstract

In conventional engineering applications, the computational cost of unsteady flow simulation such as store separation is massive, and becomes even larger if higher accuracy is desired via refining grids or adopting higher order methods. Consequently, unsteady flow simulation is both time-consuming and expensive in CFD engineering applications. Therefore, it is necessary to improve the scalability and efficiency of unsteady flow simulation. To achieve the potential of multi-core CPU processors with both distributed and shared memories, Message Passing Interface (MPI) and OpenMP are adopted for inter-node communication and intra-node shared memory, respectively. This paper firstly implements the MPI+OpenMP hybrid parallelization, both coarse-grain and fine-grain, in our in-house code HyperFLOW. The Common Research Model (CRM) with about 40 million unstructured grid cells is employed to test the implementation on an in-house cluster. The results show that coarse-grain hybrid parallelization is superior at small scales and reaches the highest efficiency at 16 threads, whereas fine-grain is more suitable for large scale parallelization and reaches the highest efficiency at 8 threads. In addition, unstructured overset grids with 0.36 billion cells and 2.88 billion cells are generated for the wing store separation standard model. It only takes dozens of seconds to read the massive grids and complete the overset grids assembly by adopting the P2P (peer to peer) grid reading mode and the optimized overset implicit assembly method. The unsteady store separation process is simulated and parallel efficiency is calculated. The parallel efficiency of 12 288 cores is 90% (based on 768 cores) on the in-house cluster and 70% (based on 384 cores) on the Tianhe 2 supercomputer when 0.36 billion cells are used. The numerical 6 DOF (degree of freedom) results agree well with the experimental data. Finally, for the grid with 2.88 billion cells, parallel efficiency tests are conducted with 4.9×104 CPU cores on the in-house cluster, and the results show that the parallel efficiency reaches 55.3% (based on 4 096 cores).

参考文献

[1] 周铸, 黄江涛, 黄勇, 等. CFD技术在航空工程领域的应用、挑战与发展[J].航空学报, 2017, 38(3):020891. ZHOU Z, HUANG J T, HUANG Y, et al. CFD technology in aeronautic engineering field:Applications, challenges and development[J].Acta Aeronautica et Astronautica Sinica, 2017, 38(3):020891(in Chinese).
[2] 阎超, 于剑, 徐晶磊, 等. CFD模拟方法的发展成就与展望[J].力学进展, 2011, 41(5):562-589. YAN C, YU J, XU J L, et al. On the achievements and prospects for the methods of computational fluid dynamics[J].Advances in Mechanics, 2011, 41(5):562-589(in Chinese).
[3] BLAZEK J. Computational fluid dynamics, principles and applications[M]. 3rd ed. Elsevier, 2015.
[4] 张来平, 邓小刚, 何磊, 等. E级计算给CFD带来的机遇与挑战[J].空气动力学学报, 2016, 34(4):405-417. ZHANG L P, DENG X G, HE L, et al. The opportunity and grand challenges in computational fluid dynamics by exascale computing[J].Acta Aerodynamica Sinica, 2016, 34(4):405-417(in Chinese).
[5] 陈国良, 孙广中, 徐云, 等. 并行计算的一体化研究现状与发展趋势[J].科学通报, 2009, 54(8):1043-1049. CHEN G L, SUN G Z, XU Y, et al. Integrated research of parallel computing:Status and future[J].Chinese Science Bulletin, 2009, 54(8):1043-1049(in Chinese).
[6] 王涛."天河二号"超级计算机[J].科学, 2013, 65(4):52. WANG T. "Tianhe 2" supercomputer[J].Science, 2013, 65(4):52(in Chinese).
[7] 张云泉. 2015年中国高性能计算机发展现状分析与展望[J].科研信息化技术与应用, 2015, 6(6):83-92. ZAHNG Y Q. State-of-art analysis and perspectives of 2015 China HPC[J].E-science Technology & Application, 2015, 6(6):83-92(in Chinese).
[8] 杨广文, 赵文来, 丁楠, 等. "神威·太湖之光"及其应用系统[J].科学, 2017, 69(3):12-16. YANG G W, ZHAO W L, DING N, et al. "Sunway TaihuLight" supercomputer and its application systems[J].Science, 2017, 69(3):12-16(in Chinese).
[9] 张云泉. 2018年中国高性能计算机发展现状分析与展望[J].计算机科学, 2019, 46(1):1-5. ZHANG Y Q. State-of-the-art analysis and perspectives of 2018 China HPC development[J].Computer Science, 2019, 46(1):1-5(in Chinese).
[10] TINOCO E N, BRODERSEN O P, KEYE S, et al. Summary of data from the sixth AIAA CFD Drag Prediction Workshop:CRM Cases 2 to 5:AIAA-2017-1208[R]. Reston:AIAA, 2017.
[11] HE X, ZHAO Z, MA R, et al. Validation of HyperFLOW in subsonic and transonic flow[J].Acta Aerodynamica Sinica, 2016, 34(2):267-275.
[12] HE X, HE X Y, HE L, et al. HyperFLOW:A structured/unstructured hybrid integrated computational environment for multi-purpose fluid simulation[J].Procedia Engineering, 2015, 126:645-649.
[13] 赵钟, 张来平, 何磊, 等. 适用任意网格的大规模并行CFD计算框架PHengLEI[J].计算机学报, 2019, 42(11):2368-2383. ZHAO Z, ZHANG L P, HE L, et al. PHengLEI:A large scale parallel CFD framework for arbitrary grids[J].Chinese Journal of Computers, 2019, 42(11):2368-2383(in Chinese).
[14] 王年华, 李明, 张来平. 非结构网格二阶有限体积法中黏性通量离散格式精度分析与改进[J].力学学报, 2018, 50(3):527-537. WANG N H, LI M, ZHANG L P. Accuracy analysis and improvement of viscous flux schemes in unstructured second-order finite-volume discretization[J].Chinese Journal of Theoretical and Applied Mechanics, 2018, 50(3):527-537(in Chinese).
[15] CHAPMAN B, JOST G, VAN DER PAS R. Using OpenMP, portable shared memory parallel programming[M]. Cambridge:The MIT Press, 2010:115-118.
[16] ZHAO Z, ZHANG Y, HE L, et al. A large-scale parallel hybrid grid generation technique for realistic complex geometry[J].International Journal for Numerical Methods in Fluids, 2020(in Press)
[17] 常兴华, 马戎, 张来平. 并行化非结构重叠网格隐式装配技术[J].航空学报, 2018, 39(6):48-58. CHANG X H, MA R, ZHANG L P. Parallel implicit hole-cutting method for unstructured overset grid[J].Acta Aeronautica et Astronautica Sinica, 2018, 39(6):48-58(in Chinese).
[18] 常兴华, 王年华, 马戎, 等. 并行重叠/变形混合网格生成技术及其应用[J].气体物理, 2019, 4(6):12-21. CHANG X H, WANG N H, MA R, et al. Dynamic hybrid mesh generator coupled with overset and deformation in parallel environment[J].Physics of Gases, 2019, 4(6):12-21(in Chinese).
[19] CHANG X H, MA R, WANG N H, et al. A parallel implicit hole-cutting method based on background mesh for unstructured chimera grid[J].Computers and Fluids, 2020, 198:104403.
[20] ZHANG L P, CHANG X H, MA R, et al. A CFD-based numerical virtual flight simulator and its application in control law design of a maneuverable missile model[J].Chinese Journal of Aeronautics, 2019, 32(12):2577-2591.
[21] HALL L H, PARTHASARATHY V. Validation of an automated Chimera/6-DOF methodology for multiple moving body problems[C]//36th AIAA Aerospace Sciences Meeting and Exhibit. Reston:AIAA, 1998.
[22] 赵钟, 何磊, 张健, 等. 湍流模拟壁面距离MPI/OpenMP混合并行计算方法[J].空气动力学学报, 2019, 37(6):883-892. ZHAO Z, HE L, ZHANG J, et al. MPI/OpenMP hybrid parallel computation of wall distance for turbulence flow simulations[J].Acta Aerodynamica Sinica, 2019, 37(6):883-892(in Chinese).
[23] 王刚, 曾铮, 叶正寅. 混合非结构网格下壁面最短距离的快速计算方法[J].西北工业大学学报, 2014(4):511-516. WANG G, ZENG Z, YE Z Y. An efficient search algorithm for calculating minimum wall distance of unstructured mesh[J].Journal of Northwestern Polytechnical University, 2014(4):511-516(in Chinese).
[24] XU C F, DENG X G, ZHANG L L, et.al. Collaborating CPU and GPU for large-scale high-order CFD simulations with complex grids on the TianHe-1A supercomputer[J].Journal of Computational Physics, 2014, 278:275-297.
文章导航

/