Fluid Mechanics and Flight Mechanics

Implementation of hybrid MPI+OpenMP parallelization on unstructured CFD solver and its applications in massive unsteady simulations

  • WANG Nianhua ,
  • CHANG Xinghua ,
  • ZHAO Zhong ,
  • ZHANG Laiping
Expand
  • 1. State Key Laboratory of Aerodynamics, China Aerodynamics Research and Development Center, Mianyang 621000, China;
    2. Computational Aerodynamics Institute, China Aerodynamics Research and Development Center, Mianyang 621000, China

Received date: 2020-02-02

  Revised date: 2020-03-10

  Online published: 2020-03-06

Supported by

National Key Research and Development Program of China (2016YFB0200701); National Natural Science Foundation of China (11532016, 11672324)

Abstract

In conventional engineering applications, the computational cost of unsteady flow simulation such as store separation is massive, and becomes even larger if higher accuracy is desired via refining grids or adopting higher order methods. Consequently, unsteady flow simulation is both time-consuming and expensive in CFD engineering applications. Therefore, it is necessary to improve the scalability and efficiency of unsteady flow simulation. To achieve the potential of multi-core CPU processors with both distributed and shared memories, Message Passing Interface (MPI) and OpenMP are adopted for inter-node communication and intra-node shared memory, respectively. This paper firstly implements the MPI+OpenMP hybrid parallelization, both coarse-grain and fine-grain, in our in-house code HyperFLOW. The Common Research Model (CRM) with about 40 million unstructured grid cells is employed to test the implementation on an in-house cluster. The results show that coarse-grain hybrid parallelization is superior at small scales and reaches the highest efficiency at 16 threads, whereas fine-grain is more suitable for large scale parallelization and reaches the highest efficiency at 8 threads. In addition, unstructured overset grids with 0.36 billion cells and 2.88 billion cells are generated for the wing store separation standard model. It only takes dozens of seconds to read the massive grids and complete the overset grids assembly by adopting the P2P (peer to peer) grid reading mode and the optimized overset implicit assembly method. The unsteady store separation process is simulated and parallel efficiency is calculated. The parallel efficiency of 12 288 cores is 90% (based on 768 cores) on the in-house cluster and 70% (based on 384 cores) on the Tianhe 2 supercomputer when 0.36 billion cells are used. The numerical 6 DOF (degree of freedom) results agree well with the experimental data. Finally, for the grid with 2.88 billion cells, parallel efficiency tests are conducted with 4.9×104 CPU cores on the in-house cluster, and the results show that the parallel efficiency reaches 55.3% (based on 4 096 cores).

Cite this article

WANG Nianhua , CHANG Xinghua , ZHAO Zhong , ZHANG Laiping . Implementation of hybrid MPI+OpenMP parallelization on unstructured CFD solver and its applications in massive unsteady simulations[J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2020 , 41(10) : 123859 -123859 . DOI: 10.7527/S1000-6893.2020.23859

References

[1] 周铸, 黄江涛, 黄勇, 等. CFD技术在航空工程领域的应用、挑战与发展[J].航空学报, 2017, 38(3):020891. ZHOU Z, HUANG J T, HUANG Y, et al. CFD technology in aeronautic engineering field:Applications, challenges and development[J].Acta Aeronautica et Astronautica Sinica, 2017, 38(3):020891(in Chinese).
[2] 阎超, 于剑, 徐晶磊, 等. CFD模拟方法的发展成就与展望[J].力学进展, 2011, 41(5):562-589. YAN C, YU J, XU J L, et al. On the achievements and prospects for the methods of computational fluid dynamics[J].Advances in Mechanics, 2011, 41(5):562-589(in Chinese).
[3] BLAZEK J. Computational fluid dynamics, principles and applications[M]. 3rd ed. Elsevier, 2015.
[4] 张来平, 邓小刚, 何磊, 等. E级计算给CFD带来的机遇与挑战[J].空气动力学学报, 2016, 34(4):405-417. ZHANG L P, DENG X G, HE L, et al. The opportunity and grand challenges in computational fluid dynamics by exascale computing[J].Acta Aerodynamica Sinica, 2016, 34(4):405-417(in Chinese).
[5] 陈国良, 孙广中, 徐云, 等. 并行计算的一体化研究现状与发展趋势[J].科学通报, 2009, 54(8):1043-1049. CHEN G L, SUN G Z, XU Y, et al. Integrated research of parallel computing:Status and future[J].Chinese Science Bulletin, 2009, 54(8):1043-1049(in Chinese).
[6] 王涛."天河二号"超级计算机[J].科学, 2013, 65(4):52. WANG T. "Tianhe 2" supercomputer[J].Science, 2013, 65(4):52(in Chinese).
[7] 张云泉. 2015年中国高性能计算机发展现状分析与展望[J].科研信息化技术与应用, 2015, 6(6):83-92. ZAHNG Y Q. State-of-art analysis and perspectives of 2015 China HPC[J].E-science Technology & Application, 2015, 6(6):83-92(in Chinese).
[8] 杨广文, 赵文来, 丁楠, 等. "神威·太湖之光"及其应用系统[J].科学, 2017, 69(3):12-16. YANG G W, ZHAO W L, DING N, et al. "Sunway TaihuLight" supercomputer and its application systems[J].Science, 2017, 69(3):12-16(in Chinese).
[9] 张云泉. 2018年中国高性能计算机发展现状分析与展望[J].计算机科学, 2019, 46(1):1-5. ZHANG Y Q. State-of-the-art analysis and perspectives of 2018 China HPC development[J].Computer Science, 2019, 46(1):1-5(in Chinese).
[10] TINOCO E N, BRODERSEN O P, KEYE S, et al. Summary of data from the sixth AIAA CFD Drag Prediction Workshop:CRM Cases 2 to 5:AIAA-2017-1208[R]. Reston:AIAA, 2017.
[11] HE X, ZHAO Z, MA R, et al. Validation of HyperFLOW in subsonic and transonic flow[J].Acta Aerodynamica Sinica, 2016, 34(2):267-275.
[12] HE X, HE X Y, HE L, et al. HyperFLOW:A structured/unstructured hybrid integrated computational environment for multi-purpose fluid simulation[J].Procedia Engineering, 2015, 126:645-649.
[13] 赵钟, 张来平, 何磊, 等. 适用任意网格的大规模并行CFD计算框架PHengLEI[J].计算机学报, 2019, 42(11):2368-2383. ZHAO Z, ZHANG L P, HE L, et al. PHengLEI:A large scale parallel CFD framework for arbitrary grids[J].Chinese Journal of Computers, 2019, 42(11):2368-2383(in Chinese).
[14] 王年华, 李明, 张来平. 非结构网格二阶有限体积法中黏性通量离散格式精度分析与改进[J].力学学报, 2018, 50(3):527-537. WANG N H, LI M, ZHANG L P. Accuracy analysis and improvement of viscous flux schemes in unstructured second-order finite-volume discretization[J].Chinese Journal of Theoretical and Applied Mechanics, 2018, 50(3):527-537(in Chinese).
[15] CHAPMAN B, JOST G, VAN DER PAS R. Using OpenMP, portable shared memory parallel programming[M]. Cambridge:The MIT Press, 2010:115-118.
[16] ZHAO Z, ZHANG Y, HE L, et al. A large-scale parallel hybrid grid generation technique for realistic complex geometry[J].International Journal for Numerical Methods in Fluids, 2020(in Press)
[17] 常兴华, 马戎, 张来平. 并行化非结构重叠网格隐式装配技术[J].航空学报, 2018, 39(6):48-58. CHANG X H, MA R, ZHANG L P. Parallel implicit hole-cutting method for unstructured overset grid[J].Acta Aeronautica et Astronautica Sinica, 2018, 39(6):48-58(in Chinese).
[18] 常兴华, 王年华, 马戎, 等. 并行重叠/变形混合网格生成技术及其应用[J].气体物理, 2019, 4(6):12-21. CHANG X H, WANG N H, MA R, et al. Dynamic hybrid mesh generator coupled with overset and deformation in parallel environment[J].Physics of Gases, 2019, 4(6):12-21(in Chinese).
[19] CHANG X H, MA R, WANG N H, et al. A parallel implicit hole-cutting method based on background mesh for unstructured chimera grid[J].Computers and Fluids, 2020, 198:104403.
[20] ZHANG L P, CHANG X H, MA R, et al. A CFD-based numerical virtual flight simulator and its application in control law design of a maneuverable missile model[J].Chinese Journal of Aeronautics, 2019, 32(12):2577-2591.
[21] HALL L H, PARTHASARATHY V. Validation of an automated Chimera/6-DOF methodology for multiple moving body problems[C]//36th AIAA Aerospace Sciences Meeting and Exhibit. Reston:AIAA, 1998.
[22] 赵钟, 何磊, 张健, 等. 湍流模拟壁面距离MPI/OpenMP混合并行计算方法[J].空气动力学学报, 2019, 37(6):883-892. ZHAO Z, HE L, ZHANG J, et al. MPI/OpenMP hybrid parallel computation of wall distance for turbulence flow simulations[J].Acta Aerodynamica Sinica, 2019, 37(6):883-892(in Chinese).
[23] 王刚, 曾铮, 叶正寅. 混合非结构网格下壁面最短距离的快速计算方法[J].西北工业大学学报, 2014(4):511-516. WANG G, ZENG Z, YE Z Y. An efficient search algorithm for calculating minimum wall distance of unstructured mesh[J].Journal of Northwestern Polytechnical University, 2014(4):511-516(in Chinese).
[24] XU C F, DENG X G, ZHANG L L, et.al. Collaborating CPU and GPU for large-scale high-order CFD simulations with complex grids on the TianHe-1A supercomputer[J].Journal of Computational Physics, 2014, 278:275-297.
Outlines

/