导航

ACTA AERONAUTICAET ASTRONAUTICA SINICA ›› 2020, Vol. 41 ›› Issue (10): 123859-123859.doi: 10.7527/S1000-6893.2020.23859

• Fluid Mechanics and Flight Mechanics • Previous Articles     Next Articles

Implementation of hybrid MPI+OpenMP parallelization on unstructured CFD solver and its applications in massive unsteady simulations

WANG Nianhua1, CHANG Xinghua1, ZHAO Zhong2, ZHANG Laiping1,2   

  1. 1. State Key Laboratory of Aerodynamics, China Aerodynamics Research and Development Center, Mianyang 621000, China;
    2. Computational Aerodynamics Institute, China Aerodynamics Research and Development Center, Mianyang 621000, China
  • Received:2020-02-02 Revised:2020-03-10 Published:2020-03-06
  • Supported by:
    National Key Research and Development Program of China (2016YFB0200701); National Natural Science Foundation of China (11532016, 11672324)

Abstract: In conventional engineering applications, the computational cost of unsteady flow simulation such as store separation is massive, and becomes even larger if higher accuracy is desired via refining grids or adopting higher order methods. Consequently, unsteady flow simulation is both time-consuming and expensive in CFD engineering applications. Therefore, it is necessary to improve the scalability and efficiency of unsteady flow simulation. To achieve the potential of multi-core CPU processors with both distributed and shared memories, Message Passing Interface (MPI) and OpenMP are adopted for inter-node communication and intra-node shared memory, respectively. This paper firstly implements the MPI+OpenMP hybrid parallelization, both coarse-grain and fine-grain, in our in-house code HyperFLOW. The Common Research Model (CRM) with about 40 million unstructured grid cells is employed to test the implementation on an in-house cluster. The results show that coarse-grain hybrid parallelization is superior at small scales and reaches the highest efficiency at 16 threads, whereas fine-grain is more suitable for large scale parallelization and reaches the highest efficiency at 8 threads. In addition, unstructured overset grids with 0.36 billion cells and 2.88 billion cells are generated for the wing store separation standard model. It only takes dozens of seconds to read the massive grids and complete the overset grids assembly by adopting the P2P (peer to peer) grid reading mode and the optimized overset implicit assembly method. The unsteady store separation process is simulated and parallel efficiency is calculated. The parallel efficiency of 12 288 cores is 90% (based on 768 cores) on the in-house cluster and 70% (based on 384 cores) on the Tianhe 2 supercomputer when 0.36 billion cells are used. The numerical 6 DOF (degree of freedom) results agree well with the experimental data. Finally, for the grid with 2.88 billion cells, parallel efficiency tests are conducted with 4.9×104 CPU cores on the in-house cluster, and the results show that the parallel efficiency reaches 55.3% (based on 4 096 cores).

Key words: MPI+OpenMP hybrid parallelization, parallel efficiency, computational fluid dynamics, overset grids, unsteady simulation

CLC Number: