导航

Acta Aeronautica et Astronautica Sinica ›› 2026, Vol. 47 ›› Issue (4): 332217.doi: 10.7527/S1000-6893.2025.32217

• Electronics and Electrical Engineering and Control • Previous Articles    

Deep-stall recovery control based on safety-constrained reinforcement learning

Yu LI1, Xinlong XU2, Kecheng LI3, Chi-yung WEN1, Ni LI4, Xiaoxiong LIU3()   

  1. 1.Department of Aeronautical and Aviation Engineering,The Hong Kong Polytechnic University,Hong Kong 999077,China
    2.Nanjing Research Institute of Electronic Engineering,Nanjing 210007,China
    3.College of Automation,Northwestern Polytechnical University,Xi’an 710072,China
    4.College of Aeronautics,Northwestern Polytechnical University,Xi’an 710072,China
  • Received:2025-05-12 Revised:2025-07-29 Accepted:2025-09-05 Online:2025-09-19 Published:2025-09-18
  • Contact: Xiaoxiong LIU E-mail:liuxiaoxiong@nwpu.edu.cn
  • Supported by:
    National Natural Science Foundation of China(62073266);Aeronautical Science Foundation of China(201905053003)

Abstract:

To address the deep-stall recovery problem of V-tail aircraft, this paper proposes a novel deep-stall recovery hierarchical strategy that combines Penalized Proximal Policy Optimization (P3O) reinforcement learning with a fast predefined-time incremental control approach. First, a six-degree-of-freedom nonlinear model of the V-tail aircraft is established, and the deep-stall recovery problem is formulated as a constrained Markov decision process. Second, the existing predefined-time control theory is improved to enhance the transient performance of state responses under given convergence time. Based on this improved theory and a nonlinear incremental dynamic inversion method, an angular rate controller is designed, which ensures that angular rate accurately tracks the decision commands within the user-defined time. The predefined-time stability of the controller is theoretically proven via Lyapunov stability theory. Subsequently, a decision-making network based on P3O is constructed to improve the safety during deep-stall recovery, where safety constraints are incorporated as penalty terms to guide the agent in generating safe recovery actions. Finally, a series of simulations and Monte Carlo experiments are conducted to validate the proposed strategy. The results demonstrate its superior performance in terms of rapidity, robustness, safety, and interpretability.

Key words: V-tail aircraft, deep-stall recovery, penalized proximal policy optimization, fast predefined-time control, state safety constraints

CLC Number: