航空学报 > 2026, Vol. 47 Issue (4): 332217-332217   doi: 10.7527/S1000-6893.2025.32217

基于安全约束强化学习的深失速改出控制

李煜1, 徐新龙2, 李珂澄3, 溫志湧1, 李霓4, 刘小雄3()   

  1. 1.香港理工大学 航空及民航工程学系,香港 999077
    2.南京电子工程研究所,南京 210007
    3.西北工业大学 自动化学院,西安 710072
    4.西北工业大学 航空学院,西安 710072
  • 收稿日期:2025-05-12 修回日期:2025-07-29 接受日期:2025-09-05 出版日期:2025-09-19 发布日期:2025-09-18
  • 通讯作者: 刘小雄 E-mail:liuxiaoxiong@nwpu.edu.cn
  • 基金资助:
    国家自然科学基金(62073266);航空科学基金(201905053003)

Deep-stall recovery control based on safety-constrained reinforcement learning

Yu LI1, Xinlong XU2, Kecheng LI3, Chi-yung WEN1, Ni LI4, Xiaoxiong LIU3()   

  1. 1.Department of Aeronautical and Aviation Engineering,The Hong Kong Polytechnic University,Hong Kong 999077,China
    2.Nanjing Research Institute of Electronic Engineering,Nanjing 210007,China
    3.College of Automation,Northwestern Polytechnical University,Xi’an 710072,China
    4.College of Aeronautics,Northwestern Polytechnical University,Xi’an 710072,China
  • Received:2025-05-12 Revised:2025-07-29 Accepted:2025-09-05 Online:2025-09-19 Published:2025-09-18
  • Contact: Xiaoxiong LIU E-mail:liuxiaoxiong@nwpu.edu.cn
  • Supported by:
    National Natural Science Foundation of China(62073266);Aeronautical Science Foundation of China(201905053003)

摘要:

针对V尾飞机的深失速改出问题,结合带惩罚的近端策略优化(P3O)学习和快速预定义时间的增量控制方法,提出了一种分层结构的深失速安全改出策略。首先,建立了V尾飞机的六自由度非线性运动方程,并将深失速安全改出问题转化为带有约束的马尔科夫决策过程;其次,改进了现有的预定义时间控制理论,提升了系统状态在预定义收敛时间内的响应速度,并基于改进后的理论和非线性增量动态逆控制方法设计了角速度控制器,确保角速度能够在预定义时间内快速跟踪上决策指令,并通过Lyapunov稳定理论证明了其预定义时间的稳定性;随后,结合深失速改出过程中飞机的安全需求,构建了基于惩罚近端策略优化的强化学习决策网络,将安全约束转化为惩罚项,从而实现深失速改出的安全决策;最后,通过一系列仿真和蒙特卡洛实验对所提出深失速安全改出策略进行了验证,结果表明该策略在快速性、鲁棒性、安全性以及策略合理性方面具有显著优势,并且展示出良好的应用潜力。

关键词: V尾飞机, 深失速改出, 带惩罚项的近端策略优化, 快速预定义时间控制, 状态安全约束

Abstract:

To address the deep-stall recovery problem of V-tail aircraft, this paper proposes a novel deep-stall recovery hierarchical strategy that combines Penalized Proximal Policy Optimization (P3O) reinforcement learning with a fast predefined-time incremental control approach. First, a six-degree-of-freedom nonlinear model of the V-tail aircraft is established, and the deep-stall recovery problem is formulated as a constrained Markov decision process. Second, the existing predefined-time control theory is improved to enhance the transient performance of state responses under given convergence time. Based on this improved theory and a nonlinear incremental dynamic inversion method, an angular rate controller is designed, which ensures that angular rate accurately tracks the decision commands within the user-defined time. The predefined-time stability of the controller is theoretically proven via Lyapunov stability theory. Subsequently, a decision-making network based on P3O is constructed to improve the safety during deep-stall recovery, where safety constraints are incorporated as penalty terms to guide the agent in generating safe recovery actions. Finally, a series of simulations and Monte Carlo experiments are conducted to validate the proposed strategy. The results demonstrate its superior performance in terms of rapidity, robustness, safety, and interpretability.

Key words: V-tail aircraft, deep-stall recovery, penalized proximal policy optimization, fast predefined-time control, state safety constraints

中图分类号: