首页 >

基于安全约束强化学习的深失速改出控制研究

李煜1,2,徐新龙1,3,李珂澄1,温志涌2,李霓1,刘小雄1   

  1. 1. 西北工业大学
    2. 香港理工大学
    3. 中国电子科技集团公司第二十八研究所
  • 收稿日期:2025-05-12 修回日期:2025-09-15 出版日期:2025-09-18 发布日期:2025-09-18
  • 通讯作者: 刘小雄
  • 基金资助:
    国家自然科学基金;航空科学基金

Research on deep-stall recovery control based on safety-constrained reinforcement learning

  • Received:2025-05-12 Revised:2025-09-15 Online:2025-09-18 Published:2025-09-18

摘要: 针对V尾飞机的深失速改出问题,本文结合带惩罚的近端策略优化强化学习和快速预定义时间的增量控制方法,提出了一种分层结构的深失速安全改出策略。首先,建立了V尾飞机的六自由度非线性运动方程,并将深失速安全改出问题转化为带有约束的马尔科夫决策过程;其次,改进了现有的预定义时间控制理论,提升了系统状态在预定义收敛时间内的响应速度,并基于改进后的理论和非线性增量动态逆控制方法设计了角速度控制器,确保角速度够在预定义时间内快速跟踪上决策指令,并通过Lyapunov稳定理论证明了其预定义时间的稳定性;随后,结合深失速改出过程中飞机的安全需求,构建了基于惩罚近端策略优化的强化学习决策网络,将安全约束转化为惩罚项,从而实现深失速改出的安全决策;最后,通过一系列仿真和蒙特卡洛实验对所提出深失速安全改出策略进行了验证,结果表明该策略在在快速性、鲁棒性、安全性以及策略合理性方面具有显著优势,并且展示出良好的应用潜力。

关键词: V尾飞机, 深失速改出, 带惩罚项的近端策略优化, 快速预定义时间控制, 状态安全约束

Abstract: To address the deep-stall recovery problem of V-tail aircraft, this paper proposes a novel deep-stall recovery hierarchical strategy that combines Penalized Proximal Policy Optimization (P3O) reinforcement learning with a fast predefined-time incremental control approach. First, a six-degree-of-freedom nonlinear model of the V-tail aircraft is established, and the deep-stall recovery problem is formulated as a constrained Markov Decision Process. Second, the existing predefined-time control theory is improved to enhance the transient performance of state responses under given convergence time. Based on this improved theory and a nonlinear incremental dynamic inversion method, an angular rate controller is designed, which ensures that angular rate accurately tracks the decision commands within the user-defined time. The predefined-time stability of the controller is theoretically proven via Lyapunov stability theory. Subsequently, a decision-making network based on P3O is constructed to improve the safety during deep-stall recovery, where safety constraints are incorporated as penalty terms to guide the agent in generating safe recovery actions. Finally, a series of simulations and Monte Carlo experiments are conducted to validate the proposed strategy. The results demonstrate its superior performance in terms of rapidity, robustness, safety, and interpretability.

Key words: V-tail aircraft, Deep-stall recovery, Penalized Proximal Policy Optimization, Fast predefined-time control, State safety constraints

中图分类号: