基于安全约束强化学习的深失速改出控制研究

  • 李煜 ,
  • 徐新龙 ,
  • 李珂澄 ,
  • 温志涌 ,
  • 李霓 ,
  • 刘小雄
展开
  • 1. 西北工业大学
    2. 香港理工大学
    3. 中国电子科技集团公司第二十八研究所

收稿日期: 2025-05-12

  修回日期: 2025-09-15

  网络出版日期: 2025-09-18

基金资助

国家自然科学基金;航空科学基金

Research on deep-stall recovery control based on safety-constrained reinforcement learning

  • LI Yu ,
  • XU Xin-Long ,
  • LI Ke-Cheng ,
  • WEN Zhi-Yong ,
  • LI Ni ,
  • LIU Xiao-Xiong
Expand

Received date: 2025-05-12

  Revised date: 2025-09-15

  Online published: 2025-09-18

摘要

针对V尾飞机的深失速改出问题,本文结合带惩罚的近端策略优化强化学习和快速预定义时间的增量控制方法,提出了一种分层结构的深失速安全改出策略。首先,建立了V尾飞机的六自由度非线性运动方程,并将深失速安全改出问题转化为带有约束的马尔科夫决策过程;其次,改进了现有的预定义时间控制理论,提升了系统状态在预定义收敛时间内的响应速度,并基于改进后的理论和非线性增量动态逆控制方法设计了角速度控制器,确保角速度够在预定义时间内快速跟踪上决策指令,并通过Lyapunov稳定理论证明了其预定义时间的稳定性;随后,结合深失速改出过程中飞机的安全需求,构建了基于惩罚近端策略优化的强化学习决策网络,将安全约束转化为惩罚项,从而实现深失速改出的安全决策;最后,通过一系列仿真和蒙特卡洛实验对所提出深失速安全改出策略进行了验证,结果表明该策略在在快速性、鲁棒性、安全性以及策略合理性方面具有显著优势,并且展示出良好的应用潜力。

本文引用格式

李煜 , 徐新龙 , 李珂澄 , 温志涌 , 李霓 , 刘小雄 . 基于安全约束强化学习的深失速改出控制研究[J]. 航空学报, 0 : 1 -0 . DOI: 10.7527/S1000-6893.2025.32217

Abstract

To address the deep-stall recovery problem of V-tail aircraft, this paper proposes a novel deep-stall recovery hierarchical strategy that combines Penalized Proximal Policy Optimization (P3O) reinforcement learning with a fast predefined-time incremental control approach. First, a six-degree-of-freedom nonlinear model of the V-tail aircraft is established, and the deep-stall recovery problem is formulated as a constrained Markov Decision Process. Second, the existing predefined-time control theory is improved to enhance the transient performance of state responses under given convergence time. Based on this improved theory and a nonlinear incremental dynamic inversion method, an angular rate controller is designed, which ensures that angular rate accurately tracks the decision commands within the user-defined time. The predefined-time stability of the controller is theoretically proven via Lyapunov stability theory. Subsequently, a decision-making network based on P3O is constructed to improve the safety during deep-stall recovery, where safety constraints are incorporated as penalty terms to guide the agent in generating safe recovery actions. Finally, a series of simulations and Monte Carlo experiments are conducted to validate the proposed strategy. The results demonstrate its superior performance in terms of rapidity, robustness, safety, and interpretability.

参考文献

[1] Taylor R T, Ray E J. Deep-Stall Aerodynamic characteristics of T-tail aircraft[R]. USA: NASA,1965.
[2] Nguyen D H, Lowenberg, M H, Neild S A. Analysing dynamic deep stall recovery using a nonlinear frequency approach[J]. Nonlinear Dynamics, 2022, 108: 1179-1196.
[3] 陈永亮,沈宏良,刘昶.飞机深失速改出特性分析与控制[J]. 南京航空航天大学学报, 2007, 39(4): 435-439.
CHEN Y L, SHEN H L, LIU C. Analysis and control of aircraft deep stall recovery characteristics[J]. Journal of Nanjing University of Aeronautics &Astronautics, 2007, 39(4): 435-439 (in Chinese).
[4] 艾文磊.歼击机深失速特性分析及改出控制研究[D]. 南京: 南京航空航天大学, 2015.
AI W L. Characteristics analysis and recovery control for deep-stall of fighters[D]. Nanjing: Nanjing University of Aeronautics and Astronautics, 2015 (in Chinese).
[5] Trubshaw, E. B. Low speed handling with special reference to the super stall[J]. Journal of the Royal Aeronautical Society, 1966, 70(667):695-704.
[6] Iloputaife O I. Design of deep stall protection for the C-17A[J]. Journal of Guidance, Control, and Dynamics, 1997, 20(4): 760-767.
[7] DEFAZIO P A, LARSEN R. Final committee report on the design, development, and certification of the boeing 737 max [R/Online]. 2020. https://democrats-transportation.house.gov/committee-activity/boeing-737-max-investigation
[8] JIANG H T, XIONG H, ZENG W F, et al. Safely learn to fly aircraft from human: an offline-online reinforcement learning strategy and its application to aircraft stall recovery[J] IEEE Transactions on Aerospace and Electronic Systems, 2023, 59(6): 8194-8207.
[9] TOMAR D S, GAUCI J, DINGLI, A, et.al. Automated Aircraft stall recovery using reinforcement learning and supervised learning techniques[C]// 2021 IEEE/AIAA 40th Digital Avionics Systems Conference, San Antonio, TX, USA, 2021: 1-7,
[10] CYNTHIA K, DAVID Z M. Using reinforcement learning for AI systems in the mitigation of automation failures and stall recovery in complex aircraft[C]// AIAA SCITECH, Orlando: AIAA, 2024.
[11] AGUSTIN G, GABRIEL T, ROBERTO B. Optimal stall recovery via deep reinforcement learning for a general aviation aircraft[C]// AIAA 2024-2408. AIAA SCITECH 2024 Forum: AIAA, 2024.
[12] 李煜 陈通文 王志刚 溫志湧 刘小雄. 基于预定义时间的直接升力着舰增量控制研究[J]. 航空学报, (2024-11-29) [2025-04-23]. https://hkxb.buaa.edu.cn/ EN/10.7527/ S1000-6893.2024. 31163.
LI Y, CHEN T W, WANG Z G, et al. Research on incremental control of direct lift landing based on predefined-time theory [J/OL]. Acta Aeronautica et Astronautica Sinica, (2024-11-29) [2025-04-23]. https://hkxb. buaa.edu.cn/EN/10.7527/S1000-6893.2024. 31163 (in Chinese).
[13] 罗飞, 张军红, 王博, 等. 基于直接升力与动态逆的舰尾流抑制方法[J]. 航空学报, 2021, 42(12): 124770.
LUO F, ZHANG J H, WANG B, et al. Air wake suppression method based on direct lift and nonlinear dynamic inversion control[J]. Acta Aeronautica et Astronautica Sinica, 2021, 42(12): 124770 (in Chinese).
[14] KOLB S, HETRU L, FAURE T M et al. Nonlinear analysis and control of an aircraft in the neighbourhood of deep stall[J]. AIP Conference Proceedings. 2017; 1798 (1): 020080.
[15] LI Y, WEN C Y, LIU X X, et al. Prescribed-time fault-tolerant flight control for aircraft subject to structural damage[J]. IEEE Transactions on Aerospace and Electronic Systems, 2024, 61(2): 1848-1859.
[16] 吴慈航,闫建国,钱先云,等.受油机指定时间姿态稳定控制[J].航空学报, 2022, 43(2): 324996.
WU C H, YAN J G, QIAN X Y, et al. Predefined-time attitude stabilization control of receiver aircraft[J]. Acta Aeronautica et Astronautica Sinica, 2022, 43(2): 324996 (in Chinese).
[17] DONG Y, ZOU A M, SUN Z W. Predefined-time predefined-bounded attitude tracking control for rigid spacecraft[J]. IEEE Transactions on Aerospace and Electronic Systems, 2022, 58(1): 464-472.
[18] LI Y, WANG T Q, LIU X X, et al. Predefined-time active fault-tolerant control of transport aircraft subject to control surface failures[J] IEEE Transactions on Aerospace and Electronic Systems, (2024-12-30) [2025-04-26]. https://ieeexplore.ieee.org/document/10818516.
[19] TIAN D, SHEN H, DAI, M. Improving the rapidity of nonlinear tracking differentiator via feedforward[J]. IEEE Transactions on Industrial Electronics, 2013, 61(7), 3736–3743.
[20] H. K. Khalil, Nonlinear System, 3rd ed. Englewood Cliffs, NJ, USA: Prentice-Hall, 2002
[21] Yu X, Wu Z J. Corrections to stochastic Barbalat’s lemma and its applications[J]. IEEE Transactions on Automatic Control, 2014, 59(5): 1386-1390.
[22] ZHANG L R, SHEN L, YANG L, et al. Penalized proximal policy optimization for safe reinforcement learning [DB/OL]. arXiv:2205.11814, 2022.
[23] JOHN S, PHILIPP M, SERGEY L et al. High-dimensional continuous control using generalized advantage estimation [DB/OL]. arXiv:1506.02438, 2016.
[24] KINGMA D P, JIMMY B. Adam: A method for stochastic optimization [DB/OL]. arXiv:1412.6980,2017.
文章导航

/