电子电气工程与控制

基于安全约束强化学习的深失速改出控制

  • 李煜 ,
  • 徐新龙 ,
  • 李珂澄 ,
  • 溫志湧 ,
  • 李霓 ,
  • 刘小雄
展开
  • 1.香港理工大学 航空及民航工程学系,香港 999077
    2.南京电子工程研究所,南京 210007
    3.西北工业大学 自动化学院,西安 710072
    4.西北工业大学 航空学院,西安 710072

收稿日期: 2025-05-12

  修回日期: 2025-07-29

  录用日期: 2025-09-05

  网络出版日期: 2025-09-18

基金资助

国家自然科学基金(62073266);航空科学基金(201905053003)

Deep-stall recovery control based on safety-constrained reinforcement learning

  • Yu LI ,
  • Xinlong XU ,
  • Kecheng LI ,
  • Chi-yung WEN ,
  • Ni LI ,
  • Xiaoxiong LIU
Expand
  • 1.Department of Aeronautical and Aviation Engineering,The Hong Kong Polytechnic University,Hong Kong 999077,China
    2.Nanjing Research Institute of Electronic Engineering,Nanjing 210007,China
    3.College of Automation,Northwestern Polytechnical University,Xi’an 710072,China
    4.College of Aeronautics,Northwestern Polytechnical University,Xi’an 710072,China

Received date: 2025-05-12

  Revised date: 2025-07-29

  Accepted date: 2025-09-05

  Online published: 2025-09-18

Supported by

National Natural Science Foundation of China(62073266);Aeronautical Science Foundation of China(201905053003)

摘要

针对V尾飞机的深失速改出问题,结合带惩罚的近端策略优化(P3O)学习和快速预定义时间的增量控制方法,提出了一种分层结构的深失速安全改出策略。首先,建立了V尾飞机的六自由度非线性运动方程,并将深失速安全改出问题转化为带有约束的马尔科夫决策过程;其次,改进了现有的预定义时间控制理论,提升了系统状态在预定义收敛时间内的响应速度,并基于改进后的理论和非线性增量动态逆控制方法设计了角速度控制器,确保角速度能够在预定义时间内快速跟踪上决策指令,并通过Lyapunov稳定理论证明了其预定义时间的稳定性;随后,结合深失速改出过程中飞机的安全需求,构建了基于惩罚近端策略优化的强化学习决策网络,将安全约束转化为惩罚项,从而实现深失速改出的安全决策;最后,通过一系列仿真和蒙特卡洛实验对所提出深失速安全改出策略进行了验证,结果表明该策略在快速性、鲁棒性、安全性以及策略合理性方面具有显著优势,并且展示出良好的应用潜力。

本文引用格式

李煜 , 徐新龙 , 李珂澄 , 溫志湧 , 李霓 , 刘小雄 . 基于安全约束强化学习的深失速改出控制[J]. 航空学报, 2026 , 47(4) : 332217 -332217 . DOI: 10.7527/S1000-6893.2025.32217

Abstract

To address the deep-stall recovery problem of V-tail aircraft, this paper proposes a novel deep-stall recovery hierarchical strategy that combines Penalized Proximal Policy Optimization (P3O) reinforcement learning with a fast predefined-time incremental control approach. First, a six-degree-of-freedom nonlinear model of the V-tail aircraft is established, and the deep-stall recovery problem is formulated as a constrained Markov decision process. Second, the existing predefined-time control theory is improved to enhance the transient performance of state responses under given convergence time. Based on this improved theory and a nonlinear incremental dynamic inversion method, an angular rate controller is designed, which ensures that angular rate accurately tracks the decision commands within the user-defined time. The predefined-time stability of the controller is theoretically proven via Lyapunov stability theory. Subsequently, a decision-making network based on P3O is constructed to improve the safety during deep-stall recovery, where safety constraints are incorporated as penalty terms to guide the agent in generating safe recovery actions. Finally, a series of simulations and Monte Carlo experiments are conducted to validate the proposed strategy. The results demonstrate its superior performance in terms of rapidity, robustness, safety, and interpretability.

参考文献

[1] TAYLOR R T, RAY E J. Deep-stall aerodynamic characteristics of T-tail aircraft[R]. Washington, D.C.: NASA, 1965.
[2] NGUYEN D H, LOWENBERG M H, NEILD S A. Analysing dynamic deep stall recovery using a nonlinear frequency approach[J]. Nonlinear Dynamics2022108(2): 1179-1196.
[3] 陈永亮, 沈宏良, 刘昶. 飞机深失速改出特性分析与控制[J]. 南京航空航天大学学报200739(4): 435-439.
  CHEN Y L, SHEN H L, LIU C. Analysis and control of aircraft deep stall recovery characteristics[J]. Journal of Nanjing University of Aeronautics & Astronautics200739(4): 435-439 (in Chinese).
[4] 艾文磊. 歼击机深失速特性分析及改出控制研究[D]. 南京: 南京航空航天大学, 2015.
  AI W L. Characteristics analysis and recovery control for deep-stall of fighters[D]. Nanjing: Nanjing University of Aeronautics and Astronautics, 2015 (in Chinese).
[5] TRUBSHAW E B. Low speed handling with special reference to the super stall[J]. Journal of the Royal Aeronautical Society196670(667): 695-704.
[6] ILOPUTAIFE O I. Design of deep stall protection for the C-17A[J]. Journal of Guidance, Control, and Dynamics199720(4): 760-767.
[7] DEFAZIO P A, LARSEN R. Final committee report on the design, development, and certification of the boeing 737 max [EB/OL]. (2020-09-16)[2025-05-12]. .
[8] JIANG H T, XIONG H, ZENG W F, et al. Safely learn to fly aircraft from human: An offline-online reinforcement learning strategy and its application to aircraft stall recovery[J]. IEEE Transactions on Aerospace and Electronic Systems202359(6): 8194-8207.
[9] SINGH TOMAR D, GAUCI J, DINGLI A, et al. Automated aircraft stall recovery using reinforcement learning and supervised learning techniques[C]∥2021 IEEE/AIAA 40th Digital Avionics Systems Conference (DASC). Piscataway: IEEE Press, 2021.
[10] KOOPMAN C, ZAMMIT-MANGION D. Using reinforcement learning for AI systems in the mitigation of automation failures and stall recovery in complex aircraft[C]∥ AIAA SciTech 2024 Forum. Reston: AIAA, 2024.
[11] GRILLO A, TORRE G, BUNGE R. Optimal stall recovery via deep reinforcement learning for a general aviation aircraft[C]∥ AIAA SciTech 2024 Forum. Reston: AIAA, 2024.
[12] 李煜, 陈通文, 王志刚, 等. 基于预定义时间的直接升力着舰增量控制[J]. 航空学报202546(13): 531163.
  LI Y, CHEN T W, WANG Z G, et al. Incremental control of direct lift landing based on predefined-time theory[J]. Acta Aeronautica et Astronautica Sinica202546(13): 531163 (in Chinese).
[13] 罗飞, 张军红, 王博, 等. 基于直接升力与动态逆的舰尾流抑制方法[J]. 航空学报202142(12): 124770.
  LUO F, ZHANG J H, WANG B, et al. Air wake suppression method based on direct lift and nonlinear dynamic inversion control[J]. Acta Aeronautica et Astronautica Sinica202142(12): 124770 (in Chinese).
[14] KOLB S, HéTRU L, FAURE T M, et al. Nonlinear analysis and control of an aircraft in the neighbourhood of deep stall[J]. AIP Conference Proceedings20171798: 020080.
[15] LI Y, WEN C Y, LIU X X, et al. Prescribed-time fault-tolerant flight control for aircraft subject to structural damage[J]. IEEE Transactions on Aerospace and Electronic Systems202561(2): 1848-1859.
[16] 吴慈航, 闫建国, 钱先云, 等. 受油机指定时间姿态稳定控制[J]. 航空学报202243(2): 324996.
  WU C H, YAN J G, QIAN X Y, et al. Predefined-time attitude stabilization control of receiver aircraft[J]. Acta Aeronautica et Astronautica Sinica202243(2): 324996 (in Chinese).
[17] YE D, ZOU A M, SUN Z W. Predefined-time predefined-bounded attitude tracking control for rigid spacecraft[J]. IEEE Transactions on Aerospace and Electronic Systems202258(1): 464-472.
[18] LI Y, WANG T Q, LIU X X, et al. Predefined-time active fault-tolerant control of transport aircraft subject to control surface failures[J]. IEEE Transactions on Aerospace and Electronic Systems202561(3): 5731-5744.
[19] TIAN D P, SHEN H H, DAI M. Improving the rapidity of nonlinear tracking differentiator via feedforward[J]. IEEE Transactions on Industrial Electronics201461(7): 3736-3743.
[20] KHALIL H K. Nonlinear system[M]. 3rd ed. Englewood Cliffs: Prentice-Hall, 2002:5-10.
[21] YU X, WU Z J. Stochastic barbalat’s lemma and its applications[J]. IEEE Transactions on Automatic Control201257(6): 1537-1543.
[22] ZHANG L R, SHEN L, YANG L, et al. Penalized proximal policy optimization for safe reinforcement learning [EB/OL]. arXiv preprint2205.11814, 2022.
[23] JOHN S, PHILIPP M, SERGEY L, et al. High-dimensional continuous control using generalized advantage estimation [EB/OL]. arXiv preprint1506.02438, 2016.
[24] KINGMA D P, JIMMY B. Adam: A method for stochastic optimization [EB/OL]. arXiv preprint1412.6980, 2017.
[25] SCHULAMN J, MORITZ P, LEVINE S, et, al. High-dimensional continuous control using generalized advantage estimation[C]∥ International Conference on Learning Representations. New York: ACM, 2016.
文章导航

/