Electronics and Electrical Engineering and Control

Deep-stall recovery control based on safety-constrained reinforcement learning

  • Yu LI ,
  • Xinlong XU ,
  • Kecheng LI ,
  • Chi-yung WEN ,
  • Ni LI ,
  • Xiaoxiong LIU
Expand
  • 1.Department of Aeronautical and Aviation Engineering,The Hong Kong Polytechnic University,Hong Kong 999077,China
    2.Nanjing Research Institute of Electronic Engineering,Nanjing 210007,China
    3.College of Automation,Northwestern Polytechnical University,Xi’an 710072,China
    4.College of Aeronautics,Northwestern Polytechnical University,Xi’an 710072,China

Received date: 2025-05-12

  Revised date: 2025-07-29

  Accepted date: 2025-09-05

  Online published: 2025-09-18

Supported by

National Natural Science Foundation of China(62073266);Aeronautical Science Foundation of China(201905053003)

Abstract

To address the deep-stall recovery problem of V-tail aircraft, this paper proposes a novel deep-stall recovery hierarchical strategy that combines Penalized Proximal Policy Optimization (P3O) reinforcement learning with a fast predefined-time incremental control approach. First, a six-degree-of-freedom nonlinear model of the V-tail aircraft is established, and the deep-stall recovery problem is formulated as a constrained Markov decision process. Second, the existing predefined-time control theory is improved to enhance the transient performance of state responses under given convergence time. Based on this improved theory and a nonlinear incremental dynamic inversion method, an angular rate controller is designed, which ensures that angular rate accurately tracks the decision commands within the user-defined time. The predefined-time stability of the controller is theoretically proven via Lyapunov stability theory. Subsequently, a decision-making network based on P3O is constructed to improve the safety during deep-stall recovery, where safety constraints are incorporated as penalty terms to guide the agent in generating safe recovery actions. Finally, a series of simulations and Monte Carlo experiments are conducted to validate the proposed strategy. The results demonstrate its superior performance in terms of rapidity, robustness, safety, and interpretability.

Cite this article

Yu LI , Xinlong XU , Kecheng LI , Chi-yung WEN , Ni LI , Xiaoxiong LIU . Deep-stall recovery control based on safety-constrained reinforcement learning[J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2026 , 47(4) : 332217 -332217 . DOI: 10.7527/S1000-6893.2025.32217

References

[1] TAYLOR R T, RAY E J. Deep-stall aerodynamic characteristics of T-tail aircraft[R]. Washington, D.C.: NASA, 1965.
[2] NGUYEN D H, LOWENBERG M H, NEILD S A. Analysing dynamic deep stall recovery using a nonlinear frequency approach[J]. Nonlinear Dynamics2022108(2): 1179-1196.
[3] 陈永亮, 沈宏良, 刘昶. 飞机深失速改出特性分析与控制[J]. 南京航空航天大学学报200739(4): 435-439.
  CHEN Y L, SHEN H L, LIU C. Analysis and control of aircraft deep stall recovery characteristics[J]. Journal of Nanjing University of Aeronautics & Astronautics200739(4): 435-439 (in Chinese).
[4] 艾文磊. 歼击机深失速特性分析及改出控制研究[D]. 南京: 南京航空航天大学, 2015.
  AI W L. Characteristics analysis and recovery control for deep-stall of fighters[D]. Nanjing: Nanjing University of Aeronautics and Astronautics, 2015 (in Chinese).
[5] TRUBSHAW E B. Low speed handling with special reference to the super stall[J]. Journal of the Royal Aeronautical Society196670(667): 695-704.
[6] ILOPUTAIFE O I. Design of deep stall protection for the C-17A[J]. Journal of Guidance, Control, and Dynamics199720(4): 760-767.
[7] DEFAZIO P A, LARSEN R. Final committee report on the design, development, and certification of the boeing 737 max [EB/OL]. (2020-09-16)[2025-05-12]. .
[8] JIANG H T, XIONG H, ZENG W F, et al. Safely learn to fly aircraft from human: An offline-online reinforcement learning strategy and its application to aircraft stall recovery[J]. IEEE Transactions on Aerospace and Electronic Systems202359(6): 8194-8207.
[9] SINGH TOMAR D, GAUCI J, DINGLI A, et al. Automated aircraft stall recovery using reinforcement learning and supervised learning techniques[C]∥2021 IEEE/AIAA 40th Digital Avionics Systems Conference (DASC). Piscataway: IEEE Press, 2021.
[10] KOOPMAN C, ZAMMIT-MANGION D. Using reinforcement learning for AI systems in the mitigation of automation failures and stall recovery in complex aircraft[C]∥ AIAA SciTech 2024 Forum. Reston: AIAA, 2024.
[11] GRILLO A, TORRE G, BUNGE R. Optimal stall recovery via deep reinforcement learning for a general aviation aircraft[C]∥ AIAA SciTech 2024 Forum. Reston: AIAA, 2024.
[12] 李煜, 陈通文, 王志刚, 等. 基于预定义时间的直接升力着舰增量控制[J]. 航空学报202546(13): 531163.
  LI Y, CHEN T W, WANG Z G, et al. Incremental control of direct lift landing based on predefined-time theory[J]. Acta Aeronautica et Astronautica Sinica202546(13): 531163 (in Chinese).
[13] 罗飞, 张军红, 王博, 等. 基于直接升力与动态逆的舰尾流抑制方法[J]. 航空学报202142(12): 124770.
  LUO F, ZHANG J H, WANG B, et al. Air wake suppression method based on direct lift and nonlinear dynamic inversion control[J]. Acta Aeronautica et Astronautica Sinica202142(12): 124770 (in Chinese).
[14] KOLB S, HéTRU L, FAURE T M, et al. Nonlinear analysis and control of an aircraft in the neighbourhood of deep stall[J]. AIP Conference Proceedings20171798: 020080.
[15] LI Y, WEN C Y, LIU X X, et al. Prescribed-time fault-tolerant flight control for aircraft subject to structural damage[J]. IEEE Transactions on Aerospace and Electronic Systems202561(2): 1848-1859.
[16] 吴慈航, 闫建国, 钱先云, 等. 受油机指定时间姿态稳定控制[J]. 航空学报202243(2): 324996.
  WU C H, YAN J G, QIAN X Y, et al. Predefined-time attitude stabilization control of receiver aircraft[J]. Acta Aeronautica et Astronautica Sinica202243(2): 324996 (in Chinese).
[17] YE D, ZOU A M, SUN Z W. Predefined-time predefined-bounded attitude tracking control for rigid spacecraft[J]. IEEE Transactions on Aerospace and Electronic Systems202258(1): 464-472.
[18] LI Y, WANG T Q, LIU X X, et al. Predefined-time active fault-tolerant control of transport aircraft subject to control surface failures[J]. IEEE Transactions on Aerospace and Electronic Systems202561(3): 5731-5744.
[19] TIAN D P, SHEN H H, DAI M. Improving the rapidity of nonlinear tracking differentiator via feedforward[J]. IEEE Transactions on Industrial Electronics201461(7): 3736-3743.
[20] KHALIL H K. Nonlinear system[M]. 3rd ed. Englewood Cliffs: Prentice-Hall, 2002:5-10.
[21] YU X, WU Z J. Stochastic barbalat’s lemma and its applications[J]. IEEE Transactions on Automatic Control201257(6): 1537-1543.
[22] ZHANG L R, SHEN L, YANG L, et al. Penalized proximal policy optimization for safe reinforcement learning [EB/OL]. arXiv preprint2205.11814, 2022.
[23] JOHN S, PHILIPP M, SERGEY L, et al. High-dimensional continuous control using generalized advantage estimation [EB/OL]. arXiv preprint1506.02438, 2016.
[24] KINGMA D P, JIMMY B. Adam: A method for stochastic optimization [EB/OL]. arXiv preprint1412.6980, 2017.
[25] SCHULAMN J, MORITZ P, LEVINE S, et, al. High-dimensional continuous control using generalized advantage estimation[C]∥ International Conference on Learning Representations. New York: ACM, 2016.
Outlines

/