Deep-stall recovery control based on safety-constrained reinforcement learning

Yu LI; Xinlong XU; Kecheng LI; Chi-yung WEN; Ni LI; Xiaoxiong LIU

doi:10.7527/S1000-6893.2025.32217

ACTA AERONAUTICAET ASTRONAUTICA SINICA >

2026 , Vol. 47 >Issue 4: 332217 - 332217

DOI: https://doi.org/10.7527/S1000-6893.2025.32217

Electronics and Electrical Engineering and Control

Deep-stall recovery control based on safety-constrained reinforcement learning

Yu LI ,
Xinlong XU ,
Kecheng LI ,
Chi-yung WEN ,
Ni LI ,
Xiaoxiong LIU

Expand

^1.Department of Aeronautical and Aviation Engineering，The Hong Kong Polytechnic University，Hong Kong 999077，China
^2.Nanjing Research Institute of Electronic Engineering，Nanjing 210007，China
^3.College of Automation，Northwestern Polytechnical University，Xi’an 710072，China
^4.College of Aeronautics，Northwestern Polytechnical University，Xi’an 710072，China

E-mail： liuxiaoxiong@nwpu.edu.cn

Received date: 2025-05-12

Revised date: 2025-07-29

Accepted date: 2025-09-05

Online published: 2025-09-18

Supported by

National Natural Science Foundation of China(62073266);Aeronautical Science Foundation of China(201905053003)

Fold

Abstract

To address the deep-stall recovery problem of V-tail aircraft， this paper proposes a novel deep-stall recovery hierarchical strategy that combines Penalized Proximal Policy Optimization （P3O） reinforcement learning with a fast predefined-time incremental control approach. First， a six-degree-of-freedom nonlinear model of the V-tail aircraft is established， and the deep-stall recovery problem is formulated as a constrained Markov decision process. Second， the existing predefined-time control theory is improved to enhance the transient performance of state responses under given convergence time. Based on this improved theory and a nonlinear incremental dynamic inversion method， an angular rate controller is designed， which ensures that angular rate accurately tracks the decision commands within the user-defined time. The predefined-time stability of the controller is theoretically proven via Lyapunov stability theory. Subsequently， a decision-making network based on P3O is constructed to improve the safety during deep-stall recovery， where safety constraints are incorporated as penalty terms to guide the agent in generating safe recovery actions. Finally， a series of simulations and Monte Carlo experiments are conducted to validate the proposed strategy. The results demonstrate its superior performance in terms of rapidity， robustness， safety， and interpretability.

Key words： V-tail aircraft; deep-stall recovery; penalized proximal policy optimization; fast predefined-time control; state safety constraints

Cite this article

Yu LI , Xinlong XU , Kecheng LI , Chi-yung WEN , Ni LI , Xiaoxiong LIU . Deep-stall recovery control based on safety-constrained reinforcement learning[J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2026 , 47(4) : 332217 -332217 . DOI: 10.7527/S1000-6893.2025.32217

References

[1]	TAYLOR R T， RAY E J. Deep-stall aerodynamic characteristics of T-tail aircraft［R］. Washington， D.C.： NASA， 1965.
[2]	NGUYEN D H， LOWENBERG M H， NEILD S A. Analysing dynamic deep stall recovery using a nonlinear frequency approach［J］. Nonlinear Dynamics， 2022， 108（2）： 1179-1196.
[3]	陈永亮，沈宏良，刘昶. 飞机深失速改出特性分析与控制［J］. 南京航空航天大学学报， 2007， 39（4）： 435-439.
	CHEN Y L， SHEN H L， LIU C. Analysis and control of aircraft deep stall recovery characteristics［J］. Journal of Nanjing University of Aeronautics & Astronautics， 2007， 39（4）： 435-439 （in Chinese）.
[4]	艾文磊. 歼击机深失速特性分析及改出控制研究［D］. 南京：南京航空航天大学， 2015.
	AI W L. Characteristics analysis and recovery control for deep-stall of fighters［D］. Nanjing： Nanjing University of Aeronautics and Astronautics， 2015 （in Chinese）.
[5]	TRUBSHAW E B. Low speed handling with special reference to the super stall［J］. Journal of the Royal Aeronautical Society， 1966， 70（667）： 695-704.
[6]	ILOPUTAIFE O I. Design of deep stall protection for the C-17A［J］. Journal of Guidance， Control， and Dynamics， 1997， 20（4）： 760-767.
[7]	DEFAZIO P A， LARSEN R. Final committee report on the design， development， and certification of the boeing 737 max ［EB/OL］. （2020-09-16）［2025-05-12］. .
[8]	JIANG H T， XIONG H， ZENG W F， et al. Safely learn to fly aircraft from human： An offline-online reinforcement learning strategy and its application to aircraft stall recovery［J］. IEEE Transactions on Aerospace and Electronic Systems， 2023， 59（6）： 8194-8207.
[9]	SINGH TOMAR D， GAUCI J， DINGLI A， et al. Automated aircraft stall recovery using reinforcement learning and supervised learning techniques［C］∥2021 IEEE/AIAA 40th Digital Avionics Systems Conference （DASC）. Piscataway： IEEE Press， 2021.
[10]	KOOPMAN C， ZAMMIT-MANGION D. Using reinforcement learning for AI systems in the mitigation of automation failures and stall recovery in complex aircraft［C］∥ AIAA SciTech 2024 Forum. Reston： AIAA， 2024.
[11]	GRILLO A， TORRE G， BUNGE R. Optimal stall recovery via deep reinforcement learning for a general aviation aircraft［C］∥ AIAA SciTech 2024 Forum. Reston： AIAA， 2024.
[12]	李煜，陈通文，王志刚，等. 基于预定义时间的直接升力着舰增量控制［J］. 航空学报， 2025， 46（13）： 531163.
	LI Y， CHEN T W， WANG Z G， et al. Incremental control of direct lift landing based on predefined-time theory［J］. Acta Aeronautica et Astronautica Sinica， 2025， 46（13）： 531163 （in Chinese）.
[13]	罗飞，张军红，王博，等. 基于直接升力与动态逆的舰尾流抑制方法［J］. 航空学报， 2021， 42（12）： 124770.
	LUO F， ZHANG J H， WANG B， et al. Air wake suppression method based on direct lift and nonlinear dynamic inversion control［J］. Acta Aeronautica et Astronautica Sinica， 2021， 42（12）： 124770 （in Chinese）.
[14]	KOLB S， HéTRU L， FAURE T M， et al. Nonlinear analysis and control of an aircraft in the neighbourhood of deep stall［J］. AIP Conference Proceedings， 2017， 1798： 020080.
[15]	LI Y， WEN C Y， LIU X X， et al. Prescribed-time fault-tolerant flight control for aircraft subject to structural damage［J］. IEEE Transactions on Aerospace and Electronic Systems， 2025， 61（2）： 1848-1859.
[16]	吴慈航，闫建国，钱先云，等. 受油机指定时间姿态稳定控制［J］. 航空学报， 2022， 43（2）： 324996.
	WU C H， YAN J G， QIAN X Y， et al. Predefined-time attitude stabilization control of receiver aircraft［J］. Acta Aeronautica et Astronautica Sinica， 2022， 43（2）： 324996 （in Chinese）.
[17]	YE D， ZOU A M， SUN Z W. Predefined-time predefined-bounded attitude tracking control for rigid spacecraft［J］. IEEE Transactions on Aerospace and Electronic Systems， 2022， 58（1）： 464-472.
[18]	LI Y， WANG T Q， LIU X X， et al. Predefined-time active fault-tolerant control of transport aircraft subject to control surface failures［J］. IEEE Transactions on Aerospace and Electronic Systems， 2025， 61（3）： 5731-5744.
[19]	TIAN D P， SHEN H H， DAI M. Improving the rapidity of nonlinear tracking differentiator via feedforward［J］. IEEE Transactions on Industrial Electronics， 2014， 61（7）： 3736-3743.
[20]	KHALIL H K. Nonlinear system［M］. 3rd ed. Englewood Cliffs： Prentice-Hall， 2002：5-10.
[21]	YU X， WU Z J. Stochastic barbalat’s lemma and its applications［J］. IEEE Transactions on Automatic Control， 2012， 57（6）： 1537-1543.
[22]	ZHANG L R， SHEN L， YANG L， et al. Penalized proximal policy optimization for safe reinforcement learning ［EB/OL］. arXiv preprint： 2205.11814， 2022.
[23]	JOHN S， PHILIPP M， SERGEY L， et al. High-dimensional continuous control using generalized advantage estimation ［EB/OL］. arXiv preprint： 1506.02438， 2016.
[24]	KINGMA D P， JIMMY B. Adam： A method for stochastic optimization ［EB/OL］. arXiv preprint： 1412.6980， 2017.
[25]	SCHULAMN J， MORITZ P， LEVINE S， et， al. High-dimensional continuous control using generalized advantage estimation［C］∥ International Conference on Learning Representations. New York： ACM， 2016.

Options

Outlines

模态框（Modal）标题

Abstract

Cite this article

References