Electronics and Electrical Engineering and Control

Generating new quality flight corridor for reentry aircraft based on reinforcement learning

  • HUI Junpeng ,
  • WANG Ren ,
  • YU Qidong
Expand
  • Research and Development Center, China Academy of Launch Vehicle Technology, Beijing 100076, China

Received date: 2021-06-11

  Revised date: 2021-07-05

  Online published: 2021-08-03

Abstract

The breakthrough of artificial intelligence provides a new technical approach for the research on aircraft reentry guidance. In both of the reference trajectory tracking guidance and predictor-corrector guidance, flight corridor parameters need to be designed based on manual experience in advance. In this paper, we propose to break through the constraint of "conical" flight path envelope, which is common in traditional guidance methods, by taking the natural advantage of reinforcement learning in intelligent decision making. Under the premise of satisfying dynamic equations and hard conditions such as heating rate, load factor and dynamic pressure, a large number of "trial-and-error" interactions can be taken between the aircraft and environment. Effective reward is set by referring to the human's idea of adjusting learning strategies based on feedback. Proximal Policy Optimization (PPO) algorithm in reinforcement learning is employed to train the bank angle guidance model, so as to generate bank angle instruction online based on real-time state information. The "new quality" flight corridor is explored, which is completely different from the traditional guidance method. Monte Carlo simulation analysis verifies that the intelligent guidance technology based on reinforcement learning can fully exploit the advantage of wide range flight of aircraft and further expand the flight profile of aircraft.

Cite this article

HUI Junpeng , WANG Ren , YU Qidong . Generating new quality flight corridor for reentry aircraft based on reinforcement learning[J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2022 , 43(9) : 325960 -325960 . DOI: 10.7527/S1000-6893.2021.25960

References

[1] BAO W M. Present situation and development tendency of aerospace control techniques[J]. Acta Automatica Sinica, 2013, 39(6): 697-702 (in Chinese). 包为民. 航天飞行器控制技术研究现状与发展趋势[J]. 自动化学报, 2013, 39(6): 697-702.
[2] LU P. Entry guidance: A unified method[J]. Journal of Guidance, Control, and Dynamics, 2014, 37(3): 713-728.
[3] XUE S B, LU P. Constrained predictor-corrector entry guidance[J]. Journal of Guidance, Control, and Dynamics, 2010, 33(4): 1273-1281.
[4] SUTTON R S, BARTO A G. Reinforcement learning: An Introduction[M]. Cambridge: The MIT Press, 2011: 119-138.
[5] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529-533.
[6] LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[EB/OL]. (2019-07-05)[2021-06-15]. https://arxiv.org/abs/1509.02971.
[7] SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[EB/OL]. (2017-08-28)[2021-06-15]. https://arxiv.org/abs/1707.06347
[8] HAARNOJA T, ZHOU A, ABBEEL P, et al. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor[EB/OL]. (2018-08-08)[2021-06-15]. https://arxiv.org/abs/1801.01290
[9] CHENG L, JIANG F H, LI J F. A review on the applications of deep learning in aircraft dynamics and control[J]. Mechanics in Engineering, 2020, 42(3): 267-276 (in Chinese). 程林, 蒋方华, 李俊峰. 深度学习在飞行器动力学与控制中的应用研究综述[J]. 力学与实践, 2020, 42(3): 267-276.
[10] YU Y, WANG H L. Deep learning-based reentry predictor-corrector fault-tolerant guidance for hypersonic vehicles[J]. Acta Armamentarii, 2020, 41(4): 656-669 (in Chinese). 余跃, 王宏伦. 基于深度学习的高超声速飞行器再入预测校正容错制导[J]. 兵工学报, 2020, 41(4): 656-669.
[11] SHI Y, WANG Z B. A deep learning-based approach to real-time trajectory optimization for hypersonic vehicles[C]//AIAA Scitech 2020 Forum. Reston: AIAA, 2020.
[12] CHENG L, JIANG F H, WANG Z B, et al. Multiconstrained real-time entry guidance using deep neural networks[J]. IEEE Transactions on Aerospace and Electronic Systems, 2021, 57(1): 325-340.
[13] LI T R, YANG B, WANG R, et al. Reentry vehicle guidance method based on Q-learning algorithm[J]. Tactical Missile Technology, 2019(5): 44-49 (in Chinese). 李天任, 杨奔, 汪韧, 等. 基于Q-Learning算法的再入飞行器制导方法[J]. 战术导弹技术, 2019(5): 44-49.
[14] ZHANG Q H, AO B Q, ZHANG Q X. Reinforcement learning guidance law of Q-learning[J]. Systems Engineering and Electronics, 2020, 42(2): 414-419 (in Chinese). 张秦浩, 敖百强, 张秦雪. Q-learning强化学习制导律[J]. 系统工程与电子技术, 2020, 42(2): 414-419.
[15] GAUDET B, FURFARO R, LINARES R. Reinforcement learning for angle-only intercept guidance of maneuvering targets[J]. Aerospace Science and Technology, 2020, 99: 105746.
[16] HOVELL K, ULRICH S. Deep reinforcement learning for spacecraft proximity operations guidance[J]. Journal of Spacecraft and Rockets, 2021, 58(2): 254-264.
[17] HOVELL K, ULRICH S. On deep reinforcement learning for spacecraft guidance[C]//AIAA Scitech 2020 Forum. Reston: AIAA, 2020.
[18] GAO J S, SHI X M, CHENG Z T, et al. Reentry trajectory optimization based on deep reinforcement learning[C]//2019 Chinese Control and Decision Conference (CCDC). Piscataway: IEEE Press, 2019: 2588-2592.
[19] KOCH W, MANCUSO R, WEST R, et al. Reinforcement learning for UAV attitude control[J]. ACM Transactions on Cyber-Physical Systems, 2019, 3(2): 22.
[20] CHAI R Q, TSOURDOS A, SAVVARIS A, et al. Six-DOF spacecraft optimal trajectory planning and real-time attitude control: A deep neural network-based approach[J]. IEEE Transactions on Neural Networks and Learning Systems, 2020, 31(11): 5005-5013.
[21] FANG K, ZHANG Q Z, NI K, et al. Time-coordinated reentry guidance law for hypersonic vehicle[J]. Acta Aeronautica et Astronautica Sinica, 2018, 39(5): 321958 (in Chinese). 方科, 张庆振, 倪昆, 等. 高超声速飞行器时间协同再入制导[J]. 航空学报, 2018, 39(5): 321958.
[22] ZHOU H Y, WANG X G, SHAN Y Z, et al. Synergistic path planning for multiple vehicles based on an improved particle swarm optimization method[J/OL](2020-04-07)[2021-06-15]http://www.oas.net.cn/cn/article/doi/co.16383/j.cas.c190865. 周宏宇, 王小刚, 单永志, 等. 基于改进粒子群算法的飞行器协同轨迹规划[J/OL]. 自动化学报, (2020-04-07)[2021-06-15]http://www.oas.net.cn/cn/article/doi/co.16383/j.cas.c190865.
[23] SUTSKEVER I, MARTENS J, DAHL G, et al. On the importance of initialization and momentum in deep learning[C]//Proceedings of the 30th International Conference on International Conference on Machine Learning-Volume 28. New York: ACM, 2013: Ⅲ-1139.
[24] GLOROT X, BENGIO Y. Understanding the difficulty of training deep feedforward neural networks[C]//Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, 2010: 249-256.
[25] SUTTON R S, MCALLESTER D, SINGH S, et al. Policy gradient methods for reinforcement learning with function approximation[C]//Proceedings of the 12th International Conference on Neural Information Processing Systems. New York: ACM, 1999: 1057-1063.
[26] SCHULMAN J, LEVINE S, MORITZ P, et al. Trust region policy optimization[EB/OL]. (2017-04-20)[2015-06-15]. https://arxiv.org/abs/1502.05477.
Outlines

/