电子电气工程与控制

基于强化学习的再入飞行器“新质”走廊在线生成技术

  • 惠俊鹏 ,
  • 汪韧 ,
  • 俞启东
展开
  • 中国运载火箭技术研究院 研究发展部, 北京 100076

收稿日期: 2021-06-11

  修回日期: 2021-07-05

  网络出版日期: 2021-08-03

Generating new quality flight corridor for reentry aircraft based on reinforcement learning

  • HUI Junpeng ,
  • WANG Ren ,
  • YU Qidong
Expand
  • Research and Development Center, China Academy of Launch Vehicle Technology, Beijing 100076, China

Received date: 2021-06-11

  Revised date: 2021-07-05

  Online published: 2021-08-03

摘要

人工智能技术的突破性进展为飞行器再入制导技术的研究提供了新的技术途径。无论是基于标称轨迹的制导还是预测校正制导, 都需要基于人工经验设计飞行走廊参数。本文旨在突破传统制导方法中普遍存在的"锥形"飞行轨迹包络的约束, 利用强化学习技术在智能决策方面的天然优势, 在满足动力学方程和热流率、过载、动压等硬条件的前提下, 通过飞行器与环境的大量交互"试错": 一方面, 借鉴人类基于反馈来调整学习策略的思想, 设置有效的奖励(反馈)引导; 另一方面, 利用强化学习中近端策略优化(PPO)算法训练飞行器倾侧角制导模型, 基于实时的状态信息在线决策倾侧角指令, 探索出完全不同于传统制导方法的"新质"飞行走廊。Monte Carlo仿真分析验证了基于强化学习的智能制导技术能够充分发挥飞行器的宽域飞行优势, 进一步拓展飞行剖面。

本文引用格式

惠俊鹏 , 汪韧 , 俞启东 . 基于强化学习的再入飞行器“新质”走廊在线生成技术[J]. 航空学报, 2022 , 43(9) : 325960 -325960 . DOI: 10.7527/S1000-6893.2021.25960

Abstract

The breakthrough of artificial intelligence provides a new technical approach for the research on aircraft reentry guidance. In both of the reference trajectory tracking guidance and predictor-corrector guidance, flight corridor parameters need to be designed based on manual experience in advance. In this paper, we propose to break through the constraint of "conical" flight path envelope, which is common in traditional guidance methods, by taking the natural advantage of reinforcement learning in intelligent decision making. Under the premise of satisfying dynamic equations and hard conditions such as heating rate, load factor and dynamic pressure, a large number of "trial-and-error" interactions can be taken between the aircraft and environment. Effective reward is set by referring to the human's idea of adjusting learning strategies based on feedback. Proximal Policy Optimization (PPO) algorithm in reinforcement learning is employed to train the bank angle guidance model, so as to generate bank angle instruction online based on real-time state information. The "new quality" flight corridor is explored, which is completely different from the traditional guidance method. Monte Carlo simulation analysis verifies that the intelligent guidance technology based on reinforcement learning can fully exploit the advantage of wide range flight of aircraft and further expand the flight profile of aircraft.

参考文献

[1] BAO W M. Present situation and development tendency of aerospace control techniques[J]. Acta Automatica Sinica, 2013, 39(6): 697-702 (in Chinese). 包为民. 航天飞行器控制技术研究现状与发展趋势[J]. 自动化学报, 2013, 39(6): 697-702.
[2] LU P. Entry guidance: A unified method[J]. Journal of Guidance, Control, and Dynamics, 2014, 37(3): 713-728.
[3] XUE S B, LU P. Constrained predictor-corrector entry guidance[J]. Journal of Guidance, Control, and Dynamics, 2010, 33(4): 1273-1281.
[4] SUTTON R S, BARTO A G. Reinforcement learning: An Introduction[M]. Cambridge: The MIT Press, 2011: 119-138.
[5] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529-533.
[6] LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[EB/OL]. (2019-07-05)[2021-06-15]. https://arxiv.org/abs/1509.02971.
[7] SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[EB/OL]. (2017-08-28)[2021-06-15]. https://arxiv.org/abs/1707.06347
[8] HAARNOJA T, ZHOU A, ABBEEL P, et al. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor[EB/OL]. (2018-08-08)[2021-06-15]. https://arxiv.org/abs/1801.01290
[9] CHENG L, JIANG F H, LI J F. A review on the applications of deep learning in aircraft dynamics and control[J]. Mechanics in Engineering, 2020, 42(3): 267-276 (in Chinese). 程林, 蒋方华, 李俊峰. 深度学习在飞行器动力学与控制中的应用研究综述[J]. 力学与实践, 2020, 42(3): 267-276.
[10] YU Y, WANG H L. Deep learning-based reentry predictor-corrector fault-tolerant guidance for hypersonic vehicles[J]. Acta Armamentarii, 2020, 41(4): 656-669 (in Chinese). 余跃, 王宏伦. 基于深度学习的高超声速飞行器再入预测校正容错制导[J]. 兵工学报, 2020, 41(4): 656-669.
[11] SHI Y, WANG Z B. A deep learning-based approach to real-time trajectory optimization for hypersonic vehicles[C]//AIAA Scitech 2020 Forum. Reston: AIAA, 2020.
[12] CHENG L, JIANG F H, WANG Z B, et al. Multiconstrained real-time entry guidance using deep neural networks[J]. IEEE Transactions on Aerospace and Electronic Systems, 2021, 57(1): 325-340.
[13] LI T R, YANG B, WANG R, et al. Reentry vehicle guidance method based on Q-learning algorithm[J]. Tactical Missile Technology, 2019(5): 44-49 (in Chinese). 李天任, 杨奔, 汪韧, 等. 基于Q-Learning算法的再入飞行器制导方法[J]. 战术导弹技术, 2019(5): 44-49.
[14] ZHANG Q H, AO B Q, ZHANG Q X. Reinforcement learning guidance law of Q-learning[J]. Systems Engineering and Electronics, 2020, 42(2): 414-419 (in Chinese). 张秦浩, 敖百强, 张秦雪. Q-learning强化学习制导律[J]. 系统工程与电子技术, 2020, 42(2): 414-419.
[15] GAUDET B, FURFARO R, LINARES R. Reinforcement learning for angle-only intercept guidance of maneuvering targets[J]. Aerospace Science and Technology, 2020, 99: 105746.
[16] HOVELL K, ULRICH S. Deep reinforcement learning for spacecraft proximity operations guidance[J]. Journal of Spacecraft and Rockets, 2021, 58(2): 254-264.
[17] HOVELL K, ULRICH S. On deep reinforcement learning for spacecraft guidance[C]//AIAA Scitech 2020 Forum. Reston: AIAA, 2020.
[18] GAO J S, SHI X M, CHENG Z T, et al. Reentry trajectory optimization based on deep reinforcement learning[C]//2019 Chinese Control and Decision Conference (CCDC). Piscataway: IEEE Press, 2019: 2588-2592.
[19] KOCH W, MANCUSO R, WEST R, et al. Reinforcement learning for UAV attitude control[J]. ACM Transactions on Cyber-Physical Systems, 2019, 3(2): 22.
[20] CHAI R Q, TSOURDOS A, SAVVARIS A, et al. Six-DOF spacecraft optimal trajectory planning and real-time attitude control: A deep neural network-based approach[J]. IEEE Transactions on Neural Networks and Learning Systems, 2020, 31(11): 5005-5013.
[21] FANG K, ZHANG Q Z, NI K, et al. Time-coordinated reentry guidance law for hypersonic vehicle[J]. Acta Aeronautica et Astronautica Sinica, 2018, 39(5): 321958 (in Chinese). 方科, 张庆振, 倪昆, 等. 高超声速飞行器时间协同再入制导[J]. 航空学报, 2018, 39(5): 321958.
[22] ZHOU H Y, WANG X G, SHAN Y Z, et al. Synergistic path planning for multiple vehicles based on an improved particle swarm optimization method[J/OL](2020-04-07)[2021-06-15]http://www.oas.net.cn/cn/article/doi/co.16383/j.cas.c190865. 周宏宇, 王小刚, 单永志, 等. 基于改进粒子群算法的飞行器协同轨迹规划[J/OL]. 自动化学报, (2020-04-07)[2021-06-15]http://www.oas.net.cn/cn/article/doi/co.16383/j.cas.c190865.
[23] SUTSKEVER I, MARTENS J, DAHL G, et al. On the importance of initialization and momentum in deep learning[C]//Proceedings of the 30th International Conference on International Conference on Machine Learning-Volume 28. New York: ACM, 2013: Ⅲ-1139.
[24] GLOROT X, BENGIO Y. Understanding the difficulty of training deep feedforward neural networks[C]//Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, 2010: 249-256.
[25] SUTTON R S, MCALLESTER D, SINGH S, et al. Policy gradient methods for reinforcement learning with function approximation[C]//Proceedings of the 12th International Conference on Neural Information Processing Systems. New York: ACM, 1999: 1057-1063.
[26] SCHULMAN J, LEVINE S, MORITZ P, et al. Trust region policy optimization[EB/OL]. (2017-04-20)[2015-06-15]. https://arxiv.org/abs/1502.05477.
文章导航

/