电子电气工程与控制

基于强化学习的禁飞区绕飞智能制导技术

  • 惠俊鹏 ,
  • 汪韧 ,
  • 郭继峰
展开
  • 1.哈尔滨工业大学 航天学院,哈尔滨 150006
    2.中国航天科技创新研究院,北京 100176
.E-mail: hjpbuaa@126.com

收稿日期: 2022-05-11

  修回日期: 2022-12-08

  录用日期: 2023-01-17

  网络出版日期: 2023-02-06

基金资助

国家级项目

Intelligent guidance for no⁃fly zone avoidance based on reinforcement learning

  • Junpeng HUI ,
  • Ren WANG ,
  • Jifeng GUO
Expand
  • 1.School of Astronautics,Harbin Institute of Technology,Harbin 150006,China
    2.China Academy of Aerospace Science and Innovation,Beijing 100176,China
E-mail: hjpbuaa@126.com

Received date: 2022-05-11

  Revised date: 2022-12-08

  Accepted date: 2023-01-17

  Online published: 2023-02-06

Supported by

National Level project

摘要

人工智能(AI)的快速发展为飞行器制导技术的研究提供新的技术途径。本文针对高速飞行器面临不确定禁飞区的绕飞问题,提出“预测校正制导—基于监督学习预训练倾侧角制导模型—基于强化学习进一步升级倾侧角制导模型”逐级递进的禁飞区绕飞智能制导研究框架:一是基于传统预测校正制导生成大量禁飞区绕飞样本轨迹,并基于监督学习方法对倾侧角制导模型进行预训练;二是进一步采用强化学习中近端策略优化算法(PPO)升级倾侧角制导模型,通过飞行器与带有不确定禁飞区环境的大量交互探索,并设置有效的奖励引导,充分挖掘高升阻比飞行器强大的横向机动能力,摆脱传统预测校正制导方法对倾侧角解空间的约束,期望产生更优的绕飞策略。通过与传统预测校正制导和基于监督学习的智能制导的对比分析,验证了基于强化学习的禁飞区绕飞智能制导技术能够充分发挥飞行器的宽域飞行优势,满足未来飞行器智能决策系统对不确定绕飞场景的适应性需求。

本文引用格式

惠俊鹏 , 汪韧 , 郭继峰 . 基于强化学习的禁飞区绕飞智能制导技术[J]. 航空学报, 2023 , 44(11) : 327416 -327416 . DOI: 10.7527/S1000-6893.2022.27416

Abstract

The rapid development of Artificial Intelligence (AI) provides a new technical approach for the research of aircraft guidance. Aiming at the problem of reentry aircraft for avoiding uncertain no-fly zone, we propose the research frame of “predictor-corrector guidance-pre-training of bank angle guidance model based on supervised learning-further training of bank angle guidance model based on reinforcement learning”. On the one hand, lots of flying trajectory for avoiding no-fly zone are produced by predictor-corrector guidance. The bank angle guidance model is pre-trained with supervised learning algorithm. On the other hand, the bank angle guidance model is further trained by the use of Proximal Policy Optimization (PPO) algorithm. A large number of exploration interactions are taken between aircraft and environment with uncertain no-fly-zone. At the same time, the powerful lateral maneuverability of high lift-drag ratio reentry aircraft is exploited with effective reward. Such method will get rid of restriction of bank angle solution space produced by predictor-corrector guidance, which is expected to produce better strategy for avoiding no-fly zone. By comparing with traditional predictor-corrector guidance and intelligent guidance based on supervised learning, it is verified that the no-fly zone intelligent guidance technology based on reinforcement learning can fully exploit the wide area flight advantages of aircraft, so as to meet the adaptability requirements of future intelligent decision system under uncertain scenarios.

参考文献

1 包为民. 航天飞行器控制技术研究现状与发展趋势[J]. 自动化学报201339(6): 697-702.
  BAO W M. Present situation and development tendency of aerospace control techniques[J]. Acta Automatica Sinica201339(6): 697-702 (in Chinese).
2 高长生, 陈尔康, 荆武兴. 高超声速飞行器机动规避轨迹优化[J]. 哈尔滨工业大学学报201749(4): 16-21.
  GAO C S, CHEN E K, JING W X. Maneuver evasion trajectory optimization for hypersonic vehicles[J]. Journal of Harbin Institute of Technology201749(4): 16-21 (in Chinese).
3 李柯, 聂万胜, 冯必鸣. 助推-滑翔飞行器规避能力研究[J]. 飞行力学201331(2): 148-151, 156.
  LI K, NIE W S, FENG B M. Research on elusion capability of boost-glide vehicle[J]. Flight Dynamics201331(2): 148-151, 156 (in Chinese).
4 卢青, 周军, 周敏. 考虑禁飞区的高超声速飞行器再入制导[J]. 西北工业大学学报201735(5): 749-754.
  LU Q, ZHOU J, ZHOU M. Reentry guidance for hypersonic vehicle considering no-fly zone[J]. Journal of Northwestern Polytechnical University201735(5): 749-754 (in Chinese).
5 高兴, 张璐, 韦常柱. 面向禁飞区约束的再入滑翔飞行器快速轨迹规划[J]. 战术导弹技术2018(5): 62-67, 94.
  GAO X, ZHANG L, WEI C Z. Rapid trajectory planning for reentry glide vehicle satisfying no-fly zone constraint[J]. Tactical Missile Technology2018(5): 62-67, 94 (in Chinese).
6 赵江, 周锐, 张超. 考虑禁飞区规避的预测校正再入制导方法[J]. 北京航空航天大学学报201541(5): 864-870.
  ZHAO J, ZHOU R, ZHANG C. Predictor-corrector reentry guidance satisfying no-fly zone constraints[J]. Journal of Beijing University of Aeronautics and Astronautics201541(5): 864-870 (in Chinese).
7 LIANG Z X, LIU S Y, LI Q D, et al. Lateral entry guidance with no-fly zone constraint[J]. Aerospace Science and Technology201760: 39-47.
8 ZHANG D, LIU L, WANG Y J. On-line reentry guidance algorithm with both path and no-fly zone constraints[J]. Acta Astronautica2015117: 243-253.
9 赵亮博, 徐玮, 董超, 等. 基于虚拟目标导引的再入飞行器禁飞区规避制导方法研究[J]. 中国科学: 物理学 力学 天文学202151(10): 65-74.
  ZHAO L B, XU W, DONG C, et al. Evasion guidance of re-entry vehicle satisfying no-fly zone constraints based on virtual goals[J]. Scientia Sinica (Physica, Mechanica & Astronomica), 202151(10): 65-74 (in Chinese).
10 章吉力, 周大鹏, 杨大鹏, 等. 禁飞区影响下的空天飞机可达区域计算方法[J]. 航空学报202142(8): 525771.
  ZHANG J L, ZHOU D P, YANG D P, et al. Computation method for reachable domain of aerospace plane under the influence of no-fly zone[J]. Acta Aeronautica et Astronautica Sinica202142(8): 525771 (in Chinese).
11 章吉力, 刘凯, 樊雅卓, 等. 考虑禁飞区规避的空天飞行器分段预测校正再入制导方法[J]. 宇航学报202142(1): 122-131.
  ZHANG J L, LIU K, FAN Y Z, et al. A piecewise predictor-corrector re-entry guidance algorithm with no-fly zone avoidance[J]. Journal of Astronautics202142(1): 122-131 (in Chinese).
12 LIANG Z X, REN Z. Tentacle-based guidance for entry flight with no-fly zone constraint[J]. Journal of Guidance, Control, and Dynamics201841(4): 996-1005.
13 高杨, 蔡光斌, 徐慧, 等. 虚拟多触角探测的高超声速滑翔飞行器再入机动制导[J]. 航空学报202041(11): 623703.
  GAO Y, CAI G B, XU H, et al. Reentry maneuver guidance of hypersonic glide vehicle under virtual multi-tentacle detection[J]. Acta Aeronautica et Astronautica Sinica202041(11): 623703 (in Chinese).
14 LI Z H, YANG X J, SUN X D, et al. Improved artificial potential field based lateral entry guidance for waypoints passage and no-fly zones avoidance[J]. Aerospace Science and Technology201986: 119-131.
15 YU W B, CHEN W C, JIANG Z G, et al. Analytical entry guidance for no-fly-zone avoidance[J]. Aerospace Science and Technology201872: 426-442.
16 SUTTON R S, BARTO A G. Reinforcement learning: An introduction[M]. Cambridge: MIT Press, 2011: 119-138.
17 MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature2015518(7540): 529-533.
18 LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[DB/OL]. arXiv perprint: 1509.02971, 2015.
19 HAARNOJA T, ZHOU A, ABBEEL P, et al. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor[DB/OL]. arXiv preprint1801.01290, 2018.
20 SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[DB/OL]. arXiv preprint: 1707.06347, 2017.
21 张秦浩, 敖百强, 张秦雪. Q-learning强化学习制导律[J]. 系统工程与电子技术202042(2): 414-419.
  ZHANG Q H, AO B Q, ZHANG Q X. Reinforcement learning guidance law of Q-learning[J]. Systems Engineering and Electronics202042(2): 414-419 (in Chinese).
22 GAUDET B, FURFARO R, LINARES R. Reinforcement learning for angle-only intercept guidance of maneuvering targets[DB/OL]. arXiv preprint1906.02113, 2019.
23 HOVELL K, ULRICH S. Deep reinforcement learning for spacecraft proximity operations guidance[J]. Journal of Spacecraft and Rockets202158(2): 254-264.
24 HOVELL K, ULRICH S. On deep reinforcement learning for spacecraft guidance: AIAA-2020-1600[R]. Reston: AIAA, 2020.
25 郭冬子, 黄荣, 许河川, 等. 再入飞行器深度确定性策略梯度制导方法研究[J/OL]. 系统工程与电子技术, (2021-09-29) [2022-05-11]. .
  GUO D Z, HUANG R, XU H C, et al. Research on deep deterministic policy gradient reinforcement learning guidance method for reentry vehicle[J/OL]. Systems Engineering and Electronics, (2021-09-29) [2022-05-11]. .
26 刘扬, 何泽众, 王春宇, 等. 基于DDPG算法的末制导律设计研究[J]. 计算机学报202144(9): 1854-1865.
  LIU Y, HE Z Z, WANG C Y, et al. Terminal guidance law design based on DDPG algorithm[J]. Chinese Journal of Computers202144(9): 1854-1865 (in Chinese).
27 张晚晴, 余文斌, 李静琳, 等. 基于纵程解析解的飞行器智能横程机动再入协同制导[J]. 兵工学报202142(7): 1400-1411.
  ZHANG W Q, YU W B, LI J L, et al. Cooperative reentry guidance for intelligent lateral maneuver of hypersonic vehicle based on downrange analytical solution[J]. Acta Armamentarii202142(7): 1400-1411 (in Chinese).
28 CHAI R Q, TSOURDOS A, SAVVARIS A, et al. Six-DOF spacecraft optimal trajectory planning and real-time attitude control: A deep neural network-based approach[J]. IEEE Transactions on Neural Networks and Learning Systems202031(11): 5005-5013.
29 黄旭, 柳嘉润, 贾晨辉, 等. 深度确定性策略梯度算法用于无人飞行器控制[J]. 航空学报202142(11): 524688.
  HUANG X, LIU J R, JIA C H, et al. Deep deterministic policy gradient algorithm for UAV control[J]. Acta Aeronautica et Astronautica Sinica202142(11): 524688 (in Chinese).
30 裴培, 何绍溟, 王江, 等. 一种深度强化学习制导控制一体化算法[J]. 宇航学报202142(10): 1293-1304.
  PEI P, HE S M, WANG J, et al. Integrated guidance and control for missile using deep reinforcement learning[J]. Journal of Astronautics202142(10): 1293-1304 (in Chinese).
31 郭继峰, 陈宇燊, 白成超. 基于强化学习的在轨目标逼近[J]. 航天控制202139(5): 44-50.
  GUO J F, CHEN Y S, BAI C C. On-orbit target approach based on reinforcement learning[J]. Aerospace Control202139(5): 44-50 (in Chinese).
32 惠俊鹏, 汪韧, 俞启东. 基于强化学习的再入飞行器“新质”走廊在线生成技术[J]. 航空学报202243(9): 325960.
  HUI J P, WANG R, YU Q D. Generating new quality flight corridor for reentry aircraft based on reinforcement learning[J]. Acta Aeronautica et Astronautica Sinica202243(9): 325960 (in Chinese).
33 SILVER D, HUANG A, MADDISON C J, et al. Mastering the game of go with deep neural networks and tree search[J]. Nature2016529(7587): 484-489.
34 SUTSKEVER I, MARTENS J, DAHL G, et al. On the importance of initialization and momentum in deep learning[C]∥ Proceedings of the 30th International Conference on International Conference on Machine Learning-Volume 28. New York: ACM, 2013: 1139-1147.
35 HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation19979(8): 1735-1780.
36 汪韧, 惠俊鹏, 俞启东, 等. 基于LSTM模型的飞行器智能制导技术研究[J]. 力学学报202153(7): 2047-2057.
  WANG R, HUI J P, YU Q D, et al. Research of LSTM model-based intelligent guidance of flight aircraft[J]. Chinese Journal of Theoretical and Applied Mechanics202153(7): 2047-2057 (in Chinese).
文章导航

/