电子电气工程与控制

拦截机动目标的信赖域策略优化制导算法

  • 陈文雪 ,
  • 高长生 ,
  • 荆武兴
展开
  • 哈尔滨工业大学 航天学院,哈尔滨 150001
.E-mail: gaocs@hit.edu.cn

收稿日期: 2022-06-09

  修回日期: 2022-06-21

  录用日期: 2022-07-21

  网络出版日期: 2022-07-25

基金资助

国家自然科学基金(12072090)

Trust region policy optimization guidance algorithm for intercepting maneuvering target

  • Wenxue CHEN ,
  • Changsheng GAO ,
  • Wuxing JING
Expand
  • School of Astronautics,Harbin Institute of Technology,Harbin 150001,China
E-mail: gaocs@hit.edu.cn

Received date: 2022-06-09

  Revised date: 2022-06-21

  Accepted date: 2022-07-21

  Online published: 2022-07-25

Supported by

National Natural Science Foundation of China(12072090)

摘要

针对临近空间高超声速飞行器的高速性、机动性等特性,为提高制导算法针对不同初始状态、不同机动性目标的准确性、鲁棒性及智能性,提出一种基于信赖域策略优化(TRPO)算法的深度强化学习制导算法。基于TRPO算法的制导算法由2个策略(动作)网络、1个评价网络共同组成,将临近空间目标与拦截弹相对运动系统状态以端对端的方式直接映射为制导指令。在算法训练过程中合理选取连续动作空间、状态空间、并通过权衡能量消耗、相对距离等因素构建奖励函数加快其收敛速度,最终依据训练的智能体模型针对不同任务场景进行拦截测试。仿真结果表明:与传统比例导引律(PN)及改进比例导引律(IPN)相比,本文算法针对学习场景及未知场景均具有更小的脱靶量、更稳定的拦截效果、鲁棒性,并能够在多种配置计算机上广泛应用。

本文引用格式

陈文雪 , 高长生 , 荆武兴 . 拦截机动目标的信赖域策略优化制导算法[J]. 航空学报, 2023 , 44(11) : 327596 -327596 . DOI: 10.7527/S1000-6893.2022.27596

Abstract

Considering the characteristics of high speed and maneuverability of hypersonic vehicles in near-space, this paper proposes a deep reinforcement learning guidance algorithm based on the Trust Region Policy Optimization (TRPO) algorithm to improve the accuracy, robustness, and intelligence of the guidance algorithm for intercepting targets with different initial states and different maneuverability modes. The guidance algorithm based on the TRPO algorithm is composed of two policy (action) networks and a critic network, directly mapping the relative motion system state of the near-space target and the interceptor to the guidance command of the interceptor. In the algorithm training process, continuous action space and state space are reasonably designed, and the reward function is constructed to accelerate the training convergence speed by weighing energy consumption, relative distance, and other factors. Finally, tests are conducted for different task scenarios according to the trained agent model. The simulation results show that, compared with the traditional Proportional Navigation guidance law (PN) and the Improved Proportional Navigation guidance law (IPN), the guidance algorithm in this paper has smaller miss distances, a more stable interception effect, and robustness for learned scenarios and unknown scenarios, and can be widely used on multiple configuration computers.

参考文献

1 GOLESTANI M, MOHAMMADZAMAN I, VALI A R. Finite-time convergent guidance law based on integral backstepping control[J]. Aerospace Science and Technology201439: 370-376.
2 ZARCHAN P. Tactical and strategic missile guidance[M]. 7th ed. Reston: AIAA, 2019.
3 GUELMAN M. A qualitative study of proportional navigation[J]. IEEE Transactions on Aerospace and Electronic Systems1971, AES-7(4): 637-643.
4 GHAWGHAWE S N, GHOSE D. Pure proportional navigation against time-varying target manoeuvres[J]. IEEE Transactions on Aerospace and Electronic Systems199632(4): 1336-1347.
5 黎克波, 廖选平, 梁彦刚, 等. 基于纯比例导引的拦截碰撞角约束制导策略[J]. 航空学报202041(S2): 724277.
  LI K B, LIAO X P, LIANG Y G, et al. Guidance strategy with impact angle constraints based on pure proportional navigation[J]. Acta Aeronautica et Astronautica Sinica202041(S2): 724277 (in Chinese).
6 YUAN P, CHEN M G. Extended true proportional navigation[C]∥ SPIE Defense + Commercial Sensing. San Francisco: SPIE, 2001: 214-224.
7 袁泉, 赵秀娜, 马宏绪, 等. 一种改进的比例导引规律的设计与仿真[J]. 计算机仿真200724(7): 65-68.
  YUAN Q, ZHAO X N, MA H X, et al. Design and simulation of an advanced proportional guidance law[J]. Computer Simulation200724(7): 65-68 (in Chinese).
8 ZHANG Z X, MAN C Y, LI S H, et al. Finite-time guidance laws for three-dimensional missile-target interception[J]. Proceedings of the Institution of Mechanical Engineers, Part G: Journal of Aerospace Engineering, 2016230(2): 392-403.
9 ZHANG B L, ZHOU D. Optimal predictive sliding-mode guidance law for intercepting near-space hypersonic maneuvering target[J]. Chinese Journal of Aeronautics202235(4): 320-331.
10 司玉洁, 熊华, 宋勋, 等. 三维自适应终端滑模协同制导律[J]. 航空学报202041(S1): 723759.
  SI Y J, XIONG H, SONG X, et al. Three dimensional guidance law for cooperative operation based on adaptive terminal sliding mode[J]. Acta Aeronautica et Astronautica Sinica202041(S1): 723759 (in Chinese).
11 孙胜, 张华明, 周荻. 考虑自动驾驶仪动特性的终端角度约束滑模导引律[J]. 宇航学报201334(1): 69-78.
  SUN S, ZHANG H M, ZHOU D. Sliding mode guidance law with autopilot lag for terminal angle constrained trajectories[J]. Journal of Astronautics201334(1): 69-78 (in Chinese).
12 张宽桥, 杨锁昌, 李宝晨, 等. 考虑驾驶仪动态特性的固定时间收敛制导律[J]. 航空学报201940(11): 323227.
  ZHANG K Q, YANG S C, LI B C, et al. Fixed-time convergent guidance law considering autopilot dynamics[J]. Acta Aeronautica et Astronautica Sinica201940(11): 323227 (in Chinese).
13 EBRAHIMI B, BAHRAMI M, ROSHANIAN J. Optimal sliding-mode guidance with terminal velocity constraint for fixed-interval propulsive maneuvers[J]. Acta Astronautica200862(10-11): 556-562.
14 王亚宁, 王辉, 林德福, 等. 基于虚拟视角约束的机动目标拦截制导方法[J]. 航空学报202243(1): 324799.
  WANG Y N, WANG H, LIN D F, et al. Guidance method for maneuvering target interception based on virtual look angle constraint[J]. Acta Aeronautica et Astronautica Sinica202243(1): 324799 (in Chinese).
15 RYU M Y, LEE C H, TAHK M J. New trajectory shaping guidance laws for anti-tank guided missile[J]. Proceedings of the Institution of Mechanical Engineers, Part G: Journal of Aerospace Engineering, 2015229(7): 1360-1368.
16 周聪, 闫晓东, 唐硕, 等. 大气层内模型预测静态规划拦截中制导[J]. 航空学报202142(11): 524912.
  ZHOU C, YAN X D, TANG S, et al. Midcourse guidance for endo-atmospheric interception based on model predictive static programming[J]. Acta Aeronautica et Astronautica Sinica202142(11): 524912 (in Chinese).
17 YAMASAKI T, BALAKRISHNAN S N, TAKANO H. Geometrical approach-based defense-missile intercept guidance for aircraft protection against missile attack[J]. Proceedings of the Institution of Mechanical Engineers, Part G: Journal of Aerospace Engineering, 2012226(8): 1014-1028.
18 张友安, 胡云安, 林涛. 导弹制导的鲁棒几何方法[J]. 控制理论与应用200320(1): 13-16, 20.
  ZHANG Y A, HU Y A, LIN T. Robust geometric approach to missile guidance[J]. Control Theory & Applications200320(1): 13-16, 20 (in Chinese).
19 ZHANG P, FANG Y W, ZHANG F M, et al. An adaptive weighted differential game guidance law[J]. Chinese Journal of Aeronautics201225(5): 739-746.
20 SURKOV P G. On the problem of package guidance for nonlinear control system via fuzzy approach[J]. IFAC-PapersOnLine201851(32): 733-738.
21 RAJASEKHAR V, SREENATHA A G. Fuzzy logic implementation of proportional navigation guidance[J]. Acta Astronautica200046(1): 17-24.
22 KIM M, HONG D, PARK S. Deep neural network-based guidance law using supervised learning[J]. Applied Sciences202010(21): 7865.
23 SUTTON R S, BARTO A G. Reinforcement learning: An introduction[J]. IEEE Transactions on Neural Networks19989(5): 1054.
24 BELLMAN R. Dynamic programming[J]. Science1966153(3731): 34-37.
25 SCHULMAN J. Optimizing expectations: From deep reinforcement learning to stochastic computation graphs[D]. Berkeley: University of California, Berkeley, 2016.
26 HE X J, CHEN Z H, JIA F, et al. Guidance law based on zero effort miss and Q-learning algorithm[C]∥ Seventh Symposium on Novel Photoelectronic Detection Technology and Applications. San Francisco: SPIE, 2021: 708-716.
27 陈治湘, 曹国辉. 基于微分对策的导弹神经网络制导律研究[J]. 地面防空武器2007(4): 13-17.
  CHEN Z X, CAO G H. Research on neural network guidance law of missile based on differential game[J]. Land-Based Air Defence Weapons2007(4): 13-17 (in Chinese).
28 THEODORIDIS S. Machine learning [M]. Salt Lake City: Academic Press, 2020: 901-1038.
29 MNIH V, KAVUKCUOGLU K, SILVER D, et al. Playing atari with deep reinforcement learning[DB/OL]. arXiv preprint: 1312.5602, 2013.
30 LEI S, LEI Y L, ZHU Z. Research on missile intelligent penetration based on deep reinforcement learning[J]. Journal of Physics: Conference Series20201616(1): 012107.
31 梁晨, 王卫红, 赖超. 带攻击角度约束的深度强化元学习制导律[J]. 宇航学报202142(5): 611-620.
  LIANG C, WANG W H, LAI C. Deep reinforcement meta-learning guidance with impact angle constraint[J]. Journal of Astronautics202142(5): 611-620 (in Chinese).
32 GAUDET B, FURFARO R, LINARES R. Reinforcement learning for angle-only intercept guidance of maneuvering targets[J]. Aerospace Science and Technology202099: 105746.
33 HUANG L W, FU M S, QU H, et al. A deep reinforcement learning-based method applied for solving multi-agent defense and attack problems[J]. Expert Systems With Applications2021176: 114896.
34 HE S M, SHIN H S, TSOURDOS A. Computational missile guidance: A deep reinforcement learning approach[J]. Journal of Aerospace Information Systems202118(8): 571-582.
35 邱潇颀, 高长生, 荆武兴. 拦截大气层内机动目标的深度强化学习制导律[J]. 宇航学报202243(5): 685-695.
  QIU X Q, GAO C S, JING W X. Deep reinforcement learning guidance law for intercepting endo-atmospheric maneuvering targets[J]. Journal of Astronautics202243(5): 685-695 (in Chinese).
36 SUTTON R S, BARTO A G. Reinforcement learning: An introduction[M]. Cambridge: MIT Press, 1998
37 SUTTON R S, MCALLESTER D, SINGH S, et al. Policy gradient methods for reinforcement learning with function approximation[C]∥ Proceedings of the 12th International Conference on Neural Information Processing Systems. New York: ACM, 1999: 1057–1063.
38 SCHULMAN J, LEVINE S, MORITZ P, et al. Trust region policy optimization[DB/OL]. arXiv preprint: 1502.05477, 2015.
39 BENGIO Y, SENECAL J S. Adaptive importance sampling to accelerate training of a neural probabilistic language model[J]. IEEE Transactions on Neural Networks200819(4): 713-722.
40 KAKADE S, LANGFORD J. Approximately optimal approximate reinforcement learning[C]∥ International Conference on Machine Learning. New York: ACM, 2002: 267–274.
41 郭志强, 周绍磊, 于运治. 拦截机动目标的范数型协同微分对策制导律[J]. 计算机仿真202037(3): 23-26.
  GUO Z Q, ZHOU S L, YU Y Z. Research of cooperative norm differential games guidance law for intercepting a maneuvering target[J]. Computer Simulation202037(3): 23-26 (in Chinese).
42 钱杏芳, 林瑞雄, 赵亚男. 导弹飞行力学[M]. 北京: 北京理工大学出版社, 2000.
  QIAN X F, LIN R X, ZHAO Y N. Missile flight dynamics[M]. Beijing: Beijing Insititute of Technology Press, 2000 (in Chinese).
文章导航

/