基于机器学习的航天器规避目标威胁博弈决策

doi:10.7527/S1000-6893.2023.29136

电子电气工程与控制

本期目录 | 过刊浏览 | 高级检索

前一篇 | 后一篇

基于机器学习的航天器规避目标威胁博弈决策

张鸿林¹^,², 罗建军¹^,²(), 马卫华¹^,²

^1.西北工业大学航天学院，西安　710072
^2.航天飞行动力学技术重点实验室，西安　710072

收稿日期:2023-06-06 修回日期:2023-08-22 接受日期:2023-11-02 出版日期:2024-04-25 发布日期:2023-11-16
通讯作者: 罗建军 E-mail:jjluo@mail.nwpu.edu.cn;jjluo@nwpu.edu.cn
基金资助:
国家自然科学基金(12072269);航天飞行动力学技术重点实验室基金(6142210210302)

Spacecraft game decision making for threat avoidance of space targets based on machine learning

Honglin ZHANG¹^,², Jianjun LUO¹^,²(), Weihua MA¹^,²

^1.School of Astronautics，Northwestern Polytechnical University，Xi’an 　710072，China
^2.Science and Technology on Aerospace Flight Dynamics Laboratory，Xi’an 　710072，China

Received:2023-06-06 Revised:2023-08-22 Accepted:2023-11-02 Online:2024-04-25 Published:2023-11-16
Contact: Jianjun LUO E-mail:jjluo@mail.nwpu.edu.cn;jjluo@nwpu.edu.cn
Supported by:
National Natural Science Foundation of China(12072269);Foundation of Science and Technology on Aerospace Flight Dynamics Laboratory(6142210210302)

摘要/Abstract

摘要：

针对航天器规避空间目标抵近威胁的决策问题，提出了一种智能决策框架和基于深度强化学习的自主决策方法。考虑到空间目标的机动特性和威胁规避的博弈性，基于感知-判断-决策-执行（OODA）环决策思想和机器学习方法，提出了一种航天器威胁规避智能博弈决策框架。基于该框架和对空间目标运动意图的推理，为了使航天器决策控制具备博弈应对能力，设计了基于深度强化学习的航天器机动决策算法和训练环境，实现了对空间目标典型运动意图的规避应对；进一步地，采用自我博弈学习训练提升航天器自主机动决策算法的泛化性和应对目标不确定机动的适应能力。最后，通过算例仿真及分析，验证了所提方法的有效性。

关键词: 航天器机动, 智能决策, 威胁规避, OODA环, 深度强化学习

Abstract:

An intelligent decision-making framework and a deep reinforcement learning-based autonomous decision-making method are proposed for the spacecraft decision-making in avoiding the threat of space targets. Taking into account the maneuvering characteristics of space targets and the gameplay of threat avoidance， an intelligent game decision-making framework for spacecraft threat avoidance is proposed based on the Observation-Orientation-Decision-Action （OODA） loop decision-making idea and machine learning techniques. Based on this framework and inference on the motion intentions of space targets， a deep reinforcement learning-based spacecraft maneuver decision-making algorithm and training environment are designed to enable spacecraft decision-making control with game response capability， which realizes the avoidance response to the typical motion intentions of space targets. Furthermore， the generalization of spacecraft autonomous maneuvering decision-making algorithm and its adaptability to possible uncertain maneuvers of space targets are improved by using the self-play learning technique. Finally， the effectiveness of our proposed method is verified through simulations.

Key words: spacecraft maneuver, intelligent decision-making, threat avoidance, OODA loop, deep reinforcement learning

中图分类号:

V448.2

张鸿林, 罗建军, 马卫华. 基于机器学习的航天器规避目标威胁博弈决策[J]. 航空学报, 2024, 45(8): 329136-329136.

Honglin ZHANG, Jianjun LUO, Weihua MA. Spacecraft game decision making for threat avoidance of space targets based on machine learning[J]. Acta Aeronautica et Astronautica Sinica, 2024, 45(8): 329136-329136.

图/表 21

图1

图2

图3

图4

图5

图6

表1

图7

图8

图9

图10

表2

图11

表3

图12

图13

图14

表4

图15

表5

表6

参考文献 20

1	袁利，姜甜甜. 航天器威胁规避智能自主控制技术研究综述［J］. 自动化学报， 2023， 49（2）： 229-245.
	YUAN L， JIANG T T. Review on intelligent autonomous control for spacecraft confronting orbital threats［J］. Acta Automatica Sinica， 2023， 49（2）： 229-245 （in Chinese）.
2	袁利. 面向不确定环境的航天器智能自主控制技术［J］. 宇航学报， 2021， 42（7）： 839-849.
	YUAN L. Spacecraft intelligent autonomous control technology toward uncertain environment［J］. Journal of Astronautics， 2021， 42（7）： 839-849 （in Chinese）.
3	王杰，丁达理，董康生，等. UCAV自主空战战术机动动作建模与轨迹生成［J］. 火力与指挥控制， 2018， 43（12）： 42-49.
	WANG J， DING D L， DONG K S， et al. UCAV autonomous air combat tactical maneuvering modeling and trajectory generation［J］. Fire Control & Command Control， 2018， 43（12）： 42-49 （in Chinese）.
4	于大腾，王华，孙福煜. 考虑潜在威胁区的航天器最优规避机动策略［J］. 航空学报， 2017， 38（1）： 320202.
	YU D T， WANG H， SUN F Y. Optimal evasive maneuver strategy with potential threatening area being considered［J］. Acta Aeronautica et Astronautica Sinica， 2017， 38（1）： 320202 （in Chinese）.
5	BOMBARDELLI C. Analytical formulation of impulsive collision avoidance dynamics［J］. Celestial Mechanics and Dynamical Astronomy， 2014， 118（2）： 99-114.
6	GONZALO J L， COLOMBO C， DI LIZIA P. Analytical framework for space debris collision avoidance maneuver design［J］. Journal of Guidance， Control， and Dynamics， 2020， 44（3）： 469-487.
7	BATHER J A， ISAACS R. Differential games： a mathematical theory with applications to warfare and pursuit， control and optimization［J］. Journal of the Royal Statistical Society Series A （General）， 1966， 129（3）： 474.
8	PRINCE E R， HESS J A， COBB R G， et al. Elliptical orbit proximity operations differential games［J］. Journal of Guidance， Control， and Dynamics， 2019， 42（7）： 1458-1472.
9	LIANG L， DENG F， PENG Z H， et al. A differential game for cooperative target defense［J］. Automatica， 2019， 102： 58-71.
10	SUN J L， LIU C S， YE Q. Robust differential game guidance laws design for uncertain interceptor-target engagement via adaptive dynamic programming［J］. International Journal of Control， 2017， 90（5）： 990-1004.
11	WATANABE T， JOHNSON E N. Trajectory generation using deep neural network： AIAA-2018-1893［R］. Reston： AIAA， 2018.
12	IZZO D， TAILOR D， VASILEIOU T. On the stability analysis of deep neural network representations of an optimal state-feedback［DB/OL］. arXiv preprint： 1812.02532， 2018.
13	SÁNCHEZ-SÁNCHEZ C， IZZO D. Real-time optimal control via deep neural networks： Study on landing problems［J］. Journal of Guidance， Control， and Dynamics， 2018， 41（5）： 1122-1135.
14	OESTREICH C E， LINARES R， GONDHALEKAR R. Autonomous six-degree-of-freedom spacecraft docking with rotating targets via reinforcement learning［J］. Journal of Aerospace Information Systems， 2021， 18（7）： 417-428.
15	刘冰雁，叶雄兵，高勇，等. 基于分支深度强化学习的非合作目标追逃博弈策略求解［J］. 航空学报， 2020， 41（10）： 324040.
	LIU B Y， YE X B， GAO Y， et al. Strategy solution of non-cooperative target pursuit-evasion game based on branching deep reinforcement learning［J］. Acta Aeronautica et Astronautica Sinica， 2020， 41（10）： 324040 （in Chinese）.
16	ZHANG J R， ZHANG K P， ZHANG Y， et al. Near-optimal interception strategy for orbital pursuit-evasion using deep reinforcement learning［J］. Acta Astronautica， 2022， 198： 9-25.
17	赵毓，郭继峰，颜鹏，等. 稀疏奖励下多航天器规避决策自学习仿真［J］. 系统仿真学报， 2021， 33（8）： 1766-1774.
	ZHAO Y， GUO J F， YAN P， et al. Self-learning-based multiple spacecraft evasion decision making simulation under sparse reward condition［J］. Journal of System Simulation， 2021， 33（8）： 1766-1774 （in Chinese）.
18	ZHANG H L， LUO J J， GAO Y， et al. An intention inference method for the space non-cooperative target based on BiGRU-Self Attention［J］. Advances in Space Research， 2023， 72（5）： 1815-1828.
19	黎飞，雷拥军，冯佳佳. 一种GEO卫星太阳光遮挡轨迹设计与控制方法［J］. 宇航学报， 2022， 43（2）： 198-205.
	LI F， LEI Y J， FENG J J. A design and control method of Sun occlusion trajectory for GEO satellite［J］. Journal of Astronautics， 2022， 43（2）： 198-205 （in Chinese）.
20	SILVER D， SCHRITTWIESER J， SIMONYAN K， et al. Mastering the game of Go without human knowledge［J］. Nature， 2017， 550： 354-359.

编辑推荐

Metrics

阅读次数

全文

605

HTML			PDF

最新录用	在线预览	正式出版	最新录用	在线预览	正式出版
0	0	24	203	0	378

来源	本网站	其他网站

次数	538	67
比例	89%	11%

摘要

609

最新录用	在线预览	正式出版

91	0	518

来源	本网站	其他网站

次数	452	157
比例	74%	26%

本文评价

地址：北京市海淀区北四环中路辅路238号柏彦大厦

邮政编码：100083

E-mail：hkxb@buaa.edu.cn

关于我们

期刊社服务

专业学科

封面文章

友情链接

主管单位：中国科学技术协会主办单位：中国航空学会北京航空航天大学

网络层级	策略网络		价值网络
网络层级	单元数	激活函数	单元数	激活函数
输入层	6		6
隐含层1	64	tanh	64	tanh
隐含层2	32	tanh	32	tanh
输出层	3	tanh	1	线性

参数	数值
学习率 $α$	0.001
矩估计衰减率	$β 1 = 0.9, β 2 = 0.999$
最大训练回合	4 000
折扣因子	0.99
截断因子 $ε$	0.2
值函数误差系数c₁	0.5
交叉熵系数c₂	0.01

目标意图	初始状态
定点跟飞	$[0, - 50, - 40, 0,0, 0,0] T$
盘旋跟飞	$[0, - 50, - 40, 0, - 0.1,0.1, 0,0] T$
振荡跟飞	$[0, - 50, - 40, 0,0, 0, - 0.1,0.1] T$
共面绕飞	$[0, - 50, - 40, 0, n y / 2,0, 0] T$
异面绕飞	$[0, - 50, - 40, 0, n y / 2,0, ± 3, n y / 2] T$
飞掠抵近	$[- 21, - 19, - 110, - 90, 0,0, - 3 n x / 2,0] T$
跳跃抵近	$[0, - 55, - 45, 0,0, - 0.2, - 0.1, 0] T$
螺旋抵近	$[0, - 55, - 45, 0, - 0.1,0.1, - 0.2, - 0.1, - 3, 3 v x] T$

目标意图	指标
目标意图	$T s$ /h	$Δ v$ /（m·s^-1）	$ρ$ /km	$α$ /（°）
定点跟飞	12.5±1.13	1.48±0.15	102.48±1.54	22.13±0.26
盘旋跟飞	11.5±0.70	1.58±0.11	102.52±1.49	15.18±0.12
振荡跟飞	10.5±0.58	1.43±0.10	103.17±1.89	8.62±0.06
共面绕飞	21±3.64	2.51±0.40	102.31±1.96	47.73±0.72
异面绕飞	23.5±2.78	2.72±0.28	102.91±1.95	40.38±0.43
飞掠抵近	12±0.59	1.34±0.10	104.19±2.54	44.30±0.24
跳跃抵近	16.5±1.94	2.04±0.28	104.71±2.70	38.71±0.46
螺旋抵近	16.5±2.60	2.01±0.30	104.77±2.93	42.03±0.47

目标意图	成功率/%
目标意图	固定意图训练	自我博弈训练
定点跟飞	0	100
盘旋跟飞	18.2	100
振荡跟飞	86.1	100
共面绕飞	0	91.8
异面绕飞	26.6	84.1
飞掠抵近	0	97.7
跳跃抵近	0	98.1
螺旋抵近	0	97.7

基于机器学习的航天器规避目标威胁博弈决策

Spacecraft game decision making for threat avoidance of space targets based on machine learning

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 21

参考文献 20

相关文章 15

编辑推荐

Metrics

本文评价

目标意图	$Δ v$ /（m·s^-1）
目标意图	航天器	空间目标
定点跟飞	6.43	6.82
盘旋跟飞	6.92	7.10
振荡跟飞	5.96	6.99
共面绕飞	7.34	8.44
异面绕飞	6.98	7.70
飞掠抵近	2.89	3.15
跳跃抵近	5.91	6.63
螺旋抵近	7.40	7.61

[1]	蔡云鹏, 周大鹏, 丁江川. 具有防撞安全约束的无人机集群智能协同控制[J]. 航空学报, 2024, 45(5): 529683-529683.
[2]	张赛, 杨震, 杜向南, 罗亚中. 基于轨道可达域的机动航天器接近威胁规避方法[J]. 航空学报, 2024, 45(4): 328778-328778.
[3]	单圣哲, 张伟伟. 基于自博弈深度强化学习的空战智能决策方法[J]. 航空学报, 2024, 45(4): 328723-328723.
[4]	倪炜霖, 王永海, 徐聪, 赤丰华, 梁海朝. 基于强化学习的高超飞行器协同博弈制导方法[J]. 航空学报, 2023, 44(S2): 729400-729400.
[5]	王雪鉴, 文永明, 石晓荣, 张宁宁, 刘洁玺. 多智能体多耦合任务混合式智能决策架构设计[J]. 航空学报, 2023, 44(S2): 729770-729770.
[6]	高锡珍, 汤亮, 黄煌. 深度强化学习技术在地外探测自主操控中的应用与挑战[J]. 航空学报, 2023, 44(6): 26762-026762.
[7]	周攀, 黄江涛, 章胜, 刘刚, 舒博文, 唐骥罡. 基于深度强化学习的智能空战决策与仿真[J]. 航空学报, 2023, 44(4): 126731-126731.
[8]	朱祥维, 沈丹, 肖凯, 马岳鑫, 廖祥, 古富强, 余芳文, 高柯夫, 刘经南. 类脑导航的机理、算法、实现与展望[J]. 航空学报, 2023, 44(19): 28569-028569.
[9]	李敏, 袁利, 魏春岭. 基于混合状态机的航天器自主绕飞多模态控制[J]. 航空学报, 2023, 44(18): 328296-328296.
[10]	董磊, 陈泓兵, 陈曦, 赵长啸. 基于DQN的单一飞行员驾驶模式分布式多智能体联盟任务分配策略[J]. 航空学报, 2023, 44(13): 327895-327895.
[11]	陈文雪, 高长生, 荆武兴. 拦截机动目标的信赖域策略优化制导算法[J]. 航空学报, 2023, 44(11): 327596-327596.
[12]	章胜, 周攀, 何扬, 黄江涛, 刘刚, 唐骥罡, 贾怀智, 杜昕. 基于深度强化学习的空战机动决策试验[J]. 航空学报, 2023, 44(10): 128094-128094.
[13]	向锦武, 董希旺, 丁文锐, 索津莉, 沈林成, 夏辉. 复杂环境下无人集群系统自主协同关键技术[J]. 航空学报, 2022, 43(10): 527570-527570.
[14]	任峰, 高传强, 唐辉. 机器学习在流动控制领域的应用及发展趋势[J]. 航空学报, 2021, 42(4): 524686-524686.
[15]	相晓嘉, 闫超, 王菖, 尹栋. 基于深度强化学习的固定翼无人机编队协调控制方法[J]. 航空学报, 2021, 42(4): 524009-524009.