基于指针网络的空间目标遍历交会序列规划

doi:10.7527/S1000-6893.2023.28698

飞行力学与制导控制

本期目录 | 过刊浏览 | 高级检索

前一篇 | 后一篇

基于指针网络的空间目标遍历交会序列规划

张嘉城¹^,², 朱阅訸¹^,², 罗亚中¹^,²()

^1.国防科技大学空天科学学院，长沙　410073
^2.空天任务智能规划与仿真湖南省重点实验室，长沙　410073

收稿日期:2023-04-11 修回日期:2023-04-22 接受日期:2023-05-06 出版日期:2023-08-15 发布日期:2023-05-12
通讯作者: 罗亚中 E-mail:luoyz@nudt.edu.cn
基金资助:
国家自然科学基金(12125207)

Space target rendezvous sequence planning via pointer networks

Jiacheng ZHANG¹^,², Yuehe ZHU¹^,², Yazhong LUO¹^,²()

^1.College of Aerospace Science，National University of Defense Technology，Changsha 　410073，China
^2.Hunan Key Laboratory of Intelligent Planning and Simulation for Aerospace Missions，Changsha 　410073，China

Received:2023-04-11 Revised:2023-04-22 Accepted:2023-05-06 Online:2023-08-15 Published:2023-05-12
Contact: Yazhong LUO E-mail:luoyz@nudt.edu.cn
Supported by:
National Natural Science Foundation of China(12125207)

摘要/Abstract

摘要：

单航天器对多目标的遍历交会任务规划是一类复杂度极高的混合整数优化问题，涉及顶层交会序列组合优化和底层飞行轨迹连续优化。现有方法将离散变量和连续变量一体优化，计算效率低且难以求得最优序列。提出了一种基于指针网络的多目标遍历交会序列规划方法，可快速获得最优序列。首先，构建了多目标遍历交会序列规划的神经网络模型，作为序列规划的决策智能体。其次，提出了一种基于异步优势函数行动者-评论家算法的无监督学习方法，避免了求解训练标签数据的计算开销。最后，为提高奖励函数的计算效率，在训练中嵌入了一种快速估计实际转移成本的近似方法。应用算例分析表明：所提出的训练方法可显著提高训练效率，经训练的决策智能体能够以超过88.7%的正确率快速求得最优序列。

关键词: 航天任务规划, 交会序列规划, 移动目标旅行商问题, 组合优化, 指针网络, 强化学习

Abstract:

Traversal rendezvous mission planning of multiple space targets for a single spacecraft is a mixed-integer programming problem with high complexity， which involves the combinatorial optimization of the top-level rendezvous sequence and the continuous optimization of the base-level flight trajectories. Existing methods that integrally optimize all discrete and continuous variables are inefficient and difficult to achieve the optimum. We propose a learning-based method that can efficiently obtain the near-optimal sequence mainly using the pointer networks. First， the neural network model for multiple-space-target traversal rendezvous planning is constructed as the decision agent of sequencing. Second， an unsupervised learning method based on the asynchronous advantage actor-critic algorithm is proposed to avoid the expensive computational cost in obtaining training labels. Finally， an estimation method to rapidly approximate the actual transfer cost is embedded in the training process to improve the efficiency of calculating rewards. Case studies show that the proposed training method performs efficiently， and the well-trained agent can rapidly predict the optimal sequence with a probability more than 88.7%.

Key words: aerospace mission planning, rendezvous sequence planning, moving target traveling salesman problem, combinatorial optimization, pointer network, reinforcement learning

中图分类号:

V448

张嘉城, 朱阅訸, 罗亚中. 基于指针网络的空间目标遍历交会序列规划[J]. 航空学报, 2023, 44(15): 528698.

Jiacheng ZHANG, Yuehe ZHU, Yazhong LUO. Space target rendezvous sequence planning via pointer networks[J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(15): 528698.

图/表 13

图 1

图 2

图 3

图 4

图 5

图 6

表 1

目标动力学属性的取值域

算例	参数类型	最小值	最大值
算例1	初始位置横向分量x₀/m	0	100
	初始位置纵向分量y₀/m	0	100
	移动速度横向分量v_x_，0/（m·s^-1）	1.5	5
	移动速度纵向分量v_y_，0/（m·s^-1）	1.5	5
	2目标间转移时长 $Δ t$ /s	0	10
算例2	半长轴a/AU	2.4	3
	偏心率e	0.1	0.2
	轨道倾角I₀/（°）	5	15
	升交点赤经Ω₀/（°）	0	360
	近地点幅角ω₀/（°）	0	360
	真近点角θ₀/（°）	0	360
	2目标间转移时长 $Δ t$ /d	300	1 200
算例3	半长轴a/km	6 990	7 280
	偏心率e	0	0.02
	轨道倾角I₀/（°）	80	85
	升交点赤经Ω₀/（°）	0	360
	近地点幅角ω₀/（°）	0	360
	真近点角θ₀/（°）	0	360
	两目标间转移时长 $Δ t$ /d	30	180

表 1

表 2

表 3

蚁群算法参数

参数	数值	参数	数值
蚂蚁数量N	100	启发式权重 $β$	2
最大迭代次数G	500	全局信息素挥发因子 $ρ$	0.1
信息素常量Q	20	局部信息素挥发因子 $γ$	0.1
信息素因子 $α$	2	解构造选择概率 $q 0$	0.1

表 3

表 4

图 7

图 8

表 5

序列规划智能体辅助优化的多航天器交会序列

航天器序号	交会目标顺序	$Δ v$ /（km·s^-1）
1	2， 78， 3， 19， 14， 37， 18， 41， 62， 80， 106， 13， 121， 0， 107， 10， 9， 34， 63， 25， 27， 61， 101， 94， 90， 83， 28， 77， 73	3.871
2	65， 112， 40， 87， 66， 51， 97， 17， 8， 22， 29， 46， 115， 92， 33， 88， 109， 55， 21， 93， 75， 89	3.299
3	31， 47， 43， 98， 52， 111， 57， 16， 15， 58， 104， 42， 74， 7， 6， 24， 95， 35， 39	3.195
4	69， 122， 103， 117， 91， 118， 84， 100， 48， 82， 60， 85， 99， 5， 120， 119， 54	2.029
5	44， 86， 59， 4， 105， 102， 36， 23， 68， 67， 114， 76	1.737
6	30， 49， 20， 116， 32， 81， 72， 64， 108， 79， 50， 12， 53， 56， 113	1.437
7	71， 96， 45， 1， 38， 11， 110， 26， 70	0.796

表 5

参考文献 40

1	SHAN M H， GUO J， GILL E. Review and comparison of active space debris capturing and removal methods［J］. Progress in Aerospace Sciences， 2016， 80： 18-32.
2	SIZOV D A， ASLANOV V S. Space debris removal with harpoon assistance： Choice of parameters and optimization［J］. Journal of Guidance， Control， and Dynamics， 2021， 44（4）： 767-778.
3	LI Y X， HUO J， MA P， et al. Target localization method of non-cooperative spacecraft on on-orbit service［J］. Chinese Journal of Aeronautics， 2022， 35（11）： 336-348.
4	ZHANG J， PARKS G T， LUO Y Z， et al. Multispacecraft refueling optimization considering the J2 perturbation and window constraints［J］. Journal of Guidance， Control， and Dynamics， 2014， 37（1）： 111-122.
5	GAO Y T， LU X， PENG Y M， et al. Trajectory optimization of multiple asteroids exploration with asteroid 2010TK₇ as main target［J］. Advances in Space Research， 2019， 63（1）： 432-442.
6	PELONI A， CERIOTTI M， DACHWALD B. Solar-sail trajectory design for a multiple near-earth-asteroid rendezvous mission［J］. Journal of Guidance， Control， and Dynamics， 2016， 39（12）： 2712-2724.
7	HELVIG C S， ROBINS G， ZELIKOVSKY A. The moving-target traveling salesman problem［J］. Journal of Algorithms， 2003， 49（1）： 153-174.
8	SAAD S， WAN JAAFAR W N， JAMIL S J. Solving standard traveling salesman problem and multiple traveling salesman problem by using branch-and-bound［C］∥ AIP Conference Proceedings. 2013.
9	TOMANOVÁ P， HOLÝ V. Ant colony optimization for time-dependent travelling salesman problem［C］∥Proceedings of the 2020 4th International Conference on Intelligent Systems， Metaheuristics & Swarm Intelligence. New York： ACM， 2020： 47-51.
10	ZHAO J F， FENG W M， YUAN J P. A novel two-level optimization strategy for multi-debris active removal mission in LEO［J］. Computer Modeling in Engineering & Sciences， 2020， 122（1）： 149-174.
11	朱阅訸. 面向大规模目标访问任务的飞行序列规划方法［D］. 长沙：国防科技大学， 2020.
	ZHU Y H. Flight sequence planning method for large-scale-object visiting mission［D］. Changsha： National University of Defense Technology， 2020 （in Chinese）.
12	SHANG H B， LIU Y X. Assessing accessibility of main-belt asteroids based on Gaussian process regression［J］. Journal of Guidance， Control， and Dynamics， 2017， 40（5）： 1144-1154.
13	HUANG A Y， LUO Y Z， LI H N. Fast estimation of perturbed impulsive rendezvous via semi-analytical equality-constrained optimization［J］. Journal of Guidance， Control， and Dynamics， 2020， 43（12）： 2383-2390.
14	ZHU Y H， LUO Y Z. Fast approximation of optimal perturbed long-duration impulsive transfers via artificial neural networks［J］. IEEE Transactions on Aerospace and Electronic Systems， 2021， 57（2）： 1123-1138.
15	ZHU Y H， LUO Y Z. Fast evaluation of low-thrust transfers via multilayer perceptions［J］. Journal of Guidance， Control， and Dynamics， 2019， 42（12）： 2627-2637.
16	VIAVATTENE G， CERIOTTI M. Artificial neural networks for multiple NEA rendezvous missions with continuous thrust［J］. Journal of Spacecraft and Rockets， 2022， 59（2）： 574-586.
17	CUI P Y， QIAO D， CUI H T， et al. Target selection and transfer trajectories design for exploring asteroid mission［J］. Science China Technological Sciences， 2010， 53（4）： 1150-1158.
18	CERF M. Multiple space debris collecting mission—debris selection and trajectory optimization［J］. Journal of Optimization Theory and Applications， 2013， 156（3）： 761-796.
19	HUANG A Y， LUO Y Z， LI H N. Global optimization of multiple-spacecraft rendezvous mission via decomposition and dynamics-guide evolution approach［J］. Journal of Guidance， Control， and Dynamics， 2022， 45（1）： 171-178.
20	WANG H J， YANG Z， ZHOU W G， et al. Online scheduling of image satellites based on neural networks and deep reinforcement learning［J］. Chinese Journal of Aeronautics， 2019， 32（4）： 1011-1019.
21	LITTLE B D， FRUEH C E. Space situational awareness sensor tasking： Comparison of machine learning with classical optimization methods［J］. Journal of Guidance， Control， and Dynamics， 2020， 43（2）： 262-273.
22	刘冰雁，叶雄兵，周赤非，等. 基于改进DQN的复合模式在轨服务资源分配［J］. 航空学报， 2020， 41（5）： 323630.
	LIU B Y， YE X B， ZHOU C F， et al. Allocation of composite mode on-orbit service resource based on improved DQN［J］. Acta Aeronautica et Astronautica Sinica， 2020， 41（5）： 323630 （in Chinese）.
23	IZZO D， MÄRTENS M， PAN B F. A survey on artificial intelligence trends in spacecraft guidance dynamics and control［J］. Astrodynamics， 2019， 3（4）： 287-299.
24	SONG Y， GONG S P. Solar-sail trajectory design for multiple near-Earth asteroid exploration based on deep neural networks［J］. Aerospace Science and Technology， 2019， 91： 28-40.
25	IZZO D， ÖZTÜRK E. Real-time guidance for low-thrust transfers using deep neural networks［J］. Journal of Guidance， Control， and Dynamics， 2021， 44（2）： 315-327.
26	ZAVOLI A， FEDERICI L. Reinforcement learning for robust trajectory design of interplanetary missions［J］. Journal of Guidance， Control， and Dynamics， 2021， 44（8）： 1440-1453.
27	SÁNCHEZ-SÁNCHEZ C， IZZO D. Real-time optimal control via deep neural networks： Study on landing problems［J］. Journal of Guidance， Control， and Dynamics， 2018， 41（5）： 1122-1135.
28	SCORSOGLIO A， D’AMBROSIO A， GHILARDI L， et al. Image-based deep reinforcement meta-learning for autonomous lunar landing［J］. Journal of Spacecraft and Rockets， 2022， 59（1）： 153-165.
29	YANG B， LI S A， FENG J L， et al. Fast solver for J2-perturbed lambert problem using deep neural network［J］. Journal of Guidance， Control， and Dynamics， 2022， 45（5）： 875-884.
30	PENG H， BAI X L. Artificial neural network–based machine learning approach to improve orbit prediction accuracy［J］. Journal of Spacecraft and Rockets， 2018， 55（5）： 1248-1260.
31	VINYALS O， FORTUNATO M， JAITLY N. Pointer networks［DB/OL］. arXiv preprint： 1506.03134， 2015.
32	GU S S， HAO T， YAO H M. A pointer network based deep learning algorithm for unconstrained binary quadratic programming problem［J］. Neurocomputing， 2020， 390： 1-11.
33	GU S S， YAO H M. Pointer network based deep learning algorithm for the maximum clique problem［J］. International Journal on Artificial Intelligence Tools， 2021， 30（1）： 2140004.
34	GU S S， YANG Y E. A deep learning algorithm for the max-cut problem based on pointer network structure with supervised learning and reinforcement learning strategies［J］. Mathematics， 2020， 8（2）： 298.
35	马一凡，赵凡宇，王鑫，等. 基于改进指针网络的卫星对地观测任务规划方法［J］. 浙江大学学报（工学版）， 2021， 55（2）： 395-401.
	MA Y F， ZHAO F Y， WANG X， et al. Satellite earth observation task planning method based on improved pointer networks［J］. Journal of Zhejiang University （Engineering Science）， 2021， 55（2）： 395-401 （in Chinese）.
36	HOCHREITER S， SCHMIDHUBER J. Long short-term memory［J］. Neural Computation， 1997， 9（8）： 1735-1780.
37	KIM Y. Convolutional neural networks for sentence classification［DB/OL］. arXiv preprint： 1408.5882， 2014.
38	NUDT. Problem data of the GTOC11： Candidate asteroids［EB/OL］. .
39	ESA. Problem data of the GTOC9： Debris orbits［EB/OL］. .
40	BANG J， AHN J. Multitarget rendezvous for active debris removal using multiple spacecraft［J］. Journal of Spacecraft and Rockets， 2019， 56（4）： 1237-1247.

E-mail：hkxb@buaa.edu.cn

关于我们

期刊社服务

专业学科

封面文章

友情链接

主管单位：中国科学技术协会主办单位：中国航空学会北京航空航天大学

超参数	数值
超参数	算例1	算例2	算例3
特征属性数量	4	6	9
一维卷积核大小	1	3	3
特征属性嵌入维度	128	512	512
LSTM隐层维度	128	512	512
Attention匹配维度	128	512	512
训练中的目标数量	15	15	15
环境交互总回合数	2 000	5 000	8 000
优化器学习率	0.001	0.001	0.001
Softmax蒸馏温度初值	5.0	5.0	5.0
测试中的目标数量	15~20	15~20	15~20

性能指标	数值
性能指标	算例1	算例2	算例3
模型训练时间/s	23.62	4.77×10³	3.09×10⁴
Actor规划时间/ms	0.255	3.74	5.15
蚁群算法收敛时间/s	8.31	96.13	312.93
Actor得最优序列概率/%	96.15	92.38	88.71
Actor得近最优序列概率/%	100.00	96.31	92.11
Actor故障概率/%	0	0.82	1.92
Critic计算时间/ms	0.183	2.17	2.89
Critic误差率/%	7.21	7.33	9.37
Critic故障概率/%	13.82	16.35	12.55

[1]	万开方, 吴志林, 武韫晖, 强皓植, 吴艺博, 李波. 拒止环境下基于深度强化学习的多无人机协同定位[J]. 航空学报, 2025, 46(8): 331024-331024.
[2]	姜凌峰, 李新凯, 张海, 李涵玮, 张宏立. 基于改进TD3算法的无人机动态环境无地图导航[J]. 航空学报, 2025, 46(8): 331035-331035.
[3]	杨敏, 刘关俊, 周子渊. 基于安全强化学习的月球着陆器控制[J]. 航空学报, 2025, 46(3): 630553-630553.
[4]	王辰, 魏才盛, 殷泽阳, 靳锴, 李星辰. 考虑信道资源约束的多无人机航迹与通信策略协同规划[J]. 航空学报, 2025, 46(18): 331837-331837.
[5]	罗祎喆, 张辉, 余新得, 金钊, 冯朔, 石育澄, 徐明亮. 面向舰载机多波次弹药保障任务的分层动态调度[J]. 航空学报, 2025, 46(18): 331945-331945.
[6]	黄湘松, 王梦宇, 潘大鹏. 基于对抗强化学习的无人机逃离路径规划方法[J]. 航空学报, 2025, 46(17): 331637-331637.
[7]	王昱, 谢志鹏, 田永健, 孟光磊. 虚拟结构引领强化学习分布式无人机编队控制[J]. 航空学报, 2025, 46(15): 331354-331354.
[8]	陈伟, 李璐璐, 陈董, 张少辉, 李亚飞, 王可, 靳远远, 徐明亮. 差异化保障需求驱动的舰载机多机协同决策方法[J]. 航空学报, 2025, 46(13): 531274-531274.
[9]	陈旭东, 陈琦琦, 罗祎喆, 王佳宝, 徐明亮. 异构舰载机舰面保障作业动态并行调度[J]. 航空学报, 2025, 46(13): 531329-531329.
[10]	王政, 王华, 崔可可, 李超超, 刘俊楠, 徐明亮. 局部引导强化学习的舰载机自主调运方法[J]. 航空学报, 2025, 46(13): 531333-531333.
[11]	凌文辉, 牟春晖, 聂聆聪, 杜宪, 孙希明. 基于改进DDPG的宽速域几何可调燃烧室压力分布控制[J]. 航空学报, 2025, 46(12): 131092-131092.
[12]	余子杰, 郑征, 李清东, 郭林, 任素萍, 郭健. 基于深度强化学习的太阳能无人机航迹规划[J]. 航空学报, 2025, 46(12): 331420-331420.
[13]	赵长啸, 孙亦轩. 面向适航要求的eVTOL航电系统安全调度模型[J]. 航空学报, 2025, 46(11): 531252-531252.
[14]	高树一, 林德福, 郑多, 徐骋. 考虑拦截器探测能力限制的飞行器智能机动突防制导策略[J]. 航空学报, 2025, 46(10): 331304-331304.
[15]	刘广, 王华, 林友芳, 贺硕, 李亚飞, 徐明亮. 舰载机保障作业自适应批量匹配决策方法[J]. 航空学报, 2025, 46(1): 330615-330615.

基于指针网络的空间目标遍历交会序列规划

Space target rendezvous sequence planning via pointer networks

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 13

参考文献 40

相关文章 15

编辑推荐

Metrics

本文评价