航空学报 > 2023, Vol. 44 Issue (15): 528698-528698   doi: 10.7527/S1000-6893.2023.28698

基于指针网络的空间目标遍历交会序列规划

张嘉城1,2, 朱阅訸1,2, 罗亚中1,2()   

  1. 1.国防科技大学 空天科学学院,长沙  410073
    2.空天任务智能规划与仿真湖南省重点实验室,长沙  410073
  • 收稿日期:2023-04-11 修回日期:2023-04-22 接受日期:2023-05-06 出版日期:2023-08-15 发布日期:2023-05-12
  • 通讯作者: 罗亚中 E-mail:luoyz@nudt.edu.cn
  • 基金资助:
    国家自然科学基金(12125207)

Space target rendezvous sequence planning via pointer networks

Jiacheng ZHANG1,2, Yuehe ZHU1,2, Yazhong LUO1,2()   

  1. 1.College of Aerospace Science,National University of Defense Technology,Changsha  410073,China
    2.Hunan Key Laboratory of Intelligent Planning and Simulation for Aerospace Missions,Changsha  410073,China
  • Received:2023-04-11 Revised:2023-04-22 Accepted:2023-05-06 Online:2023-08-15 Published:2023-05-12
  • Contact: Yazhong LUO E-mail:luoyz@nudt.edu.cn
  • Supported by:
    National Natural Science Foundation of China(12125207)

摘要:

单航天器对多目标的遍历交会任务规划是一类复杂度极高的混合整数优化问题,涉及顶层交会序列组合优化和底层飞行轨迹连续优化。现有方法将离散变量和连续变量一体优化,计算效率低且难以求得最优序列。提出了一种基于指针网络的多目标遍历交会序列规划方法,可快速获得最优序列。首先,构建了多目标遍历交会序列规划的神经网络模型,作为序列规划的决策智能体。其次,提出了一种基于异步优势函数行动者-评论家算法的无监督学习方法,避免了求解训练标签数据的计算开销。最后,为提高奖励函数的计算效率,在训练中嵌入了一种快速估计实际转移成本的近似方法。应用算例分析表明:所提出的训练方法可显著提高训练效率,经训练的决策智能体能够以超过88.7%的正确率快速求得最优序列。

关键词: 航天任务规划, 交会序列规划, 移动目标旅行商问题, 组合优化, 指针网络, 强化学习

Abstract:

Traversal rendezvous mission planning of multiple space targets for a single spacecraft is a mixed-integer programming problem with high complexity, which involves the combinatorial optimization of the top-level rendezvous sequence and the continuous optimization of the base-level flight trajectories. Existing methods that integrally optimize all discrete and continuous variables are inefficient and difficult to achieve the optimum. We propose a learning-based method that can efficiently obtain the near-optimal sequence mainly using the pointer networks. First, the neural network model for multiple-space-target traversal rendezvous planning is constructed as the decision agent of sequencing. Second, an unsupervised learning method based on the asynchronous advantage actor-critic algorithm is proposed to avoid the expensive computational cost in obtaining training labels. Finally, an estimation method to rapidly approximate the actual transfer cost is embedded in the training process to improve the efficiency of calculating rewards. Case studies show that the proposed training method performs efficiently, and the well-trained agent can rapidly predict the optimal sequence with a probability more than 88.7%.

Key words: aerospace mission planning, rendezvous sequence planning, moving target traveling salesman problem, combinatorial optimization, pointer network, reinforcement learning

中图分类号: