航空学报 > 2020, Vol. 41 Issue (5): 323630-323630   doi: 10.7527/S1000-6893.2019.23630

基于改进DQN的复合模式在轨服务资源分配

刘冰雁1,2, 叶雄兵1, 周赤非1, 刘必鎏2   

  1. 1. 军事科学院, 北京 100091;
    2. 中国人民解放军 32032部队, 北京 100094
  • 收稿日期:2019-11-04 修回日期:2019-11-28 出版日期:2020-05-15 发布日期:2020-01-10
  • 通讯作者: 刘冰雁 E-mail:bingyanl@outlook.com

Allocation of composite mode on-orbit service resource based on improved DQN

LIU Bingyan1,2, YE Xiongbing1, ZHOU Chifei1, LIU Biliu2   

  1. 1. Academy of Military Sciences, Beijing 100091, China;
    2. 32032 Troops, Beijing 100094, China
  • Received:2019-11-04 Revised:2019-11-28 Online:2020-05-15 Published:2020-01-10

摘要: 针对开展在轨服务前的资源分配非线性多目标优化问题,构建复合服务模式下的在轨资源分配模型,基于对DQN (Deep Q-Network)方法的收敛性和稳定性改进,提出了一种在轨服务资源分配方法。该方法能够应对同时包含"一对多""多对一"的复合服务模式,并在满足预期成功率的前提下优先分配重要服务对象,兼顾资源分配综合效益和总体能耗效率,达到了以期望成功率、较少资源投入尽快完成任务的综合目标。仿真实验表明,改进DQN方法能够在任务执行前依据服务对象重要程度自主分配航天器资源,收敛速度快、训练误差低,在分配效益和总体能耗的优化方面具有明显的比较优势。

关键词: 在轨服务, 整数规划, 资源分配, 深度强化学习, 神经网络

Abstract: In order to solve the nonlinear multi-objective optimization before on-orbit service, an on-orbit service resource allocation model under the composite service mode is constructed, and an on-orbit service resource allocation method based on Deep Q Network (DQN) convergence and stability improvement was proposed. This approach can cope with a composite service pattern which includes "one to many" and "many to one". This method can prioritize the allocation of important service objects on the premise of satisfying the expected success rate, and at the same time, take into account the comprehensive benefit of resource allocation and the overall energy consumption efficiency, achieving the comprehensive goal of completing the task efficiently and with the expected success rate and less resource input. Simulation results show that improved DQN method can independently allocate spacecraft resources based on the importance of service objects. This method has the advantages of fast convergence, low training error, and obvious comparative advantages in the optimization of distribution benefits and overall energy consumption.

Key words: on-orbit servicing, integer programming, resource allocation, deep reinforcement learning, neural network

中图分类号: