Electronics and Electrical Engineering and Control

Allocation of composite mode on-orbit service resource based on improved DQN

  • LIU Bingyan ,
  • YE Xiongbing ,
  • ZHOU Chifei ,
  • LIU Biliu
Expand
  • 1. Academy of Military Sciences, Beijing 100091, China;
    2. 32032 Troops, Beijing 100094, China

Received date: 2019-11-04

  Revised date: 2019-11-28

  Online published: 2020-01-10

Abstract

In order to solve the nonlinear multi-objective optimization before on-orbit service, an on-orbit service resource allocation model under the composite service mode is constructed, and an on-orbit service resource allocation method based on Deep Q Network (DQN) convergence and stability improvement was proposed. This approach can cope with a composite service pattern which includes "one to many" and "many to one". This method can prioritize the allocation of important service objects on the premise of satisfying the expected success rate, and at the same time, take into account the comprehensive benefit of resource allocation and the overall energy consumption efficiency, achieving the comprehensive goal of completing the task efficiently and with the expected success rate and less resource input. Simulation results show that improved DQN method can independently allocate spacecraft resources based on the importance of service objects. This method has the advantages of fast convergence, low training error, and obvious comparative advantages in the optimization of distribution benefits and overall energy consumption.

Cite this article

LIU Bingyan , YE Xiongbing , ZHOU Chifei , LIU Biliu . Allocation of composite mode on-orbit service resource based on improved DQN[J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2020 , 41(5) : 323630 -323630 . DOI: 10.7527/S1000-6893.2019.23630

References

[1] TSIOTRAS P, NAILLY A. Comparison between peer-to-peer and single spacecraft refueling strategies for spacecraft in circular orbits[C]//Proceedings of the 2005 Infotech@Aerospace Conference. Reston:AIAA, 2005:26-29.
[2] SHEN H, TSIOTRAS P. Optimal two-impulse rendezvous using multiple-revolution Lambert solutions[J]. Journal of Guidance, Control and Dynamics, 2013, 26(1):50-61.
[3] ZHOU H, YAN Y, HUANG X, et al. Multi-objective planning of a multiple geostationary spacecraft refuelling mission[J]. Engineering Optimization, 2017, 49(3):531-548.
[4] 谭迎龙,乔兵,朱啸宇, 等.一种以燃耗为优化目标的航天器在轨加注作业调度[J]. 载人航天, 2018, 24(2):143-149. TAN Y L, QIAO B, ZHU X Y, et al. Mission scheduling for on-orbit spacecraft refueling with optimized fuel consumption during orbital maneuver[J]. Manned Spaceflight, 2018, 24(2):143-149(in Chinese).
[5] 朱啸宇, 乔兵, 张庆展, 等.一种基于燃料站的可往返式在轨加注任务调度模型及优化算法[J]. 工程科学与技术, 2017, 49(S2):186-194. ZHU X Y, QIAO B, ZHANG Q Z, et al. An reusable on-orbit refueling mode and mission scheduling algorithm for GEO spacecraft through a space fuel station[J]. Advanced Engineering Sciences, 2017, 49(S2):186-194(in Chinese).
[6] 肖海, 刘新学, 舒健生,等. 多在轨服务飞行器目标分配问题研究[J]. 计算机仿真,2017, 34(1):90-93,128. XIAO H, LIU X X, SHU J S, et al. Research on target allocation for multiple on-orbit service vehices[J]. Computer Simulation, 2017, 34(1):90-93, 128(in Chinese).
[7] GAO X Z, HAN H C, YANG K, et al. Energy efficiency optimization for D2D communications based on SCA and GP method[J]. China Communications, 2017, 14(3):66-74.
[8] 裴绪芳, 陈学强, 吕丽刚, 等. 基于随机森林强化学习的干扰智能决策方法研究[J]. 通信技术, 2019, 52(9):2118-2124. PEI X F, CHEN X Q, LV L G, et al. Research on jamming intelligent decision making method based on random forest reinforcement learning[J]. Communications Technology, 2019, 52(9):2118-2124(in Chinese).
[9] 廖晓闽, 严少虎, 石嘉, 等. 基于深度强化学习的蜂窝网资源分配算法[J]. 通信学报, 2019, 40(2):11-18. LIAO X M, YAN S H, SHI J, et al. Deep reinforcement learning based resource allocation algorithm in cellular networks[J]. Journal on Communications, 2019, 40(2):11-18(in Chinese).
[10] 肖鹏飞,张超勇,孟磊磊, 等. 基于深度强化学习的非置换流水车间调度问题[J/OL]. 计算机集成制造系统,(2019-07-11)[2019-10-09]. http://kns.cnki.net/kcms/deail/11.59-46.tp.20190708.1512.034.html. XIAO P F, ZHANG C Y, MENG L L, et al. Non-permutation flow shop scheduling problem based on deep reinforcement learning[J/OL]. Computer Integrated Manufacturing Systems, (2019-07-11)[2019-10-09].http://kns.cnki.net/kcms/deail/11.5946.tp.20190708.1512.034.html (in Chinese).
[11] 李俨, 董玉娜. 基于SA-DPSO混合优化算法的协同空战火力分配[J]. 航空学报, 2010, 31(3):626-631. LI Y, DONG Y N. Weapon-target assignment based on simulated annealing and discrete particle swarm optimization in cooperative air combat[J]. Acta Aeronautica et Astronautica Sinica, 2010, 31(3):626-631(in Chinese).
[12] 陈黎, 王中许, 武兆斌, 等. 一种基于先期毁伤准则的防空火力优化分配[J]. 航空学报, 2014, 35(9):2574-2582. CHEN L, WANG Z X, WU Z B, et al. A kind of antiaircraft weapon-target optimal assignment under earlier damage principle[J]. Acta Aeronautica et Astronautica Sinica, 2014, 35(9):2574-2582(in Chinese).
[13] 赵慧, 张学, 刘明, 等.实现无线传输能量效率最大化的功率控制新方法[J]. 计算机应用, 2013, 33(2):365-368, 381. ZHAO H, ZHANG X, LIU M, et al. New power control scheme with maximum energy efficiency in wireless transmission[J]. Journal of Computer Applications, 2013, 33(2):365-368, 381(in Chinese).
[14] RODOPLU V, MENG T H. Bits-per-Joule capacity of energy-limited wireless networks[J]. IEEE Transactions on Wireless Communications, 2007, 6(3):857-865.
[15] 孟雅哲. 航天器燃耗最优轨道直接/间接混合法延拓求解[J]. 航空学报, 2017, 38(1):320168. MENG Y Z. Minimum-fuel spacecraft transfer trajectories solved by direct/indirect hybird continuation method[J]. Acta Aeronautica et Astronautica Sinica, 2017, 38(1):320168(in Chinese).
[16] 张洪波. 航天器轨道力学理论与方法[M]. 北京:国防工业出版社, 2015:21-360. ZHANG H B. Theories and methods of spacecraft orbital mechanics[M]. Beijing:National Defense Industry Press, 2015:21-360(in Chinese).
[17] 谭迎龙. 航天器在轨服务作业模式及其调度算法研究[D]. 南京:南京航空航天大学, 2018. TAN Y L. Research on workflow and mission scheduling algorithms for on-orbit servicing of spacecraft[D]. Nanjing:Nanjing University of Aeronautics and Astronautics, 2018(in Chinese).
[18] HESSEL M, MODAYIL J, VAN HASSELT H, et al. Rainbow:Combining improvements in deep reinforcement learning[J]. Association for the Advancement of Artificial Intelligence, 2017, 10(6):3215-3222.
[19] WANG Z, SCHAUL T, HESSEL M, et al. Dueling network architectures for deep reinforcement learning[J]. Association for the Advancement of Artificial Intelligence, 2016, 4(5):1998-2003.
[20] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540):529-533.
[21] VAN HASSELT H, GUEZ A, SILVER D. Deep reinforcement learning with double Q-learning[C]//Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2016:2094-2100.
[22] 张菁, 何友, 彭应宁, 等. 基于神经网络和人工势场的协同博弈路径规划[J]. 航空学报, 2019, 40(3):322493. ZHANG J, HE Y, PENG Y N, et al. Neural network and artificial potential field based cooperative and adversarial path planning[J]. Acta Aeronautica et Astronautica Sinica, 2019, 40(3):322493(in Chinese).
[23] LIU B Y, YE X B, GAO Y, et al. Forward-looking imaginative planning framework combined with prioritized replay double DQN[C]//International Conference on Control, Automation and Robotics, 2019, 4:336-341.
[24] SHEN H. Optimal scheduling for satellite refuelling in circular orbits[D]. Georgia:Georgia Institute of Technology, 2003.
[25] SALOTTI J. Robust, affordable, semi-direct Mars mission[J]. Acta Astronautica, 2016, 127:235-248.
[26] 朱啸宇. 基于空间燃料站的圆轨道航天器在轨加注服务调度算法[D]. 南京:南京航空航天大学, 2017. ZHU X Y. Optimal scheduling for on-orbit refueling based on space fuel station[J]. Nanjing:Nanjing University of Aeronautics and Astronautics, 2017(in Chinese).
[27] 余婧. 航天器在轨服务任务规划技术研究[D]. 长沙:国防科技大学, 2015. YU J. Research on spacecrafts on-oribt servicing mission planning[D]. Changsha:National University of Defense Technology, 2015(in Chinese).
[28] 赵琳,王硕,郝勇, 等. 基于能量最优的敏捷遥感卫星在轨任务规划[J]. 航空学报, 2017, 38(6):207-225. ZHAO L, WANG S, HAO Y, et al. Energy-optimal in orbit mission planning for agile remote sensing satellites[J]. Acta Aeronautica et Astronautica Sinica, 2017, 38(6):207-225(in Chinese).
Outlines

/