电子电气工程与控制

基于启发强化学习的大规模ADR任务优化方法

  • 杨家男 ,
  • 侯晓磊 ,
  • HU Yu Hen ,
  • 刘勇 ,
  • 潘泉 ,
  • 冯乾
展开
  • 1. 西北工业大学 自动化学院, 西安 710129;
    2. 美国威斯康星大学麦迪逊分校 电气与计算机工程系, 麦迪逊 53706

收稿日期: 2020-06-02

  修回日期: 2020-09-12

  网络出版日期: 2020-10-10

基金资助

国家自然科学基金(61703343,61790552);陕西省自然科学基金(2018JQ6070);中央高校基本科研业务费(3102018JCC003)

Heuristic enhanced reinforcement learning method for large-scale multi-debris active removal mission planning

  • YANG Jianan ,
  • HOU Xiaolei ,
  • HU Yu Hen ,
  • LIU Yong ,
  • PAN Quan ,
  • FENG Qian
Expand
  • 1. College of Automation, Northwestern Polytechnical University, Xi'an 710129, China;
    2. Department of Electrical and Computer Engineering, University of Wisconsin-Madison, Madison 53706, USA

Received date: 2020-06-02

  Revised date: 2020-09-12

  Online published: 2020-10-10

Supported by

National Natural Science Foundation of China (61703343, 61790552); Natural Science Foundation of Shaanxi (2018JQ6070); The Fundamental Research Funds for Central Universities (3102018JCC003)

摘要

随着航天事业的蓬勃发展,空间碎片尤其是低轨碎片已成为航天任务不可忽视的威胁。考虑到碎片清除的紧迫性和成本,低轨多碎片主动清除(ADR)技术成为缓解现状的必要手段。针对大规模多碎片主动清除任务规划问题,首先,基于任务规划的最大收益模型,提出一种强化学习(RL)优化方法,并依照强化学习框架定义了该问题的状态、动作以及收益函数;其次,基于高效启发因子,提出一种专用的改进蒙特卡罗树搜索(MCTS)算法,该算法使用MCTS算法作为内核,加入高效启发算子以及强化学习迭代过程;最后,在铱星33碎片云的全数据集中检验了所提算法有效性。与相关MCTS变体方法以及贪婪启发算法对比,所提方法能在测试数据集上更高效地获得较优规划结果,较好地平衡了探索与利用。

本文引用格式

杨家男 , 侯晓磊 , HU Yu Hen , 刘勇 , 潘泉 , 冯乾 . 基于启发强化学习的大规模ADR任务优化方法[J]. 航空学报, 2021 , 42(4) : 524354 -524354 . DOI: 10.7527/S1000-6893.2020.24354

Abstract

Vigorous development of the space industry leads to a nonnegligible space debris threat to future space activities. The Active multi-Debris Removal (ADR) technology has become an indispensable means to alleviate this situation. Aiming at the large-scale multi-debris active removal mission planning problem, a Reinforcement Learning (RL) planning scheme is first proposed based on the maximal-reward optimization model for the ADR problem, and the state, action, and reward function of this problem are defined according to the RL framework. Based on an efficient heuristics method, a specialized Monte Carlo Tree Search (MCTS) algorithm is then presented, with the Monte Carlo Tree Search as the core structure and efficient heuristic operators and reinforcement learning iteration process. Finally, its effectiveness is tested in the large-scale complete Iridium 33 debris cloud. The results show that this method is superior to the original MCTS algorithm and the heuristic greedy algorithm.

参考文献

[1] LIOU J C, JOHNSON N L. Risks in space from orbiting debris[J]. Science, 2006, 311(5759):340-341.
[2] LIOU J C, SHOOTS D. Space missions and satellite box score[J]. Orbital Debris Quarterly News, 2014, 18(2):10.
[3] LIOU J C. Engineering and technology challenges for active debris removal[J]. Progress in Propulsion Physics, 2013, 4:735-748.
[4] KESSLER D J, JOHNSON N L, LIOU J, et al. The Kessler syndrome:Implications to future space operations[J]. Advances in the Astronautical Sciences, 2010, 137(8):2010.
[5] LIOU J C, KRISKO P. An update on the effectiveness of Postmission disposal in LEO[C]//64th International Astronautical Congress. Amsterdam:Elsevier, 2013:23-27.
[6] LIOU J C. An active debris removal parametric study for LEO environment remediation[J]. Advances in Space Research, 2011, 47(11):1865-1876.
[7] 曹喜滨, 李峰, 张锦绣, 等. 空间碎片天基主动清除技术发展现状及趋势[J]. 国防科技大学学报, 2015(4):117-120. CAO X B, LI F, ZHANG J X, et al. Development status and tendency of active debris removal[J]. Journal of National University of Defense Technology, 2015(4):117-120(in Chinese).
[8] 王成林, 张艳, 王鲲鹏. 地基激光清除空间碎片的策略[J]. 北京航空航天大学学报, 2015, 41(11):2137-2143. WANG C L, ZHANG Y, WANG K P. Strategy of removing space debris using ground-based lasers[J]. Journal of Beijing University of Aeronautics and Astronautics, 2015, 41(11):2137-2143(in Chinese).
[9] BRAUN V, L PKEN A, FLEGEL S, et al. Active debris removal of multiple priority targets[J]. Advances in Space Research, 2013, 51(9):1638-1648.
[10] BEREND N, OLIVE X. Bi-objective optimization of a multiple-target active debris removal mission[J]. Acta Astronautica, 2016, 122:324-335.
[11] LIU Y, YANG J. A multi-objective planning method for multi-debris active removal mission in LEO[C]//AIAA Guidance, Navigation, and Control Conference. Reston:AIAA, 2017:1733.
[12] LIU Y, YANG J, WANG Y, et al. Multi-objective optimal preliminary planning of multi-debris active removal mission in LEO[J]. Science China Information Sciences, 2017, 60(7):072202.
[13] YANG J, HU Y H, LIU Y, et al. A maximal-reward preliminary planning for multi-debris active removal mission in LEO with a greedy heuristic method[J]. Acta Astronautica, 2018, 149:123-142.
[14] CERF M. Multiple space debris collecting mission:Optimal mission planning[J]. Journal of Optimization Theory and Applications, 2015, 167(1):195-218.
[15] BARBEE B W, ALFANO S, PINON E, et al. Design of spacecraft missions to remove multiple orbital debris objects[C]//IEEE Aerospace Conference. Piacataway:IEEE Press, 2011:1-14.
[16] CERF M. Multiple space debris collecting mission debris selection and trajectory optimization[J]. Journal of Optimization Theory and Applications, 2013, 156(3):761-796.
[17] SHEN H X, ZHANG T J, CASALINO L, et al. Optimization of active debris removal missions with multiple targets[J]. Journal of Spacecraft and Rockets, 2018, 55(1):181-189.
[18] YU J, CHEN X Q, CHEN L H. Optimal planning of LEO active debris removal based on hybrid optimal control theory[J]. Advances in Space Research, 2015, 55(11):2628-2640.
[19] ZUIANI F, VASILE M. Preliminary design of debris removal missions by means of simplified models for low-thrust, many-revolution transfers[J]. International Journal of Aerospace Engineering, 2012:836250.
[20] YU J, CHEN X Q, CHEN L H, et al. Optimal scheduling of GEO debris removing based on hybrid optimal control theory[J]. Acta Astronautica, 2014, 93(1):400-409.
[21] MADAKAT D, MORIO J, VANDERPOOTEN D. Biobjective planning of an active debris removal mission[J]. Acta Astronautica, 2013, 84:182-188.
[22] OLYMPIO J T, FROUVELLE N. Space debris selection and optimal guidance for removal in the SSO with low-thrust propulsion[J]. Acta Astronautica, 2014, 99:263-275.
[23] STUART J, HOWELL K, WILSON R. Application of multi-agent coordination methods to the design of space debris mitigation tours[J]. Advances in Space Research, 2016, 57(8):1680-1697.
[24] IZZO D, GETZNER I, HENNES D, et al. Evolving solutions to tsp variants for active space debris removal[C]//Annual Conference on Genetic and Evolutionary Computation. New York:ACM, 2015:1207-1214.
[25] MERETA A, IZZO D, WITTIG A. Machine learning of optimal low-thrust transfers between near-earth objects[C]//International Conference on Hybrid Artificial Intelligence Systems. Asturias:Gijon, 2017:543-553.
[26] IZZO D, TAILOR D, VASILEIOU T. On the stability analysis of optimal state feedbacks as represented by deep neural models[DB/OL]. arXiv preprint:1812.02532,2018.
[27] 刘冰雁, 叶雄兵, 周赤非, 等. 基于改进DQN的复合模式在轨服务资源分配[J]. 航空学报, 2020, 41(5):323630. LIU B Y, YE X B, ZHOU C F, et al. Allocation of composite mode on-orbit service resource based on improved DQN[J]. Acta Aeronautica et Astronautica Sinica, 2020, 41(5):323630(in Chinese).
[28] 左家亮, 杨任农, 张滢, 等. 基于启发式强化学习的空战机动智能决策[J]. 航空学报, 2017, 38(10):321168. ZUO J L, YANG R N, ZHANG Y, et al. Intelligent decision-making in air combat maneuvering based on heuristic reinforcement learning[J]. Acta Aeronautica et Astronautica Sinica, 2017, 38(10):321168(in Chinese).
[29] 相晓嘉, 闫超, 王菖, 等. 基于深度强化学习的固定翼无人机编队协调控制方法[J]. 航空学报, 2021, 42(4):524009 XIANG X J, YAN C, WANG C, et al. Towards coordination control for fixed-wing UAV formation through deep reinforcement learning[J]. Acta Aeronautica et Astronautica Sinica, 2021, 42(4):524009(in Chinese).
[30] 赵毓, 管公顺, 郭继峰, 等. 基于多智能体强化学习的空间机械臂轨迹规划[J]. 航空学报, 2021, 42(4):524151. ZHAO Y, GUAN G S, GUO J F, et al. Trajectory planning of space manipulator based on multi-agent reinforcement learning[J]. Acta Aeronautica et Astronautica Sinica, 2021, 42(4):524151(in Chinese).
[31] BIANCHI R A, RIBEIRO C H, COSTA A H. On the relation between ant colony optimization and heuristically accelerated reinforcement learning[C]//1st International Workshop on Hybrid Control of Autonomous System. Pasadena, 2009:49-55.
[32] WANG Z, CHEN C, LI H X, et al. A novel incremental learning scheme for reinforcement learning in dynamic environments[C]//Intelligent Control and Automation. Piscataway:IEEE Press, 2016:2426-2431.
[33] AYDIN M E, FELLOWS R. A reinforcement learning algorithm for building collaboration in multi-agent systems[DB/OL]. arXiv preprint:1711.10574,2017.
[34] KOBER J, BAGNELL J A, PETERS J. Reinforcement learning in robotics:A survey[J]. The International Journal of Robotics Research, 2013, 32(11):1238-1274.
[35] BUSONIU L, BABUSKA R, SCHUTTER B D. A comprehensive survey of multiagent reinforcement learning[J]. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 2008, 38(2):156-172.
[36] MUKHOPADHYAY S, TILAK O, CHAKRABARTI S. Reinforcement learning algorithms for uncertain, dynamic, zero-sum games[C]//17th IEEE International Conference on Machine Learning and Applications (ICMLA). Piscataway:IEEE Press, 2018:48-54.
[37] KIUMARSI B, LEWIS F L, MODARES H, et al. Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics[J]. Automatica, 2014, 50(4):1167-1175.
[38] SILVER D, SCHRITTWIESER J, SIMONYAN K, et al. Mastering the game of Go without human knowledge[J]. Nature, 2017, 550(7676):354-359.
[39] ANSELMO L, PARDINI C. Ranking upper stages in low Earth orbit for active removal[J]. Acta Astronautica, 2016, 122:19-27.
[40] 刘全, 翟建伟, 章宗长, 等. 深度强化学习综述[J]. 计算机学报, 2018, 41(1):1-27. LIU Q, ZHAI J W, ZHANG Z C, et al. A survey on deep reinforcement learning[J]. Chinese Journal of Computers, 2018, 41(1):1-27(in Chinese).
[41] 唐振韬. 深度强化学习进展:从AlphaGo到AlphaGo Zero[J]. 控制理论与应用, 2017, 34(12):1529-1546. TANG Z T. Recent progress of deep reinforcement learning:From AlphaGo to AlphaGo Zero[J]. Control Theory & Applications, 2017, 34(12):1529-1546(in Chinese).
[42] BROWNE C B, POWLEY E, WHITEHOUSE D, et al. A survey of Monte Carlo tree search methods[J]. IEEE Transactions on Computational Intelligence and AI in Games, 2012, 4(1):1-43.
[43] WATKINS C J C H, DAYAN P. Technical note:Q-learning[J]. Machine Learning, 1992, 8(3-4):279-292.
文章导航

/