航空学报 > 2021, Vol. 42 Issue (4): 524354-524354   doi: 10.7527/S1000-6893.2020.24354

基于启发强化学习的大规模ADR任务优化方法

杨家男1, 侯晓磊1, HU Yu Hen2, 刘勇1, 潘泉1, 冯乾1   

  1. 1. 西北工业大学 自动化学院, 西安 710129;
    2. 美国威斯康星大学麦迪逊分校 电气与计算机工程系, 麦迪逊 53706
  • 收稿日期:2020-06-02 修回日期:2020-09-12 发布日期:2020-10-10
  • 通讯作者: 杨家男 E-mail:yang_jia_nan@mail.nwpu.edu.cn
  • 基金资助:
    国家自然科学基金(61703343,61790552);陕西省自然科学基金(2018JQ6070);中央高校基本科研业务费(3102018JCC003)

Heuristic enhanced reinforcement learning method for large-scale multi-debris active removal mission planning

YANG Jianan1, HOU Xiaolei1, HU Yu Hen2, LIU Yong1, PAN Quan1, FENG Qian1   

  1. 1. College of Automation, Northwestern Polytechnical University, Xi'an 710129, China;
    2. Department of Electrical and Computer Engineering, University of Wisconsin-Madison, Madison 53706, USA
  • Received:2020-06-02 Revised:2020-09-12 Published:2020-10-10
  • Supported by:
    National Natural Science Foundation of China (61703343, 61790552); Natural Science Foundation of Shaanxi (2018JQ6070); The Fundamental Research Funds for Central Universities (3102018JCC003)

摘要: 随着航天事业的蓬勃发展,空间碎片尤其是低轨碎片已成为航天任务不可忽视的威胁。考虑到碎片清除的紧迫性和成本,低轨多碎片主动清除(ADR)技术成为缓解现状的必要手段。针对大规模多碎片主动清除任务规划问题,首先,基于任务规划的最大收益模型,提出一种强化学习(RL)优化方法,并依照强化学习框架定义了该问题的状态、动作以及收益函数;其次,基于高效启发因子,提出一种专用的改进蒙特卡罗树搜索(MCTS)算法,该算法使用MCTS算法作为内核,加入高效启发算子以及强化学习迭代过程;最后,在铱星33碎片云的全数据集中检验了所提算法有效性。与相关MCTS变体方法以及贪婪启发算法对比,所提方法能在测试数据集上更高效地获得较优规划结果,较好地平衡了探索与利用。

关键词: 空间碎片清除, 任务规划, 强化学习, 启发算子, 蒙特卡罗树搜索

Abstract: Vigorous development of the space industry leads to a nonnegligible space debris threat to future space activities. The Active multi-Debris Removal (ADR) technology has become an indispensable means to alleviate this situation. Aiming at the large-scale multi-debris active removal mission planning problem, a Reinforcement Learning (RL) planning scheme is first proposed based on the maximal-reward optimization model for the ADR problem, and the state, action, and reward function of this problem are defined according to the RL framework. Based on an efficient heuristics method, a specialized Monte Carlo Tree Search (MCTS) algorithm is then presented, with the Monte Carlo Tree Search as the core structure and efficient heuristic operators and reinforcement learning iteration process. Finally, its effectiveness is tested in the large-scale complete Iridium 33 debris cloud. The results show that this method is superior to the original MCTS algorithm and the heuristic greedy algorithm.

Key words: active debris removal, mission planning, reinforcement learning, heuristic operator, Monte Carlo tree search

中图分类号: