航空学报 > 2020, Vol. 41 Issue (10): 324040-324040   doi: 10.7527/S1000-6893.2020.24040

基于分支深度强化学习的非合作目标追逃博弈策略求解

刘冰雁1,2, 叶雄兵1, 高勇2, 王新波2, 倪蕾3   

  1. 1. 军事科学院, 北京 100091;
    2. 解放军 32032部队, 北京 100094;
    3. 航天工程大学, 北京 101416
  • 收稿日期:2020-03-31 修回日期:2020-10-25 发布日期:2020-11-04
  • 通讯作者: 刘冰雁 E-mail:bingyanl@outlook.com

Strategy solution of non-cooperative target pursuit-evasion game based on branching deep reinforcement learning

LIU Bingyan1,2, YE Xiongbing1, GAO Yong2, WANG Xinbo2, NI Lei3   

  1. 1. Academy of Military Sciences, Beijing 100091, China;
    2. 32032 Troops, Beijing 100094, China;
    3. Space Engineering University, Beijing 101416, China
  • Received:2020-03-31 Revised:2020-10-25 Published:2020-11-04

摘要: 为解决航天器与非合作目标的空间交会问题,缓解深度强化学习在连续空间的应用限制,提出了一种基于分支深度强化学习的追逃博弈算法,以获得与非合作目标的空间交会策略。对于非合作目标的空间交会最优控制,运用微分对策描述为连续推力作用下的追逃博弈问题;为避免传统深度强化学习应对连续空间存在维数灾难问题,通过构建模糊推理模型来表征连续空间,提出了一种具有多组并行神经网络和共享决策模块的分支深度强化学习架构。实现了最优控制与博弈论的结合,有效解决了微分对策模型高度非线性且难于利用经典最优控制理论进行求解的难题,进一步提升了深度强化学习对离散行为的学习能力,并通过算例仿真检验了该算法的有效性。

关键词: 非合作目标, 空间交会, 航天器追逃问题, 连续空间, 微分对策, 深度强化学习, 分支架构

Abstract: To solve the space rendezvous problem between spacecraft and non-cooperative targets and alleviate application limitations of deep reinforcement learning in continuous space, this paper proposes a pursuit-evasion game algorithm based on branching deep reinforcement learning to obtain the space rendezvous strategy. The differential game is used to solve the optimal control problem of space intersection for non-cooperative targets, which is described as a pursuit-evasion game problem under the action of continuous thrust. To avoid the dimension disaster of the traditional deep reinforcement learning in dealing with continuous space, this paper constructs a fuzzy inference model to represent the continuous space, and proposes a branching deep reinforcement learning architecture with multiple parallel neural networks and a shared decision module. The combination of optimal control and game theory is realized, effectively overcoming the difficulty in solving the highly nonlinear differential game model by the classical optimal control theory, and further improving the training ability of deep reinforcement learning on discrete behaviors. Finally, an example is given to verify the effectiveness of the algorithm.

Key words: non-cooperative targets, space rendezvous, pursuit-evasion problem of spacecraft, continuous space, differential game, deep reinforcement learning, branching architectures

中图分类号: