电子电气工程与控制

基于分支深度强化学习的非合作目标追逃博弈策略求解

  • 刘冰雁 ,
  • 叶雄兵 ,
  • 高勇 ,
  • 王新波 ,
  • 倪蕾
展开
  • 1. 军事科学院, 北京 100091;
    2. 解放军 32032部队, 北京 100094;
    3. 航天工程大学, 北京 101416

收稿日期: 2020-03-31

  修回日期: 2020-10-25

  网络出版日期: 2020-11-04

Strategy solution of non-cooperative target pursuit-evasion game based on branching deep reinforcement learning

  • LIU Bingyan ,
  • YE Xiongbing ,
  • GAO Yong ,
  • WANG Xinbo ,
  • NI Lei
Expand
  • 1. Academy of Military Sciences, Beijing 100091, China;
    2. 32032 Troops, Beijing 100094, China;
    3. Space Engineering University, Beijing 101416, China

Received date: 2020-03-31

  Revised date: 2020-10-25

  Online published: 2020-11-04

摘要

为解决航天器与非合作目标的空间交会问题,缓解深度强化学习在连续空间的应用限制,提出了一种基于分支深度强化学习的追逃博弈算法,以获得与非合作目标的空间交会策略。对于非合作目标的空间交会最优控制,运用微分对策描述为连续推力作用下的追逃博弈问题;为避免传统深度强化学习应对连续空间存在维数灾难问题,通过构建模糊推理模型来表征连续空间,提出了一种具有多组并行神经网络和共享决策模块的分支深度强化学习架构。实现了最优控制与博弈论的结合,有效解决了微分对策模型高度非线性且难于利用经典最优控制理论进行求解的难题,进一步提升了深度强化学习对离散行为的学习能力,并通过算例仿真检验了该算法的有效性。

本文引用格式

刘冰雁 , 叶雄兵 , 高勇 , 王新波 , 倪蕾 . 基于分支深度强化学习的非合作目标追逃博弈策略求解[J]. 航空学报, 2020 , 41(10) : 324040 -324040 . DOI: 10.7527/S1000-6893.2020.24040

Abstract

To solve the space rendezvous problem between spacecraft and non-cooperative targets and alleviate application limitations of deep reinforcement learning in continuous space, this paper proposes a pursuit-evasion game algorithm based on branching deep reinforcement learning to obtain the space rendezvous strategy. The differential game is used to solve the optimal control problem of space intersection for non-cooperative targets, which is described as a pursuit-evasion game problem under the action of continuous thrust. To avoid the dimension disaster of the traditional deep reinforcement learning in dealing with continuous space, this paper constructs a fuzzy inference model to represent the continuous space, and proposes a branching deep reinforcement learning architecture with multiple parallel neural networks and a shared decision module. The combination of optimal control and game theory is realized, effectively overcoming the difficulty in solving the highly nonlinear differential game model by the classical optimal control theory, and further improving the training ability of deep reinforcement learning on discrete behaviors. Finally, an example is given to verify the effectiveness of the algorithm.

参考文献

[1] 常燕,陈韵,鲜勇, 等. 机动目标的空间交会微分对策制导方法[J].宇航学报, 2016, 37(7):795-801. CHANG Y, CHEN Y, XIAN Y, et al. Differential game guidance for space rendezvous of maneuvering target[J].Journal of Astronautics, 2016, 37(7):795-801(in Chinese).
[2] 柴源,罗建军, 王明明, 等. 基于追逃博弈的非合作目标接近控制[J].宇航总体技术, 2020, 4(1):30-38. CHAI Y, LUO J J, WANG M M, et al. Pursuit-Evasion game control for approaching space non-cooperative target[J].Astronautical Systems Engineering Technology, 2020, 4(1):30-38(in Chinese).
[3] 王强,叶东,范宁军, 等. 基于零控脱靶量的卫星末端追逃控制方法[J].北京理工大学学报, 2016,36(11):1171-1176. WANG Q, YE D, FAN N J, et al. Terminal orbital control of satellite pursuit evasion game based on zero effort miss[J].Transactions of Beijing Institute of Technology, 2016, 36(11):1171-1176(in Chinese).
[4] ISAACS R. Differential games[M]. New York:Wiley, 1965.
[5] FRIEDMAN A. Differential games[M]. Rhode Island:American Mathematical Society, 1974.
[6] DICKMANNS E, WELL K. Approximate solution of optimal control problems using third order hermite polynomial functions[C]//Optimization Techniques IFIP Technical Conference, 1974:1-7.
[7] 张秋华, 孙松涛, 谌颖, 等. 时间固定的两航天器追逃策略及数值求解[J].宇航学报, 2014, 35(5):537-544. ZHANG Q H, SUN S T, CHEN Y, et al. Strategy and numerical solution of pursuit-evasion with fixed duration for two spacecraft[J].Journal of Astronautics, 2014, 35(5):537-544(in Chinese).
[8] 赵琳,周俊峰,刘源, 等. 三维空间"追-逃-防"三方微分对策方法[J].系统工程与电子技术, 2019, 41(2):322-335. ZHAO L, ZHOU J F, LIU Y, et al. Three-body differential game approach of pursuit-evasion-defense in three dimensional space[J].Systems Engineering and Electronics, 2019, 41(2):322-335(in Chinese).
[9] 郝志伟,孙松涛,张秋华, 等. 半直接配点法在航天器追逃问题求解中的应用[J].宇航学报, 2019, 40(6):628-635. HAO Z W, SUN S T, ZHANG Q H, et al. Application of semi-direct collocation method for solving pursuit-evasion problems of spacecraft[J].Journal of Astronautics, 2019, 40(6):628-635(in Chinese).
[10] 李龙跃, 刘付显, 史向峰, 等. 导弹追逃博弈微分对策建模与求解[J].系统工程理论与实践, 2016,36(8):2161-2168. LI L Y, LIU F X, SHI X F, et al. Differential game model and solving method for missile pursuit-evasion[J].Systems Engineering-Theory & Practice, 2016, 36(8):2161-2168(in Chinese).
[11] 陈燕妮. 基于微分对策的有限时间自适应动态规划制导研究[D]. 南京:南京航空航天大学, 2019. CHEN Y N. Research on differential games-based finite-time adaptive dynamic programming guidance law[D]. Nanjing:Nanjing University of Aeronautics and Astronautics, 2019(in Chinese).
[12] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J].Nature, 2015, 518(7540):529.
[13] 刘冰雁,叶雄兵,周赤非, 等. 基于改进DQN的复合模式在轨服务资源分配[J].航空学报, 2020, 41(4):323630. LIU B Y, YE X B, ZHOU C F, et al. Composite mode on-orbit service resource allocation based on improved DQN[J].Acta Aeronautica et Astronautica Sinica, 2020, 41(4):323630(in Chinese).
[14] 曹雷. 基于深度强化学习的智能博弈对抗关键技术[J].指挥信息系统与技术, 2019, 10(5):1-7. CAO L. Key technologies of intelligent game confrontation based on deep reinforcement learning[J].Command Information Systemand Technology, 2019, 10(5):1-7(in Chinese).
[15] CHENG Y, SUN Z J, HUANG Y X, et al. Fuzzy categorical deep reinforcement learning of a defensive game for an unmanned surface vessel[J].International Journal of Fuzzy Systems, 2019, 21(2):592-606.
[16] LIU B Y, YE X B, GAO Y, et al. Forward-looking imaginative planning framework combined with prioritized replay double DQN[C]//International Conferenceon Control, Automation and Robotics, 2019:336-341.
[17] 吴晓光,刘绍维,杨磊, 等. 基于深度强化学习的双足机器人斜坡步态控制方法[J/OL]. 自动化学报, 2020:1-13[2020-02-28]. https://doi.org/10.16383/j.aas.c190547. WU X G, LIU S W, YANG L et al. A gait control method for biped robot on slope based on deep reinforcement learning[J].Acta Automatica Sinica, 2020:1-13[2020-02-28]. https://doi.org/10.16383/j.aas.c190547(in Chinese).
[18] 吴其昌,张洪波. 基于生存型微分对策的航天器追逃策略及数值求解[J].控制与信息技术, 2019(4):39-43. WU Q C, ZHANG H B. Spacecraft pursuit strategy and numerical solution based on survival differential strategy[J].Control and Information Technology, 2019(4):39-43(in Chinese).
[19] ENGWERDA J. Algorithms for computing Nash equilibria indeterministic LQ games[J] Computational Management Science, 2007, 4(2):113-140.
[20] 约翰纳什. 博弈论经典[M]. 北京:中国人民大学出版社, 2013. NASH J. Classic in game theory[M]. Beijing:China Renmin University Press, 2013(in Chinese).
[21] SUN S T,ZHANG Q H,LOXTON R, et al. Numerical solution of a pursuit-evasion differential game involving two spacecraft in low earth orbit[J].Journal of Industrial and Management Optimization, 2015, 11(4):1127-1147.
[22] CRANDALL M G, ISHII H, LIONS P L. User's guide to viscosity solutions of second order partial differential equations[J].Bulletin of the American Mathematical Society, 1992, 27(1):1-67.
[23] 孙松涛. 近地轨道上两航天器追逃对策及数值求解方法研究[D]. 哈尔滨:哈尔滨工业大学, 2015. SUN S T. Two spacecraft pursuit-evasion strategies on low earth orbit and numerical solution[D]. Harbin:Harbin Institute of Technology, 2015(in Chinese).
[24] SCHWARTZ H M. Multi-agent machine learning:A reinforcement approach[M]. New York:John Wiley & Sons, Inc., 2014.
[25] WANG L X. A course in fuzzy systems and control[M]. New Jersey:Prentice-Hall, Inc., 1997.
[26] TAKAGI T, SUGENO M. Fuzzy identifcation of systems and its applications to modelling ad control[J].IEEE Transactions on Systems Man and Cyberetics, 1985, 15:116-132.
[27] JANG J S R, SUN C T. Neuro-fuzzy and soft computing:A computational approach to learning and machine intelligence[M]. New Jersey:Prentice-Hall, Inc., 1997.
[28] DAI X, LI C, RAD A. An approach to tune fuzzy contorllers based on reinforcement learning for autonomous vehicle control[J].IEEE Transactions on Intelligent Transportation Systems, 2005, 6(3):285-293.
[29] DESOUKY S, SCHWARTZ H. Q(λ)-learning fuzzy logic controller for a multi-robot system[C]//IEEE International Conference on Systems, Man and Cybernetics. Piscataway:IEEE Press, 2010:4075-4080.
[30] JANG J S R, SUN C T. Neuro-fuzzy and soft computing:A computational approach to learning and machine intelligence[M]. New Jersey:Prentice-Hall, Inc., 1997.
[31] ROSS T J. Fuzzy logic with engineering applications[M]. New York:John Wiley & Sons, Ltd., 2010.
[32] MATIGNON L, LAURENT G J, LE F P. Independent reinforcement learners in cooperative Markov games:A survey regarding coordination problems[J].The Knowledge Engineering Review, 2012, 27(1):1-31.
[33] FRANK EJ, HÄRDLE W K, HAFNER C M. Neural networks and deep learning[M]. Verlag:Springer, 2019.
[34] LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[C]//In International Conference on Learning Representations, 2016.
[35] RICHARD S S, ANDREW G B. 强化学习[M]. 2版. 北京:电子工业出版社, 2019. RICHARD S S, ANDREW G B. Reinforcement learning[M]. 2nd ed. Beijing:Publishing House of Electronics Industry, 2019.
[36] HESSEL M, MODAYIL J, VAN H H, et al. Rainbow:Combining improvements in deep reinforcement learning[J].Association for the Advancement of Artificial Intelligence, 2017, 10(6):3215-3222.
[37] 苏飞,刘静,张耀,等. 航天器面内机动规避最优脉冲分析[J].系统工程与电子技术,2018,40(12):2782-2789. SU F, LIU J, ZHANG Y, et al. Analysis of optimal impulse for in-plane collision avoidance maneuver[J].Systems Engineering and Electronics, 2018, 40(12):2782-2789(in Chinese).
文章导航

/