基于分支深度强化学习的非合作目标追逃博弈策略求解

doi:10.7527/S1000-6893.2020.24040

Abstract

Abstract: To solve the space rendezvous problem between spacecraft and non-cooperative targets and alleviate application limitations of deep reinforcement learning in continuous space, this paper proposes a pursuit-evasion game algorithm based on branching deep reinforcement learning to obtain the space rendezvous strategy. The differential game is used to solve the optimal control problem of space intersection for non-cooperative targets, which is described as a pursuit-evasion game problem under the action of continuous thrust. To avoid the dimension disaster of the traditional deep reinforcement learning in dealing with continuous space, this paper constructs a fuzzy inference model to represent the continuous space, and proposes a branching deep reinforcement learning architecture with multiple parallel neural networks and a shared decision module. The combination of optimal control and game theory is realized, effectively overcoming the difficulty in solving the highly nonlinear differential game model by the classical optimal control theory, and further improving the training ability of deep reinforcement learning on discrete behaviors. Finally, an example is given to verify the effectiveness of the algorithm.

Key words: non-cooperative targets, space rendezvous, pursuit-evasion problem of spacecraft, continuous space, differential game, deep reinforcement learning, branching architectures

CLC Number:

LIU Bingyan, YE Xiongbing, GAO Yong, WANG Xinbo, NI Lei. Strategy solution of non-cooperative target pursuit-evasion game based on branching deep reinforcement learning[J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2020, 41(10): 324040-324040.

References

[1] 常燕,陈韵,鲜勇, 等. 机动目标的空间交会微分对策制导方法[J].宇航学报, 2016, 37(7):795-801. CHANG Y, CHEN Y, XIAN Y, et al. Differential game guidance for space rendezvous of maneuvering target[J].Journal of Astronautics, 2016, 37(7):795-801(in Chinese).
[2] 柴源,罗建军, 王明明, 等. 基于追逃博弈的非合作目标接近控制[J].宇航总体技术, 2020, 4(1):30-38. CHAI Y, LUO J J, WANG M M, et al. Pursuit-Evasion game control for approaching space non-cooperative target[J].Astronautical Systems Engineering Technology, 2020, 4(1):30-38(in Chinese).
[3] 王强,叶东,范宁军, 等. 基于零控脱靶量的卫星末端追逃控制方法[J].北京理工大学学报, 2016,36(11):1171-1176. WANG Q, YE D, FAN N J, et al. Terminal orbital control of satellite pursuit evasion game based on zero effort miss[J].Transactions of Beijing Institute of Technology, 2016, 36(11):1171-1176(in Chinese).
[4] ISAACS R. Differential games[M]. New York:Wiley, 1965.
[5] FRIEDMAN A. Differential games[M]. Rhode Island:American Mathematical Society, 1974.
[6] DICKMANNS E, WELL K. Approximate solution of optimal control problems using third order hermite polynomial functions[C]//Optimization Techniques IFIP Technical Conference, 1974:1-7.
[7] 张秋华, 孙松涛, 谌颖, 等. 时间固定的两航天器追逃策略及数值求解[J].宇航学报, 2014, 35(5):537-544. ZHANG Q H, SUN S T, CHEN Y, et al. Strategy and numerical solution of pursuit-evasion with fixed duration for two spacecraft[J].Journal of Astronautics, 2014, 35(5):537-544(in Chinese).
[8] 赵琳,周俊峰,刘源, 等. 三维空间"追-逃-防"三方微分对策方法[J].系统工程与电子技术, 2019, 41(2):322-335. ZHAO L, ZHOU J F, LIU Y, et al. Three-body differential game approach of pursuit-evasion-defense in three dimensional space[J].Systems Engineering and Electronics, 2019, 41(2):322-335(in Chinese).
[9] 郝志伟,孙松涛,张秋华, 等. 半直接配点法在航天器追逃问题求解中的应用[J].宇航学报, 2019, 40(6):628-635. HAO Z W, SUN S T, ZHANG Q H, et al. Application of semi-direct collocation method for solving pursuit-evasion problems of spacecraft[J].Journal of Astronautics, 2019, 40(6):628-635(in Chinese).
[10] 李龙跃, 刘付显, 史向峰, 等. 导弹追逃博弈微分对策建模与求解[J].系统工程理论与实践, 2016,36(8):2161-2168. LI L Y, LIU F X, SHI X F, et al. Differential game model and solving method for missile pursuit-evasion[J].Systems Engineering-Theory & Practice, 2016, 36(8):2161-2168(in Chinese).
[11] 陈燕妮. 基于微分对策的有限时间自适应动态规划制导研究[D]. 南京:南京航空航天大学, 2019. CHEN Y N. Research on differential games-based finite-time adaptive dynamic programming guidance law[D]. Nanjing:Nanjing University of Aeronautics and Astronautics, 2019(in Chinese).
[12] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J].Nature, 2015, 518(7540):529.
[13] 刘冰雁,叶雄兵,周赤非, 等. 基于改进DQN的复合模式在轨服务资源分配[J].航空学报, 2020, 41(4):323630. LIU B Y, YE X B, ZHOU C F, et al. Composite mode on-orbit service resource allocation based on improved DQN[J].Acta Aeronautica et Astronautica Sinica, 2020, 41(4):323630(in Chinese).
[14] 曹雷. 基于深度强化学习的智能博弈对抗关键技术[J].指挥信息系统与技术, 2019, 10(5):1-7. CAO L. Key technologies of intelligent game confrontation based on deep reinforcement learning[J].Command Information Systemand Technology, 2019, 10(5):1-7(in Chinese).
[15] CHENG Y, SUN Z J, HUANG Y X, et al. Fuzzy categorical deep reinforcement learning of a defensive game for an unmanned surface vessel[J].International Journal of Fuzzy Systems, 2019, 21(2):592-606.
[16] LIU B Y, YE X B, GAO Y, et al. Forward-looking imaginative planning framework combined with prioritized replay double DQN[C]//International Conferenceon Control, Automation and Robotics, 2019:336-341.
[17] 吴晓光,刘绍维,杨磊, 等. 基于深度强化学习的双足机器人斜坡步态控制方法[J/OL]. 自动化学报, 2020:1-13[2020-02-28]. https://doi.org/10.16383/j.aas.c190547. WU X G, LIU S W, YANG L et al. A gait control method for biped robot on slope based on deep reinforcement learning[J].Acta Automatica Sinica, 2020:1-13[2020-02-28]. https://doi.org/10.16383/j.aas.c190547(in Chinese).
[18] 吴其昌,张洪波. 基于生存型微分对策的航天器追逃策略及数值求解[J].控制与信息技术, 2019(4):39-43. WU Q C, ZHANG H B. Spacecraft pursuit strategy and numerical solution based on survival differential strategy[J].Control and Information Technology, 2019(4):39-43(in Chinese).
[19] ENGWERDA J. Algorithms for computing Nash equilibria indeterministic LQ games[J] Computational Management Science, 2007, 4(2):113-140.
[20] 约翰纳什. 博弈论经典[M]. 北京:中国人民大学出版社, 2013. NASH J. Classic in game theory[M]. Beijing:China Renmin University Press, 2013(in Chinese).
[21] SUN S T,ZHANG Q H,LOXTON R, et al. Numerical solution of a pursuit-evasion differential game involving two spacecraft in low earth orbit[J].Journal of Industrial and Management Optimization, 2015, 11(4):1127-1147.
[22] CRANDALL M G, ISHII H, LIONS P L. User's guide to viscosity solutions of second order partial differential equations[J].Bulletin of the American Mathematical Society, 1992, 27(1):1-67.
[23] 孙松涛. 近地轨道上两航天器追逃对策及数值求解方法研究[D]. 哈尔滨:哈尔滨工业大学, 2015. SUN S T. Two spacecraft pursuit-evasion strategies on low earth orbit and numerical solution[D]. Harbin:Harbin Institute of Technology, 2015(in Chinese).
[24] SCHWARTZ H M. Multi-agent machine learning:A reinforcement approach[M]. New York:John Wiley & Sons, Inc., 2014.
[25] WANG L X. A course in fuzzy systems and control[M]. New Jersey:Prentice-Hall, Inc., 1997.
[26] TAKAGI T, SUGENO M. Fuzzy identifcation of systems and its applications to modelling ad control[J].IEEE Transactions on Systems Man and Cyberetics, 1985, 15:116-132.
[27] JANG J S R, SUN C T. Neuro-fuzzy and soft computing:A computational approach to learning and machine intelligence[M]. New Jersey:Prentice-Hall, Inc., 1997.
[28] DAI X, LI C, RAD A. An approach to tune fuzzy contorllers based on reinforcement learning for autonomous vehicle control[J].IEEE Transactions on Intelligent Transportation Systems, 2005, 6(3):285-293.
[29] DESOUKY S, SCHWARTZ H. Q(λ)-learning fuzzy logic controller for a multi-robot system[C]//IEEE International Conference on Systems, Man and Cybernetics. Piscataway:IEEE Press, 2010:4075-4080.
[30] JANG J S R, SUN C T. Neuro-fuzzy and soft computing:A computational approach to learning and machine intelligence[M]. New Jersey:Prentice-Hall, Inc., 1997.
[31] ROSS T J. Fuzzy logic with engineering applications[M]. New York:John Wiley & Sons, Ltd., 2010.
[32] MATIGNON L, LAURENT G J, LE F P. Independent reinforcement learners in cooperative Markov games:A survey regarding coordination problems[J].The Knowledge Engineering Review, 2012, 27(1):1-31.
[33] FRANK EJ, HÄRDLE W K, HAFNER C M. Neural networks and deep learning[M]. Verlag:Springer, 2019.
[34] LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[C]//In International Conference on Learning Representations, 2016.
[35] RICHARD S S, ANDREW G B. 强化学习[M]. 2版. 北京:电子工业出版社, 2019. RICHARD S S, ANDREW G B. Reinforcement learning[M]. 2nd ed. Beijing:Publishing House of Electronics Industry, 2019.
[36] HESSEL M, MODAYIL J, VAN H H, et al. Rainbow:Combining improvements in deep reinforcement learning[J].Association for the Advancement of Artificial Intelligence, 2017, 10(6):3215-3222.
[37] 苏飞,刘静,张耀,等. 航天器面内机动规避最优脉冲分析[J].系统工程与电子技术,2018,40(12):2782-2789. SU F, LIU J, ZHANG Y, et al. Analysis of optimal impulse for in-plane collision avoidance maneuver[J].Systems Engineering and Electronics, 2018, 40(12):2782-2789(in Chinese).

Strategy solution of non-cooperative target pursuit-evasion game based on branching deep reinforcement learning

RichHTML

PDF (PC)

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics

Comments

[1]	Honglin ZHANG, Jianjun LUO, Weihua MA. Spacecraft game decision making for threat avoidance of space targets based on machine learning [J]. Acta Aeronautica et Astronautica Sinica, 2024, 45(8): 329136-329136.
[2]	Yunpeng CAI, Dapeng ZHOU, Jiangchuan DING. Intelligent collaborative control of UAV swarms with collision avoidance safety constraints [J]. Acta Aeronautica et Astronautica Sinica, 2024, 45(5): 529683-529683.
[3]	Shengzhe SHAN, Weiwei ZHANG. Air combat intelligent decision-making method based on self-play and deep reinforcement learning [J]. Acta Aeronautica et Astronautica Sinica, 2024, 45(4): 328723-328723.
[4]	Xuejian WANG, Yongming WEN, Xiaorong SHI, Ningning ZHANG, Jiexi LIU. Design of hybrid intelligent decision framework for multi⁃agent and multi⁃coupling tasks [J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(S2): 729770-729770.
[5]	Xizhen GAO, Liang TANG, Huang HUANG. Deep reinforcement learning in autonomous manipulation for celestial bodies exploration: Applications and challenges [J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2023, 44(6): 26762-026762.
[6]	Pan ZHOU, Jiangtao HUANG, Sheng ZHANG, Gang LIU, Bowen SHU, Jigang TANG. Intelligent air combat decision making and simulation based on deep reinforcement learning [J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2023, 44(4): 126731-126731.
[7]	Xiangwei ZHU, Dan SHEN, Kai XIAO, Yuexin MA, Xiang LIAO, Fuqiang GU, Fangwen YU, Kefu GAO, Jingnan LIU. Mechanisms, algorithms, implementation and perspectives of brain⁃inspired navigation [J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(19): 28569-028569.
[8]	Sheng XU, Ming CHU, Shaoqi LIN, Rui CHANG, Hanxu SUN. Dynamic parameter identification without excitation for non-cooperative targets post soft capture [J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(19): 228342-228342.
[9]	Xi CHEN, Di YANG, Kang NIU, Jiaxun LI, Jianqiao YU. Reach-avoid game with time limit and detection range [J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(17): 328215-328215.
[10]	Lei DONG, Hongbing CHEN, Xi CHEN, Changxiao ZHAO. Distributed multi-agent coalition task allocation strategy for single pilot operation mode based on DQN [J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(13): 327895-327895.
[11]	Wenxue CHEN, Changsheng GAO, Wuxing JING. Trust region policy optimization guidance algorithm for intercepting maneuvering target [J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2023, 44(11): 327596-327596.
[12]	Sheng ZHANG, Pan ZHOU, Yang HE, Jiangtao HUANG, Gang LIU, Jigang TANG, Huaizhi JIA, Xin DU. Air combat maneuver decision-making test based on deep reinforcement learning [J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2023, 44(10): 128094-128094.
[13]	HU Yanyan, ZHANG Li, XIA Hui, ZHANG Naiwen, YAN Rongyi. Cooperative capture of maneuvering targets with incomplete information based on differential game [J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2022, 43(S1): 726905-726905.
[14]	FU Xiaowei, WANG Hui, XU Zhe. Cooperative pursuit strategy for multi-UAVs based on DE-MADDPG algorithm [J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2022, 43(5): 325311-325311.
[15]	CHAI Yuan, LUO Jianjun, WANG Mingming. Predictive game control for on-orbit transportation by multiple microsatellites with impulsive thrust [J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2022, 43(12): 326112-326112.