ACTA AERONAUTICAET ASTRONAUTICA SINICA >
Maneuvering decision-making of multi-UAV attack-defence confrontation based on PER-MATD3
Received date: 2022-02-28
Revised date: 2022-03-23
Accepted date: 2022-05-11
Online published: 2022-05-19
Supported by
Aeronautical Science Foundation of China(2020Z023053001)
This paper explores multi-UAVs attack-defence confrontation maneuvering decision-making in a complex environment with random distribution of obstacles. A motion model and a radar detection model for both attack and defence sides are constructed. the Twin Delayed Deep Deterministic policy gradient (TD3) algorithm is extended to the multi-agent field to solve the problem of overestimation of the value function in the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm. To improve the learning efficiency of the algorithm, a Prioritized Experience Replay Multi-Agent Twin Delayed Deep Deterministic policy gradient (PER-MATD3) algorithm is proposed based on the priority experience playback mechanism. The simulation experiments show that the method proposed in this paper has a good confrontation effect in multi-UAV attack-defence confrontation maneuvering decision making, and the advantages of the PER-MATD3 algorithm over other algorithms in terms of convergence speed and stability are verified by comparison.
Xiaowei FU , Zhe XU , Jindong ZHU , Nan WANG . Maneuvering decision-making of multi-UAV attack-defence confrontation based on PER-MATD3[J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2023 , 44(7) : 327083 -327083 . DOI: 10.7527/S1000-6893.2022.27083
1 | 孙智孝, 杨晟琦, 朴海音, 等. 未来智能空战发展综述[J]. 航空学报, 2021, 42(8): 525799. |
SUN Z X, YANG S Q, PIAO H Y, et al. A survey of air combat artificial intelligence[J]. Acta Aeronautica et Astronautica Sinica, 2021, 42(8): 525799 (in Chinese). | |
2 | 贾永楠, 田似营, 李擎. 无人机集群研究进展综述[J]. 航空学报, 2020, 41(S1): 4-14. |
JIA Y N, TIAN S Y, LI Q. Recent development of unmanned aerial vehicle swarms[J]. Acta Aeronautica et Astronautica Sinica, 2020, 41(S1): 4-14 (in Chinese). | |
3 | 李兆强, 周德云. 无人机数据链变结构对抗攻击导引方式研究[J]. 系统仿真学报, 2008, 20(13): 3507-3509, 3513. |
LI Z Q, ZHOU D Y. UAV data links variable structure against attacks guidance law research[J]. Journal of System Simulation, 2008, 20(13): 3507-3509, 3513 (in Chinese). | |
4 | 田亚卓, 张勇军. 基于改进人工势场法的动态环境下无人机路径规划[J]. 武汉科技大学学报, 2017, 40(6): 451-456. |
TIAN Y Z, ZHANG Y J. UAV path planning based on improved artificial potential field in dynamic environment[J]. Journal of Wuhan University of Science and Technology, 2017, 40(6): 451-456 (in Chinese). | |
5 | FANG B F, PAN Q S, HONG B R, et al. Research on high speed evader vs. multi lower speed pursuers in multi pursuit-evasion games[J]. Information Technology Journal, 2012, 11(8): 989-997. |
6 | 谢剑. 基于微分博弈论的多无人机追逃协同机动技术研究[D]. 哈尔滨: 哈尔滨工业大学, 2015, 32-45. |
XIE J. Differential game theory for multi UAV pursuit maneuver technology based on collaborative research[D]. Harbin: Harbin Institute of Technology, 2015,32-45. (in Chinese). | |
7 | WEINTRAUB I, GARCIA E, PACHTER M. Optimal guidance strategy for the defense of a non‐manoeuvrable target in 3‐dimensions[J]. IET Control Theory & Applications, 2020, 14(11): 1531-1538. |
8 | 张国锋, 周凯. 基于改进鱼群算法的无人机智能突防[J]. 控制工程, 2019, 26(5): 922-926. |
ZHANG G F, ZHOU K. Intelligent penetration for UAV based on improved artificial fish swarm algorithm (AFSA)[J]. Control Engineering of China, 2019, 26(5): 922-926 (in Chinese). | |
9 | HUO Z X, DAI S L, YUAN M X, et al. A reinforcement learning based multiple strategy framework for tracking a moving target[C]∥2020 IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM). Piscataway: IEEE Press, 2020: 1292-1297. |
10 | 陈灿, 莫雳, 郑多, 等. 非对称机动能力多无人机智能协同攻防对抗[J]. 航空学报, 2020, 41(12): 324152. |
CHEN C, MO L, ZHENG D, et al. Cooperative attack-defense game of multiple UAVs with asymmetric maneuverability[J]. Acta Aeronautica et Astronautica Sinica, 2020, 41(12): 324152 (in Chinese). | |
11 | 马俊冲. 基于多机器人系统的多目标围捕协同控制问题研究[D]. 长沙: 国防科技大学, 2018,43-52. |
MA J C. Research on encirclement control for a group of targets by multi-robot system[D]. Changsha: National University of Defense Technology, 2018,43-52 (in Chinese). | |
12 | 符小卫, 王辉, 徐哲. 基于DE-MADDPG的多无人机协同追捕策略研究[J]. 航空学报, 2022:,43(5):325311. |
FU X W, WANG H, XU Z. Research on cooperative pursuit strategy for multi-UAVs based on DE-MADDPG algorithm[J]. Acta Aeronautica et Astronautica Sinica, 2022,43(5):325311 (in Chinese). | |
13 | 李波, 越凯强, 甘志刚, 等. 基于MADDPG的多无人机协同任务决策[J]. 宇航学报, 2021, 42(6): 757-765. |
LI B, YUE K Q, GAN Z G, et al. Multi-UAV cooperative autonomous navigation based on multi-agent deep deterministic policy gradient[J]. Journal of Astronautics, 2021, 42(6): 757-765 (in Chinese). | |
14 | 周攀, 黄江涛, 章胜, 等. 基于深度强化学习的智能空战决策与仿真研究[J]. 航空学报,2023, 44(4): 126731. |
ZHOU P, HUANG J T, ZHANG S, et al. Research on UAV intelligent air combat decision and simulation based on deep reinforcement learning[J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(4): 126731 (in Chinese). | |
15 | HU Z J, GAO X G, WAN K F, et al. Relevant experience learning: A deep reinforcement learning method for UAV autonomous motion planning in complex unknown environments[J]. Chinese Journal of Aeronautics, 2021, 34(12): 187-204. |
16 | BAI S X, SONG S M, LIANG S Y, et al. UAV maneuvering decision-making algorithm based on twin delayed deep deterministic policy gradient algorithm[J]. Journal of Artificial Intelligence and Technology, 2022, 2(1): 16-22. |
17 | ZHANG S T. Autonomous navigation of UAV in multi-obstacle environments based on a deep reinforcement learning approach[J]. Applied Soft Computing, 2022, 115: 108194. |
18 | 郭万春, 解武杰, 尹晖, 等. 基于改进双延迟深度确定性策略梯度法的无人机反追击机动决策[J]. 空军工程大学学报(自然科学版), 2021, 22(4): 15-21. |
GUO W C, XIE W J, YIN H, et al. Research on UAV anti-pursing maneuvering decision based on improved twin delayed deep deterministic policy gradient method[J]. Journal of Air Force Engineering University (Natural Science Edition), 2021, 22(4): 15-21 (in Chinese). | |
19 | 李文浩. 去中心化多智能体强化学习算法研究[D]. 上海: 华东师范大学, 2019: 45-51. |
LI W H. Decentralized multi-agent reinforcement learning algorithm research[D]. Shanghai: East China Normal University, 2019: 45-51 (in Chinese). | |
20 | XIANG L, XIE T. Research on UAV swarm confrontation task based on MADDPG algorithm[C]∥2020 5th International Conference on Mechanical, Control and Computer Engineering (ICMCCE). Piscataway: IEEE Press, 2020: 1513-1518. |
21 | 黄利伟. 智能协同算法研究及应用[D]. 成都: 电子科技大学, 2019: 31-40. |
HUANG L W. Research and application of the intelligent collaboration algorithms[D]. Chengdu: University of Electronic Science and Technology of China, 2019: 31-40 (in Chinese). | |
22 | LOWE R, WU Y, TAMAR A, et al. Multi-agent actor-critic for mixed cooperative-competitive environments[DB/OL].arXiv preprint: 1706.02275,2017. |
23 | FUJIMOTO S, van HOOF H, MEGER D. Addressing function approximation error in actor-critic methods[DB/OL]. arXiv preprint: 1802.09477, 2018. |
24 | ZHANG F J. A TD3-based multi-agent deep reinforcement learning method in mixed cooperation-competition environment[J]. Neurocomputing, 2020, 411: 206-215. |
25 | SUI D, XU W P, ZHANG K. Study on the resolution of multi-aircraft flight conflicts based on an IDQN[J]. Chinese Journal of Aeronautics, 2022, 35(2): 195-213. |
26 | 高昂, 董志明, 李亮, 等. MADDPG算法并行优先经验回放机制[J]. 系统工程与电子技术, 2021, 43(2): 420-433. |
GAO A, DONG Z M, LI L, et al. Parallel priority experience replay mechanism of MADDPG algorithm[J]. Systems Engineering and Electronics, 2021, 43(2): 420-433 (in Chinese). | |
27 | FU X W, ZHU J D, WEI Z Y, et al. A UAV pursuit-evasion strategy based on DDPG and imitation learning[J]. International Journal of Aerospace Engineering, 2022, 2022: 3139610. |
28 | 赵毓, 管公顺, 郭继峰, 等. 基于多智能体强化学习的空间机械臂轨迹规划[J]. 航空学报, 2021, 42(1): 524151. |
ZHAO Y, GUAN G S, GUO J F, et al. Trajectory planning of space manipulator based on multi-agent reinforcement learning[J]. Acta Aeronautica et Astronautica Sinica, 2021, 42(1): 524151 (in Chinese). |
/
〈 |
|
〉 |