基于PER-MATD3的多无人机攻防对抗机动决策

doi:10.7527/S1000-6893.2022.27083

Abstract

Abstract:

This paper explores multi-UAVs attack-defence confrontation maneuvering decision-making in a complex environment with random distribution of obstacles. A motion model and a radar detection model for both attack and defence sides are constructed. the Twin Delayed Deep Deterministic policy gradient （TD3） algorithm is extended to the multi-agent field to solve the problem of overestimation of the value function in the Multi-Agent Deep Deterministic Policy Gradient （MADDPG） algorithm. To improve the learning efficiency of the algorithm， a Prioritized Experience Replay Multi-Agent Twin Delayed Deep Deterministic policy gradient （PER-MATD3） algorithm is proposed based on the priority experience playback mechanism. The simulation experiments show that the method proposed in this paper has a good confrontation effect in multi-UAV attack-defence confrontation maneuvering decision making， and the advantages of the PER-MATD3 algorithm over other algorithms in terms of convergence speed and stability are verified by comparison.

Key words: multi-UAVs, multi-agent reinforcement learning, PER-MATD3, attack-defence confrontation, maneuvering decision-making

CLC Number:

V279

Xiaowei FU, Zhe XU, Jindong ZHU, Nan WANG. Maneuvering decision-making of multi-UAV attack-defence confrontation based on PER-MATD3[J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2023, 44(7): 327083-327083.

Figures/Tables 13

Fig. 1

Fig. 2

Fig. 3

Fig. 4

Fig. 5

Table 1

Fig. 6

Table 2

Table 3

Fig. 7

Table 4

Fig. 8

Fig. 9

References 28

1	孙智孝，杨晟琦，朴海音，等. 未来智能空战发展综述［J］. 航空学报， 2021， 42（8）： 525799.
	SUN Z X， YANG S Q， PIAO H Y， et al. A survey of air combat artificial intelligence［J］. Acta Aeronautica et Astronautica Sinica， 2021， 42（8）： 525799 （in Chinese）.
2	贾永楠，田似营，李擎. 无人机集群研究进展综述［J］. 航空学报， 2020， 41（S1）： 4-14.
	JIA Y N， TIAN S Y， LI Q. Recent development of unmanned aerial vehicle swarms［J］. Acta Aeronautica et Astronautica Sinica， 2020， 41（S1）： 4-14 （in Chinese）.
3	李兆强，周德云. 无人机数据链变结构对抗攻击导引方式研究［J］. 系统仿真学报， 2008， 20（13）： 3507-3509， 3513.
	LI Z Q， ZHOU D Y. UAV data links variable structure against attacks guidance law research［J］. Journal of System Simulation， 2008， 20（13）： 3507-3509， 3513 （in Chinese）.
4	田亚卓，张勇军. 基于改进人工势场法的动态环境下无人机路径规划［J］. 武汉科技大学学报， 2017， 40（6）： 451-456.
	TIAN Y Z， ZHANG Y J. UAV path planning based on improved artificial potential field in dynamic environment［J］. Journal of Wuhan University of Science and Technology， 2017， 40（6）： 451-456 （in Chinese）.
5	FANG B F， PAN Q S， HONG B R， et al. Research on high speed evader vs. multi lower speed pursuers in multi pursuit-evasion games［J］. Information Technology Journal， 2012， 11（8）： 989-997.
6	谢剑. 基于微分博弈论的多无人机追逃协同机动技术研究［D］. 哈尔滨：哈尔滨工业大学， 2015， 32-45.
	XIE J. Differential game theory for multi UAV pursuit maneuver technology based on collaborative research［D］. Harbin： Harbin Institute of Technology， 2015，32-45. （in Chinese）.
7	WEINTRAUB I， GARCIA E， PACHTER M. Optimal guidance strategy for the defense of a non‐manoeuvrable target in 3‐dimensions［J］. IET Control Theory & Applications， 2020， 14（11）： 1531-1538.
8	张国锋，周凯. 基于改进鱼群算法的无人机智能突防［J］. 控制工程， 2019， 26（5）： 922-926.
	ZHANG G F， ZHOU K. Intelligent penetration for UAV based on improved artificial fish swarm algorithm （AFSA）［J］. Control Engineering of China， 2019， 26（5）： 922-926 （in Chinese）.
9	HUO Z X， DAI S L， YUAN M X， et al. A reinforcement learning based multiple strategy framework for tracking a moving target［C］∥2020 IEEE/ASME International Conference on Advanced Intelligent Mechatronics （AIM）. Piscataway： IEEE Press， 2020： 1292-1297.
10	陈灿，莫雳，郑多，等. 非对称机动能力多无人机智能协同攻防对抗［J］. 航空学报， 2020， 41（12）： 324152.
	CHEN C， MO L， ZHENG D， et al. Cooperative attack-defense game of multiple UAVs with asymmetric maneuverability［J］. Acta Aeronautica et Astronautica Sinica， 2020， 41（12）： 324152 （in Chinese）.
11	马俊冲. 基于多机器人系统的多目标围捕协同控制问题研究［D］. 长沙：国防科技大学， 2018，43-52.
	MA J C. Research on encirclement control for a group of targets by multi-robot system［D］. Changsha： National University of Defense Technology， 2018，43-52 （in Chinese）.
12	符小卫，王辉，徐哲. 基于DE-MADDPG的多无人机协同追捕策略研究［J］. 航空学报， 2022：，43（5）：325311.
	FU X W， WANG H， XU Z. Research on cooperative pursuit strategy for multi-UAVs based on DE-MADDPG algorithm［J］. Acta Aeronautica et Astronautica Sinica， 2022，43（5）：325311 （in Chinese）.
13	李波，越凯强，甘志刚，等. 基于MADDPG的多无人机协同任务决策［J］. 宇航学报， 2021， 42（6）： 757-765.
	LI B， YUE K Q， GAN Z G， et al. Multi-UAV cooperative autonomous navigation based on multi-agent deep deterministic policy gradient［J］. Journal of Astronautics， 2021， 42（6）： 757-765 （in Chinese）.
14	周攀，黄江涛，章胜，等. 基于深度强化学习的智能空战决策与仿真研究［J］. 航空学报，2023， 44（4）： 126731.
	ZHOU P， HUANG J T， ZHANG S， et al. Research on UAV intelligent air combat decision and simulation based on deep reinforcement learning［J］. Acta Aeronautica et Astronautica Sinica， 2023， 44（4）： 126731 （in Chinese）.
15	HU Z J， GAO X G， WAN K F， et al. Relevant experience learning： A deep reinforcement learning method for UAV autonomous motion planning in complex unknown environments［J］. Chinese Journal of Aeronautics， 2021， 34（12）： 187-204.
16	BAI S X， SONG S M， LIANG S Y， et al. UAV maneuvering decision-making algorithm based on twin delayed deep deterministic policy gradient algorithm［J］. Journal of Artificial Intelligence and Technology， 2022， 2（1）： 16-22.
17	ZHANG S T. Autonomous navigation of UAV in multi-obstacle environments based on a deep reinforcement learning approach［J］. Applied Soft Computing， 2022， 115： 108194.
18	郭万春，解武杰，尹晖，等. 基于改进双延迟深度确定性策略梯度法的无人机反追击机动决策［J］. 空军工程大学学报（自然科学版）， 2021， 22（4）： 15-21.
	GUO W C， XIE W J， YIN H， et al. Research on UAV anti-pursing maneuvering decision based on improved twin delayed deep deterministic policy gradient method［J］. Journal of Air Force Engineering University （Natural Science Edition）， 2021， 22（4）： 15-21 （in Chinese）.
19	李文浩. 去中心化多智能体强化学习算法研究［D］. 上海：华东师范大学， 2019： 45-51.
	LI W H. Decentralized multi-agent reinforcement learning algorithm research［D］. Shanghai： East China Normal University， 2019： 45-51 （in Chinese）.
20	XIANG L， XIE T. Research on UAV swarm confrontation task based on MADDPG algorithm［C］∥2020 5th International Conference on Mechanical， Control and Computer Engineering （ICMCCE）. Piscataway： IEEE Press， 2020： 1513-1518.
21	黄利伟. 智能协同算法研究及应用［D］. 成都：电子科技大学， 2019： 31-40.
	HUANG L W. Research and application of the intelligent collaboration algorithms［D］. Chengdu： University of Electronic Science and Technology of China， 2019： 31-40 （in Chinese）.
22	LOWE R， WU Y， TAMAR A， et al. Multi-agent actor-critic for mixed cooperative-competitive environments［DB/OL］.arXiv preprint： 1706.02275，2017.
23	FUJIMOTO S， van HOOF H， MEGER D. Addressing function approximation error in actor-critic methods［DB/OL］. arXiv preprint： 1802.09477， 2018.
24	ZHANG F J. A TD3-based multi-agent deep reinforcement learning method in mixed cooperation-competition environment［J］. Neurocomputing， 2020， 411： 206-215.
25	SUI D， XU W P， ZHANG K. Study on the resolution of multi-aircraft flight conflicts based on an IDQN［J］. Chinese Journal of Aeronautics， 2022， 35（2）： 195-213.
26	高昂，董志明，李亮，等. MADDPG算法并行优先经验回放机制［J］. 系统工程与电子技术， 2021， 43（2）： 420-433.
	GAO A， DONG Z M， LI L， et al. Parallel priority experience replay mechanism of MADDPG algorithm［J］. Systems Engineering and Electronics， 2021， 43（2）： 420-433 （in Chinese）.
27	FU X W， ZHU J D， WEI Z Y， et al. A UAV pursuit-evasion strategy based on DDPG and imitation learning［J］. International Journal of Aerospace Engineering， 2022， 2022： 3139610.
28	赵毓，管公顺，郭继峰，等. 基于多智能体强化学习的空间机械臂轨迹规划［J］. 航空学报， 2021， 42（1）： 524151.
	ZHAO Y， GUAN G S， GUO J F， et al. Trajectory planning of space manipulator based on multi-agent reinforcement learning［J］. Acta Aeronautica et Astronautica Sinica， 2021， 42（1）： 524151 （in Chinese）.

序号	参数名称	参数数值
1	战场边界 $x m i n, x m a x × y m i n, y m a x$ $/ k m, k m$	$[0,100] × [0,80]$
2	障碍物数量 $N o$	9
3	不同种类障碍物半径 $R o k$ /km及对应个数	4（3），5（3），6（3）
4	障碍物随机生成区域 $/ k m, k m$	$[15,85] × [15,65]$
5	进攻无人机雷达探测范围 $R a θ a$ /（km·（°））	12×120
6	防守无人机雷达探测范围 $R d θ d$ /（km·（°））	8×120
7	离散化的探测状态数目 $m$	7
8	进攻无人机的速度上限 $v a m a x$ /（m·s^-1）	340
9	防守无人机的速度上限 $v d m a x$ /（m·s^-1）	300
10	进攻无人机的加速度上限 $a a m a x$ /（m·s^-2）	20
11	防守无人机的加速度上限 $a d m a x$ /（m·s^-2）	20
12	进攻无人机的最大角速度 $ω a m a x$ /s^-1	$π / 15.7$
13	防守无人机的最大角速度 $ω d m a x$ /s^-1	$π / 22.6$
14	进攻无人机的初始位置坐标/［km，km］及航向角/rad	［2.5，2.5］， π/4
15	防守无人机的初始位置坐标/［km，km］及航向角/rad	［95，75］，［90，78］，［98，70］，5×π/4
16	目标区域中心点坐标 $x t p, y t p$ /［km，km］及半径 $R t$ /km	［95，75］，5
17	防守无人机初始火力打击半径 $R f$ /km	1

算法	输入层	隐藏层1	隐藏层2	输出层
ILDDPG	输入自身状态观测值和动作量，节点数为11+2（进攻方）、13+2（防守方）	神经元个数128	神经元个数64	输出动作值函数，节点数为1
MADDPG	输入全局状态观测值和动作量，节点数为58
MATD3、PER-MATD3	2个Critic网络均输入全局状态观测值和动作量，节点数为58

超参数名称	符号	取值
折扣因子	$γ$	0.95
惯性更新率	$τ$	0.01
经验池大小	$M$	1×10⁵
批样本数	$m$	64
Actor网络学习率	$α A$	1×10^-5
Critic网络学习率	$α C$	1×10^-4
探索率	$ε$	$0.1 → 0$
动作噪声（正态方差）	$σ$	$3 → 0$
回合数	MaxEpisode	2 000
每回报最大时间步	MaxStep	500

算法	在1 000回合时的奖励值	1 000~2 000回合的平均奖励值	训练时奖励所达到的峰值
ILDDPG	286.2	524.6	565.7
MADDPG	322.4	571.0	602.9
MATD3	522.4	601.0	612.9

[1]	Shuyi GAO, Defu LIN, Duo ZHENG, Xinyu HU. Intelligent cooperative interception strategy of aircraft against cluster attack [J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(18): 328301-328301.
[2]	FU Xiaowei, WANG Hui, XU Zhe. Cooperative pursuit strategy for multi-UAVs based on DE-MADDPG algorithm [J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2022, 43(5): 325311-325311.
[3]	XUE Zhentao, CHEN Jian, ZHANG Zichao, LIU Xuzan, MIAO Xiansheng, HU Gui. Multi-UAV coverage path planning based on optimization of convex division of complex plots [J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2022, 43(12): 325990-325990.
[4]	WANG Tong, HUANG Panfeng, DONG Gangqi. Cooperation path planning of multi-UAV in road-network continuous monitoring [J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2020, 41(S1): 723753-723753.

Maneuvering decision-making of multi-UAV attack-defence confrontation based on PER-MATD3

RichHTML

PDF (PC)

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 13

References 28

Related Articles 4

Recommended Articles

Metrics

Comments