基于DACTM-PPO的机载末端红外复合干扰智能决策

doi:10.7527/S1000-6893.2025.32759

Abstract

Abstract:

With the continuous improvement in the guidance accuracy and maneuverability of infrared-guided air-to-air missiles， combat aircraft find it increasingly difficult to effectively evade the risk of infrared missile hits through maneuvering avoidance or single infrared countermeasures alone. As a result， composite infrared countermeasures have become a critical means to ensure aircraft survivability. To address the challenge of airborne terminal composite infrared countermeasures， this study proposes an intelligent decision-making method based on an improved Proximal Policy Optimization （PPO） algorithm. From the perspective of the airborne terminal confrontation scenario， the decision constraints faced by combat aircraft under infrared-guided missile attacks are analyzed， and models for infrared decoy flares and laser directional jamming are established. An improved PPO algorithm incorporating a dynamic asymmetric clipping mechanism and a fusion of temporal memory and attention mechanisms is proposed to enhance convergence efficiency and solution quality. Furthermore， a reward function integrating the characteristics of jamming means is designed， incorporating overuse and ineffective-use penalty terms to achieve a rational balance between jamming effectiveness and resource consumption. Simulation results demonstrate that the intelligent decision-making method for infrared composite jamming can organize infrared jamming measures in a reasonably coordinated manner， exhibiting excellent performance under various typical aircraft-missile confrontation scenarios. Compared with the original near-end strategy optimization algorithm， the flexible action-evaluation algorithm， and the preset rule-based method， this method shows significant advantages in metrics such as aircraft survivability， missile miss distance， and resource utilization efficiency， demonstrating good application value.

Key words: airborne terminal defense, infrared composite jamming, reinforcement learning, infrared decoy bombs, laser directional jamming

CLC Number:

V279

Yanlong HAN, An ZHANG, Wenhao BI, Qiucen FAN, Tianle HOU. Intelligent decision-making of airborne terminal infrared composite jamming based on DACTM-PPO[J]. Acta Aeronautica et Astronautica Sinica, 2026, 47(7): 332759.

Figures/Tables 26

Fig.1

Fig.2

Fig.3

Fig.4

Fig.5

Fig.6

Table 1

Airborne terminal state space

状态名称	状态标识	维度	取值范围
机弹相对方位/rad	$o t d i$	2	$- π / 2, π / 2 × [0,2 π)$
最大测角误差/rad	$o t e r$	2	$0, δ θ × [0, δ ψ)$
机弹距离编号	$o t d i s$	1	$0,1, ⋯, D m a x / D b i n$
飞机速度/（ $m ⋅ s - 1$ ）	$o t a v$	1	$v m i n a i r c r a f t, v m a x a i r c r a f t$
飞机高度/m	$o t h$	1	$0, h m a x a i r c r a f t$
飞机俯仰角和偏航角/rad	$o t a e$	2	$[- π / 2, π / 2) × [- π, π)$
飞机机动动作	$o t a m$	1	$1,2, 3,4$
红外诱饵弹剩余数量/枚	$o t i r$	1	$0,1, ⋯, N m a x i n f r a r e d$

Table 1

Table 2

Airborne terminal action space

动作名称	动作标识	维度	取值范围
是否释放一组红外诱饵弹	$a t i r$	1	$0, 1$
每组红外诱饵弹枚数	$a t g s$	1	$1,2, …, 6$
组内红外诱饵弹弹间隔	$a t g i$	1	$0.02 ∶ 0.02 ∶ 0.10$
激光定向干扰状态	$a t l d$	1	$0, 1$

Table 2

Fig.7

Table 3

Aircraft parameters and infrared jamming parameters

性能参数	数值或设置
最大过载 $n f m a x$ /g	2.5
红外诱饵弹初始质量 $m d 0$ /kg	0.5
红外诱饵弹质量变化率 $m ˙$ /（kg·s^-1）	0.01
红外诱饵弹速度 $v d$ /（m·s^-1）	50
红外诱饵弹最大辐射强度 $I m a x$ /（W·（sr）^-1）	9 000
红外诱饵弹燃烧时间 $t 1$ /s	5
红外诱饵弹投放方向 $d f$	沿机体坐标系后下方45°
激光定向干扰输出功率 $P 0$ /W	4 000
激光束散角 $θ$ /rad	1×10^-3
激光波长 $λ$ /μm	10.6

Table 3

Table 4

Missile performance parameters

导弹性能参数	数值
最大过载 $n m m a x$ /g	50
导弹杀伤半径 $R k i l l$ /m	12
导引头最大作用距离 $D m m a x$ /m	12 000
导引头视场角度 $A m$ /（°）	180
导弹最大角速度 $ω m$ /（rad·s^-1）	15.7
导引头光学系统焦距 $f$ /mm	57
红外探测器像元尺寸 $d 0$ /μm	12

Table 4

Table 5

DACTM-PPO algorithm training parameters

算法参数	数值
最大训练次数 $s m a x$	$1 × 104$
PPO剪切系数 $ϵ$	0.2
折扣因子 $γ$	0.95
GAE折扣系数 $λ$	0.98
熵正则项系数 $E$	$1 × 10 - 3$
每轮训练迭代次数 $n e p o c h$	4
LSTM隐层维度 $d L S$	128
LSTM层数 $n L S$	1
注意力键/查询维度 $d k v$	128
注意力Dropout概率 $p a t t n$	0.2
Actor/Critic网络结构 $A d i m$	［256， 128， 64， 32］
Actor/Critic学习率 $l A C$	$1 × 10 - 3$
训练批次样本数 $B s i z e$	128

Table 5

Fig.8

Fig.9

Fig.10

Table 6

SAC algorithm training parameters

参数	数值
最大训练次数 $s m a x$	$1 × 104$
折扣因子 $γ$	0.95
Actor/Critic学习率 $l A C$	$1 × 10 - 3$
批样本次数 $B s i z e$	128
软更新系数 $τ c$	$5 × 10 - 3$
初始温度 $a 0$	0.01
温度学习率 $l a$	$1 × 10 - 3$

Table 6

Fig.11

Fig.12

Table 7

Fig.13

Fig.14

Table 8

Fig.15

Fig.16

Fig.17

Fig.18

References 25

[1]	CHEN C， MO L， LYU M L， et al. Enhanced missile hit probability actor-critic algorithm for autonomous decision-making in air-to-air confrontation［J］. Aerospace Science and Technology， 2024， 151： 109285.
[2]	SONAWANE H R， MAHULIKAR S P. Tactical air warfare： Generic model for aircraft susceptibility to infrared guided missiles［J］. Aerospace Science and Technology， 2011， 15（4）： 249-260.
[3]	GONG X P， CHEN W C， CHEN Z Y. All-aspect attack guidance law for agile missiles based on deep reinforcement learning［J］. Aerospace Science and Technology， 2022， 127： 107677.
[4]	DENG T B， HUANG H， FANG Y W， et al. Reinforcement learning-based missile terminal guidance of maneuvering targets with decoys［J］. Chinese Journal of Aeronautics， 2023， 36（12）： 309-324.
[5]	DEBNATH S， REJ P， KUMAR H， et al. A computational model for prediction of IR intensity and burn time of Magnesium-Teflon-Viton （MTV） based Infrared （IR） decoy flare of various configurations［J］. Infrared Physics & Technology， 2025， 145： 105651.
[6]	吴晓迪，黄超超. 多枚红外诱饵弹运动轨迹仿真［J］. 激光与红外， 2015， 45（12）： 1473-1476.
	WU X D， HUANG C C. Simulation for the motion traces of infrared decoys［J］. Laser & Infrared， 2015， 45（12）： 1473-1476 （in Chinese）.
[7]	SHI L K， PEI Y， YUN Q J， et al. Agent-based effectiveness evaluation method and impact analysis of airborne laser weapon system in cooperation combat［J］. Chinese Journal of Aeronautics， 2023， 36（4）： 442-454.
[8]	王炜强，贾晓洪，韩宇萌，等. 定向干扰激光的红外成像建模与仿真［J］. 红外与激光工程， 2016， 45（6）： 0606005.
	WANG W Q， JIA X H， HAN Y M， et al. Infrared imaging modeling and simulation of DIRCM laser［J］. Infrared and Laser Engineering， 2016， 45（6）： 0606005 （in Chinese）.
[9]	张颜伟，白春华，蔡猛. 红外干扰弹与定向红外对抗系统协同使用研究［J］. 电光与控制， 2023， 30（2）： 82-85.
	ZHANG Y W， BAI C H， CAI M. Cooperative usage of infrared jamming projectile and directional infrared countermeasure system［J］. Electronics Optics & Control， 2023， 30（2）： 82-85， 105 （in Chinese）.
[10]	白杨，张成，王博宇，等. 机载末端红外对抗作战效能仿真研究［J］. 红外与激光工程， 2022， 51（11）： 20220105.
	BAI Y， ZHANG C， WANG B Y， et al. Simulation of airborne terminal infrared countermeasure operational effectiveness［J］. Infrared and Laser Engineering， 2022， 51（11）： 20220105 （in Chinese）.
[11]	PIAO H Y， HAN Y， CHEN H C， et al. Complex relationship graph abstraction for autonomous air combat collaboration： A learning and expert knowledge hybrid approach［J］. Expert Systems with Applications， 2023， 215： 119285.
[12]	徐西蒙，魏贤智，张涛，等. 基于混沌粒子群优化算法的战斗机使用空射诱饵的攻击决策［J］. 电光与控制， 2015， 22（11）： 42-47.
	XU X M， WEI X Z， ZHANG T， et al. CPSO based decision-making of fighters using miniature air launched decoy［J］. Electronics Optics & Control， 2015， 22（11）： 42-47 （in Chinese）.
[13]	张涛，周中良，于雷，等. 战斗机使用空射诱饵弹协同规避策略［J］. 系统工程与电子技术， 2017， 39（12）： 2738-2744.
	ZHANG T， ZHOU Z L， YU L， et al. Coordinated evasion strategy for MALD and fighter in air combat［J］. Systems Engineering and Electronics， 2017， 39（12）： 2738-2744 （in Chinese）.
[14]	BAYRAK A E， POLAT F. Employment of an evolutionary heuristic to solve the target allocation problem efficiently［J］. Information Sciences， 2013， 222： 675-695.
[15]	LI Y， HAN W， WANG Y Q. Deep reinforcement learning with application to air confrontation intelligent decision-making of manned/unmanned aerial vehicle cooperative system［J］. IEEE Access， 2020， 8： 67887-67898.
[16]	李传浩，明振军，王国新，等. 基于多智能体深度强化学习的无人平台箔条干扰末端防御动态决策方法［J］. 兵工学报， 2025， 46（3）： 19-33.
	LI C H， MING Z J， WANG G X， et al. Dynamic decision-making method of unmanned platform chaff jamming for terminal defense based on multi-agent deep reinforcement learning［J］. Acta Armamentarii， 2025， 46（3）： 19-33 （in Chinese）.
[17]	黄成，邱志聪，许家忠. 地月环境下航天器近距离接近自主决策［J］. 光学精密工程， 2025， 33（6）： 979-992.
	HUANG C， QIU Z C， XU J Z. Autonomous decision-making for spacecraft close approaches in the Earth-Moon environment［J］. Optics and Precision Engineering， 2025， 33（6）： 979-992 （in Chinese）.
[18]	YANG M C， SHAN S Z， ZHANG W W. Decision-making and confrontation in close-range air combat based on reinforcement learning［J］. Chinese Journal of Aeronautics， 2025， 38（9）： 103526.
[19]	ZHU J Y， KUANG M C， ZHOU W Q， et al. Mastering air combat game with deep reinforcement learning［J］. Defence Technology， 2024， 34： 295-312.
[20]	HE X， ZHAO W L， GAO Z J， et al. A novel deep reinforcement learning model based on DDPG considering attention mechanism and combined with GRU network for short-term load forecasting［J］. Applied Soft Computing， 2025， 184： 113739.
[21]	XIAO H P， FU L J， SHANG C Y， et al. Collaborative energy-saving path planning of unmanned surface vehicle cluster based on multi-head attention mechanism and multi-agent deep reinforcement learning［J］. Engineering Applications of Artificial Intelligence， 2025， 161： 112078.
[22]	HU Z T， LIANG X F， ZHANG J， et al. Exploring crash induction strategies in within-visual-range air combat based on distributional reinforcement learning［J］. Chinese Journal of Aeronautics， 2025， 38（9）： 103663.
[23]	WANG W F， RU L， LYU M L， et al. Dynamic and adaptive learning for autonomous decision-making in beyond visual range air combat［J］. Aerospace Science and Technology， 2025， 163： 110327.
[24]	王存灿，王晓芳，林海. 一种元学习和强化学习结合的多飞行器协同制导律［J］. 兵工学报， 2025， 46（7）： 201-215.
	WANG C C， WANG X F， LIN H. A cooperative guidance law based on meta-learning and reinforcement learning for multiple aerial vehicles［J］. Acta Armamentarii， 2025， 46（7）： 201-215 （in Chinese）.
[25]	RAO G A， MAHULIKAR S P. New criterion for aircraft susceptibility to infrared guided missiles［J］. Aerospace Science and Technology， 2005， 9（8）： 701-712.

算法	飞机存活率/%	平均训练时长/s
DACTM-PPO	94.6	5 589.39
PPO	81.2	5 083.36
SAC	79.6	24 108.40
预设规则	49.2

算法	导弹平均脱靶量/m	导弹中位数脱靶量/m
DACTM-PPO	299.8	274.9
PPO	267.6	222.8
SAC	255.4	230.9

Intelligent decision-making of airborne terminal infrared composite jamming based on DACTM-PPO

RichHTML

PDF (PC)

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 26

References 25

Related Articles 15

Recommended Articles

Metrics

Comments

[1]	Wei XIONG, Dong ZHANG, Shuheng YANG, Zhi REN, Wenyi LIU. Manned/unmanned aerial vehicle collaborative interpretable method for intelligent air combat [J]. Acta Aeronautica et Astronautica Sinica, 2026, 47(7): 332547-332547.
[2]	Sihua GAO, Bingyang ZHAO, Jianfu LI. UAV complete data collection trajectory planning algorithm based on time window constraints [J]. Acta Aeronautica et Astronautica Sinica, 2026, 47(6): 332451-332451.
[3]	Yunxiao LIAN, Ni LI, Feng XIE, Pan ZHOU, Changyin DONG. A multi-UAV cooperative air combat decision-making method based on spatial-temporal information fusion [J]. Acta Aeronautica et Astronautica Sinica, 2026, 47(6): 332633-332633.
[4]	Lei ZHANG, Can TIAN, Fangqing WEN, Qinghe ZHANG, Han LIU. Multi-objective evolution with deep deterministic strategy gradient algorithm for mobile edge networks [J]. Acta Aeronautica et Astronautica Sinica, 2026, 47(3): 631880-631880.
[5]	Zan MA, Jie BAI, Liqin YAN, Yong CHEN, Shuguang SUN. Safety assessment for airborne intelligent avoidance system based on Bayesian optimization [J]. Acta Aeronautica et Astronautica Sinica, 2026, 47(1): 331973-331973.
[6]	Tao ZHANG, Pan LI, Zixu WANG, Zhenhua ZHU. Design of reward functions for helicopter attitude control in reinforcement learning [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(S1): 732184-732184.
[7]	Kaifang WAN, Zhilin WU, Yunhui WU, Haozhi QIANG, Yibo WU, Bo LI. Cooperative location of multiple UAVs with deep reinforcement learning in GPS-denied environment [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(8): 331024-331024.
[8]	Lingfeng JIANG, Xinkai LI, Hai ZHANG, Hanwei LI, Hongli ZHANG. Mapless navigation of UAVs in dynamic environments based on an improved TD3 algorithm [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(8): 331035-331035.
[9]	Min YANG, Guanjun LIU, Ziyuan ZHOU. Control of lunar landers based on secure reinforcement learning [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(3): 630553-630553.
[10]	Qichao XIE, Chengyu CAO, Yiyun ZHAO, Fanbiao LI. Integrated guidance and control method based on deep reinforcement learning parameter tuning [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(24): 632345-632345.
[11]	Tianqi FAN, Zhengxia ZOU, Zhenwei SHI. Typical remote sensing target detection with data synthesis based on reinforcement learning [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(23): 631955-631955.
[12]	Chen WANG, Caisheng WEI, Zeyang YIN, Kai JIN, Xingchen LI. Collaborative planning of multi-UAV trajectories and communication strategies considering channel resource constraints [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(18): 331837-331837.
[13]	Yizhe LUO, Hui ZHANG, Xinde YU, Zhao JIN, Shuo FENG, Yucheng SHI, Mingling XU. Hierarchical dynamic scheduling for multi-wave carrier-based aircraft ammunition support missions [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(18): 331945-331945.
[14]	Xiangsong HUANG, Mengyu WANG, Dapeng PAN. Adversarial reinforcement learning-based UAV escape path planning method [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(17): 331637-331637.
[15]	Yu WANG, Zhipeng XIE, Yongjian TIAN, Guanglei MENG. Distributed UAV formation control with virtual structure guided reinforcement learning [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(15): 331354-331354.