基于DACTM-PPO的机载末端红外复合干扰智能决策

doi:10.7527/S1000-6893.2025.32759

电子电气工程与控制

本期目录 | 过刊浏览 | 高级检索

前一篇 |

基于DACTM-PPO的机载末端红外复合干扰智能决策

韩滟泷¹, 张安¹^,², 毕文豪¹^,²(), 范秋岑¹, 侯天乐¹

^1. 西北工业大学航空学院，西安 710072
^2. 飞行器基础布局全国重点实验室，西安 710072

收稿日期:2025-09-08 修回日期:2025-10-16 接受日期:2025-11-28 出版日期:2025-12-09 发布日期:2025-12-08
通讯作者: 毕文豪
基金资助:
国家自然科学基金(62073267)

Intelligent decision-making of airborne terminal infrared composite jamming based on DACTM-PPO

Yanlong HAN¹, An ZHANG¹^,², Wenhao BI¹^,²(), Qiucen FAN¹, Tianle HOU¹

^1. School of Aeronautics，Northwestern Polytechnical University，Xi’an 710072，China
^2. National Key Laboratory of Aircraft Configuration Design，Xi’an 710072，China

Received:2025-09-08 Revised:2025-10-16 Accepted:2025-11-28 Online:2025-12-09 Published:2025-12-08
Contact: Wenhao BI
Supported by:
National Natural Science Foundation of China(62073267)

摘要/Abstract

摘要：

随着红外制导空空导弹制导精度和机动能力的不断提升，作战飞机通过机动规避或单一红外干扰难以有效规避红外导弹命中风险，红外复合干扰成为保障飞机生存的重要途径。针对机载末端红外复合干扰问题，提出了一种基于改进近端策略优化算法的机载末端红外复合干扰智能决策方法。从机载末端对抗场景出发，分析了作战飞机在红外制导导弹攻击下的决策约束，建立了红外诱饵弹与激光定向干扰模型，提出了一种动态非对称裁剪机制和融合时序记忆与注意力机制改进的近端策略优化算法，提升收敛效率与求解质量，设计了融合干扰手段特性的奖励函数，引入资源惩罚项，实现干扰效能与资源消耗之间的合理平衡。仿真结果表明：红外复合干扰智能决策方法能够以合理的协同方式组织红外干扰手段，在多种典型机弹对抗态势下表现出良好性能，相较原始近端策略优化算法、柔性动作-评价算法及基于预设规则的方法，在飞机存活率、导弹脱靶量和资源利用效率等指标上均具有显著优势，具有良好应用价值。

关键词: 机载末端防御, 红外复合干扰, 强化学习, 红外诱饵弹, 激光定向干扰

Abstract:

With the continuous improvement in the guidance accuracy and maneuverability of infrared-guided air-to-air missiles， combat aircraft find it increasingly difficult to effectively evade the risk of infrared missile hits through maneuvering avoidance or single infrared countermeasures alone. As a result， composite infrared countermeasures have become a critical means to ensure aircraft survivability. To address the challenge of airborne terminal composite infrared countermeasures， this study proposes an intelligent decision-making method based on an improved Proximal Policy Optimization （PPO） algorithm. From the perspective of the airborne terminal confrontation scenario， the decision constraints faced by combat aircraft under infrared-guided missile attacks are analyzed， and models for infrared decoy flares and laser directional jamming are established. An improved PPO algorithm incorporating a dynamic asymmetric clipping mechanism and a fusion of temporal memory and attention mechanisms is proposed to enhance convergence efficiency and solution quality. Furthermore， a reward function integrating the characteristics of jamming means is designed， incorporating overuse and ineffective-use penalty terms to achieve a rational balance between jamming effectiveness and resource consumption. Simulation results demonstrate that the intelligent decision-making method for infrared composite jamming can organize infrared jamming measures in a reasonably coordinated manner， exhibiting excellent performance under various typical aircraft-missile confrontation scenarios. Compared with the original near-end strategy optimization algorithm， the flexible action-evaluation algorithm， and the preset rule-based method， this method shows significant advantages in metrics such as aircraft survivability， missile miss distance， and resource utilization efficiency， demonstrating good application value.

Key words: airborne terminal defense, infrared composite jamming, reinforcement learning, infrared decoy bombs, laser directional jamming

中图分类号:

V279

韩滟泷, 张安, 毕文豪, 范秋岑, 侯天乐. 基于DACTM-PPO的机载末端红外复合干扰智能决策[J]. 航空学报, 2026, 47(7): 332759.

Yanlong HAN, An ZHANG, Wenhao BI, Qiucen FAN, Tianle HOU. Intelligent decision-making of airborne terminal infrared composite jamming based on DACTM-PPO[J]. Acta Aeronautica et Astronautica Sinica, 2026, 47(7): 332759.

图/表 26

图1

图2

图3

图4

图5

图6

表1

机载末端状态空间

状态名称	状态标识	维度	取值范围
机弹相对方位/rad	$o t d i$	2	$- π / 2, π / 2 × [0,2 π)$
最大测角误差/rad	$o t e r$	2	$0, δ θ × [0, δ ψ)$
机弹距离编号	$o t d i s$	1	$0,1, ⋯, D m a x / D b i n$
飞机速度/（ $m ⋅ s - 1$ ）	$o t a v$	1	$v m i n a i r c r a f t, v m a x a i r c r a f t$
飞机高度/m	$o t h$	1	$0, h m a x a i r c r a f t$
飞机俯仰角和偏航角/rad	$o t a e$	2	$[- π / 2, π / 2) × [- π, π)$
飞机机动动作	$o t a m$	1	$1,2, 3,4$
红外诱饵弹剩余数量/枚	$o t i r$	1	$0,1, ⋯, N m a x i n f r a r e d$

表1

表2

机载末端动作空间

动作名称	动作标识	维度	取值范围
是否释放一组红外诱饵弹	$a t i r$	1	$0, 1$
每组红外诱饵弹枚数	$a t g s$	1	$1,2, …, 6$
组内红外诱饵弹弹间隔	$a t g i$	1	$0.02 ∶ 0.02 ∶ 0.10$
激光定向干扰状态	$a t l d$	1	$0, 1$

表2

图7

表3

飞机参数及红外干扰参数

性能参数	数值或设置
最大过载 $n f m a x$ /g	2.5
红外诱饵弹初始质量 $m d 0$ /kg	0.5
红外诱饵弹质量变化率 $m ˙$ /（kg·s^-1）	0.01
红外诱饵弹速度 $v d$ /（m·s^-1）	50
红外诱饵弹最大辐射强度 $I m a x$ /（W·（sr）^-1）	9 000
红外诱饵弹燃烧时间 $t 1$ /s	5
红外诱饵弹投放方向 $d f$	沿机体坐标系后下方45°
激光定向干扰输出功率 $P 0$ /W	4 000
激光束散角 $θ$ /rad	1×10^-3
激光波长 $λ$ /μm	10.6

表3

表4

导弹性能参数

导弹性能参数	数值
最大过载 $n m m a x$ /g	50
导弹杀伤半径 $R k i l l$ /m	12
导引头最大作用距离 $D m m a x$ /m	12 000
导引头视场角度 $A m$ /（°）	180
导弹最大角速度 $ω m$ /（rad·s^-1）	15.7
导引头光学系统焦距 $f$ /mm	57
红外探测器像元尺寸 $d 0$ /μm	12

表4

表5

DACTM-PPO算法训练参数

算法参数	数值
最大训练次数 $s m a x$	$1 × 104$
PPO剪切系数 $ϵ$	0.2
折扣因子 $γ$	0.95
GAE折扣系数 $λ$	0.98
熵正则项系数 $E$	$1 × 10 - 3$
每轮训练迭代次数 $n e p o c h$	4
LSTM隐层维度 $d L S$	128
LSTM层数 $n L S$	1
注意力键/查询维度 $d k v$	128
注意力Dropout概率 $p a t t n$	0.2
Actor/Critic网络结构 $A d i m$	［256， 128， 64， 32］
Actor/Critic学习率 $l A C$	$1 × 10 - 3$
训练批次样本数 $B s i z e$	128

表5

图8

图9

图10

表6

SAC算法训练参数

参数	数值
最大训练次数 $s m a x$	$1 × 104$
折扣因子 $γ$	0.95
Actor/Critic学习率 $l A C$	$1 × 10 - 3$
批样本次数 $B s i z e$	128
软更新系数 $τ c$	$5 × 10 - 3$
初始温度 $a 0$	0.01
温度学习率 $l a$	$1 × 10 - 3$

表6

图11

图12

表7

图13

图14

表8

图15

图16

图17

图18

参考文献 25

[1]	CHEN C， MO L， LYU M L， et al. Enhanced missile hit probability actor-critic algorithm for autonomous decision-making in air-to-air confrontation［J］. Aerospace Science and Technology， 2024， 151： 109285.
[2]	SONAWANE H R， MAHULIKAR S P. Tactical air warfare： Generic model for aircraft susceptibility to infrared guided missiles［J］. Aerospace Science and Technology， 2011， 15（4）： 249-260.
[3]	GONG X P， CHEN W C， CHEN Z Y. All-aspect attack guidance law for agile missiles based on deep reinforcement learning［J］. Aerospace Science and Technology， 2022， 127： 107677.
[4]	DENG T B， HUANG H， FANG Y W， et al. Reinforcement learning-based missile terminal guidance of maneuvering targets with decoys［J］. Chinese Journal of Aeronautics， 2023， 36（12）： 309-324.
[5]	DEBNATH S， REJ P， KUMAR H， et al. A computational model for prediction of IR intensity and burn time of Magnesium-Teflon-Viton （MTV） based Infrared （IR） decoy flare of various configurations［J］. Infrared Physics & Technology， 2025， 145： 105651.
[6]	吴晓迪，黄超超. 多枚红外诱饵弹运动轨迹仿真［J］. 激光与红外， 2015， 45（12）： 1473-1476.
	WU X D， HUANG C C. Simulation for the motion traces of infrared decoys［J］. Laser & Infrared， 2015， 45（12）： 1473-1476 （in Chinese）.
[7]	SHI L K， PEI Y， YUN Q J， et al. Agent-based effectiveness evaluation method and impact analysis of airborne laser weapon system in cooperation combat［J］. Chinese Journal of Aeronautics， 2023， 36（4）： 442-454.
[8]	王炜强，贾晓洪，韩宇萌，等. 定向干扰激光的红外成像建模与仿真［J］. 红外与激光工程， 2016， 45（6）： 0606005.
	WANG W Q， JIA X H， HAN Y M， et al. Infrared imaging modeling and simulation of DIRCM laser［J］. Infrared and Laser Engineering， 2016， 45（6）： 0606005 （in Chinese）.
[9]	张颜伟，白春华，蔡猛. 红外干扰弹与定向红外对抗系统协同使用研究［J］. 电光与控制， 2023， 30（2）： 82-85.
	ZHANG Y W， BAI C H， CAI M. Cooperative usage of infrared jamming projectile and directional infrared countermeasure system［J］. Electronics Optics & Control， 2023， 30（2）： 82-85， 105 （in Chinese）.
[10]	白杨，张成，王博宇，等. 机载末端红外对抗作战效能仿真研究［J］. 红外与激光工程， 2022， 51（11）： 20220105.
	BAI Y， ZHANG C， WANG B Y， et al. Simulation of airborne terminal infrared countermeasure operational effectiveness［J］. Infrared and Laser Engineering， 2022， 51（11）： 20220105 （in Chinese）.
[11]	PIAO H Y， HAN Y， CHEN H C， et al. Complex relationship graph abstraction for autonomous air combat collaboration： A learning and expert knowledge hybrid approach［J］. Expert Systems with Applications， 2023， 215： 119285.
[12]	徐西蒙，魏贤智，张涛，等. 基于混沌粒子群优化算法的战斗机使用空射诱饵的攻击决策［J］. 电光与控制， 2015， 22（11）： 42-47.
	XU X M， WEI X Z， ZHANG T， et al. CPSO based decision-making of fighters using miniature air launched decoy［J］. Electronics Optics & Control， 2015， 22（11）： 42-47 （in Chinese）.
[13]	张涛，周中良，于雷，等. 战斗机使用空射诱饵弹协同规避策略［J］. 系统工程与电子技术， 2017， 39（12）： 2738-2744.
	ZHANG T， ZHOU Z L， YU L， et al. Coordinated evasion strategy for MALD and fighter in air combat［J］. Systems Engineering and Electronics， 2017， 39（12）： 2738-2744 （in Chinese）.
[14]	BAYRAK A E， POLAT F. Employment of an evolutionary heuristic to solve the target allocation problem efficiently［J］. Information Sciences， 2013， 222： 675-695.
[15]	LI Y， HAN W， WANG Y Q. Deep reinforcement learning with application to air confrontation intelligent decision-making of manned/unmanned aerial vehicle cooperative system［J］. IEEE Access， 2020， 8： 67887-67898.
[16]	李传浩，明振军，王国新，等. 基于多智能体深度强化学习的无人平台箔条干扰末端防御动态决策方法［J］. 兵工学报， 2025， 46（3）： 19-33.
	LI C H， MING Z J， WANG G X， et al. Dynamic decision-making method of unmanned platform chaff jamming for terminal defense based on multi-agent deep reinforcement learning［J］. Acta Armamentarii， 2025， 46（3）： 19-33 （in Chinese）.
[17]	黄成，邱志聪，许家忠. 地月环境下航天器近距离接近自主决策［J］. 光学精密工程， 2025， 33（6）： 979-992.
	HUANG C， QIU Z C， XU J Z. Autonomous decision-making for spacecraft close approaches in the Earth-Moon environment［J］. Optics and Precision Engineering， 2025， 33（6）： 979-992 （in Chinese）.
[18]	YANG M C， SHAN S Z， ZHANG W W. Decision-making and confrontation in close-range air combat based on reinforcement learning［J］. Chinese Journal of Aeronautics， 2025， 38（9）： 103526.
[19]	ZHU J Y， KUANG M C， ZHOU W Q， et al. Mastering air combat game with deep reinforcement learning［J］. Defence Technology， 2024， 34： 295-312.
[20]	HE X， ZHAO W L， GAO Z J， et al. A novel deep reinforcement learning model based on DDPG considering attention mechanism and combined with GRU network for short-term load forecasting［J］. Applied Soft Computing， 2025， 184： 113739.
[21]	XIAO H P， FU L J， SHANG C Y， et al. Collaborative energy-saving path planning of unmanned surface vehicle cluster based on multi-head attention mechanism and multi-agent deep reinforcement learning［J］. Engineering Applications of Artificial Intelligence， 2025， 161： 112078.
[22]	HU Z T， LIANG X F， ZHANG J， et al. Exploring crash induction strategies in within-visual-range air combat based on distributional reinforcement learning［J］. Chinese Journal of Aeronautics， 2025， 38（9）： 103663.
[23]	WANG W F， RU L， LYU M L， et al. Dynamic and adaptive learning for autonomous decision-making in beyond visual range air combat［J］. Aerospace Science and Technology， 2025， 163： 110327.
[24]	王存灿，王晓芳，林海. 一种元学习和强化学习结合的多飞行器协同制导律［J］. 兵工学报， 2025， 46（7）： 201-215.
	WANG C C， WANG X F， LIN H. A cooperative guidance law based on meta-learning and reinforcement learning for multiple aerial vehicles［J］. Acta Armamentarii， 2025， 46（7）： 201-215 （in Chinese）.
[25]	RAO G A， MAHULIKAR S P. New criterion for aircraft susceptibility to infrared guided missiles［J］. Aerospace Science and Technology， 2005， 9（8）： 701-712.

E-mail：hkxb@buaa.edu.cn

关于我们

期刊社服务

专业学科

封面文章

友情链接

主管单位：中国科学技术协会主办单位：中国航空学会北京航空航天大学

算法	飞机存活率/%	平均训练时长/s
DACTM-PPO	94.6	5 589.39
PPO	81.2	5 083.36
SAC	79.6	24 108.40
预设规则	49.2

算法	导弹平均脱靶量/m	导弹中位数脱靶量/m
DACTM-PPO	299.8	274.9
PPO	267.6	222.8
SAC	255.4	230.9

基于DACTM-PPO的机载末端红外复合干扰智能决策

Intelligent decision-making of airborne terminal infrared composite jamming based on DACTM-PPO

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 26

参考文献 25

相关文章 15

编辑推荐

Metrics

本文评价

[1]	熊威, 张栋, 杨书恒, 任智, 刘文逸. 面向智能空战有人/无人机协同可解释方法[J]. 航空学报, 2026, 47(7): 332547-332547.
[2]	高思华, 赵炳阳, 李建伏. 基于时间窗约束的无人机完整性数据采集路径规划算法[J]. 航空学报, 2026, 47(6): 332451-332451.
[3]	廉云霄, 李霓, 谢锋, 周攀, 董长印. 基于时空信息融合的多机协同空战决策方法[J]. 航空学报, 2026, 47(6): 332633-332633.
[4]	张磊, 田灿, 文方青, 张清河, 刘含. 面向移动边缘网络的多目标进化深度确定性策略梯度算法[J]. 航空学报, 2026, 47(3): 631880-631880.
[5]	马赞, 白杰, 闫励勤, 陈勇, 孙淑光. 基于贝叶斯优化的机载智能避让系统安全性评估[J]. 航空学报, 2026, 47(1): 331973-331973.
[6]	章涛, 李攀, 王梓旭, 朱振华. 面向直升机姿态控制的强化学习奖励函数设计[J]. 航空学报, 2025, 46(S1): 732184-732184.
[7]	万开方, 吴志林, 武韫晖, 强皓植, 吴艺博, 李波. 拒止环境下基于深度强化学习的多无人机协同定位[J]. 航空学报, 2025, 46(8): 331024-331024.
[8]	姜凌峰, 李新凯, 张海, 李涵玮, 张宏立. 基于改进TD3算法的无人机动态环境无地图导航[J]. 航空学报, 2025, 46(8): 331035-331035.
[9]	杨敏, 刘关俊, 周子渊. 基于安全强化学习的月球着陆器控制[J]. 航空学报, 2025, 46(3): 630553-630553.
[10]	谢启超, 曹承钰, 赵逸云, 李繁飙. 基于深度强化学习调参的制导控制一体化方法[J]. 航空学报, 2025, 46(24): 632345-632345.
[11]	范天麒, 邹征夏, 史振威. 基于强化学习数据合成的典型遥感目标检测[J]. 航空学报, 2025, 46(23): 631955-631955.
[12]	王辰, 魏才盛, 殷泽阳, 靳锴, 李星辰. 考虑信道资源约束的多无人机航迹与通信策略协同规划[J]. 航空学报, 2025, 46(18): 331837-331837.
[13]	罗祎喆, 张辉, 余新得, 金钊, 冯朔, 石育澄, 徐明亮. 面向舰载机多波次弹药保障任务的分层动态调度[J]. 航空学报, 2025, 46(18): 331945-331945.
[14]	黄湘松, 王梦宇, 潘大鹏. 基于对抗强化学习的无人机逃离路径规划方法[J]. 航空学报, 2025, 46(17): 331637-331637.
[15]	王昱, 谢志鹏, 田永健, 孟光磊. 虚拟结构引领强化学习分布式无人机编队控制[J]. 航空学报, 2025, 46(15): 331354-331354.