航空学报 > 2026, Vol. 47 Issue (7): 332759-332759   doi: 10.7527/S1000-6893.2025.32759

基于DACTM-PPO的机载末端红外复合干扰智能决策

韩滟泷1, 张安1,2, 毕文豪1,2(), 范秋岑1, 侯天乐1   

  1. 1. 西北工业大学 航空学院,西安 710072
    2. 飞行器基础布局全国重点实验室,西安 710072
  • 收稿日期:2025-09-08 修回日期:2025-10-16 接受日期:2025-11-28 出版日期:2025-12-09 发布日期:2025-12-08
  • 通讯作者: 毕文豪
  • 基金资助:
    国家自然科学基金(62073267)

Intelligent decision-making of airborne terminal infrared composite jamming based on DACTM-PPO

Yanlong HAN1, An ZHANG1,2, Wenhao BI1,2(), Qiucen FAN1, Tianle HOU1   

  1. 1. School of Aeronautics,Northwestern Polytechnical University,Xi’an 710072,China
    2. National Key Laboratory of Aircraft Configuration Design,Xi’an 710072,China
  • Received:2025-09-08 Revised:2025-10-16 Accepted:2025-11-28 Online:2025-12-09 Published:2025-12-08
  • Contact: Wenhao BI
  • Supported by:
    National Natural Science Foundation of China(62073267)

摘要:

随着红外制导空空导弹制导精度和机动能力的不断提升,作战飞机通过机动规避或单一红外干扰难以有效规避红外导弹命中风险,红外复合干扰成为保障飞机生存的重要途径。针对机载末端红外复合干扰问题,提出了一种基于改进近端策略优化算法的机载末端红外复合干扰智能决策方法。从机载末端对抗场景出发,分析了作战飞机在红外制导导弹攻击下的决策约束,建立了红外诱饵弹与激光定向干扰模型,提出了一种动态非对称裁剪机制和融合时序记忆与注意力机制改进的近端策略优化算法,提升收敛效率与求解质量,设计了融合干扰手段特性的奖励函数,引入资源惩罚项,实现干扰效能与资源消耗之间的合理平衡。仿真结果表明:红外复合干扰智能决策方法能够以合理的协同方式组织红外干扰手段,在多种典型机弹对抗态势下表现出良好性能,相较原始近端策略优化算法、柔性动作-评价算法及基于预设规则的方法,在飞机存活率、导弹脱靶量和资源利用效率等指标上均具有显著优势,具有良好应用价值。

关键词: 机载末端防御, 红外复合干扰, 强化学习, 红外诱饵弹, 激光定向干扰

Abstract:

With the continuous improvement in the guidance accuracy and maneuverability of infrared-guided air-to-air missiles, combat aircraft find it increasingly difficult to effectively evade the risk of infrared missile hits through maneuvering avoidance or single infrared countermeasures alone. As a result, composite infrared countermeasures have become a critical means to ensure aircraft survivability. To address the challenge of airborne terminal composite infrared countermeasures, this study proposes an intelligent decision-making method based on an improved Proximal Policy Optimization (PPO) algorithm. From the perspective of the airborne terminal confrontation scenario, the decision constraints faced by combat aircraft under infrared-guided missile attacks are analyzed, and models for infrared decoy flares and laser directional jamming are established. An improved PPO algorithm incorporating a dynamic asymmetric clipping mechanism and a fusion of temporal memory and attention mechanisms is proposed to enhance convergence efficiency and solution quality. Furthermore, a reward function integrating the characteristics of jamming means is designed, incorporating overuse and ineffective-use penalty terms to achieve a rational balance between jamming effectiveness and resource consumption. Simulation results demonstrate that the intelligent decision-making method for infrared composite jamming can organize infrared jamming measures in a reasonably coordinated manner, exhibiting excellent performance under various typical aircraft-missile confrontation scenarios. Compared with the original near-end strategy optimization algorithm, the flexible action-evaluation algorithm, and the preset rule-based method, this method shows significant advantages in metrics such as aircraft survivability, missile miss distance, and resource utilization efficiency, demonstrating good application value.

Key words: airborne terminal defense, infrared composite jamming, reinforcement learning, infrared decoy bombs, laser directional jamming

中图分类号: