航空学报 > 2026, Vol. 47 Issue (8): 332753-332753   doi: 10.7527/S1000-6893.2025.32753

飞机主动防御模式下改进逆强化学习的来袭弹轨迹预测方法

张皓, 刘家宁, 许志(), 杨垣鑫   

  1. 西北工业大学 航天学院,西安 710129
  • 收稿日期:2025-09-03 修回日期:2025-09-18 接受日期:2025-11-08 出版日期:2025-12-16 发布日期:2025-11-20
  • 通讯作者: 许志 E-mail:xuzhi@nwpu.edu.cn

Trajectory prediction method of incoming missiles based on improved inverse reinforcement learning in aircraft active defense mode

Hao ZHANG, Jianing LIU, Zhi XU(), Yuanxin YANG   

  1. School of Astronautics,Northwestern Polytechnical University,Xi’an 710129,China
  • Received:2025-09-03 Revised:2025-09-18 Accepted:2025-11-08 Online:2025-12-16 Published:2025-11-20
  • Contact: Zhi XU E-mail:xuzhi@nwpu.edu.cn

摘要:

随着飞机火控系统和态势感知能力的提升,防御空空导弹的策略正由干扰、欺骗等被动方式,向拦截弹反来袭弹的主动防御模式演变,然而拦截弹的平均速度低、防御空间小,过载比不足以支撑传统比例导引法的精确碰撞要求,对来袭弹的轨迹预测提出了新的挑战。针对载机、来袭弹和拦截弹三体主动防御场景中拦截弹制导信息的高概率预测问题,提出了基于逆强化学习的来袭弹轨迹预测方法。首先,构建最大因果熵下对来袭弹机动时序特征提取的数学模型,并基于逆强化学习框架建立了来袭弹制导律的行为策略函数库;然后,推导了基于二次型的逆强化学习策略函数计算公式,降低了高维状态下策略函数计算的复杂度;最后,基于滚动窗口测量数据在线计算策略函数的加权系数,实时择优及自适应加权轨迹预测分布,形成来袭弹轨迹的实时预测模型。仿真验证表明,在三体主动防御场景下所提轨迹预测网络算法在“模型集/样本集外”具有强泛化能力,对复杂目标机动的动态适应性好、预测精度高,为防御来袭弹提供了可供制导使用的高概率轨迹预测模型,具有一定的理论应用意义和工程参考价值。

关键词: 三体主动防御场景, 制导导弹, 主动防御, 逆强化学习, 轨迹预测

Abstract:

With the advancement of aircraft fire control systems and situational awareness capabilities, defense strategies against air-to-air missiles are evolving from passive methods such as jamming and deception to active defense modes involving interceptor missiles countering incoming threats. However, the low average velocity, limited defense space, and insufficient overload ratio of interceptor missiles make it difficult for traditional proportional navigation guidance to meet the precise collision requirements, posing new challenges for trajectory prediction of incoming missiles. To achieve the high-probability prediction of guidance information for interceptor missiles in a three-body active defense scenario involving the carrier aircraft, incoming missile, and interceptor missile, this paper provides an incoming missile trajectory prediction method based on inverse reinforcement learning. First, a mathematical model is constructed to extract the temporal maneuvering characteristics of incoming missiles under the principle of maximum causal entropy, and a behavioral strategy library for the guidance law of incoming missiles is established within the inverse reinforcement learning framework. Then, a quadratic-based calculation formula for the inverse reinforcement learning strategy function is derived, reducing the computational complexity of the strategy function in high-dimensional states. Finally, the weighting coefficients of the strategy function are computed online using rolling window measurement data, enabling real-time optimization and adaptive weighted trajectory prediction distribution to form a real-time prediction model for incoming missile trajectories. Simulation results demonstrate that in the three-body active defense context, the proposed trajectory prediction network algorithm exhibits strong generalization capability in “out-of-model-set/sample-set” scenarios, good dynamic adaptability to complex target maneuvers, and high prediction accuracy. The method provides a high-probability trajectory prediction model suitable for guidance in defense, and thus has notable theoretical significance and engineering application value.

Key words: three-body active defense scenario, guided missiles, active defense, inverse reinforcement learning, trajectory prediction

中图分类号: