首页 >

考虑视场角约束的突防打击一体化智能决策算法

谭高澎1,王晓芳2,林海3   

  1. 1. 北京理工大学空天科学与技术学院
    2. 北京理工大学宇航学院飞行器工程系
    3. 北京理工大学
  • 收稿日期:2025-11-05 修回日期:2026-01-20 出版日期:2026-01-21 发布日期:2026-01-21
  • 通讯作者: 王晓芳
  • 基金资助:
    军科委领域基金

An integrated intelligent decision-making algorithm for penetration and strike considering the field of view angle constraint

  • Received:2025-11-05 Revised:2026-01-20 Online:2026-01-21 Published:2026-01-21

摘要: 针对高超声速再入滑翔导弹末制导段考虑探测视场角等约束的突防/打击一体化设计问题,提出一种基于约束马尔可夫模型的拉格朗日-近端策略优化智能突防决策算法和针对多场景的自适应训练方法。假设末制导段滑翔弹采用偏置比例导引律打击目标,以拦截弹-滑翔弹、滑翔弹-目标相对运动状态作为状态空间,以偏置加速度的变化率作为动作空间,综合考虑滑翔弹的突防/打击结果、控制能量消耗、约束满足情况以及拦截弹的速度矢量前置角设计奖励函数,构建关于视场角的约束成本函数,建立突防/打击问题约束马尔可夫模型。通过拉格朗日乘子将约束引入策略网络损失函数,并引入约束成本Critic网络构建突防网络,采用近端策略优化算法对网络进行训练得到偏置加速度。建立作战场景复杂度分级规则,提出“前期渐进学习+后期难点多学”作战场景自适应采样训练方法,以提升突防策略的收敛速度以及对不同作战场景的泛化性。仿真结果表明:该智能突防/打击一体化策略能使滑翔弹在成功突防和以指定落角命中目标的同时全程满足视场角约束,且具有良好的泛化性。

关键词: 高超声速滑翔导弹, 突防/打击一体化, 视场角约束, 约束马尔可夫, 场景自适应采样

Abstract: Aiming at the integrated penetration/strike design problem of the terminal guidance phase of hypersonic reentry glide missiles considering constraints such as the detection field of view angle, a Lagrange-Proximal Policy Optimization intelligent penetration decision algorithm based on the Constrained Markov Decision Process and an adaptive training method for multiple scenarios are proposed. Suppose the terminal guidance phase of the gliding missile adopts the biased proportional guidance law to strike the target. Taking the relative motion state between the interceptor and the gliding missile, and between the gliding missile and the target as the state space, and the rate of change of the biased acceleration as the action space, the reward function is designed by comprehensively considering the penetration/strike result of the gliding missile, the control energy consumption, the constraint satisfaction situation, and the velocity vector lead angle of the interceptor. The constraint cost function about the field of view angle is constructed, and the Constrained Markov Decision Process of the penetration/strike problem is established. The constraints are introduced into the loss function of the policy network through the Lagrange multiplier, and the constraint cost Critic network is introduced to construct the penetration network. The Proximal Policy Optimization algorithm is used to train the network to obtain the biased acceleration. Establish rules for classifying the complexity of combat scenarios, and propose an adaptive sampling training method for combat scenarios of "progressive learning in the early stage and more learning on difficult points in the later stage" to enhance the convergence speed of the penetration strategy and its generalization to different combat scenarios. Simulation results show that this intelligent penetration/strike integrated strategy can enable gliding missile to successfully penetrate and strike the target at the specified impact angle while meeting the field of view angle constraints throughout the process, and it has good generalization.

Key words: hypersonic glide missile, integrated penetration and strike, field of view angle constraint, Constrained Markov Decision Process, scene adaptive sampling

中图分类号: