考虑着陆机会约束的随机最优控制动力下降制导（航天运输系统自主制导与控制技术专栏）

doi:10.7527/S1000-6893.2026.33338

本期目录 | 过刊浏览 | 高级检索

考虑着陆机会约束的随机最优控制动力下降制导（航天运输系统自主制导与控制技术专栏）

何林坤¹,张冉²,李惠峰²,包为民³

1. 北京航空航天大学宇航学院
2. 北京航空航天大学
3. 中国航天科技集团有限公司

收稿日期:2026-01-07 修回日期:2026-06-24 发布日期:2026-06-26
通讯作者: 张冉
基金资助:
国家自然科学基金;国家自然科学基金;北京市自然科学基金

Stochastic optimal control based powered descent guidance with landing chance constraints

Lin-Kun HERan ZhangHui-Feng LI²,

Received:2026-01-07 Revised:2026-06-24 Published:2026-06-26
Contact: Ran Zhang

摘要/Abstract

摘要： 随机最优控制动力下降制导将含有不确定性的动力下降制导问题转化为随机最优控制问题，是提升重复使用火箭可靠性的重要技术途径。然而，考虑大气层内动力下降面临的大范围初始状态与不确定性分布，现有随机最优控制动力下降制导方法无法处理终端状态的长尾分布，难以对小概率但致命的极端终端状态误差进行有效调节。为此，本文研究考虑着陆机会约束的随机最优控制动力下降制导问题，直接将有关终端状态分布尾部分位点的阈值约束加入到最优控制问题中。针对新加入的着陆机会约束给问题求解带来的困难，本文设计了一种基于神经网络参数整定的制导策略：1）采用改进的物理信息神经网络结构，通过引入控制饱和缓解和网络输入增广实现对初始状态与不确定性分布参数的适应，解决了着陆机会约束对初始状态和不确定性分布参数的高敏感性难题；2）构建终端状态分布的混沌多项式代理模型，利用少量采样轨迹对着陆机会约束分位点进行高精度估计，解决了着陆机会约束的低评估效率难题；3）采用得到的代理模型采样结果构建强化学习训练框架，实现无需梯度的制导策略优化，解决了着陆机会约束对制导策略不可微的难题。数值仿真结果表明，相比现有随机最优控制动力下降制导方法，本文所提方法能有效改善终端状态分布的长尾特性，显著降低小概率着陆失败事件的发生概率。

关键词: 动力下降制导, 机会约束, 物理信息神经网络, 混沌多项式展开, 强化学习

Abstract: Stochastic optimal control based powered descent guidance (SOC-PDG) transforms the powered descent guidance problem with uncertainties into a stochastic optimal control framework, representing a key technology for enhancing the reliability of reusable rockets. However, under a wide range of initial state and uncertainty distribution combinations, existing SOC-PDG methods based on mean-covariance constraint descriptions are unable to handle potentially long-tailed terminal state distributions, making it difficult to effectively manage low-probability but catastrophic large-magnitude terminal errors. To address this, this paper investigates the SOC-PDG problem with landing chance constraints (SOC-PDG-LCC), which directly introduce landing chance constraints related to the quantiles of the terminal state distribution. To tackle the high sensitivity, low evaluation efficiency, and non-differentiability of the landing chance constraints, this paper designs a guidance policy with the following key features: 1) An improved neural network-based parametric guidance architecture is employed to maintain consistent landing performance across different initial state and uncertainty distribution parameters. 2) A guidance policy evaluation method based on the polynomial chaos expansion surrogate model is proposed to enable efficient estimation of the mean propellant consumption and the quantiles for landing chance constraints during training. 3) A reinforcement learning training method for the guidance policy is developed using samples from the surrogate model, achieving gradient-free optimization of the guidance policy. Simulation results demonstrate that, compared to existing SOC-PDG methods based on mean-covariance constraint descriptions, the proposed method effectively mitigates the long-tail characteristics of the terminal state distribution across a wide range of initial states and uncertainty parameters, significantly reducing the probability of low-probability landing failure events.

Key words: powered descent guidance, chance constraint, physics informed neural network, polynomial chaos expansion, reinforcement learning

何林坤张冉李惠峰包为民. 考虑着陆机会约束的随机最优控制动力下降制导（航天运输系统自主制导与控制技术专栏）[J]. 航空学报, doi: 10.7527/S1000-6893.2026.33338.

Lin-Kun HE Ran Zhang Hui-Feng LI. Stochastic optimal control based powered descent guidance with landing chance constraints[J]. Acta Aeronautica et Astronautica Sinica, doi: 10.7527/S1000-6893.2026.33338.

E-mail：hkxb@buaa.edu.cn

关于我们

期刊社服务

专业学科

封面文章

友情链接

主管单位：中国科学技术协会主办单位：中国航空学会北京航空航天大学

[1]	刘宇衡, 杨力, 黄琦龙. 基于可解释分层强化学习的防空反导策略优化[J]. 航空学报, 2026, 47(8): 332786-332786.
[2]	张皓, 刘家宁, 许志, 杨垣鑫. 飞机主动防御模式下改进逆强化学习的来袭弹轨迹预测方法[J]. 航空学报, 2026, 47(8): 332753-332753.
[3]	熊威, 张栋, 杨书恒, 任智, 刘文逸. 面向智能空战有人/无人机协同可解释方法[J]. 航空学报, 2026, 47(7): 332547-332547.
[4]	韩滟泷, 张安, 毕文豪, 范秋岑, 侯天乐. 基于DACTM-PPO的机载末端红外复合干扰智能决策[J]. 航空学报, 2026, 47(7): 332759-332759.
[5]	高思华, 赵炳阳, 李建伏. 基于时间窗约束的无人机完整性数据采集路径规划算法[J]. 航空学报, 2026, 47(6): 332451-332451.
[6]	廉云霄, 李霓, 谢锋, 周攀, 董长印. 基于时空信息融合的多机协同空战决策方法[J]. 航空学报, 2026, 47(6): 332633-332633.
[7]	刘月, 任翰韬, 薛小锋, 宋祉岑, 路成, 冯蕴雯. 基于MCI-PINN的复合材料螺栓连接结构挤压强度预测[J]. 航空学报, 2026, 47(5): 232422-232422.
[8]	张磊, 田灿, 文方青, 张清河, 刘含. 面向移动边缘网络的多目标进化深度确定性策略梯度算法[J]. 航空学报, 2026, 47(3): 631880-631880.
[9]	张康, 汤新民, 顾俊伟. 基于风险意识的eVTOL自主避让[J]. 航空学报, 2026, 47(2): 332083-332083.
[10]	刘文林, 胡锡坤, 钟平. 强化学习驱动的退化遥感图像目标检测方法[J]. 航空学报, 2026, 47(10): 532861-532861.
[11]	马赞, 白杰, 闫励勤, 陈勇, 孙淑光. 基于贝叶斯优化的机载智能避让系统安全性评估[J]. 航空学报, 2026, 47(1): 331973-331973.
[12]	章涛, 李攀, 王梓旭, 朱振华. 面向直升机姿态控制的强化学习奖励函数设计[J]. 航空学报, 2025, 46(S1): 732184-732184.
[13]	万开方, 吴志林, 武韫晖, 强皓植, 吴艺博, 李波. 拒止环境下基于深度强化学习的多无人机协同定位[J]. 航空学报, 2025, 46(8): 331024-331024.
[14]	姜凌峰, 李新凯, 张海, 李涵玮, 张宏立. 基于改进TD3算法的无人机动态环境无地图导航[J]. 航空学报, 2025, 46(8): 331035-331035.
[15]	张军峰, 马曌, 杜卓铭, 胡荣. 多跑道混合运行模式进场航班动态鲁棒调度[J]. 航空学报, 2025, 46(7): 330956-330956.

考虑着陆机会约束的随机最优控制动力下降制导（航天运输系统自主制导与控制技术专栏）

Stochastic optimal control based powered descent guidance with landing chance constraints

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价