基于深度强化学习调参的制导控制一体化方法

谢启超; 曹承钰; 赵逸云; 李繁飙

doi:10.7527/S1000-6893.2025.32345

航空学报 >

2025 , Vol. 46 >Issue 24: 632345 - 632345

DOI: https://doi.org/10.7527/S1000-6893.2025.32345

第二届空天前沿大会优秀论文专栏

基于深度强化学习调参的制导控制一体化方法

谢启超 ,
曹承钰 ,
赵逸云 ,
李繁飙

展开

中南大学自动化学院，长沙 410083

．E-mail： fanbiaoli@csu.edu.cn

收稿日期: 2025-06-03

修回日期: 2025-06-04

录用日期: 2025-06-05

网络出版日期: 2025-06-20

基金资助

国家自然科学基金大飞机基础研究联合基金重点支持项目(U2570207);国家优秀青年科学基金(62222317);湖南省科技重大专项(2021GK1030);湖南省重点研发计划(2023GK2023)

收起

Integrated guidance and control method based on deep reinforcement learning parameter tuning

Qichao XIE ,
Chengyu CAO ,
Yiyun ZHAO ,
Fanbiao LI

Expand

School of Automation Engineering，Central South University，Changsha 410083，China

E-mail： fanbiaoli@csu.edu.cn

Received date: 2025-06-03

Revised date: 2025-06-04

Accepted date: 2025-06-05

Online published: 2025-06-20

Supported by

Key Program of the Joint Fund for Basic Research on Large Aircraft of the National Natural Science Foundation of China(U2570207);National Science Fund for Excellent Young Scholars(62222317);Hunan Provincial Key Technology Innovation Program(2021GK1030);Key Research and Development Program of Hunan Province(2023GK2023)

Fold

摘要

针对高超声速飞行器制导控制参数动态优化问题，提出一种基于双延迟深度确定性策略梯度（TD3）算法的深度强化学习调参方法。首先，建立了高超声速飞行器的运动模型与制导控制一体化模型，并设计了基于反步法的控制器，通过Lyapunov稳定性证明了其一致最终有界；然后，将控制器参数优化问题转化为马尔可夫决策过程模型，基于TD3算法实现了数据驱动的控制器参数在线自适应优化。该方法构建了融合模型先验知识与数据驱动的参数优化机制，显著提升了控制器在参数空间中的自主适应能力；最后，通过数值仿真验证了所提方法的有效性和鲁棒性。

关键词： 高超声速飞行器; 制导控制一体化; 深度强化学习; 自适应参数; 反步控制

本文引用格式

谢启超 , 曹承钰 , 赵逸云 , 李繁飙 . 基于深度强化学习调参的制导控制一体化方法[J]. 航空学报, 2025 , 46(24) : 632345 -632345 . DOI: 10.7527/S1000-6893.2025.32345

Abstract

To address the dynamic optimization problem of guidance and control parameters of hypersonic flight vehicles， a Deep reinforcement learning parameter tuning method based on the Twin Delayed Deep Deterministic policy gradient （TD3） algorithm is proposed. Firstly， the motion model of the hypersonic flight vehicle and the integrated model of guidance and control were established， and the controller based on the backstepping method was designed. The consistent final boundedness was proved via Lyapunov stability. Then， the controller parameter optimization problem was transformed into a Markov decision process model， and the data-driven online adaptive optimization of controller parameters was achieved based on the TD3 algorithm. This method constructs a parameter optimization mechanism that integrates the prior knowledge of the model and data-driven approaches， significantly enhancing the autonomous adaptability of the controller in the parameter space. Finally， the effectiveness and robustness of the proposed method were verified through numerical simulation.

Key words： hypersonic flight vehicle; integrated guidance and control; deep reinforcement learning; adaptive parameters; backstepping control

参考文献

[1]	黄伟，罗世彬，王振国. 临近空间高超声速飞行器关键技术及展望［J］. 宇航学报， 2010， 31（5）： 1259-1265.
	HUANG W， LUO S B， WANG Z G. Key techniques and prospect of near-space hypersonic vehicle［J］. Journal of Astronautics， 2010， 31（5）： 1259-1265 （in Chinese）.
[2]	吴宏鑫，孟斌. 高超声速飞行器控制研究综述［J］. 力学进展， 2009， 39（6）： 756-765.
	WU H X， MENG B. Review on the control of hypersonic flight vehicles［J］. Advances in Mechanics， 2009， 39（6）： 756-765 （in Chinese）.
[3]	张超凡，宗群，董琦，等. 高超声速飞行器模型及控制若干问题综述［J］. 信息与控制， 2017， 46（1）： 90-102.
	ZHANG C F， ZONG Q， DONG Q， et al. A survey of models and control problems of hypersonic vehicles［J］. Information and Control， 2017， 46（1）： 90-102 （in Chinese）.
[4]	孙长银，穆朝絮，余瑶. 近空间高超声速飞行器控制的几个科学问题研究［J］. 自动化学报， 2013， 39（11）： 1901-1913.
	SUN C Y， MU C X， YU Y. Some control problems for near space hypersonic vehicles［J］. Acta Automatica Sinica， 2013， 39（11）： 1901-1913 （in Chinese）.
[5]	穆凌霞，王新民，谢蓉，等. 高超音速飞行器及其制导控制技术综述［J］. 哈尔滨工业大学学报， 2019， 51（3）： 1-14.
	MU L X， WANG X M， XIE R， et al. A survey of the hypersonic flight vehicle and its guidance and control technology［J］. Journal of Harbin Institute of Technology， 2019， 51（3）： 1-14 （in Chinese）.
[6]	LIANG Z X， LV C， ZHU S Y. Lateral entry guidance with terminal time constraint［J］. IEEE Transactions on Aerospace and Electronic Systems， 2023， 59（3）： 2544-2553.
[7]	ZHANG F， DUAN G R. Coupled dynamics and integrated control for position and attitude motions of spacecraft： A survey［J］. IEEE/CAA Journal of Automatica Sinica， 2023， 10（12）： 2187-2208.
[8]	LI Z B， ZHANG X Y， ZHANG H R， et al. Three-dimensional approximate cooperative integrated guidance and control with fixed-impact time and azimuth constraints［J］. Aerospace Science and Technology， 2023， 142： 108617.
[9]	ZHAO Q， DUAN G R. Exponential position and attitude tracking control of spacecraft with unbiased parameter identification［J］. IEEE Transactions on Aerospace and Electronic Systems， 2024， 60（1）： 1113-1128.
[10]	ZHOU M， LU M F， HU G J， et al. Koopman operator-based integrated guidance and control for strap-down high-speed missiles［J］. IEEE Transactions on Control Systems Technology， 2024， 32（6）： 2436-2443.
[11]	王肖，郭杰，唐胜景，等. 吸气式高超声速飞行器鲁棒非奇异Terminal滑模反步控制［J］. 航空学报， 2017， 38（3）： 320287.
	WANG X， GUO J， TANG S J， et al. Robust nonsingular Terminal sliding mode backstepping control for air-breathing hypersonic vehicles［J］. Acta Aeronautica et Astronautica Sinica， 2017， 38（3）： 320287 （in Chinese）.
[12]	李亚苹，王芳，周超. 全状态受限的高超声速飞行器的预定性能滤波反步控制［J］. 航空学报， 2020， 41（11）： 623857.
	LI Y P， WANG F， ZHOU C. Prescribed performance filter backstepping control of hypersonic vehicle with full state constraints［J］. Acta Aeronautica et Astronautica Sinica， 2020， 41（11）： 623857 （in Chinese）.
[13]	周觐，雷虎民，李炯，等. 基于神经网络的导弹制导控制一体化反演设计［J］. 航空学报， 2015， 36（5）： 1661-1672.
	ZHOU J， LEI H M， LI J， et al. Integrated missile guidance and control design based on neural network and back-stepping control theory［J］. Acta Aeronautica et Astronautica Sinica， 2015， 36（5）： 1661-1672 （in Chinese）.
[14]	王伟，张晶涛，柴天佑. PID参数先进整定方法综述［J］. 自动化学报， 2000， 26（3）： 347-355.
	WANG W， ZHANG J T， CHAI T Y. A survey of advanced pid parameter tuning methods［J］. Acta Automatica Sinica， 2000， 26（3）： 347-355 （in Chinese）.
[15]	余胜威，曹中清. 基于人群搜索算法的PID控制器参数优化［J］. 计算机仿真， 2014， 31（9）： 347-350， 373.
	YU S W， CAO Z Q. Optimization parameters of PID controller parameters based on seeker optimization algorithm［J］. Computer Simulation， 2014， 31（9）： 347-350， 373 （in Chinese）.
[16]	杨侃，王昭磊，强艳辉，等. 一种面向变体飞行器的控制器设计方法［J］. 航天控制， 2024， 42（3）： 3-8.
	YANG K， WANG Z L， QIANG Y H， et al. A controller design method oriented to variant vehicles［J］. Aerospace Control， 2024， 42（3）： 3-8 （in Chinese）.
[17]	康朝海，王博宇，杨永英. 基于精英高斯学习的改进鱼群粒子群混合算法［J］. 吉林大学学报（信息科学版）， 2018， 36（4）： 430-438.
	KANG C H， WANG B Y， YANG Y Y. Improved hybrid algorithm with fish swarm-particle swarm optimization based on elite Gaussian learning［J］. Journal of Jilin University （Information Science Edition）， 2018， 36（4）： 430-438 （in Chinese）.
[18]	李墨吟，马泽远，周建平，等. 基于神经网络的变后掠翼飞行器自适应控制方法研究［J］. 弹箭与制导学报， 2021， 41（5）： 73-77， 85.
	LI M Y， MA Z Y， ZHOU J P， et al. Research on adaptive control method of variable-sweep wing aircraft based on neural network［J］. Journal of Projectiles， Rockets， Missiles and Guidance， 2021， 41（5）： 73-77， 85 （in Chinese）.
[19]	ARULKUMARAN K， DEISENROTH M P， BRUNDAGE M， et al. Deep reinforcement learning： A brief survey［J］. IEEE Signal Processing Magazine， 2017， 34（6）： 26-38.
[20]	王建华，刘鲁华，王鹏，等. 高超声速飞行器俯冲段制导控制一体化设计方法［J］. 航空学报， 2017， 38（3）： 320328.
	WANG J H， LIU L H， WANG P， et al. Integrated guidance and control scheme for hypersonic vehicles in dive phase［J］. Acta Aeronautica et Astronautica Sinica， 2017， 38（3）： 320328 （in Chinese）.
[21]	李惠峰，肖进，林平. 基于参数化外形的通用大气飞行器建模与分析［J］. 宇航学报， 2011， 32（11）： 2305-2311.
	LI H F， XIAO J， LIN P. Modeling and analyzing of common aero vehicle with parametric configuration［J］. Journal of Astronautics， 2011， 32（11）： 2305-2311 （in Chinese）.
[22]	BU X W， WU X Y， HUANG J Q， et al. A guaranteed transient performance-based adaptive neural control scheme with low-complexity computation for flexible air-breathing hypersonic vehicles［J］. Nonlinear Dynamics， 2016， 84（4）： 2175-2194.
[23]	李小华，徐波，刘洋. 非线性扩展结构大系统自适应神经网络跟踪控制［J］. 控制与决策， 2016， 31（10）： 1860-1866.
	LI X H， XU B， LIU Y. Adaptive neural network tracking control for a class of nonlinear largescale systems with expanding construction［J］. Control and Decision， 2016， 31（10）： 1860-1866 （in Chinese）.
[24]	何昊，王鹏. 高速变形飞行器制导控制一体化设计方法［J］. 航空学报， 2024， 45（S1）：730692.
	HE H， WANG P. Integrated guidance and control method for high-speed morphing wing aircraft［J］. Acta Aeronautica et Astronautica Sinica， 2024， 45（S1）：730692 （in Chinese）.
[25]	CAO C Y， LI F B， DING R， et al. Intelligent attitude control for morphing flight vehicle： a deep reinforcement learning approach［J］. IEEE Transactions on Vehicular Technology， 2025， 74（6）： 8851-8865.
[26]	CAO C Y， LI F B， XIE Q C， et al. Integrated guidance and control of morphing flight vehicle via sliding-mode-based robust reinforcement learning［J］. IEEE Transactions on Systems， Man， and Cybernetics： Systems， 2025， 55（5）： 3350-3362.

Options

文章导航

模态框（Modal）标题

摘要

本文引用格式

Abstract

参考文献