基于深度强化学习调参的制导控制一体化方法(空天前沿大会增刊)

  • 谢启超 ,
  • 曹承钰 ,
  • 赵逸云 ,
  • 李繁飙
展开
  • 1. 中南大学自动化学院
    2. 中南大学

收稿日期: 2025-06-03

  修回日期: 2025-06-17

  网络出版日期: 2025-06-20

基金资助

国家优秀青年科学基金;湖南省科技重大专项;湖南省重点研发计划

Integrated guidance and control method based on deep reinforcement learning parameter tuning

  • XIE Qi-Chao ,
  • CAO Cheng-Yu ,
  • ZHAO Yi-Yun ,
  • LI Fan-Biao
Expand

Received date: 2025-06-03

  Revised date: 2025-06-17

  Online published: 2025-06-20

摘要

针对高超声速飞行器制导控制参数动态优化问题,提出一种基于双延迟深度确定性策略梯度(Twin Delayed Deep Deterministic policy gradient, TD3)算法的深度强化学习调参方法。首先,建立了高超声速飞行器的运动模型与制导控制一体化模型,并设计了基于反步法的控制器,通过Lyapunov 稳定性证明了其一致最终有界;然后,将控制器参数优化问题转化为马尔可夫决策过程模型,基于TD3 算法实现了数据驱动的控制器参数在线自适应优化。该方法构建了融合模型先验知识与数据驱动的参数优化机制,显著提升了控制器在参数空间中的自主适应能力;最后,通过数值仿真验证了所提方法的有效性和鲁棒性。

本文引用格式

谢启超 , 曹承钰 , 赵逸云 , 李繁飙 . 基于深度强化学习调参的制导控制一体化方法(空天前沿大会增刊)[J]. 航空学报, 0 : 1 -0 . DOI: 10.7527/S1000-6893.2025.32345

Abstract

Aiming at the dynamic optimization problem of guidance and control parameters of hypersonic flight vehicles, a Deep reinforcement learning parameter tuning method based on the twin delayed deep deterministic policy gradient (TD3) algorithm is proposed. Firstly, the motion model of the hypersonic flight vehicle and the integrated model of guidance and control were established, and the controller based on the backstepping method was designed. The consistent final boundedness was proved through Lyapunov stability. Then, the controller parameter optimization problem was transformed into a Markov decision process model, and the data-driven online adaptive optimization of controller parameters was achieved based on the TD3 algorithm. This method constructs a parameter optimization mechanism that integrates the prior knowledge of the model and data-driven approaches, significantly enhancing the autonomous adaptability of the controller in the parameter space. Finally, the effectiveness and robustness of the proposed method were verified through numerical simulation.
文章导航

/