航空学报 > 2024, Vol. 45 Issue (1): 628890-628890   doi: 10.7527/S1000-6893.2023.28890

全驱系统理论及其在航空航天领域的应用专栏

航天器姿态稳定强化学习鲁棒最优控制方法

肖冰(), 张海朝   

  1. 西北工业大学 自动化学院,西安 710072
  • 收稿日期:2023-04-19 修回日期:2023-06-05 接受日期:2023-10-07 出版日期:2024-01-15 发布日期:2023-10-08
  • 通讯作者: 肖冰 E-mail:xiaobing@nwpu.edu.cn

Reinforcement learning robust optimal control for spacecraft attitude stabilization

Bing XIAO(), Haichao ZHANG   

  1. School of Automation,Northwestern Polytechnical University,Xi’an 710072,China
  • Received:2023-04-19 Revised:2023-06-05 Accepted:2023-10-07 Online:2024-01-15 Published:2023-10-08
  • Contact: Bing XIAO E-mail:xiaobing@nwpu.edu.cn

摘要:

针对外部干扰力矩作用下的刚体航天器姿态稳定最优控制问题,提出了一种在线强化学习的智能鲁棒控制方法。该方法基于自适应动态规划框架,设计单Critic神经网络在线地学习无干扰作用的航天器的最优姿态控制律,并设计一种新的自适应律在线估计Critic 神经网络的权值,实现了近似最优的控制性能。在学习的近似最优控制律的基础上,嵌入鲁棒控制量,形成鲁棒智能控制器,并应用Lyapunov理论证明了闭环姿态控制系统是一致最终有界稳定的,且Critic 神经网络的权值估计误差是收敛的。相比于采用Actor-Critic神经网络结构的自适应动态规划方法,该方法一方面削弱了对持续激励条件的依赖,另一方面降低了计算复杂度,并保证了姿态稳定控制性能对外部干扰具有较强的鲁棒性。

关键词: 航天器, 姿态控制, 强化学习, 自适应动态规划, 外部干扰, 鲁棒性

Abstract:

The problem of optimal attitude stabilization control of rigid spacecraft despite external disturbances is investigated. An online reinforcement learning-based intelligent and robust control approach is presented via the adaptive dynamic programming technique. In this approach, a critic-only neural network is developed to learn the optimal control policy of the spacecraft attitude system with external disturbance. A new estimation law is synthesized to estimate the weights of that network online. The learned controller can achieve near-optimal control performance. Then, a robust control effort is designed and added into the learned controller to formulate an intelligent and robust controller. It is proven that the closed-loop attitude system obtained from the proposed controller is uniformly ultimately bounded and that the weight estimation error of the Critic NN is convergent by Lyapunov theory. Comparison with the traditional actor-critical neural network-based control schemes shows that with less computation complexity and great robustness to external disturbances, the proposed control approach is less dependent of the persistent excitation condition. Simulation results verify the superior control performance of the proposed approach.

Key words: spacecraft, attitude control, reinforcement learning, adaptive dynamic programming, external disturbance, robustness

中图分类号: