全驱系统理论及其在航空航天领域的应用专栏

航天器姿态稳定强化学习鲁棒最优控制方法

  • 肖冰 ,
  • 张海朝
展开
  • 西北工业大学 自动化学院,西安 710072
.E-mail: xiaobing@nwpu.edu.cn

收稿日期: 2023-04-19

  修回日期: 2023-06-05

  录用日期: 2023-10-07

  网络出版日期: 2023-10-08

Reinforcement learning robust optimal control for spacecraft attitude stabilization

  • Bing XIAO ,
  • Haichao ZHANG
Expand
  • School of Automation,Northwestern Polytechnical University,Xi’an 710072,China

Received date: 2023-04-19

  Revised date: 2023-06-05

  Accepted date: 2023-10-07

  Online published: 2023-10-08

摘要

针对外部干扰力矩作用下的刚体航天器姿态稳定最优控制问题,提出了一种在线强化学习的智能鲁棒控制方法。该方法基于自适应动态规划框架,设计单Critic神经网络在线地学习无干扰作用的航天器的最优姿态控制律,并设计一种新的自适应律在线估计Critic 神经网络的权值,实现了近似最优的控制性能。在学习的近似最优控制律的基础上,嵌入鲁棒控制量,形成鲁棒智能控制器,并应用Lyapunov理论证明了闭环姿态控制系统是一致最终有界稳定的,且Critic 神经网络的权值估计误差是收敛的。相比于采用Actor-Critic神经网络结构的自适应动态规划方法,该方法一方面削弱了对持续激励条件的依赖,另一方面降低了计算复杂度,并保证了姿态稳定控制性能对外部干扰具有较强的鲁棒性。

本文引用格式

肖冰 , 张海朝 . 航天器姿态稳定强化学习鲁棒最优控制方法[J]. 航空学报, 2024 , 45(1) : 628890 -628890 . DOI: 10.7527/S1000-6893.2023.28890

Abstract

The problem of optimal attitude stabilization control of rigid spacecraft despite external disturbances is investigated. An online reinforcement learning-based intelligent and robust control approach is presented via the adaptive dynamic programming technique. In this approach, a critic-only neural network is developed to learn the optimal control policy of the spacecraft attitude system with external disturbance. A new estimation law is synthesized to estimate the weights of that network online. The learned controller can achieve near-optimal control performance. Then, a robust control effort is designed and added into the learned controller to formulate an intelligent and robust controller. It is proven that the closed-loop attitude system obtained from the proposed controller is uniformly ultimately bounded and that the weight estimation error of the Critic NN is convergent by Lyapunov theory. Comparison with the traditional actor-critical neural network-based control schemes shows that with less computation complexity and great robustness to external disturbances, the proposed control approach is less dependent of the persistent excitation condition. Simulation results verify the superior control performance of the proposed approach.

参考文献

1 HUANG W, RONG W, LIU D H, et al. Design and realization of recovery system of Chang’e-5 reentry spacecraft[J]. Space Science & Technology2021(1): 133-142.
2 HUANG X Y, LI M D, WANG X L, et al. The Tianwen-1 guidance, navigation, and control for Mars entry, descent, and landing[J]. Space Science & Technology2021, 2021(4): 1-13.
3 LI J F, WANG Y B, LIU Z Y, et al. A new recursive composite adaptive controller for robot manipulators[J]. Space Science & Technology2021(1): 77-83.
4 马广富, 朱庆华, 王鹏宇, 等. 基于终端滑模的航天器自适应预设性能姿态跟踪控制[J]. 航空学报201839(6): 321763.
  MA G F, ZHU Q H, WANG P Y, et al. Adaptive prescribed performance attitude tracking control for spacecraft via terminal sliding-mode technique[J]. Acta Aeronautica et Astronautica Sinica201839(6): 321763 (in Chinese).
5 SCHAUB H, AKELLA M R, JUNKINS J L. Adaptive control of nonlinear attitude motions realizing linear closed loop dynamics[J]. Journal of Guidance, Control, and Dynamics200124(1): 95-100.
6 XIAO B, CAO L, RAN D C. Attitude exponential stabilization control of rigid bodies via disturbance observer[J]. IEEE Transactions on Systems, Man, and Cybernetics: Systems202151(5): 2751-2759.
7 CRASSIDIS J L, MARKLEY F L. Sliding mode control using modified Rodrigues parameters[J]. Journal of Guidance, Control, and Dynamics199619(6): 1381-1383.
8 朱庆华, 董瑞琦, 马广富. 基于动态滑模控制的挠性航天器姿态控制[J]. 控制理论与应用201835(10): 1430-1435.
  ZHU Q H, DONG R Q, MA G F. Dynamical sliding mode for flexible spacecraft attitude control[J]. Control Theory & Applications201835(10): 1430-1435 (in Chinese).
9 KRSTIC M, TSIOTRAS P. Inverse optimal stabilization of a rigid spacecraft[J]. IEEE Transactions on Automatic Control199944(5): 1042-1049.
10 SHARMA R, TEWARI A. Optimal nonlinear tracking of spacecraft attitude maneuvers[J]. IEEE Transactions on Control Systems Technology200412(5): 677-682.
11 张士峰, 钱山, 李鹏奎. 刚体航天器的最小能量姿态机动最优控制研究[J]. 宇航学报200930(4): 1504-1509, 1515.
  ZHANG S F, QIAN S, LI P K. Study on the minimal energy maneuvering control of a rigid spacecraft with momentum transfer[J]. Journal of Astronautics200930(4): 1504-1509, 1515 (in Chinese).
12 WANG D, LIU D R, LI H L. Policy iteration algorithm for online design of robust control for a class of continuous-time nonlinear systems[J]. IEEE Transactions on Automation Science and Engineering201411(2): 627-632.
13 WERBOS P J. Consistency of HDP applied to a simple reinforcement learning problem[J]. Neural Networks19903(2): 179-189.
14 FAN Q, YANG G. Adaptive fault-tolerant control for affine non-linear systems based on approximate dynamic programming[J]. IET Control Theory and Ap-plications201610(6): 655-663.
15 LEWIS F L, LIU D R. Reinforcement learning and approximate dynamic programming for feedback control[M]. Hoboken, Wiley, 2012, 4-10.
16 JIANG Y, JIANG Z P. Robust adaptive dynamic programming and feedback stabilization of nonlinear systems[J]. IEEE Transactions on Neural Networks and Learning Systems201425(5): 882-893.
17 DONG H Y, ZHAO X W, YANG H Y. Reinforcement learning-based approximate optimal control for attitude reorientation under state constraints[J]. IEEE Transactions on Control Systems Technology202129(4): 1664-1673.
18 梁小辉, 胡昌华, 周志杰, 等. 基于自适应动态规划的运载火箭智能姿态容错控制[J]. 航空学报202142(4): 524915.
  LIANG X H, HU C H, ZHOU Z J, et al. ADP-based intelligent attitude fault-tolerant control for launch vehicles[J]. Acta Aeronautica et Astronautica Sinica202142(4): 524915 (in Chinese).
19 JIANG Y, JIANG Z P. Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics[J]. Automatica201248(10): 2699-2704.
20 VAMVOUDAKIS K G, LEWIS F L. Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem[J]. Automatica201046(5): 878-888.
21 WEN G X, GE S S, PHILIP CHEN C L, et al. Adaptive tracking control of surface vessel using optimized backstepping technique[J]. IEEE Transactions on Cybernetics201949(9): 3420-3431.
22 HU Q L, YANG H Y, DONG H Y, et al. Learning-based 6-DOF control for autonomous proximity operations under motion constraints[J]. IEEE Transactions on Aerospace and Electronic Systems202157(6): 4097-4109.
23 NA J, WANG B, LI G, et al. Nonlinear constrained optimal control of wave energy converters with adaptive dynamic programming[J]. IEEE Transactions on Industrial Electronics201966(10): 7904-7915.
24 LIU D R, WANG D, WANG F Y, et al. Neural-network-based online HJB solution for optimal robust guaranteed cost control of continuous-time uncertain nonlinear systems[J]. IEEE Transactions on Cybernetics201444(12): 2834-2847.
25 ZHAO J, NA J, GAO G B. Robust tracking control of uncertain nonlinear systems with adaptive dynamic programming[J]. Neurocomputing2022471: 21-30.
26 SUN J L, LIU C S. Disturbance observer-based robust missile autopilot design with full-state constraints via adaptive dynamic programming[J]. Journal of the Franklin Institute2018355(5): 2344-2368.
27 FAN Q Y, YANG G H. Adaptive actor–critic design-based integral sliding-mode control for partially unknown nonlinear systems with input disturbances[J]. IEEE Transactions on Neural Networks and Learning Systems201627(1): 165-177.
28 ZHAO B, SHI G, WANG D. Asymptotically stable critic designs for approximate optimal stabilization of nonlinear systems subject to mismatched external disturbances[J]. Neurocomputing2020396: 201-208.
29 DONG H Y, ZHAO X W, HU Q, et al. Learning-based attitude tracking control with high-performance parameter estimation[J]. IEEE Transactions on Aerospace and Electronic Systems202258: 2218-2230.
30 YANG H Y, HU Q, DONG H Y, et al. ADP-based spacecraft attitude control under actuator misalignment and pointing constraints[J]. IEEE Transactions on Industrial Electronics202269: 9342-9352.
31 RAN M P, LI J C, XIE L H. Reinforcement-learning-based disturbance rejection control for uncertain nonlinear systems[J]. IEEE Transactions on Cybernetics202252(9): 9621-9633.
32 BHASIN S, KAMALAPURKAR R, JOHNSON M, et al. A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems[J]. Automatica201349(1): 82-92.
33 ABU-KHALAF M, LEWIS F L. Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach[J]. Automatica200541(5): 779-791.
34 SONG R Z, LEWIS F L, WEI Q L, et al. Off-policy actor-critic structure for optimal control of unknown systems with disturbances[J]. IEEE Transactions on Cybernetics201646(5): 1041-1050.
35 POLYCARPOU M M, IOANNOU P A. A robust adaptive nonlinear control design[J]. Automatica199632(3): 423-427.
36 张国山, 胡伟, 郝君. 基于离策略和扰动补偿的未知非线性系统最优控制[J]. 吉林大学学报(工学版)202252(5): 1145-1152.
  ZHANG G S, HU W, HAO J. Optimal control for unknown nonlinear systems based on off-policy and disturbance compensation[J]. Journal of Jilin University (Engineering and Technology Edition)202252(5): 1145-1152 (in Chinese).
文章导航

/