航天器姿态稳定强化学习鲁棒最优控制方法

doi:10.7527/S1000-6893.2023.28890

Abstract

Abstract:

The problem of optimal attitude stabilization control of rigid spacecraft despite external disturbances is investigated. An online reinforcement learning-based intelligent and robust control approach is presented via the adaptive dynamic programming technique. In this approach， a critic-only neural network is developed to learn the optimal control policy of the spacecraft attitude system with external disturbance. A new estimation law is synthesized to estimate the weights of that network online. The learned controller can achieve near-optimal control performance. Then， a robust control effort is designed and added into the learned controller to formulate an intelligent and robust controller. It is proven that the closed-loop attitude system obtained from the proposed controller is uniformly ultimately bounded and that the weight estimation error of the Critic NN is convergent by Lyapunov theory. Comparison with the traditional actor-critical neural network-based control schemes shows that with less computation complexity and great robustness to external disturbances， the proposed control approach is less dependent of the persistent excitation condition. Simulation results verify the superior control performance of the proposed approach.

Key words: spacecraft, attitude control, reinforcement learning, adaptive dynamic programming, external disturbance, robustness

CLC Number:

V249.1

Bing XIAO, Haichao ZHANG. Reinforcement learning robust optimal control for spacecraft attitude stabilization[J]. Acta Aeronautica et Astronautica Sinica, 2024, 45(1): 628890.

Figures/Tables 11

Fig.1

Fig.2

Fig.3

Fig.4

Fig.5

Fig.6

Fig.7

Fig.8

Fig.9

Fig.10

Fig.11

References 36

1	HUANG W， RONG W， LIU D H， et al. Design and realization of recovery system of Chang’e-5 reentry spacecraft［J］. Space Science & Technology， 2021（1）： 133-142.
2	HUANG X Y， LI M D， WANG X L， et al. The Tianwen-1 guidance， navigation， and control for Mars entry， descent， and landing［J］. Space Science & Technology， 2021， 2021（4）： 1-13.
3	LI J F， WANG Y B， LIU Z Y， et al. A new recursive composite adaptive controller for robot manipulators［J］. Space Science & Technology， 2021（1）： 77-83.
4	马广富，朱庆华，王鹏宇，等. 基于终端滑模的航天器自适应预设性能姿态跟踪控制［J］. 航空学报， 2018， 39（6）： 321763.
	MA G F， ZHU Q H， WANG P Y， et al. Adaptive prescribed performance attitude tracking control for spacecraft via terminal sliding-mode technique［J］. Acta Aeronautica et Astronautica Sinica， 2018， 39（6）： 321763 （in Chinese）.
5	SCHAUB H， AKELLA M R， JUNKINS J L. Adaptive control of nonlinear attitude motions realizing linear closed loop dynamics［J］. Journal of Guidance， Control， and Dynamics， 2001， 24（1）： 95-100.
6	XIAO B， CAO L， RAN D C. Attitude exponential stabilization control of rigid bodies via disturbance observer［J］. IEEE Transactions on Systems， Man， and Cybernetics： Systems， 2021， 51（5）： 2751-2759.
7	CRASSIDIS J L， MARKLEY F L. Sliding mode control using modified Rodrigues parameters［J］. Journal of Guidance， Control， and Dynamics， 1996， 19（6）： 1381-1383.
8	朱庆华，董瑞琦，马广富. 基于动态滑模控制的挠性航天器姿态控制［J］. 控制理论与应用， 2018， 35（10）： 1430-1435.
	ZHU Q H， DONG R Q， MA G F. Dynamical sliding mode for flexible spacecraft attitude control［J］. Control Theory & Applications， 2018， 35（10）： 1430-1435 （in Chinese）.
9	KRSTIC M， TSIOTRAS P. Inverse optimal stabilization of a rigid spacecraft［J］. IEEE Transactions on Automatic Control， 1999， 44（5）： 1042-1049.
10	SHARMA R， TEWARI A. Optimal nonlinear tracking of spacecraft attitude maneuvers［J］. IEEE Transactions on Control Systems Technology， 2004， 12（5）： 677-682.
11	张士峰，钱山，李鹏奎. 刚体航天器的最小能量姿态机动最优控制研究［J］. 宇航学报， 2009， 30（4）： 1504-1509， 1515.
	ZHANG S F， QIAN S， LI P K. Study on the minimal energy maneuvering control of a rigid spacecraft with momentum transfer［J］. Journal of Astronautics， 2009， 30（4）： 1504-1509， 1515 （in Chinese）.
12	WANG D， LIU D R， LI H L. Policy iteration algorithm for online design of robust control for a class of continuous-time nonlinear systems［J］. IEEE Transactions on Automation Science and Engineering， 2014， 11（2）： 627-632.
13	WERBOS P J. Consistency of HDP applied to a simple reinforcement learning problem［J］. Neural Networks， 1990， 3（2）： 179-189.
14	FAN Q， YANG G. Adaptive fault-tolerant control for affine non-linear systems based on approximate dynamic programming［J］. IET Control Theory and Ap-plications， 2016， 10（6）： 655-663.
15	LEWIS F L， LIU D R. Reinforcement learning and approximate dynamic programming for feedback control［M］. Hoboken， Wiley， 2012， 4-10.
16	JIANG Y， JIANG Z P. Robust adaptive dynamic programming and feedback stabilization of nonlinear systems［J］. IEEE Transactions on Neural Networks and Learning Systems， 2014， 25（5）： 882-893.
17	DONG H Y， ZHAO X W， YANG H Y. Reinforcement learning-based approximate optimal control for attitude reorientation under state constraints［J］. IEEE Transactions on Control Systems Technology， 2021， 29（4）： 1664-1673.
18	梁小辉，胡昌华，周志杰，等. 基于自适应动态规划的运载火箭智能姿态容错控制［J］. 航空学报， 2021， 42（4）： 524915.
	LIANG X H， HU C H， ZHOU Z J， et al. ADP-based intelligent attitude fault-tolerant control for launch vehicles［J］. Acta Aeronautica et Astronautica Sinica， 2021， 42（4）： 524915 （in Chinese）.
19	JIANG Y， JIANG Z P. Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics［J］. Automatica， 2012， 48（10）： 2699-2704.
20	VAMVOUDAKIS K G， LEWIS F L. Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem［J］. Automatica， 2010， 46（5）： 878-888.
21	WEN G X， GE S S， PHILIP CHEN C L， et al. Adaptive tracking control of surface vessel using optimized backstepping technique［J］. IEEE Transactions on Cybernetics， 2019， 49（9）： 3420-3431.
22	HU Q L， YANG H Y， DONG H Y， et al. Learning-based 6-DOF control for autonomous proximity operations under motion constraints［J］. IEEE Transactions on Aerospace and Electronic Systems， 2021， 57（6）： 4097-4109.
23	NA J， WANG B， LI G， et al. Nonlinear constrained optimal control of wave energy converters with adaptive dynamic programming［J］. IEEE Transactions on Industrial Electronics， 2019， 66（10）： 7904-7915.
24	LIU D R， WANG D， WANG F Y， et al. Neural-network-based online HJB solution for optimal robust guaranteed cost control of continuous-time uncertain nonlinear systems［J］. IEEE Transactions on Cybernetics， 2014， 44（12）： 2834-2847.
25	ZHAO J， NA J， GAO G B. Robust tracking control of uncertain nonlinear systems with adaptive dynamic programming［J］. Neurocomputing， 2022， 471： 21-30.
26	SUN J L， LIU C S. Disturbance observer-based robust missile autopilot design with full-state constraints via adaptive dynamic programming［J］. Journal of the Franklin Institute， 2018， 355（5）： 2344-2368.
27	FAN Q Y， YANG G H. Adaptive actor–critic design-based integral sliding-mode control for partially unknown nonlinear systems with input disturbances［J］. IEEE Transactions on Neural Networks and Learning Systems， 2016， 27（1）： 165-177.
28	ZHAO B， SHI G， WANG D. Asymptotically stable critic designs for approximate optimal stabilization of nonlinear systems subject to mismatched external disturbances［J］. Neurocomputing， 2020， 396： 201-208.
29	DONG H Y， ZHAO X W， HU Q， et al. Learning-based attitude tracking control with high-performance parameter estimation［J］. IEEE Transactions on Aerospace and Electronic Systems， 2022， 58： 2218-2230.
30	YANG H Y， HU Q， DONG H Y， et al. ADP-based spacecraft attitude control under actuator misalignment and pointing constraints［J］. IEEE Transactions on Industrial Electronics， 2022， 69： 9342-9352.
31	RAN M P， LI J C， XIE L H. Reinforcement-learning-based disturbance rejection control for uncertain nonlinear systems［J］. IEEE Transactions on Cybernetics， 2022， 52（9）： 9621-9633.
32	BHASIN S， KAMALAPURKAR R， JOHNSON M， et al. A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems［J］. Automatica， 2013， 49（1）： 82-92.
33	ABU-KHALAF M， LEWIS F L. Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach［J］. Automatica， 2005， 41（5）： 779-791.
34	SONG R Z， LEWIS F L， WEI Q L， et al. Off-policy actor-critic structure for optimal control of unknown systems with disturbances［J］. IEEE Transactions on Cybernetics， 2016， 46（5）： 1041-1050.
35	POLYCARPOU M M， IOANNOU P A. A robust adaptive nonlinear control design［J］. Automatica， 1996， 32（3）： 423-427.
36	张国山，胡伟，郝君. 基于离策略和扰动补偿的未知非线性系统最优控制［J］. 吉林大学学报（工学版）， 2022， 52（5）： 1145-1152.
	ZHANG G S， HU W， HAO J. Optimal control for unknown nonlinear systems based on off-policy and disturbance compensation［J］. Journal of Jilin University （Engineering and Technology Edition）， 2022， 52（5）： 1145-1152 （in Chinese）.

[1]	Kaifang WAN, Zhilin WU, Yunhui WU, Haozhi QIANG, Yibo WU, Bo LI. Cooperative location of multiple UAVs with deep reinforcement learning in GPS-denied environment [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(8): 331024-331024.
[2]	Lingfeng JIANG, Xinkai LI, Hai ZHANG, Hanwei LI, Hongli ZHANG. Mapless navigation of UAVs in dynamic environments based on an improved TD3 algorithm [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(8): 331035-331035.
[3]	Yifan WANG, Xiyun GUO, Shiyuan JIA, Gang CHEN, Mo REN. Configuration optimization method of three-branch robot for truss holding [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(7): 431033-431033.
[4]	Min YANG, Guanjun LIU, Ziyuan ZHOU. Control of lunar landers based on secure reinforcement learning [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(3): 630553-630553.
[5]	Ruoyao XIAO, Lianyu ZHENG, Jian ZHOU, Siru ZHAO, Jieru ZHANG, Yuwu CHEN. Online optimization method for positioning accuracy in cylindrical components aligning based on digital twins [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(19): 531978-531978.
[6]	Chen WANG, Caisheng WEI, Zeyang YIN, Kai JIN, Xingchen LI. Collaborative planning of multi-UAV trajectories and communication strategies considering channel resource constraints [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(18): 331837-331837.
[7]	Yizhe LUO, Hui ZHANG, Xinde YU, Zhao JIN, Shuo FENG, Yucheng SHI, Mingling XU. Hierarchical dynamic scheduling for multi-wave carrier-based aircraft ammunition support missions [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(18): 331945-331945.
[8]	Xiangsong HUANG, Mengyu WANG, Dapeng PAN. Adversarial reinforcement learning-based UAV escape path planning method [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(17): 331637-331637.
[9]	Shufan WU, Xiaoyun Sun, Qianyun ZHANG, Qiang SHEN, Yu XIANG. Research progress in test mass dynamics and control of space inertial sensor [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(17): 331781-331781.
[10]	Jianye SUN, Dong YE, Yan XIAO. Active observation trajectory planning for non-cooperative spacecraft [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(15): 331587-331587.
[11]	Yu WANG, Zhipeng XIE, Yongjian TIAN, Guanglei MENG. Distributed UAV formation control with virtual structure guided reinforcement learning [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(15): 331354-331354.
[12]	Wei CHEN, Lulu LI, Dong CHEN, Shaohui ZHANG, Yafei LI, Ke WANG, Yuanyuan JIN, Mingliang XU. Multi-aircraft cooperative decision-making methods driven by differentiated support demands for carrier-based aircraft [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(13): 531274-531274.
[13]	Xudong CHEN, Qiqi CHEN, Yizhe LUO, Jiabao WANG, Mingliang XU. Dynamic parallel scheduling of heterogeneous carrier-based aircraft deck support operations [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(13): 531329-531329.
[14]	Zheng WANG, Hua WANG, Keke CUI, Chaochao LI, Junnan LIU, Mingliang XU. Locally guided reinforcement learning for autonomous dispatching of carrier-based aircraft [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(13): 531333-531333.
[15]	Wenhui LING, Chunhui MU, Lingcong NIE, Xian DU, Ximing SUN. Improved DDPG-based multipoint pressure distribution control of variable geometry scramjet combustor at wide range velocities [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(12): 131092-131092.

Reinforcement learning robust optimal control for spacecraft attitude stabilization

RichHTML

PDF (PC)

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 11

References 36

Related Articles 15

Recommended Articles

Metrics

Comments