基于逆强化学习的航天器交会对接方法

doi:10.7527/S1000-6893.2023.28420

Abstract

Abstract:

For spacecraft proximity maneuvering and rendezvous, a method for training neural networks based on generative adversarial inverse reinforcement learning is proposed by using model predictive control to provide the expert dataset. Firstly, considering the maximum velocity constraint, the control input saturation constraint and the space cone constraint, the dynamics of the chaser spacecraft approaching a static target is established. Then, the chaser spacecraft is driven to reach the target using model predictive control. Secondly, disturbances are added to the nominal trajectory, and the trajectories from each starting positions to the target are calculated using the aforementioned method. The state and command of trajectories at each time are collected to form a training set. Finally, the network structure and parameters are set, and hyperparameters are trained. Driven by the training set, the adversarial inverse reinforcement learning method is used to train the network. The simulation results show that adversarial inverse reinforcement learning can imitate the behavior of expert trajectories, and successfully train the neural network to drive the spacecraft to move from the starting point to the static target.

Key words: model predictive control, generative adversarial inverse reinforcement learning, imitation learning, network training, neural network

CLC Number:

V448.234

Chenglei YUE, Xuechuan WANG, Xiaokui YUE, Ting SONG. A spacecraft rendezvous and docking method based on inverse reinforcement learning[J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(19): 328420-328420.

Figures/Tables 12

Fig.1

Fig.2

Table 1

Fig.3

Fig.4

Fig.5

Table 2

Fig.6

Fig.7

Fig.8

Fig.9

Fig.10

References 23

1	林来兴. 空间碎片现状与清理［J］. 航天器工程， 2012， 21（3）： 1-10.
	LIN L X. Status and removal of space debris［J］. Spacecraft Engineering， 2012， 21（3）： 1-10 （in Chinese）.
2	孟云鹤. 近地轨道航天器编队飞行控制与应用研究［D］. 长沙：国防科学技术大学， 2006： 1-6.
	MENG Y H. Research on control and application of LEO spacecraft formation flying［D］. Changsha： National University of Defense Technology， 2006 ： 1-6 （in Chinese）.
3	赵力冉，党朝辉，张育林. 空间轨道博弈：概念、原理与方法［J］. 指挥与控制学报， 2021， 7（3）： 215-224.
	ZHAO L R， DANG Z H， ZHANG Y L. Orbital game： Concepts， principles and methods［J］. Journal of Command and Control， 2021， 7（3）： 215-224 （in Chinese）.
4	LI Q， YUAN J P， ZHANG B， et al. Model predictive control for autonomous rendezvous and docking with a tumbling target［J］. Aerospace Science and Technology， 2017， 69： 700-711.
5	MAMMARELLA M， CAPELLO E， PARK H， et al. Tube-based robust model predictive control for spacecraft proximity operations in the presence of persistent disturbance［J］. Aerospace Science and Technology， 2018， 77： 585-594.
6	LI P， ZHU Z H. Line-of-sight nonlinear model predictive control for autonomous rendezvous in elliptical orbit［J］. Aerospace Science and Technology， 2017， 69： 236-243.
7	李成录. 大数据背景下机器学习算法的综述［J］. 信息记录材料， 2018， 19（5）： 4-5.
	LI C L. Under the background of big data review of machine learning algorithms［J］. Information Recording Materials， 2018， 19（5）： 4-5 （in Chinese）.
8	龙慧，朱定局，田娟. 深度学习在智能机器人中的应用研究综述［J］. 计算机科学， 2018， 45（S2）： 43-47， 52.
	LONG H， ZHU D J， TIAN J. Research on deep learning used in intelligent robots［J］. Computer Science， 2018， 45（S2）： 43-47， 52 （in Chinese）.
9	吴今培. 智能故障诊断技术的发展和展望［J］. 振动测试与诊断， 1999， 19（2）： 79-86.
	WU J P. Development and prospect of intelligent fault diagnosis［J］. Journal of Vibration， Measurement & Diagnosis， 1999， 19（2）： 79-86. （in Chinese）
10	HUA J A， ZENG L C， LI G F， et al. Learning for a robot： Deep reinforcement learning， imitation learning， transfer learning［J］. Sensors， 2021， 21（4）： 1278.
11	FANG B， JIA S D， GUO D， et al. Survey of imitation learning for robotic manipulation［J］. International Journal of Intelligent Robotics and Applications， 2019， 3（4）： 362-369.
12	NG A Y， RUSSELL S J. Algorithms for inverse reinforcement learning［C］∥International Conference on Machine Learning. San Franciso： Morgan Kaufmann Publishers Inc.， 2000： 663-670.
13	ZIEBART B D， MAAS A， BAGNELL J A， et al. Maximum entropy inverse reinforcement learning［C］∥ Proceedings of the National Conference on Artificial Intelligence.Washington， D.C.： AAAI， 2008： 1433-1438.
14	AGHASADEGHI N， BRETL T. Maximum entropy inverse reinforcement learning in continuous state spaces with path integrals［C］∥ 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems. Piscataway： IEEE Press， 2011： 1561-1566.
15	GOODFELLOW I J， POUGET-ABADIE J， MIRZA M， et al. Generative adversarial networks［DB/OL］. arXiv preprint： 1406.2661， 2014.
16	FINN C， CHRISTIANO P， ABBEEL P， et al. A connection between generative adversarial networks， inverse reinforcement learning， and energy-based models［DB/OL］. arXiv preprint： 1611.03852， 2016.
17	BING Z S， LEMKE C， CHENG L， et al. Energy-efficient and damage-recovery slithering gait design for a snake-like robot based on reinforcement learning and inverse reinforcement learning［J］. Neural Networks， 2020， 129： 323-333.
18	LI F J， WAGNER J， WANG Y E. Safety-aware adversarial inverse reinforcement learning for highway autonomous driving［J］. Journal of Autonomous Vehicles and Systems， 2021， 1（4）： 041004.
19	FEDERICI L， BENEDIKTER B， ZAVOLI A. Machine learning techniques for autonomous spacecraft guidance during proximity operations：AIAA-2021-0668［R］. Reston： AIAA， 2021.
20	CLOHESSY W H， WILTSHIRE R S. Terminal guidance system for satellite rendezvous［J］. Journal of the Aerospace Sciences， 1960， 27（9）： 653-658.
21	袁亚湘，孙文瑜. 最优化理论与方法［M］. 北京：科学出版社， 1997.422-426.
	YUAN Y X， SUN W Y. Optimization theory and method［M］. Beijing： Science Press， 1997. 422-426 （in Chinese）.
22	陈希亮，曹雷，何明，等. 深度逆向强化学习研究综述［J］. 计算机工程与应用， 2018， 54（5）： 24-35.
	CHEN X L， CAO L， HE M， et al. Overview of deep inverse reinforcement learning［J］. Computer Engineering and Applications， 2018， 54（5）： 24-35 （in Chinese）.
23	SCHULMAN J， WOLSKI F， DHARIWAL P， et al. Proximal policy optimization algorithms［DB/OL］. arXiv preprint： 1707.06347， 2017.

组别	状态预测步长N_p	控制预测步长N_c
参数组1	30	10
参数组2	20	10
参数组3	10	10
参数组4	30	20
参数组5	30	5
参数组6	30	2

超参数名称	取值
折扣奖励因子	0.95
熵缩放因子	10^-2
每次收集轨迹/条	30
生成器 Epoch	25
网络学习率	10^-4
GAE优势估计器因子	0.95
价值缩放因子	10^-4
Mini-batch	32
新旧策略比率裁剪值	0.2
判别器 Epoch	3

[1]	Zhongzhi LI, Jinyi MA, Jianliang AI, Yiqun DONG. Fault detection and classification of aerospace sensors using deep neural networks finetuned from VGG16 [J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(S1): 727615-727615.
[2]	Wu LIU, Yunyan WU, Wei LIU, Mingming TIAN, Tianpeng HUANG. Re-entry robust fault tolerant attitude control for RLVs considering unknown disturbances [J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(S1): 727787-727787.
[3]	Chenyang LIU, Dawei WU, Yize GUO, Xinsai LV, Jiani ZHOU, Shuyi SHAO. Robust adaptive attitude control of quadrotor with uncertain strong coupling [J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(S1): 727645-727645.
[4]	Zhikai WANG, Sheng CHEN, Wei FAN. Effect of neural network width on combustor emission prediction [J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2023, 44(5): 126816-126816.
[5]	Lei HE, Weiqi QIAN, Kangsheng DONG, Xian YI, Congcong CHAI. Aerodynamic characteristics modeling of iced airfoil based on convolution neural networks [J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2023, 44(5): 126434-126434.
[6]	Xiangwei ZHU, Dan SHEN, Kai XIAO, Yuexin MA, Xiang LIAO, Fuqiang GU, Fangwen YU, Kefu GAO, Jingnan LIU. Mechanisms, algorithms, implementation and perspectives of brain⁃inspired navigation [J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(19): 28569-028569.
[7]	Huailu LI, Xu WANG, Xiao WANG, Tong ZHAO, Weiwei ZHANG. Aerodynamic modeling and flight simulation of maneuver flight at high angle of attack [J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(19): 128410-128410.
[8]	Yiming LIANG, Guangning LI, Min XU. Method for numerical virtual flight with intelligent control based on machine learning [J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(17): 128098-81280986.
[9]	Lianzi WANG, Ling WANG, Daiyin ZHU. An ISAR autofocus imaging algorithm based on FCN and transfer learing [J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(17): 328172-328172.
[10]	Yucun SONG, Quanbo GE, Junlong ZHU, Zhenyu LU. Improved YOLOX object detection algorithm based on gradient difference adaptive learning rate optimization [J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(14): 327951-327951.
[11]	Leliang REN, Yong XIAN, Shaopeng LI, Gang LEI, Wei WU, Bing LI. A neural network model for impact point prediction of ballistic missile based on improved second-order optimizer with parallel learning [J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(14): 327964-327964.
[12]	Lei DONG, Hongbing CHEN, Xi CHEN, Changxiao ZHAO. Distributed multi-agent coalition task allocation strategy for single pilot operation mode based on DQN [J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(13): 327895-327895.
[13]	Chao WEN, Wenhan DONG, XIE Wujie, Ming CAI, Ri LIU. Distributed cooperative area search method for UAV swarms based on revisit mechanism [J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2023, 44(11): 327561-327561.
[14]	HU Wei, WAN Wenzhang, CHEN Mou. Neural network and disturbance observer based control for automatic carrier landing of UAV [J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2022, 43(S1): 726963-726963.
[15]	CHEN Bo, YUE Kai, WANG Rusheng, HU Mingnan. Learning-based multi-rate multi-sensor fusion localization method [J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2022, 43(S1): 726904-726904.

A spacecraft rendezvous and docking method based on inverse reinforcement learning

RichHTML

PDF (PC)

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 12

References 23

Related Articles 15

Recommended Articles

Metrics

Comments