基于逆强化学习的航天器交会对接方法

doi:10.7527/S1000-6893.2023.28420

电子电气工程与控制

本期目录 | 过刊浏览 | 高级检索

前一篇 | 后一篇

基于逆强化学习的航天器交会对接方法

岳承磊¹^,², 汪雪川¹^,²(), 岳晓奎¹^,², 宋婷³^,⁴

^1.西北工业大学航天飞行动力学技术国家级重点实验室，西安　710072
^2.西北工业大学航天学院，西安　710072
^3.上海航天控制技术研究所，上海　201109
^4.上海市空间智能控制技术重点实验室，上海　201109

收稿日期:2022-12-22 修回日期:2023-01-18 接受日期:2023-05-24 出版日期:2023-10-15 发布日期:2023-06-02
通讯作者: 汪雪川 E-mail:xcwang@nwpu.edu.cn
基金资助:
国家自然科学基金(U2013206)

A spacecraft rendezvous and docking method based on inverse reinforcement learning

Chenglei YUE¹^,², Xuechuan WANG¹^,²(), Xiaokui YUE¹^,², Ting SONG³^,⁴

^1.National Key Laboratory of Aerospace Flight Dynamics，Northwestern Polytechnical University，Xi’an 　710072，China
^2.School of Astronautics，Northwestern Polytechnical University，Xi’an 　710072，China
^3.Shanghai Aerospace Control Technology Institute，Shanghai 　201109，China
^4.Shanghai Key Laboratory of Space Intelligent Control Technology，Shanghai 　201109，China

Received:2022-12-22 Revised:2023-01-18 Accepted:2023-05-24 Online:2023-10-15 Published:2023-06-02
Contact: Xuechuan WANG E-mail:xcwang@nwpu.edu.cn
Supported by:
National Natural Science Foundation of China(U2013206)

摘要/Abstract

摘要：

针对使用神经网络解决追踪航天器接近静止目标问题，提出一种使用模型预测控制提供数据集，基于生成对抗逆强化学习训练神经网络的方法。首先在考虑追踪航天器最大速度约束，控制输入饱和约束和空间锥约束下，建立追踪航天器接近静止目标的动力学，并通过模型预测控制驱动航天器到达指定位置。其次为标称轨迹添加扰动，通过前述方法计算从各起始位置到目标点的轨迹，收集各轨迹各控制时刻的状态与控制信息，形成包含状态与对应控制的训练集。最后通过设置网络结构与参数和训练超参数，在训练集驱动下，采用生成对抗逆强化学习方法进行网络训练。仿真结果表明生成对抗逆强化学习可模仿专家轨迹行为，并成功训练神经网络，驱动航天器从起始点向目标位置运动。

关键词: 模型预测控制, 生成对抗逆强化学习, 模仿学习, 网络训练, 神经网络

Abstract:

For spacecraft proximity maneuvering and rendezvous, a method for training neural networks based on generative adversarial inverse reinforcement learning is proposed by using model predictive control to provide the expert dataset. Firstly, considering the maximum velocity constraint, the control input saturation constraint and the space cone constraint, the dynamics of the chaser spacecraft approaching a static target is established. Then, the chaser spacecraft is driven to reach the target using model predictive control. Secondly, disturbances are added to the nominal trajectory, and the trajectories from each starting positions to the target are calculated using the aforementioned method. The state and command of trajectories at each time are collected to form a training set. Finally, the network structure and parameters are set, and hyperparameters are trained. Driven by the training set, the adversarial inverse reinforcement learning method is used to train the network. The simulation results show that adversarial inverse reinforcement learning can imitate the behavior of expert trajectories, and successfully train the neural network to drive the spacecraft to move from the starting point to the static target.

Key words: model predictive control, generative adversarial inverse reinforcement learning, imitation learning, network training, neural network

中图分类号:

V448.234

岳承磊, 汪雪川, 岳晓奎, 宋婷. 基于逆强化学习的航天器交会对接方法[J]. 航空学报, 2023, 44(19): 328420-328420.

Chenglei YUE, Xuechuan WANG, Xiaokui YUE, Ting SONG. A spacecraft rendezvous and docking method based on inverse reinforcement learning[J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(19): 328420-328420.

图/表 12

图 1

图 2

表 1

图 3

图 4

图 5

表 2

图 6

图 7

图 8

图 9

图 10

参考文献 23

1	林来兴. 空间碎片现状与清理［J］. 航天器工程， 2012， 21（3）： 1-10.
	LIN L X. Status and removal of space debris［J］. Spacecraft Engineering， 2012， 21（3）： 1-10 （in Chinese）.
2	孟云鹤. 近地轨道航天器编队飞行控制与应用研究［D］. 长沙：国防科学技术大学， 2006： 1-6.
	MENG Y H. Research on control and application of LEO spacecraft formation flying［D］. Changsha： National University of Defense Technology， 2006 ： 1-6 （in Chinese）.
3	赵力冉，党朝辉，张育林. 空间轨道博弈：概念、原理与方法［J］. 指挥与控制学报， 2021， 7（3）： 215-224.
	ZHAO L R， DANG Z H， ZHANG Y L. Orbital game： Concepts， principles and methods［J］. Journal of Command and Control， 2021， 7（3）： 215-224 （in Chinese）.
4	LI Q， YUAN J P， ZHANG B， et al. Model predictive control for autonomous rendezvous and docking with a tumbling target［J］. Aerospace Science and Technology， 2017， 69： 700-711.
5	MAMMARELLA M， CAPELLO E， PARK H， et al. Tube-based robust model predictive control for spacecraft proximity operations in the presence of persistent disturbance［J］. Aerospace Science and Technology， 2018， 77： 585-594.
6	LI P， ZHU Z H. Line-of-sight nonlinear model predictive control for autonomous rendezvous in elliptical orbit［J］. Aerospace Science and Technology， 2017， 69： 236-243.
7	李成录. 大数据背景下机器学习算法的综述［J］. 信息记录材料， 2018， 19（5）： 4-5.
	LI C L. Under the background of big data review of machine learning algorithms［J］. Information Recording Materials， 2018， 19（5）： 4-5 （in Chinese）.
8	龙慧，朱定局，田娟. 深度学习在智能机器人中的应用研究综述［J］. 计算机科学， 2018， 45（S2）： 43-47， 52.
	LONG H， ZHU D J， TIAN J. Research on deep learning used in intelligent robots［J］. Computer Science， 2018， 45（S2）： 43-47， 52 （in Chinese）.
9	吴今培. 智能故障诊断技术的发展和展望［J］. 振动测试与诊断， 1999， 19（2）： 79-86.
	WU J P. Development and prospect of intelligent fault diagnosis［J］. Journal of Vibration， Measurement & Diagnosis， 1999， 19（2）： 79-86. （in Chinese）
10	HUA J A， ZENG L C， LI G F， et al. Learning for a robot： Deep reinforcement learning， imitation learning， transfer learning［J］. Sensors， 2021， 21（4）： 1278.
11	FANG B， JIA S D， GUO D， et al. Survey of imitation learning for robotic manipulation［J］. International Journal of Intelligent Robotics and Applications， 2019， 3（4）： 362-369.
12	NG A Y， RUSSELL S J. Algorithms for inverse reinforcement learning［C］∥International Conference on Machine Learning. San Franciso： Morgan Kaufmann Publishers Inc.， 2000： 663-670.
13	ZIEBART B D， MAAS A， BAGNELL J A， et al. Maximum entropy inverse reinforcement learning［C］∥ Proceedings of the National Conference on Artificial Intelligence.Washington， D.C.： AAAI， 2008： 1433-1438.
14	AGHASADEGHI N， BRETL T. Maximum entropy inverse reinforcement learning in continuous state spaces with path integrals［C］∥ 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems. Piscataway： IEEE Press， 2011： 1561-1566.
15	GOODFELLOW I J， POUGET-ABADIE J， MIRZA M， et al. Generative adversarial networks［DB/OL］. arXiv preprint： 1406.2661， 2014.
16	FINN C， CHRISTIANO P， ABBEEL P， et al. A connection between generative adversarial networks， inverse reinforcement learning， and energy-based models［DB/OL］. arXiv preprint： 1611.03852， 2016.
17	BING Z S， LEMKE C， CHENG L， et al. Energy-efficient and damage-recovery slithering gait design for a snake-like robot based on reinforcement learning and inverse reinforcement learning［J］. Neural Networks， 2020， 129： 323-333.
18	LI F J， WAGNER J， WANG Y E. Safety-aware adversarial inverse reinforcement learning for highway autonomous driving［J］. Journal of Autonomous Vehicles and Systems， 2021， 1（4）： 041004.
19	FEDERICI L， BENEDIKTER B， ZAVOLI A. Machine learning techniques for autonomous spacecraft guidance during proximity operations：AIAA-2021-0668［R］. Reston： AIAA， 2021.
20	CLOHESSY W H， WILTSHIRE R S. Terminal guidance system for satellite rendezvous［J］. Journal of the Aerospace Sciences， 1960， 27（9）： 653-658.
21	袁亚湘，孙文瑜. 最优化理论与方法［M］. 北京：科学出版社， 1997.422-426.
	YUAN Y X， SUN W Y. Optimization theory and method［M］. Beijing： Science Press， 1997. 422-426 （in Chinese）.
22	陈希亮，曹雷，何明，等. 深度逆向强化学习研究综述［J］. 计算机工程与应用， 2018， 54（5）： 24-35.
	CHEN X L， CAO L， HE M， et al. Overview of deep inverse reinforcement learning［J］. Computer Engineering and Applications， 2018， 54（5）： 24-35 （in Chinese）.
23	SCHULMAN J， WOLSKI F， DHARIWAL P， et al. Proximal policy optimization algorithms［DB/OL］. arXiv preprint： 1707.06347， 2017.

编辑推荐 0

Metrics

阅读次数

全文

214

HTML			PDF

最新录用	在线预览	正式出版	最新录用	在线预览	正式出版
0	0	13	23	0	178

来源	本网站	其他网站

次数	194	20
比例	91%	9%

摘要

423

最新录用	在线预览	正式出版

62	0	361

来源	本网站	其他网站

次数	315	108
比例	74%	26%

本文评价

地址：北京市海淀区北四环中路辅路238号柏彦大厦

邮政编码：100083

E-mail：hkxb@buaa.edu.cn

关于我们

期刊社服务

专业学科

封面文章

友情链接

主管单位：中国科学技术协会主办单位：中国航空学会北京航空航天大学

组别	状态预测步长N_p	控制预测步长N_c
参数组1	30	10
参数组2	20	10
参数组3	10	10
参数组4	30	20
参数组5	30	5
参数组6	30	2

超参数名称	取值
折扣奖励因子	0.95
熵缩放因子	10^-2
每次收集轨迹/条	30
生成器 Epoch	25
网络学习率	10^-4
GAE优势估计器因子	0.95
价值缩放因子	10^-4
Mini-batch	32
新旧策略比率裁剪值	0.2
判别器 Epoch	3

[1]	马菲, 张琼, 赖培军, 岳一笛. 基于BP神经网络的试飞训练安全性量化模型[J]. 航空学报, 2024, 45(5): 529957-529957.
[2]	倪育德, 闫苗玉, 刘瑞华. 基于DOA-BP神经网络的电离层TEC短期预测[J]. 航空学报, 2024, 45(4): 328707-328707.
[3]	董豪泽, 陈昱达, 刘丹, 李忠奎. 障碍物空间下分布式轨迹规划的死锁破解[J]. 航空学报, 2023, 44(S2): 729771-729771.
[4]	曹煜琪, 付皓然, 高飞, 吕熙敏. 基于MPCC的鸭翼尾座式垂直起降无人机轨迹跟踪控制算法[J]. 航空学报, 2023, 44(S2): 729950-729950.
[5]	李忠智, 马金毅, 艾剑良, 董一群. 拟VGG16网络的航空传感器故障检测分类[J]. 航空学报, 2023, 44(S1): 727615-727615.
[6]	刘武, 吴云燕, 刘玮, 田明明, 黄天鹏. 考虑未知扰动的RLV再入鲁棒容错姿态控制[J]. 航空学报, 2023, 44(S1): 727787-727787.
[7]	刘晨阳, 吴大伟, 郭一泽, 吕欣赛, 周佳妮, 邵书义. 不确定强耦合下四旋翼姿态鲁棒自适应控制[J]. 航空学报, 2023, 44(S1): 727645-727645.
[8]	王志凯, 陈盛, 范玮. 神经网络宽度对燃烧室排放预测的影响[J]. 航空学报, 2023, 44(5): 126816-126816.
[9]	何磊, 钱炜祺, 董康生, 易贤, 柴聪聪. 基于卷积神经网络的结冰翼型气动特性建模[J]. 航空学报, 2023, 44(5): 126434-126434.
[10]	付宇鹏, 邓向阳, 朱子强, 张立民. 基于价值滤波的空战机动决策优化方法[J]. 航空学报, 2023, 44(22): 628871-628871.
[11]	王宏伦, 王延祥, 刘一恒. 基于轨迹映射的无人机拖曳式空中回收轨迹优化[J]. 航空学报, 2023, 44(20): 628775-628775.
[12]	李怀璐, 王旭, 王霄, 赵彤, 张伟伟. 大迎角机动飞行的气动力建模与飞行仿真[J]. 航空学报, 2023, 44(19): 128410-128410.
[13]	朱祥维, 沈丹, 肖凯, 马岳鑫, 廖祥, 古富强, 余芳文, 高柯夫, 刘经南. 类脑导航的机理、算法、实现与展望[J]. 航空学报, 2023, 44(19): 28569-028569.
[14]	梁益铭, 李广宁, 徐敏. 基于机器学习的智能控制数值虚拟飞行方法[J]. 航空学报, 2023, 44(17): 128098-81280986.
[15]	宋玉存, 葛泉波, 朱军龙, 陆振宇. 基于梯度差自适应学习率优化的改进YOLOX目标检测算法[J]. 航空学报, 2023, 44(14): 327951-327951.

基于逆强化学习的航天器交会对接方法

A spacecraft rendezvous and docking method based on inverse reinforcement learning

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 12

参考文献 23

相关文章 15

编辑推荐 0

Metrics

本文评价