基于安全强化学习的月球着陆器控制

doi:10.7527/S1000-6893.2024.30553

Abstract

Abstract:

In lunar landing missions， the lander must perform precise operations in extreme environments and often faces the challenge of communication delays. These factors severely limit the real-time operation capabilities of ground control. In response to these challenges， this study proposes a Deep Reinforcement Learning （DRL） framework for safety enhancement based on the Semi-Markov Decision Process （SMDP） to improve the operational safety of autonomous spacecraft landing. To compress the state space and maintain the key characteristics of the decision-making process， this framework compresses the Markov Decision Process （MDP） of the historical trajectory into a SMDP， and constructs an abstract SMDP state transition diagram based on the compressed trajectory. Then， the key state-action pairs of potential risks are identified， and the real-time monitoring and intervention strategy is implemented. The framework effectively improves the safety of the spacecraft’s autonomous landing. Furthermore， the reverse breadth first search method is used to search for the state-action pairs that have decisive impact on task results， and real-time adjustment of the model is realized through the built state-action monitor. Experimental results show that this framework increases the mission success rate of the lunar lander by up to 22% in a simulated environment on the pre-trained Deep Q-Network （DQN）， Dueling DQN， and DDQN models without adding additional sensors or significantly changing the existing system configuration. According to the preset safety evaluation standards， the framework can improve safety by up to 42%. In addition， simulation results in a virtual environment demonstrate the practical application potential of this framework in complex space missions such as lunar landing， which can effectively improve operational safety and efficiency.

Key words: deep reinforcement learning, autonomous landing, abstract SMDP state transition diagram, safety enhancement, real-time monitoring, reverse breadth-first search

CLC Number:

V448

Min YANG, Guanjun LIU, Ziyuan ZHOU. Control of lunar landers based on secure reinforcement learning[J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(3): 630553.

Figures/Tables 10

Fig.1

Fig.2

Fig.3

Fig.4

Fig.5

Table 1

Table 2

Table 3

Table 4

Table 5

References 37

1	SMIRNOV N N. Safety in space［J］. Acta Astronautica， 2023， 204： 679-681.
2	TIPALDI M， IERVOLINO R， MASSENIO P R. Reinforcement learning in spacecraft control applications： Advances， prospects， and challenges［J］. Annual Reviews in Control， 2022， 54： 1-23.
3	LORENZ R D. Planetary landings with terrain sensing and hazard avoidance： A review［J］. Advances in Space Research， 2023， 71（1）： 1-15.
4	XIA Y Q， CHEN R F， PU F， et al. Active disturbance rejection control for drag tracking in Mars entry guidance［J］. Advances in Space Research， 2014， 53（5）： 853-861.
5	DAI J， XIA Y Q. Mars atmospheric entry guidance for reference trajectory tracking［J］. Aerospace Science and Technology， 2015， 45： 335-345.
6	LONG J T， ZHU S Y， CUI P Y， et al. Barrier Lyapunov function based sliding mode control for Mars atmospheric entry trajectory tracking with input saturation constraint［J］. Aerospace Science and Technology， 2020， 106： 106213.
7	SHEN G H， XIA Y Q， ZHANG J H， et al. Adaptive fixed-time trajectory tracking control for Mars entry vehicle［J］. Nonlinear Dynamics， 2020， 102（4）： 2687-2698.
8	DANG Q Q， GUI H C， LIU K， et al. Relaxed-constraint pinpoint lunar landing using geometric mechanics and model predictive control［J］. Journal of Guidance， Control， and Dynamics， 2020， 43（9）： 1617-1630.
9	邓云山，夏元清，孙中奇，等. 扰动环境下火星精确着陆自主轨迹规划方法［J］. 航空学报， 2021， 42（11）： 524834.
	DENG Y S， XIA Y Q， SUN Z Q， et al. Autonomous trajectory planning method for Mars precise landing in disturbed environment［J］. Acta Aeronautica et Astronautica Sinica， 2021， 42（11）： 524834 （in Chinese）.
10	KHALID A， JAFFERY M H， JAVED M Y， et al. Performance analysis of Mars-powered descent-based landing in a constrained optimization control framework［J］. Energies， 2021， 14（24）： 8493.
11	YUAN X， ZHU S Y， YU Z S， et al. Hazard avoidance guidance for planetary landing using a dynamic safety margin index［C］∥2018 IEEE Aerospace Conference. Piscataway： IEEE Press， 2018： 1-11.
12	D’AMBROSIO A， CARBONE A， SPILLER D， et al. PSO-based soft lunar landing with hazard avoidance： Analysis and experimentation［J］. Aerospace， 2021， 8（7）： 195.
13	SHAKYA A K， PILLAI G， CHAKRABARTY S. Reinforcement learning algorithms： A brief survey［J］. Expert Systems with Applications， 2023， 231： 120495.
14	ZHOU Z Y， LIU G J， TANG Y. Multi-agent reinforcement learning： methods， applications， visionary prospects， and challenges［DB/OL］. arXiv preprint： 2305.10091， 2023.
15	高锡珍，汤亮，黄煌. 深度强化学习技术在地外探测自主操控中的应用与挑战［J］. 航空学报， 2023， 44（6）： 026762.
	GAO X Z， TANG L， HUANG H. Deep reinforcement learning in autonomous manipulation for celestial bodies exploration： Applications and challenges［J］. Acta Aeronautica et Astronautica Sinica， 2023， 44（6）： 026762 （in Chinese）.
16	MOHOLKAR U R， PATIL D D. Comprehensive survey on agent based deep learning techniques for space landing missions［J］. International Journal of Intelligent Systems and Applications in Engineering， 2024， 12（16S）： 188-200.
17	CHENG L， WANG Z B， JIANG F H. Real-time control for fuel-optimal Moon landing based on an interactive deep reinforcement learning algorithm［J］. Astrodynamics， 2019， 3（4）： 375-386.
18	HARRIS A， VALADE T， TEIL T， et al. Generation of spacecraft operations procedures using deep reinforcement learning［J］. Journal of Spacecraft and Rockets， 2022， 59（2）： 611-626.
19	MALI R， KANDE N， MANDWADE S， et al. Lunar lander using reinforcement learning algorithm［C］∥2023 7th International Conference on Computing， Communication， Control and Automation （ICCUBEA）. Piscataway： IEEE Press， 2023： 1-5.
20	DHARRAO D， GITE S， WALAMBE R. Guided cost learning for lunar lander environment using human demonstrated expert trajectories［C］∥2023 International Conference on Advances in Intelligent Computing and Applications （AICAPS）. Piscataway： IEEE Press， 2023： 1-6.
21	SHEN D L. Comparison of three deep reinforcement learning algorithms for solving the lunar lander problem［M］∥Advances in Intelligent Systems Research. Dordrecht： Atlantis Press International BV， 2024： 187-199.
22	GU S D， YANG L， DU Y L， et al. A review of safe reinforcement learning： Methods， theory and applications［DB/OL］. arXiv preprint： 2205.10330， 2022.
23	CHEN W Q， SUBRAMANIAN D， PATERNAIN S. Probabilistic constraint for safety-critical reinforcement learning［J］. IEEE Transactions on Automatic Control， 2024， 69（10）： 6789-6804.
24	SELIM M， ALANWAR A， EL-KHARASHI M W， et al. Safe reinforcement learning using data-driven predictive control［C］∥2022 5th International Conference on Communications， Signal Processing， and their Applications （ICCSPA）. Piscataway： IEEE Press， 2022： 1-6.
25	BRUNKE L， GREEFF M， HALL A W， et al. Safe learning in robotics： From learning-based control to safe reinforcement learning［J］. Annual Review of Control， Robotics， and Autonomous Systems， 2022， 5： 411-444.
26	JIN P， TIAN J X， ZHI D P， et al. Trainify： A CEGAR-driven training and verification framework for safe deep reinforcement learning［C］∥International Conference on Computer Aided Verification. Cham： Springer， 2022： 193-218.
27	ZHI D P， WANG P X， CHEN C， et al. Robustness verification of deep reinforcement learning based control systems using reward martingales［J］. Proceedings of the AAAI Conference on Artificial Intelligence， 2024， 38（18）： 19992-20000.
28	TAPPLER M， CÓRDOBA F C， AICHERNIG B K， et al. Search-based testing of reinforcement learning［DB/OL］. arXiv preprint： 2205.04887， 2022.
29	TAPPLER M， PFERSCHER A， AICHERNIG B K， et al. Learning and repair of deep reinforcement learning policies from fuzz-testing data［C］∥Proceedings of the IEEE/ACM 46th International Conference on Software Engineering. New York： ACM， 2024： 1-13.
30	WANG H N， LIU N， ZHANG Y Y， et al. Deep reinforcement learning： A survey［J］. Frontiers of Information Technology & Electronic Engineering， 2020， 21（12）： 1726-1744.
31	MNIH V， KAVUKCUOGLU K， SILVER D， et al. Human-level control through deep reinforcement learning［J］. Nature， 2015， 518（7540）： 529-533.
32	WANG Z Y， SCHAUL T， HESSEL M， et al. Dueling network architectures for deep reinforcement learning［C］∥Proceedings of the 33rd International Conference on International Conference on Machine Learni. New York： ACM， 2016， 48： 1995-2003.
33	VAN HASSELT H， GUEZ A， SILVER D. Deep reinforcement learning with double Q-learning［J］. Proceedings of the AAAI Conference on Artificial Intelligence， 2016， 30（1）： 2094-2100.
34	BROCKMAN G， CHEUNG V， PETTERSSON L， et al. OpenAI gym［DB/OL］. arXiv preprint： 1606.01540， 2016.
35	GUO S Q， YAN Q， SU X， et al. State-temporal compression in reinforcement learning with the reward-restricted geodesic metric［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2022， 44（9）： 5572-5589.
36	JIN P， WANG Y， ZHANG M. Efficient LTL model checking of deep reinforcement learning systems using policy extraction［C］∥The 34th International Conference on Software Engineering and Knowledge Engineering. San Francisco： KSI Research Inc.， 2022： 357-362.
37	KORKMAZ E. Adversarial robust deep reinforcement learning requires redefining robustness［J］. Proceedings of the AAAI Conference on Artificial Intelligence， 2023， 37（7）： 8369-8377.

状态簇	动作	概率
0	1	3/1 395
24	4	2/3 650
	1	4/3 650
	2	8/3 650
25	4	2/130
25	3	1/130
20	2	2/332
28	1	1/40
28	4	1/40

预训练模型	奖励		任务成功率/%
预训练模型	未监控	监控器纠正	未监控	监控器纠正	成功率提升
DQN	197.07	235.87	68	80	12
Dueling DQN	223.05	248.53	67	89	22
DDQN	147.08	216.85	66	81	15

预训练模型	平均奖励对比/分
预训练模型	未扰动	扰动后	监控器纠正
DQN	197.07	110.20	133.56
Dueling DQN	223.05	139.18	164.93
DDQN	147.08	134.52	176.51

[1]	Kaifang WAN, Zhilin WU, Yunhui WU, Haozhi QIANG, Yibo WU, Bo LI. Cooperative location of multiple UAVs with deep reinforcement learning in GPS-denied environment [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(8): 331024-331024.
[2]	Lingfeng JIANG, Xinkai LI, Hai ZHANG, Hanwei LI, Hongli ZHANG. Mapless navigation of UAVs in dynamic environments based on an improved TD3 algorithm [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(8): 331035-331035.
[3]	Honglin ZHANG, Jianjun LUO, Weihua MA. Spacecraft game decision making for threat avoidance of space targets based on machine learning [J]. Acta Aeronautica et Astronautica Sinica, 2024, 45(8): 329136-329136.
[4]	Yunpeng CAI, Dapeng ZHOU, Jiangchuan DING. Intelligent collaborative control of UAV swarms with collision avoidance safety constraints [J]. Acta Aeronautica et Astronautica Sinica, 2024, 45(5): 529683-529683.
[5]	Shengzhe SHAN, Weiwei ZHANG. Air combat intelligent decision-making method based on self-play and deep reinforcement learning [J]. Acta Aeronautica et Astronautica Sinica, 2024, 45(4): 328723-328723.
[6]	Bing GAO, Zhejie ZHANG, Qijie ZOU, Zhiguo LIU, Xiling ZHAO. Multi-agent communication cooperation based on deep reinforcement learning and information theory [J]. Acta Aeronautica et Astronautica Sinica, 2024, 45(18): 329862-329862.
[7]	Zuolong LI, Jihong ZHU, Minchi KUANG, Jie ZHANG, Jie REN. Hierarchical decision algorithm for air combat with hybrid action based on deep reinforcement learning [J]. Acta Aeronautica et Astronautica Sinica, 2024, 45(17): 530053-530053.
[8]	Tiancai WU, Honglun WANG, Bin REN, Yiheng LIU, Xingyu WU, Guocheng YAN. Learning-based integrated fault-tolerant guidance and control for hypersonic vehicles considering avoidance and penetration [J]. Acta Aeronautica et Astronautica Sinica, 2024, 45(15): 329607-329607.
[9]	Xuejian WANG, Yongming WEN, Xiaorong SHI, Ningning ZHANG, Jiexi LIU. Design of hybrid intelligent decision framework for multi⁃agent and multi⁃coupling tasks [J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(S2): 729770-729770.
[10]	Chao YANG, Kaifu ZHANG. Stress prediction of fuselage tube section based on PSO⁃BiLSTM neural network [J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2023, 44(7): 426991-426991.
[11]	Xizhen GAO, Liang TANG, Huang HUANG. Deep reinforcement learning in autonomous manipulation for celestial bodies exploration: Applications and challenges [J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2023, 44(6): 26762-026762.
[12]	Pan ZHOU, Jiangtao HUANG, Sheng ZHANG, Gang LIU, Bowen SHU, Jigang TANG. Intelligent air combat decision making and simulation based on deep reinforcement learning [J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2023, 44(4): 126731-126731.
[13]	Xiangwei ZHU, Dan SHEN, Kai XIAO, Yuexin MA, Xiang LIAO, Fuqiang GU, Fangwen YU, Kefu GAO, Jingnan LIU. Mechanisms, algorithms, implementation and perspectives of brain⁃inspired navigation [J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(19): 28569-028569.
[14]	Lei DONG, Hongbing CHEN, Xi CHEN, Changxiao ZHAO. Distributed multi-agent coalition task allocation strategy for single pilot operation mode based on DQN [J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(13): 327895-327895.
[15]	Wenxue CHEN, Changsheng GAO, Wuxing JING. Trust region policy optimization guidance algorithm for intercepting maneuvering target [J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2023, 44(11): 327596-327596.

Control of lunar landers based on secure reinforcement learning

RichHTML

PDF (PC)

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 10

References 37

Related Articles 15

Recommended Articles

Metrics

Comments