导航

Acta Aeronautica et Astronautica Sinica ›› 2025, Vol. 46 ›› Issue (3): 630553.doi: 10.7527/S1000-6893.2024.30553

• Special Topic: Deep Space Optoelectronic Measurement and Intelligent Awareness Technology • Previous Articles    

Control of lunar landers based on secure reinforcement learning

Min YANG, Guanjun LIU(), Ziyuan ZHOU   

  1. Department of Computer Science and Technology,Tongji University,Shanghai 201804,China
  • Received:2024-04-19 Revised:2024-05-07 Accepted:2024-07-24 Online:2024-08-21 Published:2024-08-20
  • Contact: Guanjun LIU E-mail:liuguanjun@tongji.edu.cn
  • Supported by:
    National Natural Science Foundation of China(62172299);Space Optoelectronic Measurement and Perception Lab., Beijing Institute of Control Engineering(LabSOMP-2023-03);The Fundamental Research Funds for the Central Universities(2023-4-YB-05);Shanghai Technological Innovation Action Plan(22511105500)

Abstract:

In lunar landing missions, the lander must perform precise operations in extreme environments and often faces the challenge of communication delays. These factors severely limit the real-time operation capabilities of ground control. In response to these challenges, this study proposes a Deep Reinforcement Learning (DRL) framework for safety enhancement based on the Semi-Markov Decision Process (SMDP) to improve the operational safety of autonomous spacecraft landing. To compress the state space and maintain the key characteristics of the decision-making process, this framework compresses the Markov Decision Process (MDP) of the historical trajectory into a SMDP, and constructs an abstract SMDP state transition diagram based on the compressed trajectory. Then, the key state-action pairs of potential risks are identified, and the real-time monitoring and intervention strategy is implemented. The framework effectively improves the safety of the spacecraft’s autonomous landing. Furthermore, the reverse breadth first search method is used to search for the state-action pairs that have decisive impact on task results, and real-time adjustment of the model is realized through the built state-action monitor. Experimental results show that this framework increases the mission success rate of the lunar lander by up to 22% in a simulated environment on the pre-trained Deep Q-Network (DQN), Dueling DQN, and DDQN models without adding additional sensors or significantly changing the existing system configuration. According to the preset safety evaluation standards, the framework can improve safety by up to 42%. In addition, simulation results in a virtual environment demonstrate the practical application potential of this framework in complex space missions such as lunar landing, which can effectively improve operational safety and efficiency.

Key words: deep reinforcement learning, autonomous landing, abstract SMDP state transition diagram, safety enhancement, real-time monitoring, reverse breadth-first search

CLC Number: