导航

ACTA AERONAUTICAET ASTRONAUTICA SINICA ›› 2021, Vol. 42 ›› Issue (1): 524151-524151.doi: 10.7527/S1000-6893.2020.24151

• Dissertation • Previous Articles     Next Articles

Trajectory planning of space manipulator based on multi-agent reinforcement learning

ZHAO Yu, GUAN Gongshun, GUO Jifeng, YU Xiaoqiang, YAN Peng   

  1. School of Astronautics, Harbin Institute of Technology, Harbin 150001, China
  • Received:2020-04-28 Revised:2020-05-21 Published:2020-07-10
  • Supported by:
    National Natural Science Foundation of China (61973101); Aeronautical Science Foundation of China (20180577005)

Abstract: An online self-learning trajectory planning method based on the deep reinforcement learning is studied for a six Degree-of-Freedom (DOF) space floating manipulator to capture moving objects. The DH(Denavit-Hartenberg) model of the manipulator is presented, and the kinematic and dynamic models of multi-rigid bodies established considering the mechanical coupling characteristics of the combination. An improved deep determination policy gradient algorithm is further proposed, and a multi-agent self-learning system established with each joint as a decision-making agent. Additionally, a training model of the space manipulator is built based on "offline centralized learning and online distributed execution", constructing a reward function with the variables of the target relative distance and the total operation time. Simulation results show that the robot can capture the moving target rapidly with average time of 5.4 s. Compared with the traditional planning algorithm based on random sampling, the autonomous decision-making motion planning method proposed in this paper exhibits better solution speed and robustness.

Key words: manipulators, trajectory planning, multi-agent, policy gradient, on orbit capture

CLC Number: