An online self-learning trajectory planning method based on the deep reinforcement learning is studied for a six Degree-of-Freedom (DOF) space floating manipulator to capture moving objects. The DH(Denavit-Hartenberg) model of the manipulator is presented, and the kinematic and dynamic models of multi-rigid bodies established considering the mechanical coupling characteristics of the combination. An improved deep determination policy gradient algorithm is further proposed, and a multi-agent self-learning system established with each joint as a decision-making agent. Additionally, a training model of the space manipulator is built based on "offline centralized learning and online distributed execution", constructing a reward function with the variables of the target relative distance and the total operation time. Simulation results show that the robot can capture the moving target rapidly with average time of 5.4 s. Compared with the traditional planning algorithm based on random sampling, the autonomous decision-making motion planning method proposed in this paper exhibits better solution speed and robustness.
ZHAO Yu
,
GUAN Gongshun
,
GUO Jifeng
,
YU Xiaoqiang
,
YAN Peng
. Trajectory planning of space manipulator based on multi-agent reinforcement learning[J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2021
, 42(1)
: 524151
-524151
.
DOI: 10.7527/S1000-6893.2020.24151
[1] 王平, 郭继峰, 史晓宁, 等. 服务航天器近距离操作四维运动规划方法研究[J]. 控制与决策, 2010, 25(10):1519-1522. WANG P, GUO J F, SHI X N, et al. Research on four-dimensional motion planning of on-orbit servicing autonomous spacecraft[J]. Control and Decision, 2010, 25(10):1519-1522(in Chinese).
[2] 介党阳, 陆浩然, 吴晗玲, 等. 空间大型机械臂系统载运轨迹优化方法[J]. 航空学报, 2018, 39(S1):111-119. JIE D Y, LU H R, WU H L, et al. Transporting trajectory optimization method for large space manipulator system[J]. Acta Aeronautica et Astronautica Sinica, 2018, 39(S1):111-119(in Chinese).
[3] 张彤彤. 空间机械臂抓取非合作目标的轨迹规划与智能控制研究[D]. 哈尔滨:哈尔滨工业大学, 2019:15-20. ZHANG T T. Research on trajectory planning and intelligent control of space manipulator capture for noncooperative targets[D]. Harbin:Harbin Institute of Technology, 2019:15-20(in Chinese).
[4] JOSEPH N P. A path forward to better space security:Finding new solutions to space debris, space situational awareness and space traffic management[J]. Journal of Space Safety Engineering, 2019, 6(2):92-100.
[5] XU R N, LUO J J, WANG M M. Kinematic and dynamic manipulability analysis for free-floating space robots with closed chain constraints[J]. Robotics and Autonomous Systems, 2020, 130(4):1-9.
[6] YOSHIDA K, HASHIZUME K, ABIKO S. Zero reaction maneuver:Flight validation with ETS-VⅡ space robot and extension to kinematically redundant arm[C]//IEEE International Conference on Robotics & Automation. Piscataway:IEEE Press, 2001:441-446.
[7] 徐文福. 空间机器人目标捕获的路径规划与实验研究[D]. 哈尔滨:哈尔滨工业大学, 2007:126-133. XU W F. Path planning and experiment study of space robot for target capturing[D]. Harbin:Harbin Institute of Technology, 2007:126-133(in Chinese).
[8] 崔浩, 戈新生. 自由漂浮空间机器人运动规划的多项式插值法[J]. 北京信息科技大学学报, 2019, 34(4):17-23,29. CUI H, GE X S. Polynomial interpolation method for path optimization of free-floating space robots[J]. Journal of Beijing Information Science & Technology University, 2019, 34(4):17-23,29(in Chinese).
[9] 刘宏, 刘宇, 姜力. 空间机器人及其遥操作[M]. 哈尔滨:哈尔滨工业大学出版社, 2012:174-192. LIU H, LIU Y, JIANG L. Space robot and its teleoperation[M]. Harbin:Harbin Institute of Technology Press, 2012:174-192(in Chinese).
[10] 王明, 黄攀峰, 常海涛, 等. 基于机械臂耦合力矩评估的组合航天器姿态协调控制[J]. 机器人, 2015, 37(1):25-34. WANG M, HUANG P F, CHANG H T, et al. Coordinated attitude control of combined spacecraft based on estimated coupling torque of manipulator[J]. Robot, 2015, 37(1):25-34(in Chinese).
[11] CONKUR E S. Path planning using potential fields for highly redundant manipulators[J]. Robotics and Autonomous Systems, 2005, 52(2-3):209-228.
[12] LAVALLE S M. Rapidly-exploring random trees:A new tool for path planning:TR98-11[R]. Ames:Iowa State University, 1998.
[13] LARS L, JONGHWA K, MICHAEL K, et al. Automatic path planning of industrial robots comparing sampling-based and computational intelligence methods[J]. Procedia Manufacturing, 2017, 11(6):241-248.
[14] LYDIA E K, PETR S, JEAN L, et al. Probabilistic roadmaps for path planning in high-dimensional configuration spaces[J]. IEEE Transactions on Robotics and Automation, 1996, 4(12):566-580.
[15] CAO X M, ZOU X J, JIA C Y, et al. RRT-based path planning for an intelligent litchi-picking manipulator[J]. Computers and Electronics in Agriculture, 2019, 156(1):105-118.
[16] 关英姿, 宋春林, 董惠娟. 空间自由漂浮机器人对运动目标抓捕的路径规划[J]. 机器人, 2017, 39(6):803-811. GUAN Y Z, SONG C L, DONG H J. Path planning of the free-floating manipulator for capturing a moving target[J]. Robot, 2017, 39(6):803-811(in Chinese).
[17] GUPTA J K, EGOROV M, KOCHENDERFER M. Cooperative multi-agent control using deep reinforcement learning[C]//International Conference on Autonomous Agents and Multi-Agent Systems. Heidelberg:Springer, 2017:66-83.
[18] MICHAEL S, RONAY A, THOMAS H. A survey of the advancing use and development of machine learning in smart manufacturing[J]. Journal of Manufacturing Systems, 2018, 48(7):170-179.
[19] 赵辉. 基于Q学习算法的机械臂轨迹规划研究[D]. 大庆:东北石油大学, 2013:23-29. ZHAO H. Research on the manipulator trajectory planning based on Q-learning[D]. Daqing:Northeast Petroleum University, 2013:23-29(in Chinese).
[20] 徐帷, 卢山. 基于Sarsa(λ)强化学习的空间机械臂路径规划研究[J]. 宇航学报, 2019, 40(4):435-443. XU W, LU S. Analysis of space manipulator route planning based on Sarsa (λ) reinforcement learning[J]. Journal of Astronautics, 2019, 40(4):435-443(in Chinese).
[21] 刘钱源. 基于深度强化学习的双臂机器人物体抓取[D]. 威海:山东大学, 2019:24-36. LIU Q Y. Deep reinforcement learning based object grasping of dual-arm robot[D]. Weihai:Shandong University, 2019:24-36(in Chinese).
[22] 陈建华. 基于深度强化学习的机械臂运动规划研究[D]. 秦皇岛:燕山大学, 2019:42-57. CHEN J H. Research on motion planning of robot arm based on deep reinforcement learning[D]. Qinhuangdao:Yanshan University, 2019:42-57(in Chinese).
[23] LOWR R, WU Y, TAMAR A, et al. Multi-agent actor-critic for mixed cooperative-competitive environments[C]//Annual Conference on Neural Information Processing Systems, 2017:6379-6390.
[24] 滕军. 智能工业机器人的环境感知与运动规划[D]. 哈尔滨:哈尔滨工业大学, 2019:68-78. TENG J. Environment perception and motion planning for intelligent industrial robot[D]. Harbin:Harbin Institute of Technology, 2019:68-78(in Chinese).
[25] 曾祥鑫. 自由漂浮空间机器人路径规划及控制方法研究[D]. 哈尔滨:哈尔滨工业大学, 2018:33-55. ZENG X X. Research on path planning and control method for free-floating space robot[D]. Harbin:Harbin Institute of Technology, 2018:33-55(in Chinese).
[26] 刘延栋. 基于DDPG强化学习的移动机器人路径规划[D]. 呼和浩特:内蒙古工业大学, 2019:11-31. LIU Y D. Mobile robot path planning based on DDPG reinforcement learning network[D]. Hohhot:Inner Mongolia University, 2019:11-31(in Chinese).