ACTA AERONAUTICAET ASTRONAUTICA SINICA >
Deep reinforcement learning in autonomous manipulation for celestial bodies exploration: Applications and challenges
Received date: 2021-12-07
Revised date: 2022-01-06
Accepted date: 2022-03-24
Online published: 2022-03-30
Supported by
National Key Research and Development Program of China(2018AAA0102700)
According to the higher requirements with regard to control system autonomy for future celestial body exploration missions, the importance of intelligent control technology is introduced. Based on the characteristics of manipulation missions for celestial bodies exploration, the technical challenges of autonomous control are analyzed and summarized. Existing Deep Reinforcement Learning (DRL) based autonomous manipulation algorithms are summarized. According to different difficulties faced by the deep learning based manipulation missions for celestial bodies, achievements of applications of the manipulation skills based on DRL methods are discussed. A prospect of future research directions for intelligent manipulation technologies is given.
Xizhen GAO , Liang TANG , Huang HUANG . Deep reinforcement learning in autonomous manipulation for celestial bodies exploration: Applications and challenges[J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2023 , 44(6) : 26762 -026762 . DOI: 10.7527/S1000-6893.2022.26762
1 | GE D T, CUI P Y, ZHU S Y. Recent development of autonomous GNC technologies for small celestial body descent and landing[J]. Progress in Aerospace Sciences, 2019, 110: 100551. |
2 | FRANCIS R, ESTLIN T, DORAN G, et al. AEGIS autonomous targeting for ChemCam on Mars science laboratory: Deployment and results of initial science team use[J]. Science Robotics, 2017, 2(7): 4582. |
3 | TREBI-OLLENNU A, KIM W, ALI K, et al. Insight Mars lander robotics instrument deployment system[J]. Space Science Reviews, 2018, 214(5): 93. |
4 | 张洪华, 梁俊, 黄翔宇, 等. 嫦娥三号自主避障软着陆控制技术[J]. 中国科学: 技术科学, 2014, 44(6): 559-568. |
ZHANG H H, LIANG J, HUANG X Y, et al. Autonomous hazard avoidance control for Chang’e-3 soft landing[J]. Scientia Sinica (Technologica), 2014, 44(6): 559-568 (in Chinese). | |
5 | 任德鹏, 李青, 张正峰, 等. 嫦娥五号探测器地面试验验证技术[J]. 中国科学: 技术科学, 2021, 51(7): 778-787. |
REN D P, LI Q, ZHANG Z F, et al. Ground-test validation technologies for Chang’e-5 lunar probe[J]. Scientia Sinica (Technologica), 2021, 51(7): 778-787 (in Chinese). | |
6 | 于登云, 张哲, 泮斌峰, 等. 深空探测人工智能技术研究与展望[J]. 深空探测学报, 2020, 7(1): 11-23. |
YU D Y, ZHANG Z, PAN B F, et al. Development and trend of artificial intelligent in deep space exploration[J]. Journal of Deep Space Exploration, 2020, 7(1): 11-23 (in Chinese). | |
7 | ROTHROCK B, KENNEDY R, CUNNINGHAM C, et al. SPOC: Deep learning-based terrain classification for Mars rover missions: AIAA-2016-5539[R]. Reston: AIAA, 2016. |
8 | 吴伟仁, 周建亮, 王保丰, 等. 嫦娥三号“玉兔号”巡视器遥操作中的关键技术[J]. 中国科学: 信息科学, 2014, 44(4): 425-440. |
WU W R, ZHOU J L, WANG B F, et al. Key technologies in the teleoperation of Chang’e-3 “Jade Rabbit” rover[J]. Scientia Sinica: Informationis, 2014, 44(4): 425-440 (in Chinese). | |
9 | 胡浩, 裴照宇, 李春来, 等. 无人月球采样返回工程总体设计: 嫦娥五号任务[J]. 中国科学: 技术科学, 2021, 51(11): 1275-1286. |
HU H, PEI Z Y, LI C L, et al. Overall design of unmanned lunar sampling and return project: Chang'e-5 mission[J]. Scientia Sinica (Technologica), 2021, 51(11): 1275-1286 (in Chinese). | |
10 | TREBI-OLLENNU A, BAUMGARTNER E T, LEGER P C, et al. Robotic arm in situ operations for the Mars Exploration Rovers surface mission[C]∥2005 IEEE International Conference on Systems, Man and Cybernetics. Piscataway: IEEE Press, 2005: 1799-1806. |
11 | BAUMGARTNER E T, BONITZ R G, MELKO J P, et al. The Mars Exploration Rover instrument positioning system[C]∥2005 IEEE Aerospace Conference. Piscataway: IEEE Press, 2005: 1-19. |
12 | BONITZ R, SHIRAISHI L, ROBINSON M, et al. The phoenix Mars lander robotic arm[C]∥2009 IEEE Aerospace conference. Piscataway: IEEE Press, 2009: 1-12. |
13 | BILLING P, FLEISCHNER C. Mars science laboratory robotic arm[J]. 14th European Space Mechanisms & Tribology Symposium-ESMATS 2011. 2011: 363-370. |
14 | MOELLER R C, JANDURA L, ROSETTE K, et al. The Sampling and Caching Subsystem (SCS) for the scientific exploration of jezero crater by the Mars 2020 perseverance rover[J]. Space Science Reviews, 2021, 217(1): 1-43. |
15 | 马如奇, 姜水清, 刘宾, 等. 月球采样机械臂系统设计及试验验证[J]. 宇航学报, 2018, 39(12): 1315-1322. |
MA R Q, JIANG S Q, LIU B, et al. Design and verification of a lunar sampling manipulator system[J]. Journal of Astronautics, 2018, 39(12): 1315-1322 (in Chinese). | |
16 | ROBINSON M, COLLINS C, LEGER P, et al. In-situ operations and planning for the Mars science laboratory robotic arm: The first 200 sols[C]∥2013 8th International Conference on System of Systems Engineering. Piscataway: IEEE Press, 2013: 153-158. |
17 | CALLAS J L. Mars exploration rover spirit end of mission report[R]. Pasadena: Jet Propulsion Laboratory, National Aeronautics and Space Administration, 2015. |
18 | DLR. The InSight mission logbook (February 2019-July 2020) [EB/OL]. (2020-07-07) [2021-09-02]. . |
19 | ABBEY W, ANDERSON R, BEEGLE, et al. A look back, part II: The drilling campaign of the Curiosity rover during the Mars Science Laboratory’s second and third Martian years[J]. Icarus, 2020, 350: 113885. |
20 | NASA. Assessing perseverance’s first sample attempt[EB/OL]. (2021-08-11) [2021-09-02]. . |
21 | 孙长银, 穆朝絮. 多智能体深度强化学习的若干关键科学问题[J]. 自动化学报, 2020, 46(7): 1301-1312. |
SUN C Y, MU C X. Important scientific problems of multi-agent deep reinforcement learning[J]. Acta Automatica Sinica, 2020, 46(7): 1301-1312 (in Chinese). | |
22 | MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529-533. |
23 | LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[J]. 4th International Conference on Learning Representations, ICLR 2016-Conference Track Proceedings. OpenReview. Net, 2016: 1-14 |
24 | FUJIMOTO S, van HOOF H, MEGER D. Addressing function approximation error in actor-critic methods[DB/OL]. arXiv preprint: 1802.09477, 2018. |
25 | POPOV I, HEESS N, LILLICRAP T, et al. Data-efficient deep reinforcement learning for dexterous manipulation[DB/OL]. arXiv preprint: 1704.03073, 2017. |
26 | FUJIMOTO S, MEGER D, PRECUP D. Off-policy deep reinforcement learning without exploration [DB/OL]. arXiv preprint: 1812.02900, 2018. |
27 | HAARNOJA T, ZHOU A, ABBEEL P, et al. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor [DB/OL]. arXiv preprint: 1801.01290, 2018. |
28 | SCHULMAN J. Optimizing expectations: From deep reinforcement learning to stochastic computation graphs[D]. Berkeley: UC Berkeley, 2016 |
29 | SCHULMAN J, MORITZ P, LEVINE S, et al. High-dimensional continuous control using generalized advantage estimation [DB/OL]. arXiv preprint: 1506.02438, 2015. |
30 | DUAN Y, CHEN X, HOUTHOOFT R, et al. Benchmarking deep reinforcement learning for continuous control[C]∥Proceedings of the 33rd International Conference on International Conference on Machine Learning-Volume 48. New York: ACM, 2016: 1329-1338. |
31 | ANTONOVA R, CRUCIANI S, SMITH C, et al. Reinforcement learning for pivoting task [DB/OL]. arXiv preprint: 1703.00472, 2017. |
32 | SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms [DB/OL]. arXiv preprint: 1707.06347, 2017. |
33 | HEESS N, TB D, SRIRAM S, et al. Emergence of locomotion behaviours in rich environments [DB/OL]. arXiv preprint: 1707.02286, 2017. |
34 | PENG X B, ABBEEL P, LEVINE S, et al. DeepMimic: Example-guided deep reinforcement learning of physics-based character skills [DB/OL]. arXiv preprint: 1804.02717, 2018. |
35 | TANG Y H, AGRAWAL S. Discretizing continuous action space for on-policy optimization[C]∥ Proceedings of the AAAI Conference on Artificial Intelligence. Washington, D.C.: AAAI, 2020: 5981-5988. |
36 | YIN M Z, YUE Y G, ZHOU M Y. ARSM: Augment-REINFORCE-swap-merge estimator for gradient backpropagation through categorical variables [DB/OL]. arXiv preprint: 1905.01413, 2019. |
37 | YUE Y G, TANG Y H, YIN M Z, et al. Discrete action on-policy learning with action-value critic [DB/OL]. arXiv preprint: 2002.03534, 2020. |
38 | MNIH V, BADIA A P, MIRZA M, et al. Asynchronous methods for deep reinforcement learning[C]∥ Proceedings of the 33rd International Conference on International Conference on Machine Learning-Volume 48. New York: ACM, 2016: 1928-1937. |
39 | DEISENROTH M P, RASMUSSEN C E. PILCO: A model-based and data-efficient approach to policy search[C]∥ Proceedings of the 28th International Conference on International Conference on Machine Learning. New York: ACM, 2011: 465-472. |
40 | OH J, GUO X X, LEE H, et al. Action-conditional video prediction using deep networks in Atari games[C]∥ Proceedings of the 28th International Conference on Neural Information Processing Systems-Volume 2. New York: ACM, 2015: 2863-2871. |
41 | WATTER M, SPRINGENBERG J T, BOEDECKER J, et al. Embed to control: A locally linear latent dynamics model for control from raw images [DB/OL]. arXiv preprint: 1506.07365, 2015. |
42 | HA D, SCHMIDHUBER J. World models [DB/OL]. arXiv preprint: 1803.10122, 2018. |
43 | LEVINE S, FINN C, DARRELL T, et al. End-to-end training of deep visuomotor policies [DB/OL]. arXiv preprint: 1504.00702, 2015. |
44 | 刘乃军, 鲁涛, 蔡莹皓, 等. 机器人操作技能学习方法综述[J]. 自动化学报, 2019, 45(3): 458-470. |
LIU N J, LU T, CAI Y H, et al. A review of robot manipulation skills learning methods[J]. Acta Automatica Sinica, 2019, 45(3): 458-470 (in Chinese). | |
45 | 赵冬斌, 邵坤, 朱圆恒, 等. 深度强化学习综述: 兼论计算机围棋的发展[J]. 控制理论与应用, 2016, 33(6): 701-717. |
ZHAO D B, SHAO K, ZHU Y H, et al. Review of deep reinforcement learning and discussions on the development of computer Go[J]. Control Theory & Applications, 2016, 33(6): 701-717 (in Chinese). | |
46 | GAUDET B, FURFARO R. Adaptive pinpoint and fuel efficient Mars landing using reinforcement learning[J]. IEEE/CAA Journal of Automatica Sinica, 2014, 1(4): 397-411. |
47 | CHENG L, WANG Z B, JIANG F H. Real-time control for fuel-optimal Moon landing based on an interactive deep reinforcement learning algorithm[J]. Astrodynamics, 2019, 3(4): 375-386. |
48 | GAUDET B, LINARES R. Integrated guidance and control for pinpoint Mars landing using reinforcement learning: AAS 18-290[R]. Washington, D.C.: AAS, 2018. |
49 | JIANG X Q. Integrated guidance for Mars entry and powered descent using reinforcement learning and pseudospectral method[J]. Acta Astronautica, 2019, 163: 114-129. |
50 | GAUDET B, LINARES R, FURFARO R. Deep reinforcement learning for six degree-of-freedom planetary powered descent and landing [DB/OL]. arXiv preprint: 1810.08719, 2018. |
51 | GAUDET B, LINARES R, FURFARO R. Deep reinforcement learning for six degree-of-freedom planetary landing[J]. Advances in Space Research, 2020, 65(7): 1723-1741. |
52 | SHIROBOKOV M. Survey of machine learning techniques in spacecraft control design[J]. Acta Astronautica, 2021, 186: 87-97. |
53 | 黄旭星, 李爽, 杨彬, 等. 人工智能在航天器制导与控制中的应用综述[J]. 航空学报, 2021, 42(4): 524201. |
HUANG X X, LI S, YANG B, et al. Spacecraft guidance and control based on artificial intelligence: Review[J]. Acta Aeronautica et Astronautica Sinica, 2021, 42(4): 524201 (in Chinese). | |
54 | FURFARO R, SCORSOGLIO A, LINARES R, et al. Adaptive generalized ZEM-ZEV feedback guidance for planetary landing via a deep reinforcement learning approach [DB/OL]. arXiv preprint: 2003.02182, 2020. |
55 | FURFARO R. Adaptive generalized ZEM-ZEV feedback guidance for planetary landing via a deep reinforcement learning approach[J]. Acta Astronautica, 2020, 171: 156-171. |
56 | FURFARO R, WIBBEN D R, GAUDET B, et al. Terminal multiple surface sliding guidance for planetary landing: Development, tuning and optimization via reinforcement learning[J]. The Journal of the Astronautical Sciences, 2015, 62(1): 73-99. |
57 | IIYAMA K, TOMITA K, JAGATIA B A, et al. Deep reinforcement learning for safe landing site selection with concurrent consideration of divert maneuvers [DB/OL]. arXiv preprint: 2102.12432, 2021. |
58 | GAUDET B. Terminal adaptive guidance via reinforcement meta-learning: Applications to autonomous asteroid close-proximity operations[J]. Acta Astronautica, 2020, 171: 1-13. |
59 | GAUDET B. Adaptive guidance and integrated navigation with reinforcement meta-learning[J]. Acta Astronautica, 2020, 169: 180-190. |
60 | BATTIN R H. An introduction to the mathematics and methods of astrodynamics[M]. Reston: AIAA, 1999: 10-12. |
61 | D'SOUZA C, D'SOUZA C. An optimal guidance law for planetary landing: AIAA-1997-3709[R]. Reston: AIAA, 1997. |
62 | 周思雨, 白成超. 基于深度强化学习的行星车路径规划方法研究[J]. 无人系统技术, 2019, 2(4): 38-45. |
ZHOU S Y, BAI C C. Research on planetary rover path planning method based on deep reinforcement learning[J]. Unmanned Systems Technology, 2019, 2(4): 38-45 (in Chinese). | |
63 | SERNA J G, VANEGAS F, GONZALEZ F, et al. A review of current approaches for UAV autonomous mission planning for Mars biosignatures detection[C]∥2020 IEEE Aerospace Conference. Piscataway: IEEE Press, 2020: 1-15. |
64 | MCEWEN A S, ELIASON E M, BERGSTROM J W, et al. Mars reconnaissance orbiter’s High Resolution Imaging Science Experiment (HiRISE)[J]. Journal of Geophysical Research, 2007, 112(E5): E05S02. |
65 | TAVALLALI P, KARUMANCHI S, BOWKETT J, et al. A reinforcement learning framework for space missions in unknown environments[C]∥2020 IEEE Aerospace Conference. Piscataway: IEEE Press, 2020: 1-8. |
66 | PFLUEGER M, AGHA A, SUKHATME G S. Rover-IRL: Inverse reinforcement learning with soft value iteration networks for planetary rover path planning[J]. IEEE Robotics and Automation Letters, 2019, 4(2): 1387-1394. |
67 | HUANG Y X, WU S F, MU Z C, et al. A multi-agent reinforcement learning method for swarm robots in space collaborative exploration[C]∥2020 6th International Conference on Control, Automation and Robotics (ICCAR). Piscataway: IEEE Press, 2020 : 139-144. |
68 | WACHI A, SUI Y N. Safe reinforcement learning in constrained Markov decision processes[C]∥ Proceedings of the 37th International Conference on Machine Learning. New York: ACM, 2020: 9797-9806. |
69 | TURCHETTA M, BERKENKAMP F, KRAUSE A. Safe exploration in finite Markov decision processes with Gaussian processes[C]∥ Proceedings of the 30th International Conference on Neural Information Processing Systems. New York: ACM, 2016: 4312-4320. |
70 | WACHI A, SUI Y N, YUE Y S, et al. Safe exploration and optimization of constrained MDPs using Gaussian processes[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2018, 32(1): 52-58. |
71 | BERNSTEIN D S, ZILBERSTEIN S. Reinforcement learning for weakly-coupled MDPs and an application to planetary rover control[C]∥6th European Conference on Planning. Washington, D. C.: AAAI, 2014: 240-243. |
72 | SONG X G, GAO H B, DING L, et al. Diagonal recurrent neural networks for parameters identification of terrain based on wheel-soil interaction analysis[J]. Neural Computing and Applications, 2017, 28(4): 797-804. |
73 | ARREGUIN A L, MONTENEGRO S, DILGER E. Towards in-situ characterization of regolith strength by inverse terramechanics and machine learning: A survey and applications to planetary rovers[J]. Planetary and Space Science, 2021, 204: 105271. |
74 | 黄煌, 高锡珍, 汤亮, 等. 一种端到端的地外探测样品智能抓取方法: CN113524173A[P]. 2021-10-22. |
HUANG H, GAO X Z, TANG L, et al. End-to-end extraterrestrial detection sample intelligent grabbing method: CN113524173A[P]. 2021-10-22 (in Chinese). | |
75 | 张浩东, 吴建华. 基于深度强化学习的机器人推拨优化装箱问题研究[J]. 空间控制技术与应用, 2021, 47(6): 52-58. |
ZHANG H D, WU J H. Optimization of robotic Bin packing via pushing based on algorithm[J]. Aerospace Control and Application, 2021, 47(6): 52-58 (in Chinese). |
/
〈 |
|
〉 |