[1] BAO W M. Present situation and development tendency of aerospace control techniques[J]. Acta Automatica Sinica, 2013, 39(6): 697-702 (in Chinese). 包为民. 航天飞行器控制技术研究现状与发展趋势[J]. 自动化学报, 2013, 39(6): 697-702. [2] LU P. Entry guidance: A unified method[J]. Journal of Guidance, Control, and Dynamics, 2014, 37(3): 713-728. [3] XUE S B, LU P. Constrained predictor-corrector entry guidance[J]. Journal of Guidance, Control, and Dynamics, 2010, 33(4): 1273-1281. [4] SUTTON R S, BARTO A G. Reinforcement learning: An Introduction[M]. Cambridge: The MIT Press, 2011: 119-138. [5] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529-533. [6] LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[EB/OL]. (2019-07-05)[2021-06-15]. https://arxiv.org/abs/1509.02971. [7] SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[EB/OL]. (2017-08-28)[2021-06-15]. https://arxiv.org/abs/1707.06347 [8] HAARNOJA T, ZHOU A, ABBEEL P, et al. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor[EB/OL]. (2018-08-08)[2021-06-15]. https://arxiv.org/abs/1801.01290 [9] CHENG L, JIANG F H, LI J F. A review on the applications of deep learning in aircraft dynamics and control[J]. Mechanics in Engineering, 2020, 42(3): 267-276 (in Chinese). 程林, 蒋方华, 李俊峰. 深度学习在飞行器动力学与控制中的应用研究综述[J]. 力学与实践, 2020, 42(3): 267-276. [10] YU Y, WANG H L. Deep learning-based reentry predictor-corrector fault-tolerant guidance for hypersonic vehicles[J]. Acta Armamentarii, 2020, 41(4): 656-669 (in Chinese). 余跃, 王宏伦. 基于深度学习的高超声速飞行器再入预测校正容错制导[J]. 兵工学报, 2020, 41(4): 656-669. [11] SHI Y, WANG Z B. A deep learning-based approach to real-time trajectory optimization for hypersonic vehicles[C]//AIAA Scitech 2020 Forum. Reston: AIAA, 2020. [12] CHENG L, JIANG F H, WANG Z B, et al. Multiconstrained real-time entry guidance using deep neural networks[J]. IEEE Transactions on Aerospace and Electronic Systems, 2021, 57(1): 325-340. [13] LI T R, YANG B, WANG R, et al. Reentry vehicle guidance method based on Q-learning algorithm[J]. Tactical Missile Technology, 2019(5): 44-49 (in Chinese). 李天任, 杨奔, 汪韧, 等. 基于Q-Learning算法的再入飞行器制导方法[J]. 战术导弹技术, 2019(5): 44-49. [14] ZHANG Q H, AO B Q, ZHANG Q X. Reinforcement learning guidance law of Q-learning[J]. Systems Engineering and Electronics, 2020, 42(2): 414-419 (in Chinese). 张秦浩, 敖百强, 张秦雪. Q-learning强化学习制导律[J]. 系统工程与电子技术, 2020, 42(2): 414-419. [15] GAUDET B, FURFARO R, LINARES R. Reinforcement learning for angle-only intercept guidance of maneuvering targets[J]. Aerospace Science and Technology, 2020, 99: 105746. [16] HOVELL K, ULRICH S. Deep reinforcement learning for spacecraft proximity operations guidance[J]. Journal of Spacecraft and Rockets, 2021, 58(2): 254-264. [17] HOVELL K, ULRICH S. On deep reinforcement learning for spacecraft guidance[C]//AIAA Scitech 2020 Forum. Reston: AIAA, 2020. [18] GAO J S, SHI X M, CHENG Z T, et al. Reentry trajectory optimization based on deep reinforcement learning[C]//2019 Chinese Control and Decision Conference (CCDC). Piscataway: IEEE Press, 2019: 2588-2592. [19] KOCH W, MANCUSO R, WEST R, et al. Reinforcement learning for UAV attitude control[J]. ACM Transactions on Cyber-Physical Systems, 2019, 3(2): 22. [20] CHAI R Q, TSOURDOS A, SAVVARIS A, et al. Six-DOF spacecraft optimal trajectory planning and real-time attitude control: A deep neural network-based approach[J]. IEEE Transactions on Neural Networks and Learning Systems, 2020, 31(11): 5005-5013. [21] FANG K, ZHANG Q Z, NI K, et al. Time-coordinated reentry guidance law for hypersonic vehicle[J]. Acta Aeronautica et Astronautica Sinica, 2018, 39(5): 321958 (in Chinese). 方科, 张庆振, 倪昆, 等. 高超声速飞行器时间协同再入制导[J]. 航空学报, 2018, 39(5): 321958. [22] ZHOU H Y, WANG X G, SHAN Y Z, et al. Synergistic path planning for multiple vehicles based on an improved particle swarm optimization method[J/OL](2020-04-07)[2021-06-15]http://www.oas.net.cn/cn/article/doi/co.16383/j.cas.c190865. 周宏宇, 王小刚, 单永志, 等. 基于改进粒子群算法的飞行器协同轨迹规划[J/OL]. 自动化学报, (2020-04-07)[2021-06-15]http://www.oas.net.cn/cn/article/doi/co.16383/j.cas.c190865. [23] SUTSKEVER I, MARTENS J, DAHL G, et al. On the importance of initialization and momentum in deep learning[C]//Proceedings of the 30th International Conference on International Conference on Machine Learning-Volume 28. New York: ACM, 2013: Ⅲ-1139. [24] GLOROT X, BENGIO Y. Understanding the difficulty of training deep feedforward neural networks[C]//Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, 2010: 249-256. [25] SUTTON R S, MCALLESTER D, SINGH S, et al. Policy gradient methods for reinforcement learning with function approximation[C]//Proceedings of the 12th International Conference on Neural Information Processing Systems. New York: ACM, 1999: 1057-1063. [26] SCHULMAN J, LEVINE S, MORITZ P, et al. Trust region policy optimization[EB/OL]. (2017-04-20)[2015-06-15]. https://arxiv.org/abs/1502.05477. |