深度确定性策略梯度算法用于无人飞行器控制

doi:10.7527/S1000-6893.2020.24688

论文

本期目录 | 过刊浏览 | 高级检索

前一篇 | 后一篇

深度确定性策略梯度算法用于无人飞行器控制

黄旭^1,2, 柳嘉润^1,2, 贾晨辉^1,2, 王昭磊^1,2, 张隽^1,2

1. 北京航天自动控制研究所, 北京 100854;
2. 宇航智能控制技术国家级重点实验室, 北京 100854

收稿日期:2020-08-31 修回日期:2020-09-04 发布日期:2020-09-17
通讯作者: 柳嘉润 E-mail:jiarunliu@163.com
基金资助:
国家自然科学基金（61773341）

Deep deterministic policy gradient algorithm for UAV control

HUANG Xu^1,2, LIU Jiarun^1,2, JIA Chenhui^1,2, WANG Zhaolei^1,2, ZHANG Jun^1,2

1. Beijing Aerospace Automatic Control Institute, Beijing 100854, China;
2. National Key Laboratory of Science and Technology on Aerospace Intelligent Control, Beijing 100854, China

Received:2020-08-31 Revised:2020-09-04 Published:2020-09-17
Supported by:
National Natural Science Foundation of China (61773341)

摘要/Abstract

摘要： 对深度确定性策略梯度算法训练智能体学习小型无人飞行器的飞行控制策略进行了探索研究。以多数据帧的速度、位置和姿态角等信息作为智能体的观察状态，舵摆角和发动机推力指令作为智能体的输出动作，飞行器的非线性模型和飞行环境作为智能体的学习环境。智能体在与环境交互过程中除了获得包含误差信息的密集惩罚外，也有达成一定目标的稀疏奖励，该设计有效提高了飞行数据的样本多样性，增强了智能体的学习效率。最后智能体实现了从位置、速度和姿态角等信息到控制量的端到端飞行控制，并进行了变航迹点、模型参数拉偏、注入扰动和故障条件下的飞行控制仿真，结果表明智能体除了能有效完成训练任务外，还能应对多种训练时未学习的飞行任务，具有优秀的泛化能力和鲁棒性，该方法具有一定的研究价值和工程参考价值。

关键词: 深度确定性策略梯度, 小型无人飞行器, 飞行控制, 端到端, 稀疏奖励

Abstract: The deep deterministic policy gradient algorithm is used to train the agent to learn the flight control strategy of a small UAV. The velocity, position and attitude angle of multi data frames are taken as the observation state of the agent, the rudder deflection angle and engine thrust command the output actions of the agent, and the nonlinear model and flight environment of the UAV the learning environment of the agent. In the interaction process between the agent and the environment, sparse rewards are provided to achieve certain goals, in addition to the dense punishment including error information, thereby effectively improving the diversity of flight data samples and enhancing the learning efficiency of the agent. The agent finally realizes the end-to-end flight control from the position, velocity and attitude angle to the control variables. In addition, the flight control simulations under the conditions of variable track point, model parameter deviation, disturbance and fault are carried out. Simulation results show that the agent can not only effectively complete the training task, but also deal with a variety of flight tasks not learned during training, showing excellent generalization ability and exhibiting certain research value and engineering reference value of the method.

Key words: deep deterministic policy gradient, small UAV, flight control, end to end, sparse reward

中图分类号:

黄旭, 柳嘉润, 贾晨辉, 王昭磊, 张隽. 深度确定性策略梯度算法用于无人飞行器控制[J]. 航空学报, 2021, 42(11): 524688-524688.

HUANG Xu, LIU Jiarun, JIA Chenhui, WANG Zhaolei, ZHANG Jun. Deep deterministic policy gradient algorithm for UAV control[J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2021, 42(11): 524688-524688.

参考文献

[1] 符文星, 郭行, 闫杰. 智能无人飞行器技术发展趋势综述[J]. 无人系统技术, 2019, 2(4):31-37. FU W X, GUO H, YAN J, et al. Overview on the technology development trend of intelligent unmanned aerial vehicle[J]. Unmanned Systems Technology, 2019, 2(4):31-37(in Chinese).
[2] FLANAGAN J, STRUTZENBERG R, MYERS R, et al. Development and flight testing of a morphing aircraft, the NextGen MFX-1[C]//48th AIAA/ASME/ASCE/AHS/ASC Structures, Structural Dynamics, and Materials Conference. Reston:AIAA, 2007:23-26.
[3] 雷旭升, 陶冶. 小型无人飞行器风场扰动自适应控制方法[J]. 航空学报, 2010, 31(6):1171-1176. LEI X S, TAO Y. Adaptive control for small unmanned aerial vehicle under wind disturbance[J]. Acta Aeronautica et Astronautica Sinica, 2010, 31(6):1171-1176(in Chinese).
[4] XU R, OZGUNER U. Sliding mode control of a quadrotor helicopter[C]//Proceedings of the 45th IEEE Conference on Decision and Control. Piscataway:IEEE, 2006:4957-4962.
[5] 刘德元, 刘昊, LEWIS F L. 尾座式无人飞行器鲁棒容错编队控制[J]. 航空学报, 2021, 42(2):324296. LIU D Y, LIU H, LEWIS F L. Robust fault-tolerant formation control for tail-sitters[J]. Acta Aeronautica et Astronautica Sinica, 2021, 42(2):324296(in Chinese).
[6] 党小为, 唐鹏, 孙洪强, 等. 基于角加速度估计的非线性增量动态逆控制及试飞[J]. 航空学报, 2020, 41(4):323534. DANG X W, TANG P, SUN H Q, et al. Incremental nonlinear dynamic inversion control and flight test based on angular acceleration estimation[J]. Acta Aeronautica et Astronautica Sinica, 2020, 41(4):323534(in Chinese).
[7] 陈书钊, 楚龙飞, 杨秀梅, 等. 状态预测神经网络控制应用于小型可回收火箭[J]. 航空学报, 2019, 40(3):322286. CHEN S Z, CHU L F, YANG X M, et al. Application of state prediction neural network control algorithm in small reusable rocket[J]. Acta Aeronautica et Astronautica Sinica, 2019, 40(3):322286(in Chinese).
[8] 刘金琨. 智能控制[M]. 4版. 北京:电子工业出版社, 2017:178-179. LIU J K. Intelligent control[M]. 4th ed. Beijing:Publishing House of Electronics Industry, 2017:178-179(in Chinese).
[9] NG A Y, COATES A, DIEL M, et al. Autonomous inverted helicopter flight via reinforcement learning[M]//Experimental Robotics IX. Berlin, Heidelberg:Springer, 2006:363-372.
[10] ABBEEL P, COATES A, QUIGLEY M, et al. An application of reinforcement learning to aerobatic helicopter flight[C]//Advances in Neural Information Processing Systems 19:Proceedings of the 2006 Conference. Cambridge:MIT Press, 2007:1-8.
[11] SILVER D, LEVER G, HEESS N, et al. Deterministic policy gradient algorithms[C]//31st International Conference on Machine Learning, 2014:387-395.
[12] LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[DB/OL]. arXiv preprint:1509.02971, 2015.
[13] SCHULMAN J, LEVINE S, MORITZ P, et al. Trust region policy optimization[DB/OL]. arXiv preprint:1502.05477, 2015.
[14] SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[DB/OL]. arXiv preprint:1707.06347, 2017.
[15] HWANGBO J, SA I, SIEGWART R, et al. Control of a quadrotor with reinforcement learning[J]. IEEE Robotics and Automation Letters, 2017, 2(4):2096-2103.
[16] KOCH W, MANCUSO R, WEST R, et al. Reinforcement learning for UAV attitude control[DB/OL]. arXiv preprint:1804.04154, 2018.
[17] LIN X B, YU Y, SUN C Y. Supplementary reinforcement learning controller designed for quadrotor UAVs[J]. IEEE Access, 2019, 7:26422-26431.
[18] WANG Y D, SUN J, HE H B, et al. Deterministic policy gradient with integral compensator for robust quadrotor control[J]. IEEE Transactions on Systems, Man, and Cybernetics:Systems, 2020, 50(10):3713-3725.
[19] 冯超. 强化学习精要:核心算法与TensorFlow实现[M]. 北京:电子工业出版社, 2018. FENG C. Essentials of reinforcement learning:Core algorithm and TensorFlow implementation[M]. Beijing:Publishing House of Electronics Industry, 2018(in Chinese).
[20] KONDA V R, TSITSIKLIS J N. Actor-critic algorithms[C]//Advances in Neural Information Processing Systems, 2000:1008-1014.
[21] WATKINS C J C H. Learning from delayed rewards[D]. Cambridge:University of Cambridge, 1989.
[22] SUTTON R S. Learning to predict by the methods of temporal differences[J]. Machine Learning, 1988, 3(1):9-44.
[23] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Playing Atari with deep reinforcement learning[C]//26th Neural Information Processing Systems, 2013:201-220.
[24] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540):529-533.

E-mail：hkxb@buaa.edu.cn

关于我们

期刊社服务

专业学科

封面文章

友情链接

主管单位：中国科学技术协会主办单位：中国航空学会北京航空航天大学

深度确定性策略梯度算法用于无人飞行器控制

Deep deterministic policy gradient algorithm for UAV control

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

[1]	张薇, 何若俊. 面向物联网数据收集的无人机自主路径规划[J]. 航空学报, 2024, 45(8): 329054-329054-1.
[2]	王海峰. 高性能战斗机与发动机协同设计关键技术[J]. 航空学报, 2024, 45(5): 529978-529978.
[3]	杨卫平. 新一代飞行器导航制导与控制技术发展趋势[J]. 航空学报, 2024, 45(5): 529720-529720.
[4]	刘海桥, 刘萌, 龚子超, 董晶. 基于深度学习的图像匹配方法综述[J]. 航空学报, 2024, 45(3): 28796-028796.
[5]	周攀, 黄江涛, 章胜, 刘刚, 舒博文, 唐骥罡. 基于深度强化学习的智能空战决策与仿真[J]. 航空学报, 2023, 44(4): 126731-126731.
[6]	张亮, 李丹钰, 崔乃刚, 李源. 垂直起降可重复使用运载火箭全剖面飞行预设性能控制[J]. 航空学报, 2023, 44(23): 628103-628103.
[7]	张新昱, 谢思宇, 陶洋, 李滚. 面向无人机空中加油紧密编队的鲁棒控制方法[J]. 航空学报, 2023, 44(20): 628425-628425.
[8]	胡伟, 万文章, 陈谋. 基于神经网络和干扰观测器的UAV自动着舰控制[J]. 航空学报, 2022, 43(S1): 726963-726963.
[9]	熊伟, 朱洪峰, 崔亚奇. 在线学习的循环自适应机动目标跟踪算法[J]. 航空学报, 2022, 43(5): 325250-325250.
[10]	刘畅, 蒋永平, 马春燕, 张涛. 基于NuSMV的AADL模型形式化验证技术[J]. 航空学报, 2022, 43(3): 325196-325196.
[11]	张志冰, 张秀林, 王家兴, 史静平. 一种基于多操纵面控制分配的IDLC人工着舰精确控制方法[J]. 航空学报, 2021, 42(8): 525840-525840.
[12]	雷鹏轩, 余立, 陈德华, 吕彬彬. 飞行控制律对体自由度颤振特性影响试验[J]. 航空学报, 2021, 42(6): 124378-124378.
[13]	岑飞, 聂博文, 刘志涛, 郭林亮, 孙海生, 李清. 面向先进战斗机研制的风洞模型飞行试验技术[J]. 航空学报, 2020, 41(6): 523444-523444.
[14]	党小为, 唐鹏, 孙洪强, 郑琛. 基于角加速度估计的非线性增量动态逆控制及试飞[J]. 航空学报, 2020, 41(4): 323534-323534.
[15]	马东立, 张良, 杨穆清, 夏兴禄, 王少奇. 超长航时太阳能无人机关键技术综述[J]. 航空学报, 2020, 41(3): 623418-623418.