深度确定性策略梯度算法用于无人飞行器控制

doi:10.7527/S1000-6893.2020.24688

Abstract

Abstract: The deep deterministic policy gradient algorithm is used to train the agent to learn the flight control strategy of a small UAV. The velocity, position and attitude angle of multi data frames are taken as the observation state of the agent, the rudder deflection angle and engine thrust command the output actions of the agent, and the nonlinear model and flight environment of the UAV the learning environment of the agent. In the interaction process between the agent and the environment, sparse rewards are provided to achieve certain goals, in addition to the dense punishment including error information, thereby effectively improving the diversity of flight data samples and enhancing the learning efficiency of the agent. The agent finally realizes the end-to-end flight control from the position, velocity and attitude angle to the control variables. In addition, the flight control simulations under the conditions of variable track point, model parameter deviation, disturbance and fault are carried out. Simulation results show that the agent can not only effectively complete the training task, but also deal with a variety of flight tasks not learned during training, showing excellent generalization ability and exhibiting certain research value and engineering reference value of the method.

Key words: deep deterministic policy gradient, small UAV, flight control, end to end, sparse reward

CLC Number:

HUANG Xu, LIU Jiarun, JIA Chenhui, WANG Zhaolei, ZHANG Jun. Deep deterministic policy gradient algorithm for UAV control[J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2021, 42(11): 524688-524688.

References

[1] 符文星, 郭行, 闫杰. 智能无人飞行器技术发展趋势综述[J]. 无人系统技术, 2019, 2(4):31-37. FU W X, GUO H, YAN J, et al. Overview on the technology development trend of intelligent unmanned aerial vehicle[J]. Unmanned Systems Technology, 2019, 2(4):31-37(in Chinese).
[2] FLANAGAN J, STRUTZENBERG R, MYERS R, et al. Development and flight testing of a morphing aircraft, the NextGen MFX-1[C]//48th AIAA/ASME/ASCE/AHS/ASC Structures, Structural Dynamics, and Materials Conference. Reston:AIAA, 2007:23-26.
[3] 雷旭升, 陶冶. 小型无人飞行器风场扰动自适应控制方法[J]. 航空学报, 2010, 31(6):1171-1176. LEI X S, TAO Y. Adaptive control for small unmanned aerial vehicle under wind disturbance[J]. Acta Aeronautica et Astronautica Sinica, 2010, 31(6):1171-1176(in Chinese).
[4] XU R, OZGUNER U. Sliding mode control of a quadrotor helicopter[C]//Proceedings of the 45th IEEE Conference on Decision and Control. Piscataway:IEEE, 2006:4957-4962.
[5] 刘德元, 刘昊, LEWIS F L. 尾座式无人飞行器鲁棒容错编队控制[J]. 航空学报, 2021, 42(2):324296. LIU D Y, LIU H, LEWIS F L. Robust fault-tolerant formation control for tail-sitters[J]. Acta Aeronautica et Astronautica Sinica, 2021, 42(2):324296(in Chinese).
[6] 党小为, 唐鹏, 孙洪强, 等. 基于角加速度估计的非线性增量动态逆控制及试飞[J]. 航空学报, 2020, 41(4):323534. DANG X W, TANG P, SUN H Q, et al. Incremental nonlinear dynamic inversion control and flight test based on angular acceleration estimation[J]. Acta Aeronautica et Astronautica Sinica, 2020, 41(4):323534(in Chinese).
[7] 陈书钊, 楚龙飞, 杨秀梅, 等. 状态预测神经网络控制应用于小型可回收火箭[J]. 航空学报, 2019, 40(3):322286. CHEN S Z, CHU L F, YANG X M, et al. Application of state prediction neural network control algorithm in small reusable rocket[J]. Acta Aeronautica et Astronautica Sinica, 2019, 40(3):322286(in Chinese).
[8] 刘金琨. 智能控制[M]. 4版. 北京:电子工业出版社, 2017:178-179. LIU J K. Intelligent control[M]. 4th ed. Beijing:Publishing House of Electronics Industry, 2017:178-179(in Chinese).
[9] NG A Y, COATES A, DIEL M, et al. Autonomous inverted helicopter flight via reinforcement learning[M]//Experimental Robotics IX. Berlin, Heidelberg:Springer, 2006:363-372.
[10] ABBEEL P, COATES A, QUIGLEY M, et al. An application of reinforcement learning to aerobatic helicopter flight[C]//Advances in Neural Information Processing Systems 19:Proceedings of the 2006 Conference. Cambridge:MIT Press, 2007:1-8.
[11] SILVER D, LEVER G, HEESS N, et al. Deterministic policy gradient algorithms[C]//31st International Conference on Machine Learning, 2014:387-395.
[12] LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[DB/OL]. arXiv preprint:1509.02971, 2015.
[13] SCHULMAN J, LEVINE S, MORITZ P, et al. Trust region policy optimization[DB/OL]. arXiv preprint:1502.05477, 2015.
[14] SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[DB/OL]. arXiv preprint:1707.06347, 2017.
[15] HWANGBO J, SA I, SIEGWART R, et al. Control of a quadrotor with reinforcement learning[J]. IEEE Robotics and Automation Letters, 2017, 2(4):2096-2103.
[16] KOCH W, MANCUSO R, WEST R, et al. Reinforcement learning for UAV attitude control[DB/OL]. arXiv preprint:1804.04154, 2018.
[17] LIN X B, YU Y, SUN C Y. Supplementary reinforcement learning controller designed for quadrotor UAVs[J]. IEEE Access, 2019, 7:26422-26431.
[18] WANG Y D, SUN J, HE H B, et al. Deterministic policy gradient with integral compensator for robust quadrotor control[J]. IEEE Transactions on Systems, Man, and Cybernetics:Systems, 2020, 50(10):3713-3725.
[19] 冯超. 强化学习精要:核心算法与TensorFlow实现[M]. 北京:电子工业出版社, 2018. FENG C. Essentials of reinforcement learning:Core algorithm and TensorFlow implementation[M]. Beijing:Publishing House of Electronics Industry, 2018(in Chinese).
[20] KONDA V R, TSITSIKLIS J N. Actor-critic algorithms[C]//Advances in Neural Information Processing Systems, 2000:1008-1014.
[21] WATKINS C J C H. Learning from delayed rewards[D]. Cambridge:University of Cambridge, 1989.
[22] SUTTON R S. Learning to predict by the methods of temporal differences[J]. Machine Learning, 1988, 3(1):9-44.
[23] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Playing Atari with deep reinforcement learning[C]//26th Neural Information Processing Systems, 2013:201-220.
[24] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540):529-533.

Deep deterministic policy gradient algorithm for UAV control

RichHTML

PDF (PC)

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics

Comments

[1]	Wei ZHANG, Ruojun HE. Autonomous trajectory design for IoT data collection by UAV [J]. Acta Aeronautica et Astronautica Sinica, 2024, 45(8): 329054-329054-1.
[2]	Haifeng WANG. Key technologies in collaborative airframe⁃engine design for high performance fighters [J]. Acta Aeronautica et Astronautica Sinica, 2024, 45(5): 529978-529978.
[3]	Weiping YANG. Development trend of navigation guidance and control technology for new generation aircraft [J]. Acta Aeronautica et Astronautica Sinica, 2024, 45(5): 529720-529720.
[4]	Lu ZHUANG, Zhong LU, Haijing SONG, Li DONG, Yuting WU, Jia ZHOU. Safety analysis for fly⁃by⁃wire system based on fault injection model [J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2023, 44(9): 327329-327329.
[5]	Pan ZHOU, Jiangtao HUANG, Sheng ZHANG, Gang LIU, Bowen SHU, Jigang TANG. Intelligent air combat decision making and simulation based on deep reinforcement learning [J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2023, 44(4): 126731-126731.
[6]	Liang ZHANG, Danyu LI, Naigang CUI, Yuan LI. Full flight profile prescribed performance control for vertical take-off and vertical landing reusable launch vehicle [J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(23): 628103-628103.
[7]	Xinyu ZHANG, Siyu XIE, Yang TAO, Gun LI. A robust control method for close formation of aerial-refueling UAVs [J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(20): 628425-628425.
[8]	HU Wei, WAN Wenzhang, CHEN Mou. Neural network and disturbance observer based control for automatic carrier landing of UAV [J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2022, 43(S1): 726963-726963.
[9]	LIU Chang, JIANG Yongping, MA Chunyan, ZHANG Tao. Formal verification technology for AADL models based on NuSMV [J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2022, 43(3): 325196-325196.
[10]	LIU Haigang, LIU Liang, WANG Peng, ZHOU Wei. Model based simulation and analysis of energy optimization characteristics of more-electric aircraft [J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2021, 42(8): 525801-525801.
[11]	ZHANG Zhibing, ZHANG Xiulin, WANG Jiaxing, SHI Jingping. An IDLC landing control method of carrier-based aircraft based on control allocation of multiple control surfaces [J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2021, 42(8): 525840-525840.
[12]	LEI Pengxuan, YU Li, CHEN Dehua, LYU Binbin. Influence of flight control law on body freedom flutter characteristics: Experimental study [J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2021, 42(6): 124378-124378.
[13]	CEN Fei, NIE Bowen, LIU Zhitao, GUO Linliang, SUN Haisheng, LI Qing. Wind tunnel model flight test technique for advanced fighter aircraft design [J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2020, 41(6): 523444-523444.
[14]	DANG Xiaowei, TANG Peng, SUN Hongqiang, ZHENG Chen. Incremental nonlinear dynamic inversion control and flight test based on angular acceleration estimation [J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2020, 41(4): 323534-323534.
[15]	MA Dongli, ZHANG Liang, YANG Muqing, XIA Xinglu, WANG Shaoqi. Review of key technologies of ultra-long-endurance solar powered unmanned aerial vehicle [J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2020, 41(3): 623418-623418.