Electronics and Electrical Engineering and Control

Coordination control method for fixed-wing UAV formation through deep reinforcement learning

  • XIANG Xiaojia ,
  • YAN Chao ,
  • WANG Chang ,
  • YIN Dong
Expand
  • College of Intelligence Science and Technology, National University of Defense Technology, Changsha 410073, China

Received date: 2020-03-24

  Revised date: 2020-05-18

  Online published: 2020-07-06

Supported by

National Natural Science Foundation of China (61906203); The Foundation of National Key Laboratory of Science and Technology on UAV, Northwestern Polytechnical University (614230110080817)

Abstract

Due to the complexity of kinematics and environmental dynamics, controlling a squad of fixed-wing Unmanned Aerial Vehicles (UAVs) remains a challenging problem. Considering the uncertainty of complex and dynamic environments, this paper solves the coordination control problem of UAV formation based on the model-free deep reinforcement learning algorithm. A new action selection strategy, ε-imitation strategy, is proposed by combining the ε-greedy strategy and the imitation strategy to balance the exploration and the exploitation. Based on this strategy, the double Q-learning technique, and the dueling architecture, the ID3QN (Imitative Dueling Double Deep Q-Network) algorithm is developed to boost learning efficiency. The results of the Hardware-In-Loop experiments conducted in a high-fidelity semi-physical simulation system demonstrate the adaptability and practicality of the proposed ID3QN coordinated control algorithm.

Cite this article

XIANG Xiaojia , YAN Chao , WANG Chang , YIN Dong . Coordination control method for fixed-wing UAV formation through deep reinforcement learning[J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2021 , 42(4) : 524009 -524009 . DOI: 10.7527/S1000-6893.2020.24009

References

[1] 宗群, 王丹丹, 邵士凯, 等. 多无人机协同编队飞行控制研究现状及发展[J]. 哈尔滨工业大学学报, 2017, 49(3):1-14. ZONG Q, WANG D D, SHAO S K, et al. Research status and development of multi UAV coordinated formation flight control[J]. Journal of Harbin Institute of Technology, 2017, 49(3):1-14(in Chinese).
[2] 樊琼剑, 杨忠, 方挺, 等. 多无人机协同编队飞行控制的研究现状[J]. 航空学报, 2009, 30(4):683-691. FAN Q J, YANG Z, FANG T, et al. Research status of coordinated formation flight control for multi-UAVs[J]. Acta Aeronautica et Astronautica Sinica, 2009, 30(4):683-691(in Chinese).
[3] 王祥科, 刘志宏, 丛一睿, 等. 小型固定翼无人机集群综述和未来发展[J]. 航空学报, 2020, 41(4):323732. WANG X K, LIU Z H, CONG Y R, et al. Miniature fixed-wing UAV swarms:Survey and directions[J]. Acta Aeronautica et Astronautica Sinica, 2020, 41(4):323732(in Chinese).
[4] 贾永楠, 田似营, 李擎. 无人机集群研究进展综述[J]. 航空学报, 2020, 41(S1):723738. JIA Y N, TIAN S Y, LI Q. The development of unmanned aerial vehicle swarms[J]. Acta Aeronautica et Astronautica Sinica, 2020, 41(S1):723738(in Chinese).
[5] KURIKI Y, NAMERIKAWA T. Formation control with collision avoidance for a multi-UAV system using decentralized MPC and consensus-based control[J]. Journal of Control, Measurement, and System Integration, 2015, 8(4):285-294.
[6] SAIF O, FANTONI I, ZAVALA-RIO A. Distributed integral control of multiple UAVs:Precise flocking and navigation[J]. IET Control Theory & Applications, 2019, 13(13):2008-2017.
[7] PHAM H X, LA H M, FEIL-SEIFER D, et al. Cooperative and distributed reinforcement learning of drones for field coverage[JDB/OL]. arXiv preprint:1803.07250,2018.
[8] HUNG S M, GIVIGI S N. A Q-learning approach to flocking with UAVs in a stochastic environment[J]. IEEE Transactions on Cybernetics, 2017, 47(1):186-197.
[9] SUTTON R S, BARTO A G. Reinforcement learning:An introduction[M]. Cambridge:MIT Press, 1998.
[10] 高阳, 陈世福, 陆鑫. 强化学习研究综述[J]. 自动化学报, 2004, 30(1):86-100. GAO Y, CHEN S F, LU X. Research on reinforcement learning:A review[J]. Acta Automatic Sinica, 2004, 30(1):86-100(in Chinese).
[11] YAN C, XIANG X. A path planning algorithm for UAV based on improved Q-learning[C]//International Conference on Robotics and Automation Sciences (ICRAS). Piscataway:IEEE Press, 2018:1-5.
[12] EVERETT M, CHEN Y F, HOW J P. Motion planning among dynamic, decision-making agents with deep reinforcement learning[C]//IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Piscataway:IEEE Press, 2018:3052-3059.
[13] TAI L, PAOLO G, LIU M. Virtual-to-real deep reinforcement learning:Continuous control of mobile robots for mapless navigation[C]//IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Piscataway:IEEE Press, 2017:31-36.
[14] LONG P, LIU W, PAN J. Deep-learned collision avoidance policy for distributed multiagent navigation[J]. IEEE Robotics and Automation Letters, 2017, 2(2):656-663.
[15] TOMIMASU M, MORIHIRO K, NISHIMURA H, et al. A reinforcement learning scheme of adaptive flocking behavior[C]//International Symposium on Artificial Life and Robotics (AROB). Oita:ISAROB, 2005:GS1-4.
[16] MORIHIRO K, ISOKAWA T, NISHIMURA H, et al. Characteristics of flocking behavior model by reinforcement learning scheme[C]//SICE-ICASE International Joint Conference. Piscataway:IEEE Press, 2006:4551-4556.
[17] LA H M, SHENG W. Distributed sensor fusion for scalar field mapping using mobile sensor networks[J]. IEEE Transactions on Cybernetics, 2013, 43(2):766-778.
[18] LA H M, LIM R, SHENG W. Multirobot cooperative learning for predator avoidance[J]. IEEE Transactions on Control Systems Technology, 2015, 23(1):52-63.
[19] WANG C, WANG J, ZHANG X, et al. A deep reinforcement learning approach to flocking and navigation of UAVs in large-scale complex environments[C]//IEEE Global Conference on Signal and Information Processing (GlobalSIP). Piscataway:IEEE Press, 2018:1228-1232.
[20] HUNG S M, GIVIGI S N, NOURELDIN A. A dyna-Q(λ) approach to flocking with fixed-wing UAVs in a stochastic environment[C]//IEEE International Conference on Systems, Man, and Cybernetics. Piscataway:IEEE Press, 2015:1918-1923.
[21] 左家亮, 杨任农, 张滢, 等. 基于启发式强化学习的空战机动智能决策[J]. 航空学报, 2017, 38(10):321168. ZUO J L, YANG R N, ZHANG Y, et al. Intelligent decision-making in air combat maneuvering based on heuristic reinforcement learning[J]. Acta Aeronautica et Astronautica Sinica, 2017, 38(10):321168(in Chinese).
[22] WATKINS C, DAYAN P. Q-learning[J]. Machine Learning, 1992, 8(3):279-292.
[23] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540):529-533.
[24] VAN HASSELT H. Double Q-learning[C]//Advances in Neural Information Processing Systems. Vancouver:MIT Press, 2010:2613-2621.
[25] VAN HASSELT H, GUEZ A, SILVER D. Deep reinforcement learning with double Q-learning[C]//AAAI Conference on Artificial Intelligence. Menlo Park:AAAI, 2016:2094-2100.
[26] WANG Z, SCHAUL T, HESSEL M, et al. Dueling network architectures for deep reinforcement learning[C]//International Conference on Machine Learning (ICML). Brookline:JMLR, 2016:1995-2003.
[27] WANG C, YAN C, XIANG X, et al. A continuous actor-critic reinforcement learning approach to flocking with fixed-wing UAVs[C]//Asian Conference on Machine Learning (ACML). Brookline:JMLR, 2019:64-79.
[28] QUINTERO S A P, COLLINS G E, HESPANHA J P. Flocking with fixed-wing UAVs for distributed sensing:A stochastic optimal control approach[C]//American Control Conference. Piscataway:IEEE Press, 2013:2025-2031.
[29] YAN C, XIANG X, WANG C. Towards real-time path planning through deep reinforcement learning for a UAV in dynamic environments[J]. Journal of Intelligent & Robotic Systems, 2020, 98:297-309.
[30] LV L, ZHANG S, DING D, et al. Path planning via an improved DQN-based learning policy[J]. IEEE Access, 2019, 7:67319-67330.
[31] NAIR V, HINTON G E. Rectified linear units improve restricted boltzmann machines[C]//International Conference on Machine Learning (ICML). Brookline:JMLR, 2010:807-814.
Outlines

/