基于深度强化学习的固定翼无人机编队协调控制方法

doi:10.7527/S1000-6893.2020.24009

电子电气工程与控制

本期目录 | 过刊浏览 | 高级检索

前一篇 | 后一篇

基于深度强化学习的固定翼无人机编队协调控制方法

相晓嘉, 闫超, 王菖, 尹栋

国防科技大学智能科学学院, 长沙 410073

收稿日期:2020-03-24 修回日期:2020-05-18 发布日期:2020-07-06
通讯作者: 闫超 E-mail:yanchao17@nudt.edu.cn
基金资助:
国家自然科学基金（61906203）；西北工业大学无人机特种技术重点实验室基金（614230110080817）

Coordination control method for fixed-wing UAV formation through deep reinforcement learning

XIANG Xiaojia, YAN Chao, WANG Chang, YIN Dong

College of Intelligence Science and Technology, National University of Defense Technology, Changsha 410073, China

Received:2020-03-24 Revised:2020-05-18 Published:2020-07-06
Supported by:
National Natural Science Foundation of China (61906203); The Foundation of National Key Laboratory of Science and Technology on UAV, Northwestern Polytechnical University (614230110080817)

摘要/Abstract

摘要： 由于运动学的复杂性和环境的动态性，控制一组无人机遂行任务目前仍面临较大挑战。首先，以固定翼无人机为研究对象，考虑复杂动态环境的随机性和不确定性，提出了基于无模型深度强化学习的无人机编队协调控制方法。然后，为平衡探索和利用，将ε-greedy策略与模仿策略相结合，提出了ε-imitation动作选择策略；结合双重Q学习和竞争架构对DQN（Deep Q-Network）算法进行改进，提出了ID3QN（Imitative Dueling Double Deep Q-Network）算法以提高算法的学习效率。最后，构建高保真半实物仿真系统进行硬件在环仿真飞行实验，验证了所提算法的适应性和实用性。

关键词: 固定翼无人机, 无人机编队, 协调控制, 深度强化学习, 神经网络

Abstract: Due to the complexity of kinematics and environmental dynamics, controlling a squad of fixed-wing Unmanned Aerial Vehicles (UAVs) remains a challenging problem. Considering the uncertainty of complex and dynamic environments, this paper solves the coordination control problem of UAV formation based on the model-free deep reinforcement learning algorithm. A new action selection strategy, ε-imitation strategy, is proposed by combining the ε-greedy strategy and the imitation strategy to balance the exploration and the exploitation. Based on this strategy, the double Q-learning technique, and the dueling architecture, the ID3QN (Imitative Dueling Double Deep Q-Network) algorithm is developed to boost learning efficiency. The results of the Hardware-In-Loop experiments conducted in a high-fidelity semi-physical simulation system demonstrate the adaptability and practicality of the proposed ID3QN coordinated control algorithm.

Key words: fixed-wing UAVs, UAV formation, coordination control, deep reinforcement learning, neural networks

中图分类号:

V249.1
V279

相晓嘉, 闫超, 王菖, 尹栋. 基于深度强化学习的固定翼无人机编队协调控制方法[J]. 航空学报, 2021, 42(4): 524009-524009.

XIANG Xiaojia, YAN Chao, WANG Chang, YIN Dong. Coordination control method for fixed-wing UAV formation through deep reinforcement learning[J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2021, 42(4): 524009-524009.

参考文献

[1] 宗群, 王丹丹, 邵士凯, 等. 多无人机协同编队飞行控制研究现状及发展[J]. 哈尔滨工业大学学报, 2017, 49(3):1-14. ZONG Q, WANG D D, SHAO S K, et al. Research status and development of multi UAV coordinated formation flight control[J]. Journal of Harbin Institute of Technology, 2017, 49(3):1-14(in Chinese).
[2] 樊琼剑, 杨忠, 方挺, 等. 多无人机协同编队飞行控制的研究现状[J]. 航空学报, 2009, 30(4):683-691. FAN Q J, YANG Z, FANG T, et al. Research status of coordinated formation flight control for multi-UAVs[J]. Acta Aeronautica et Astronautica Sinica, 2009, 30(4):683-691(in Chinese).
[3] 王祥科, 刘志宏, 丛一睿, 等. 小型固定翼无人机集群综述和未来发展[J]. 航空学报, 2020, 41(4):323732. WANG X K, LIU Z H, CONG Y R, et al. Miniature fixed-wing UAV swarms:Survey and directions[J]. Acta Aeronautica et Astronautica Sinica, 2020, 41(4):323732(in Chinese).
[4] 贾永楠, 田似营, 李擎. 无人机集群研究进展综述[J]. 航空学报, 2020, 41(S1):723738. JIA Y N, TIAN S Y, LI Q. The development of unmanned aerial vehicle swarms[J]. Acta Aeronautica et Astronautica Sinica, 2020, 41(S1):723738(in Chinese).
[5] KURIKI Y, NAMERIKAWA T. Formation control with collision avoidance for a multi-UAV system using decentralized MPC and consensus-based control[J]. Journal of Control, Measurement, and System Integration, 2015, 8(4):285-294.
[6] SAIF O, FANTONI I, ZAVALA-RIO A. Distributed integral control of multiple UAVs:Precise flocking and navigation[J]. IET Control Theory & Applications, 2019, 13(13):2008-2017.
[7] PHAM H X, LA H M, FEIL-SEIFER D, et al. Cooperative and distributed reinforcement learning of drones for field coverage[JDB/OL]. arXiv preprint:1803.07250,2018.
[8] HUNG S M, GIVIGI S N. A Q-learning approach to flocking with UAVs in a stochastic environment[J]. IEEE Transactions on Cybernetics, 2017, 47(1):186-197.
[9] SUTTON R S, BARTO A G. Reinforcement learning:An introduction[M]. Cambridge:MIT Press, 1998.
[10] 高阳, 陈世福, 陆鑫. 强化学习研究综述[J]. 自动化学报, 2004, 30(1):86-100. GAO Y, CHEN S F, LU X. Research on reinforcement learning:A review[J]. Acta Automatic Sinica, 2004, 30(1):86-100(in Chinese).
[11] YAN C, XIANG X. A path planning algorithm for UAV based on improved Q-learning[C]//International Conference on Robotics and Automation Sciences (ICRAS). Piscataway:IEEE Press, 2018:1-5.
[12] EVERETT M, CHEN Y F, HOW J P. Motion planning among dynamic, decision-making agents with deep reinforcement learning[C]//IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Piscataway:IEEE Press, 2018:3052-3059.
[13] TAI L, PAOLO G, LIU M. Virtual-to-real deep reinforcement learning:Continuous control of mobile robots for mapless navigation[C]//IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Piscataway:IEEE Press, 2017:31-36.
[14] LONG P, LIU W, PAN J. Deep-learned collision avoidance policy for distributed multiagent navigation[J]. IEEE Robotics and Automation Letters, 2017, 2(2):656-663.
[15] TOMIMASU M, MORIHIRO K, NISHIMURA H, et al. A reinforcement learning scheme of adaptive flocking behavior[C]//International Symposium on Artificial Life and Robotics (AROB). Oita:ISAROB, 2005:GS1-4.
[16] MORIHIRO K, ISOKAWA T, NISHIMURA H, et al. Characteristics of flocking behavior model by reinforcement learning scheme[C]//SICE-ICASE International Joint Conference. Piscataway:IEEE Press, 2006:4551-4556.
[17] LA H M, SHENG W. Distributed sensor fusion for scalar field mapping using mobile sensor networks[J]. IEEE Transactions on Cybernetics, 2013, 43(2):766-778.
[18] LA H M, LIM R, SHENG W. Multirobot cooperative learning for predator avoidance[J]. IEEE Transactions on Control Systems Technology, 2015, 23(1):52-63.
[19] WANG C, WANG J, ZHANG X, et al. A deep reinforcement learning approach to flocking and navigation of UAVs in large-scale complex environments[C]//IEEE Global Conference on Signal and Information Processing (GlobalSIP). Piscataway:IEEE Press, 2018:1228-1232.
[20] HUNG S M, GIVIGI S N, NOURELDIN A. A dyna-Q(λ) approach to flocking with fixed-wing UAVs in a stochastic environment[C]//IEEE International Conference on Systems, Man, and Cybernetics. Piscataway:IEEE Press, 2015:1918-1923.
[21] 左家亮, 杨任农, 张滢, 等. 基于启发式强化学习的空战机动智能决策[J]. 航空学报, 2017, 38(10):321168. ZUO J L, YANG R N, ZHANG Y, et al. Intelligent decision-making in air combat maneuvering based on heuristic reinforcement learning[J]. Acta Aeronautica et Astronautica Sinica, 2017, 38(10):321168(in Chinese).
[22] WATKINS C, DAYAN P. Q-learning[J]. Machine Learning, 1992, 8(3):279-292.
[23] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540):529-533.
[24] VAN HASSELT H. Double Q-learning[C]//Advances in Neural Information Processing Systems. Vancouver:MIT Press, 2010:2613-2621.
[25] VAN HASSELT H, GUEZ A, SILVER D. Deep reinforcement learning with double Q-learning[C]//AAAI Conference on Artificial Intelligence. Menlo Park:AAAI, 2016:2094-2100.
[26] WANG Z, SCHAUL T, HESSEL M, et al. Dueling network architectures for deep reinforcement learning[C]//International Conference on Machine Learning (ICML). Brookline:JMLR, 2016:1995-2003.
[27] WANG C, YAN C, XIANG X, et al. A continuous actor-critic reinforcement learning approach to flocking with fixed-wing UAVs[C]//Asian Conference on Machine Learning (ACML). Brookline:JMLR, 2019:64-79.
[28] QUINTERO S A P, COLLINS G E, HESPANHA J P. Flocking with fixed-wing UAVs for distributed sensing:A stochastic optimal control approach[C]//American Control Conference. Piscataway:IEEE Press, 2013:2025-2031.
[29] YAN C, XIANG X, WANG C. Towards real-time path planning through deep reinforcement learning for a UAV in dynamic environments[J]. Journal of Intelligent & Robotic Systems, 2020, 98:297-309.
[30] LV L, ZHANG S, DING D, et al. Path planning via an improved DQN-based learning policy[J]. IEEE Access, 2019, 7:67319-67330.
[31] NAIR V, HINTON G E. Rectified linear units improve restricted boltzmann machines[C]//International Conference on Machine Learning (ICML). Brookline:JMLR, 2010:807-814.

E-mail：hkxb@buaa.edu.cn

关于我们

期刊社服务

专业学科

封面文章

友情链接

主管单位：中国科学技术协会主办单位：中国航空学会北京航空航天大学

基于深度强化学习的固定翼无人机编队协调控制方法

Coordination control method for fixed-wing UAV formation through deep reinforcement learning

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

[1]	张鸿林, 罗建军, 马卫华. 基于机器学习的航天器规避目标威胁博弈决策[J]. 航空学报, 2024, 45(8): 329136-329136.
[2]	尹洪玉, 吴宇, 梁天骄. 固定翼无人机巡逻覆盖协同路径规划方法[J]. 航空学报, 2024, 45(6): 328944-328944.
[3]	蔡云鹏, 周大鹏, 丁江川. 具有防撞安全约束的无人机集群智能协同控制[J]. 航空学报, 2024, 45(5): 529683-529683.
[4]	马菲, 张琼, 赖培军, 岳一笛. 基于BP神经网络的试飞训练安全性量化模型[J]. 航空学报, 2024, 45(5): 529957-529957.
[5]	倪育德, 闫苗玉, 刘瑞华. 基于DOA-BP神经网络的电离层TEC短期预测[J]. 航空学报, 2024, 45(4): 328707-328707.
[6]	单圣哲, 张伟伟. 基于自博弈深度强化学习的空战智能决策方法[J]. 航空学报, 2024, 45(4): 328723-328723.
[7]	郝振洋, 秦岭, 曹鑫, 张绮瑶, 周强. 机械天线阵列系统的伺服优化及调制方式[J]. 航空学报, 2024, 45(3): 328692-328692.
[8]	张清瑞, 刘赟韵, 孙慧杰, 朱波. 固定翼无人机紧密编队的鲁棒协同跟踪控制[J]. 航空学报, 2024, 45(1): 629233-629233.
[9]	郝文康, 包素艳, 陈琪锋. 基于端口哈密顿系统的无人机编队分布式控制[J]. 航空学报, 2023, 44(S2): 729868-729868.
[10]	王雪鉴, 文永明, 石晓荣, 张宁宁, 刘洁玺. 多智能体多耦合任务混合式智能决策架构设计[J]. 航空学报, 2023, 44(S2): 729770-729770.
[11]	倪炜霖, 王永海, 徐聪, 赤丰华, 梁海朝. 基于强化学习的高超飞行器协同博弈制导方法[J]. 航空学报, 2023, 44(S2): 729400-729400.
[12]	李忠智, 马金毅, 艾剑良, 董一群. 拟VGG16网络的航空传感器故障检测分类[J]. 航空学报, 2023, 44(S1): 727615-727615.
[13]	刘武, 吴云燕, 刘玮, 田明明, 黄天鹏. 考虑未知扰动的RLV再入鲁棒容错姿态控制[J]. 航空学报, 2023, 44(S1): 727787-727787.
[14]	刘晨阳, 吴大伟, 郭一泽, 吕欣赛, 周佳妮, 邵书义. 不确定强耦合下四旋翼姿态鲁棒自适应控制[J]. 航空学报, 2023, 44(S1): 727645-727645.
[15]	高锡珍, 汤亮, 黄煌. 深度强化学习技术在地外探测自主操控中的应用与挑战[J]. 航空学报, 2023, 44(6): 26762-026762.