基于DQN的单一飞行员驾驶模式分布式多智能体联盟任务分配策略
收稿日期: 2022-08-03
修回日期: 2022-11-30
录用日期: 2023-02-23
网络出版日期: 2023-03-10
基金资助
国家重点研发计划(2021YFB1600600);天津市教委科研计划项目(2022KJ058);中央高校基本科研业务费项目中国民航大学专项资助(3122022044);中国民航大学研究生科研创新资助项目(2021YJS011)
Distributed multi-agent coalition task allocation strategy for single pilot operation mode based on DQN
Received date: 2022-08-03
Revised date: 2022-11-30
Accepted date: 2023-02-23
Online published: 2023-03-10
Supported by
National Key Research and Development Program(2021YFB1600600);Tianjin Education Commission Scientific Research Project(2022KJ058);Fundamental Research Funds for the Central Universities(3122022044);Graduate Research Innovation Funding Project of Civil Aviation University of China(2021YJS011)
分布式任务决策是提高单一飞行员驾驶(SPO)模式分布式协同飞行组织架构多智能体系统自主性的关键。以多智能体协作执行复杂任务为背景,首先构建了一种考虑任务载荷资源需求、智能体资源空间限制以及执行窗口等多约束条件的SPO模式分布式多智能体联盟任务分配决策模型;其次,对Q-估值网络函数逼近器进行了设计,提出了基于深度Q网络(DQN)的联盟任务分配方法,选择有效智能体生成最优联盟任务分配结果的最佳执行路径,使联盟中各智能体能够以更加自适应的方式实现调度优化;最后通过数值仿真,验证了DQN方法求解复杂约束条件下SPO模式多智能体联盟任务分配问题的有效性和快速性。
董磊 , 陈泓兵 , 陈曦 , 赵长啸 . 基于DQN的单一飞行员驾驶模式分布式多智能体联盟任务分配策略[J]. 航空学报, 2023 , 44(13) : 327895 -327895 . DOI: 10.7527/S1000-6893.2023.27895
Distributed decision-making is essential for increasing the autonomy of multi-agent system in the distributed coordinated flight organization structure of Single Pilot Operation (SPO) mode. A coalition task assignment decision model of distributed multi-agent for SPO mode is built on the background of multi-agent collaboration for the execution of complicated tasks, taking into account several constraints such as task load resource requirements, agent resource space, and time windows. Then, we design a function approximation of a Q-valued network, and propose a coalition task allocation algorithm based on Deep Q-Network (DQN) that generates the best execution path of the optimal coalition task allocation results, allowing each agent in the coalition to achieve scheduling optimization in a more adaptive manner. The efficiency and speed of the DQN algorithm in addressing multi-agent coalition task allocation for the SPO mode under complex constraints are confirmed through numerical simulation.
1 | 王淼, 肖刚, 王国庆. 单一飞行员驾驶模式技术[J]. 航空学报, 2020, 41(4): 323541. |
WANG M, XIAO G, WANG G Q. Single pilot operation mode technology[J]. Acta Aeronautica et Astronautica Sinica, 2020, 41(4): 323541 (in Chinese). | |
2 | LUO Y, WANG M, CHEN Y, et al. TFCluster: An efficient algorithm to mine maximal differential function-resource biclusters for single pilot operations safety analysis[C]∥ 2021 IEEE/AIAA 40th Digital Avionics Systems Conference (DASC). Piscataway: IEEE Press, 2021: 1-6. |
3 | BILIMORIA K D, JOHNSON W W, SCHUTTE P C. Conceptual framework for single pilot operations[C]∥ Proceedings of the International Conference on Human-Computer Interaction in Aerospace. New York: ACM, 2014: 1-8. |
4 | STANTON N A, HARRIS D, STARR A. Modelling and analysis of single pilot operations in commercial aviation[C]∥ Proceedings of the International Conference on Human-Computer Interaction in Aerospace. New York: ACM, 2014: 1–8. |
5 | NEIS S M, KLINGAUF U, SCHIEFELE J. Classification and review of conceptual frameworks for commercial single pilot operations[C]∥ 2018 IEEE/AIAA 37th Digital Avionics Systems Conference (DASC). Piscataway: IEEE Press, 2018: 1-8. |
6 | 陈璞, 严飞, 刘钊, 等. 通信约束下异构多无人机任务分配方法[J]. 航空学报, 2021, 42(8): 525844. |
CHEN P, YAN F, LIU Z, et al. Communication-constrained task allocation of heterogeneous UAVs[J]. Acta Aeronautica et Astronautica Sinica, 2021, 42(8): 525844 (in Chinese). | |
7 | 柳平, 胡孟权, 胡文东, 等. 作战飞机人机功能分配方法[J]. 火力与指挥控制, 2012, 37(12): 19-22. |
LIU P, HU M Q, HU W D, et al. Search after methods of man-machine function allocation of combat aircraft[J]. Fire Control & Command Control, 2012, 37(12): 19-22 (in Chinese). | |
8 | JOHNSON A W, OMAN C M, SHERIDAN T B, et al. Dynamic task allocation in operational systems: Issues, gaps, and recommendations[C]∥ 2014 IEEE Aerospace Conference. Piscataway: IEEE Press, 2014: 1-15. |
9 | HARRIS D, STANTON N A, STARR A. Spot the difference: Operational event sequence diagrams as a formal method for work allocation in the development of single-pilot operations for commercial aircraft[J]. Ergonomics, 2015, 58(11): 1773-1791. |
10 | HUDDLESTONE J, SEARS R, HARRIS D. The use of operational event sequence diagrams and work domain analysis techniques for the specification of the crewing configuration of a single-pilot commercial aircraft[J]. Cognition, Technology and Work, 2017, 19(2-3): 289–302. |
11 | DORNEICH M C, PASSINGER B, HAMBLIN C, et al. Evaluation of the display of cognitive state feedback to drive adaptive task sharing[J]. Frontiers in Neuroscience, 2017, 11: 144. |
12 | 张安, 任卫, 汤志荔, 等. 基于CTL模型和任务绩效的驾驶舱动态功能分配方法[J]. 火力与指挥控制, 2018, 43(7): 151-156. |
ZHANG A, REN W, TANG Z L, et al. Dynamic function allocation for cockpit based on CTL model and task performance[J]. Fire Control & Command Control, 2018, 43(7): 151-156 (in Chinese). | |
13 | 唐嘉钰, 李相民, 代进进, 等. 复杂约束条件下异构多智能体联盟任务分配[J]. 控制理论与应用, 2020, 37(11): 2413-2422. |
TANG J Y, LI X M, DAI J J, et al. Coalition task allocation of heterogeneous multiple agents with complex constraints[J]. Control Theory & Applications, 2020, 37(11): 2413-2422 (in Chinese). | |
14 | TOKADL G, DORNEICH M C, MATESSA M. Evaluation of playbook delegation approach in human-autonomy teaming for single pilot operations[J]. International Journal of Human-Computer Interaction, 2021, 37(7): 703-716. |
15 | SUN Y, WANG J, SUN Y, et al. Dynamic worker-and-task assignment on uncertain spatial crowdsourcing[C]∥ 2018 IEEE 22nd International Conference on Computer Supported Cooperative Work in Design (CSCWD). Piscataway: IEEE Press, 2018: 755-760. |
16 | HE M L, LI Y, WANG X F, et al. NOMA resource allocation method in IoV based on prioritized DQN-DDPG network[J]. EURASIP Journal on Advances in Signal Processing, 2021, 2021(1): 120. |
17 | HAN S, LI L, LI X B. Deep Q-network-based cooperative transmission joint strategy optimization algorithm for energy harvesting-powered underwater acoustic sensor networks[J]. Sensors, 2020, 20(22): 6519. |
18 | CHEN J J, GUO C L, FENG C Y, et al. Content driven and reinforcement learning based resource allocation scheme in vehicular network[C]∥ ICC 2021 - IEEE International Conference on Communications. Piscataway: IEEE Press, 2021: 1-6. |
19 | 刘冰雁, 叶雄兵, 周赤非, 等. 基于改进DQN的复合模式在轨服务资源分配[J]. 航空学报, 2020, 41(5): 323630. |
LIU B Y, YE X B, ZHOU C F, et al. Allocation of composite mode on-orbit service resource based on improved DQN[J]. Acta Aeronautica et Astronautica Sinica, 2020, 41(5): 323630 (in Chinese). | |
20 | SUN Y, TAN W A. A trust-aware task allocation method using deep Q-learning for uncertain mobile crowdsourcing[J]. Human-Centric Computing and Information Sciences, 2019, 9(1): 1-27. |
21 | SUN Y H, PENG M G, MAO S W. Deep reinforcement learning-based mode selection and resource management for green fog radio access networks[J]. IEEE Internet of Things Journal, 2019, 6(2): 1960-1971. |
22 | 罗庆, 张涛, 单鹏, 等. 基于改进Q学习的IMA系统重构蓝图生成方法[J]. 航空学报, 2021, 42(8): 525792. |
LUO Q, ZHANG T, SHAN P, et al. Generating reconfiguration blueprints for IMA systems based on improved Q-learning[J]. Acta Aeronautica et Astronautica Sinica, 2021, 42(8): 525792 (in Chinese). | |
23 | JI J J, GUO Y N, GAO X Z, et al. Q-learning-based hyperheuristic evolutionary algorithm for dynamic task allocation of crowdsensing[J/OL]. IEEE Transactions on Cybernetics, (2021-10-04)[2022-08-03]. . |
24 | ZHENG T, WAN J, ZHANG J L, et al. Deep reinforcement learning-based workload scheduling for edge computing[J]. Journal of Cloud Computing, 2022, 11(1): 3. |
25 | ZITOUNI F, MAAMRI R. Cooperative learning-agents for task allocation problem[C]∥Interactive Mobile Communication, Technologies and Learning. Berlin: Springer, 2018: 952-968. |
26 | ZHU P X, FANG X. Multi-UAV cooperative task assignment based on half random Q-learning[J]. Symmetry, 2021, 13(12): 2417. |
/
〈 |
|
〉 |