基于DQN的单一飞行员驾驶模式分布式多智能体联盟任务分配策略

doi:10.7527/S1000-6893.2023.27895

电子电气工程与控制

本期目录 | 过刊浏览 | 高级检索

前一篇 |

基于DQN的单一飞行员驾驶模式分布式多智能体联盟任务分配策略

董磊¹^,²^,³, 陈泓兵²^,³, 陈曦¹^,²^,³, 赵长啸¹^,²^,³()

^1.中国民航大学民航航空器适航审定技术重点实验室，天津　300300
^2.中国民航大学天津市民用航空器适航与维修重点实验室，天津　300300
^3.中国民航大学安全科学与工程学院，天津　300300

收稿日期:2022-08-03 修回日期:2022-11-30 接受日期:2023-02-23 出版日期:2023-03-13 发布日期:2023-03-10
通讯作者: 赵长啸 E-mail:zhaochangxiao@yeah.net
基金资助:
国家重点研发计划(2021YFB1600600);天津市教委科研计划项目(2022KJ058);中央高校基本科研业务费项目中国民航大学专项资助(3122022044);中国民航大学研究生科研创新资助项目(2021YJS011)

Distributed multi-agent coalition task allocation strategy for single pilot operation mode based on DQN

Lei DONG¹^,²^,³, Hongbing CHEN²^,³, Xi CHEN¹^,²^,³, Changxiao ZHAO¹^,²^,³()

^1.Key Laboratory of Civil Aircraft Airworthiness Technology，Civil Aviation University of China，Tianjin　300300，China
^2.Civil Aircraft Airworthiness and Repair Key Laboratory of Tianjin，Civil Aviation University of China，Tianjin　300300，China
^3.College of Safety Science and Engineering，Civil Aviation University of China，Tianjin　300300，China

Received:2022-08-03 Revised:2022-11-30 Accepted:2023-02-23 Online:2023-03-13 Published:2023-03-10
Contact: Changxiao ZHAO E-mail:zhaochangxiao@yeah.net
Supported by:
National Key Research and Development Program(2021YFB1600600);Tianjin Education Commission Scientific Research Project(2022KJ058);Fundamental Research Funds for the Central Universities(3122022044);Graduate Research Innovation Funding Project of Civil Aviation University of China(2021YJS011)

摘要/Abstract

摘要：

分布式任务决策是提高单一飞行员驾驶（SPO）模式分布式协同飞行组织架构多智能体系统自主性的关键。以多智能体协作执行复杂任务为背景，首先构建了一种考虑任务载荷资源需求、智能体资源空间限制以及执行窗口等多约束条件的SPO模式分布式多智能体联盟任务分配决策模型；其次，对Q-估值网络函数逼近器进行了设计，提出了基于深度Q网络（DQN）的联盟任务分配方法，选择有效智能体生成最优联盟任务分配结果的最佳执行路径，使联盟中各智能体能够以更加自适应的方式实现调度优化；最后通过数值仿真，验证了DQN方法求解复杂约束条件下SPO模式多智能体联盟任务分配问题的有效性和快速性。

关键词: 单一飞行员驾驶, 多智能体系统, 任务分配, 联盟生成, 深度强化学习, 神经网络

Abstract:

Distributed decision-making is essential for increasing the autonomy of multi-agent system in the distributed coordinated flight organization structure of Single Pilot Operation （SPO） mode. A coalition task assignment decision model of distributed multi-agent for SPO mode is built on the background of multi-agent collaboration for the execution of complicated tasks， taking into account several constraints such as task load resource requirements， agent resource space， and time windows. Then， we design a function approximation of a Q-valued network， and propose a coalition task allocation algorithm based on Deep Q-Network （DQN） that generates the best execution path of the optimal coalition task allocation results， allowing each agent in the coalition to achieve scheduling optimization in a more adaptive manner. The efficiency and speed of the DQN algorithm in addressing multi-agent coalition task allocation for the SPO mode under complex constraints are confirmed through numerical simulation.

Key words: single pilot operation, multi-agent system, task allocation, coalition formation, deep reinforcement learning, neural network

中图分类号:

V323.11

董磊, 陈泓兵, 陈曦, 赵长啸. 基于DQN的单一飞行员驾驶模式分布式多智能体联盟任务分配策略[J]. 航空学报, 2023, 44(13): 327895-327895.

Lei DONG, Hongbing CHEN, Xi CHEN, Changxiao ZHAO. Distributed multi-agent coalition task allocation strategy for single pilot operation mode based on DQN[J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(13): 327895-327895.

图/表 20

图 1

图 2

图 3

图 4

图 5

表 1

表 2

任务空间参数设定

任务	类型	Task _j	$r 1 j ¯, r 2 j ¯, r 3 j ¯, r 4 j ¯, r 5 j ¯$	$λ j$	numAgent _j	$τ i j, s t a r t, τ i j, e n d$
1	飞行航路联合监视及感知	20	［0.2，0.18，0.18，0.2，0.18］	0.4	4	［0，4］［4，7］［6，9］［9，12］
2	严酷天气识别及确认	18	［0.2，0.22，0.2，0.18，0.18］	0.4	3	［12，16］［15，20］［19，23］
3	提前规划恶劣气象环境的优化路径	19	［0.16，0.2，0.2，0.2，0.2］	0.4	4	［23，26］［25，29］［28，31］［31，33］
4	基于4D航迹的飞行航路机动调整	18	［0.22，0.2，0.22，0.2，0.2］	0.4	3	［33，36］［36，38］［38，42］
5	自主巡航	12	［0.18，0.22，0.22，0.18，0.18］	0.4	2	［42，44］［44，47］
6	空地交联的协同决策	17	［0.2，0.16，0.16，0.2，0.2］	0.4	3	［47，49］［49，51］［51，54］

表 2

图 6

表 3

图 7

图 8

表 4

基于DQN的SPO模式联盟任务分配参数设置

任务	参数变量
任务	$α$	$γ$	$ε 0$	$ε d e c a y$
1	0.1	0.9	0.9	0.05
2	0.1	0.9	0.9	0.10
3	0.1	0.9	0.9	0.05
4	0.1	0.9	0.9	0.10
5	0.1	0.8	0.9	0.05
6	0.1	0.9	0.9	0.10

表 4

图 9

图 10

表 5

图 11

图 12

表 6

基于Q-Learning的SPO模式联盟任务分配参数设置

任务	参数变量
任务	$α$	$γ$	$ε 0$	$ε d e c a y$
1	0.08	0.8	0.9	0.10
2	0.10	0.8	0.9	0.10
3	0.08	0.9	0.9	0.10
4	0.10	0.8	0.9	0.05
5	0.08	0.9	0.9	0.20
6	0.08	0.9	0.9	0.05

表 6

图 13

表 7

参考文献 26

1	王淼，肖刚，王国庆. 单一飞行员驾驶模式技术［J］. 航空学报， 2020， 41（4）： 323541.
	WANG M， XIAO G， WANG G Q. Single pilot operation mode technology［J］. Acta Aeronautica et Astronautica Sinica， 2020， 41（4）： 323541 （in Chinese）.
2	LUO Y， WANG M， CHEN Y， et al. TFCluster： An efficient algorithm to mine maximal differential function-resource biclusters for single pilot operations safety analysis［C］∥ 2021 IEEE/AIAA 40th Digital Avionics Systems Conference （DASC）. Piscataway： IEEE Press， 2021： 1-6.
3	BILIMORIA K D， JOHNSON W W， SCHUTTE P C. Conceptual framework for single pilot operations［C］∥ Proceedings of the International Conference on Human-Computer Interaction in Aerospace. New York： ACM， 2014： 1-8.
4	STANTON N A， HARRIS D， STARR A. Modelling and analysis of single pilot operations in commercial aviation［C］∥ Proceedings of the International Conference on Human-Computer Interaction in Aerospace. New York： ACM， 2014： 1–8.
5	NEIS S M， KLINGAUF U， SCHIEFELE J. Classification and review of conceptual frameworks for commercial single pilot operations［C］∥ 2018 IEEE/AIAA 37th Digital Avionics Systems Conference （DASC）. Piscataway： IEEE Press， 2018： 1-8.
6	陈璞，严飞，刘钊，等. 通信约束下异构多无人机任务分配方法［J］. 航空学报， 2021， 42（8）： 525844.
	CHEN P， YAN F， LIU Z， et al. Communication-constrained task allocation of heterogeneous UAVs［J］. Acta Aeronautica et Astronautica Sinica， 2021， 42（8）： 525844 （in Chinese）.
7	柳平，胡孟权，胡文东，等. 作战飞机人机功能分配方法［J］. 火力与指挥控制， 2012， 37（12）： 19-22.
	LIU P， HU M Q， HU W D， et al. Search after methods of man-machine function allocation of combat aircraft［J］. Fire Control & Command Control， 2012， 37（12）： 19-22 （in Chinese）.
8	JOHNSON A W， OMAN C M， SHERIDAN T B， et al. Dynamic task allocation in operational systems： Issues， gaps， and recommendations［C］∥ 2014 IEEE Aerospace Conference. Piscataway： IEEE Press， 2014： 1-15.
9	HARRIS D， STANTON N A， STARR A. Spot the difference： Operational event sequence diagrams as a formal method for work allocation in the development of single-pilot operations for commercial aircraft［J］. Ergonomics， 2015， 58（11）： 1773-1791.
10	HUDDLESTONE J， SEARS R， HARRIS D. The use of operational event sequence diagrams and work domain analysis techniques for the specification of the crewing configuration of a single-pilot commercial aircraft［J］. Cognition， Technology and Work， 2017， 19（2-3）： 289–302.
11	DORNEICH M C， PASSINGER B， HAMBLIN C， et al. Evaluation of the display of cognitive state feedback to drive adaptive task sharing［J］. Frontiers in Neuroscience， 2017， 11： 144.
12	张安，任卫，汤志荔，等. 基于CTL模型和任务绩效的驾驶舱动态功能分配方法［J］. 火力与指挥控制， 2018， 43（7）： 151-156.
	ZHANG A， REN W， TANG Z L， et al. Dynamic function allocation for cockpit based on CTL model and task performance［J］. Fire Control & Command Control， 2018， 43（7）： 151-156 （in Chinese）.
13	唐嘉钰，李相民，代进进，等. 复杂约束条件下异构多智能体联盟任务分配［J］. 控制理论与应用， 2020， 37（11）： 2413-2422.
	TANG J Y， LI X M， DAI J J， et al. Coalition task allocation of heterogeneous multiple agents with complex constraints［J］. Control Theory & Applications， 2020， 37（11）： 2413-2422 （in Chinese）.
14	TOKADL G， DORNEICH M C， MATESSA M. Evaluation of playbook delegation approach in human-autonomy teaming for single pilot operations［J］. International Journal of Human-Computer Interaction， 2021， 37（7）： 703-716.
15	SUN Y， WANG J， SUN Y， et al. Dynamic worker-and-task assignment on uncertain spatial crowdsourcing［C］∥ 2018 IEEE 22nd International Conference on Computer Supported Cooperative Work in Design （CSCWD）. Piscataway： IEEE Press， 2018： 755-760.
16	HE M L， LI Y， WANG X F， et al. NOMA resource allocation method in IoV based on prioritized DQN-DDPG network［J］. EURASIP Journal on Advances in Signal Processing， 2021， 2021（1）： 120.
17	HAN S， LI L， LI X B. Deep Q-network-based cooperative transmission joint strategy optimization algorithm for energy harvesting-powered underwater acoustic sensor networks［J］. Sensors， 2020， 20（22）： 6519.
18	CHEN J J， GUO C L， FENG C Y， et al. Content driven and reinforcement learning based resource allocation scheme in vehicular network［C］∥ ICC 2021 - IEEE International Conference on Communications. Piscataway： IEEE Press， 2021： 1-6.
19	刘冰雁，叶雄兵，周赤非，等. 基于改进DQN的复合模式在轨服务资源分配［J］. 航空学报， 2020， 41（5）： 323630.
	LIU B Y， YE X B， ZHOU C F， et al. Allocation of composite mode on-orbit service resource based on improved DQN［J］. Acta Aeronautica et Astronautica Sinica， 2020， 41（5）： 323630 （in Chinese）.
20	SUN Y， TAN W A. A trust-aware task allocation method using deep Q-learning for uncertain mobile crowdsourcing［J］. Human-Centric Computing and Information Sciences， 2019， 9（1）： 1-27.
21	SUN Y H， PENG M G， MAO S W. Deep reinforcement learning-based mode selection and resource management for green fog radio access networks［J］. IEEE Internet of Things Journal， 2019， 6（2）： 1960-1971.
22	罗庆，张涛，单鹏，等. 基于改进Q学习的IMA系统重构蓝图生成方法［J］. 航空学报， 2021， 42（8）： 525792.
	LUO Q， ZHANG T， SHAN P， et al. Generating reconfiguration blueprints for IMA systems based on improved Q-learning［J］. Acta Aeronautica et Astronautica Sinica， 2021， 42（8）： 525792 （in Chinese）.
23	JI J J， GUO Y N， GAO X Z， et al. Q-learning-based hyperheuristic evolutionary algorithm for dynamic task allocation of crowdsensing［J/OL］. IEEE Transactions on Cybernetics，（2021-10-04）［2022-08-03］. .
24	ZHENG T， WAN J， ZHANG J L， et al. Deep reinforcement learning-based workload scheduling for edge computing［J］. Journal of Cloud Computing， 2022， 11（1）： 3.
25	ZITOUNI F， MAAMRI R. Cooperative learning-agents for task allocation problem［C］∥Interactive Mobile Communication， Technologies and Learning. Berlin： Springer， 2018： 952-968.
26	ZHU P X， FANG X. Multi-UAV cooperative task assignment based on half random Q-learning［J］. Symmetry， 2021， 13（12）： 2417.

E-mail：hkxb@buaa.edu.cn

关于我们

期刊社服务

专业学科

封面文章

友情链接

主管单位：中国科学技术协会主办单位：中国航空学会北京航空航天大学

智能体	Agent _i
智能体	分区1	分区2	分区3	分区4	分区5	分区6
1	5	5	6	6	7	8
2	4	4	4	5	6	7
3	3	3	4	4	5	5
4	5	5	6	6	7	7
5	3	3	3	3	4	4

γ	收敛情节	回报值
0.1	1 803	1.219
0.2	1 851	1.421
0.3	1 987	1.569
0.4	2 041	1.673
0.5	2 156	1.808
0.6	2 295	1.974
0.7	2 309	2.292
0.8	2 497	2.353
0.9	2 556	2.409

任务	执行路径h_j	是否符合约束
1	1（0）→5（4）→4（6.5）→2（9.5）	是
2	3（11.5）→1（14.5）→2（19）	是
3	4（21.5）→2（24.5）→3（27）→1（29.5）	是
4	1（32.5）→3（37）→4（39.5）	是
5	3（43）→2（46）	是
6	5（49.5）→1（51.5）→4（55.5）	否

任务	方法	最大值	最小值	标准差	发散系数/10^-6
1	DQN	2.710	2.264	0.136 5	5.397
1	Q-Learning	2.541	2.051	0.137 4	12.385
2	DQN	2.251	1.998	0.066 4	1.425
2	Q-Learning	2.163	1.917	0.079 2	1.642
3	DQN	2.383	2.119	0.079 4	1.450
3	Q-Learning	2.275	2.015	0.085 7	1.541
4	DQN	2.334	2.171	0.057 5	0.190
4	Q-Learning	2.177	2.008	0.064 1	0.253
5	DQN	1.942	1.825	0.034 6	0.128
5	Q-Learning	1.865	1.730	0.036 3	0.264
6	DQN	2.258	2.075	0.050 8	0.425
6	Q-Learning	1.934	1.725	0.052 2	1.330

[1]	张安, 杨咪, 毕文豪, 张百川, 王雨农. 基于多策略GWO算法的不确定环境下异构多无人机任务分配[J]. 航空学报, 2023, 44(8): 327115-327115.
[2]	马亚杰, 王娟, 姜斌, 龚建业. 一种无人机⁃无人车编队系统容错控制方法[J]. 航空学报, 2023, 44(8): 327216-327216.
[3]	高锡珍, 汤亮, 黄煌. 深度强化学习技术在地外探测自主操控中的应用与挑战[J]. 航空学报, 2023, 44(6): 26762-026762.
[4]	王志凯, 陈盛, 范玮. 神经网络宽度对燃烧室排放预测的影响[J]. 航空学报, 2023, 44(5): 126816-126816.
[5]	何磊, 钱炜祺, 董康生, 易贤, 柴聪聪. 基于卷积神经网络的结冰翼型气动特性建模[J]. 航空学报, 2023, 44(5): 126434-126434.
[6]	周攀, 黄江涛, 章胜, 刘刚, 舒博文, 唐骥罡. 基于深度强化学习的智能空战决策与仿真[J]. 航空学报, 2023, 44(4): 126731-126731.
[7]	陈勇, 钟科林, 罗悦, 王淼. 支线客机关键技术与发展方向[J]. 航空学报, 2023, 44(2): 26697-026697.
[8]	陈文雪, 高长生, 荆武兴. 拦截机动目标的信赖域策略优化制导算法[J]. 航空学报, 2023, 44(11): 327596-327596.
[9]	章胜, 周攀, 何扬, 黄江涛, 刘刚, 唐骥罡, 贾怀智, 杜昕. 基于深度强化学习的空战机动决策试验[J]. 航空学报, 2023, 44(10): 128094-128094.
[10]	胡伟, 万文章, 陈谋. 基于神经网络和干扰观测器的UAV自动着舰控制[J]. 航空学报, 2022, 43(S1): 726963-726963.
[11]	陈博, 岳凯, 王如生, 胡明南. 基于学习策略的多速率多传感器融合定位方法[J]. 航空学报, 2022, 43(S1): 726904-726904.
[12]	王宇斐, 刘骁佳, 刘欢, 曹立俊, 罗志强. 孪生神经网络在航天产品电性能测试方面的应用[J]. 航空学报, 2022, 43(S1): 727048-727048.
[13]	王子玲, 熊振宇, 顾祥岐. 可见光与SAR多源遥感图像关联学习算法[J]. 航空学报, 2022, 43(S1): 727239-727239.
[14]	刘佳奇, 冯蕴雯, 路成, 薛小锋, 潘维煌. 基于智能神经网络的航空发动机运行安全分析[J]. 航空学报, 2022, 43(9): 625375-625375.
[15]	韩淞宇, 邵海东, 姜洪开, 张笑阳. 基于提升卷积神经网络的航空发动机高速轴承智能故障诊断[J]. 航空学报, 2022, 43(9): 625479-625479.

基于DQN的单一飞行员驾驶模式分布式多智能体联盟任务分配策略

Distributed multi-agent coalition task allocation strategy for single pilot operation mode based on DQN

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 20

参考文献 26

相关文章 15

编辑推荐

Metrics

本文评价