基于DQN的单一飞行员驾驶模式分布式多智能体联盟任务分配策略

doi:10.7527/S1000-6893.2023.27895

Abstract

Abstract:

Distributed decision-making is essential for increasing the autonomy of multi-agent system in the distributed coordinated flight organization structure of Single Pilot Operation （SPO） mode. A coalition task assignment decision model of distributed multi-agent for SPO mode is built on the background of multi-agent collaboration for the execution of complicated tasks， taking into account several constraints such as task load resource requirements， agent resource space， and time windows. Then， we design a function approximation of a Q-valued network， and propose a coalition task allocation algorithm based on Deep Q-Network （DQN） that generates the best execution path of the optimal coalition task allocation results， allowing each agent in the coalition to achieve scheduling optimization in a more adaptive manner. The efficiency and speed of the DQN algorithm in addressing multi-agent coalition task allocation for the SPO mode under complex constraints are confirmed through numerical simulation.

Key words: single pilot operation, multi-agent system, task allocation, coalition formation, deep reinforcement learning, neural network

CLC Number:

V323.11

Lei DONG, Hongbing CHEN, Xi CHEN, Changxiao ZHAO. Distributed multi-agent coalition task allocation strategy for single pilot operation mode based on DQN[J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(13): 327895.

Figures/Tables 20

Fig.1

Fig.2

Fig.3

Fig.4

Fig.5

Table 1

Table 2

Initial settings of tasks

任务	类型	Task _j	$r 1 j ¯, r 2 j ¯, r 3 j ¯, r 4 j ¯, r 5 j ¯$	$λ j$	numAgent _j	$τ i j, s t a r t, τ i j, e n d$
1	飞行航路联合监视及感知	20	［0.2，0.18，0.18，0.2，0.18］	0.4	4	［0，4］［4，7］［6，9］［9，12］
2	严酷天气识别及确认	18	［0.2，0.22，0.2，0.18，0.18］	0.4	3	［12，16］［15，20］［19，23］
3	提前规划恶劣气象环境的优化路径	19	［0.16，0.2，0.2，0.2，0.2］	0.4	4	［23，26］［25，29］［28，31］［31，33］
4	基于4D航迹的飞行航路机动调整	18	［0.22，0.2，0.22，0.2，0.2］	0.4	3	［33，36］［36，38］［38，42］
5	自主巡航	12	［0.18，0.22，0.22，0.18，0.18］	0.4	2	［42，44］［44，47］
6	空地交联的协同决策	17	［0.2，0.16，0.16，0.2，0.2］	0.4	3	［47，49］［49，51］［51，54］

Table 2

Fig.6

Table 3

Fig.7

Fig.8

Table 4

Parameter settings of coalition task allocation method for SPO mode based on DQN

任务	参数变量
任务	$α$	$γ$	$ε 0$	$ε d e c a y$
1	0.1	0.9	0.9	0.05
2	0.1	0.9	0.9	0.10
3	0.1	0.9	0.9	0.05
4	0.1	0.9	0.9	0.10
5	0.1	0.8	0.9	0.05
6	0.1	0.9	0.9	0.10

Table 4

Fig.9

Fig.10

Table 5

Fig.11

Fig.12

Table 6

Parameter settings of coalition task allocation method for SPO mode based on Q-Learning

任务	参数变量
任务	$α$	$γ$	$ε 0$	$ε d e c a y$
1	0.08	0.8	0.9	0.10
2	0.10	0.8	0.9	0.10
3	0.08	0.9	0.9	0.10
4	0.10	0.8	0.9	0.05
5	0.08	0.9	0.9	0.20
6	0.08	0.9	0.9	0.05

Table 6

Fig.13

Table 7

References 26

1	王淼，肖刚，王国庆. 单一飞行员驾驶模式技术［J］. 航空学报， 2020， 41（4）： 323541.
	WANG M， XIAO G， WANG G Q. Single pilot operation mode technology［J］. Acta Aeronautica et Astronautica Sinica， 2020， 41（4）： 323541 （in Chinese）.
2	LUO Y， WANG M， CHEN Y， et al. TFCluster： An efficient algorithm to mine maximal differential function-resource biclusters for single pilot operations safety analysis［C］∥ 2021 IEEE/AIAA 40th Digital Avionics Systems Conference （DASC）. Piscataway： IEEE Press， 2021： 1-6.
3	BILIMORIA K D， JOHNSON W W， SCHUTTE P C. Conceptual framework for single pilot operations［C］∥ Proceedings of the International Conference on Human-Computer Interaction in Aerospace. New York： ACM， 2014： 1-8.
4	STANTON N A， HARRIS D， STARR A. Modelling and analysis of single pilot operations in commercial aviation［C］∥ Proceedings of the International Conference on Human-Computer Interaction in Aerospace. New York： ACM， 2014： 1–8.
5	NEIS S M， KLINGAUF U， SCHIEFELE J. Classification and review of conceptual frameworks for commercial single pilot operations［C］∥ 2018 IEEE/AIAA 37th Digital Avionics Systems Conference （DASC）. Piscataway： IEEE Press， 2018： 1-8.
6	陈璞，严飞，刘钊，等. 通信约束下异构多无人机任务分配方法［J］. 航空学报， 2021， 42（8）： 525844.
	CHEN P， YAN F， LIU Z， et al. Communication-constrained task allocation of heterogeneous UAVs［J］. Acta Aeronautica et Astronautica Sinica， 2021， 42（8）： 525844 （in Chinese）.
7	柳平，胡孟权，胡文东，等. 作战飞机人机功能分配方法［J］. 火力与指挥控制， 2012， 37（12）： 19-22.
	LIU P， HU M Q， HU W D， et al. Search after methods of man-machine function allocation of combat aircraft［J］. Fire Control & Command Control， 2012， 37（12）： 19-22 （in Chinese）.
8	JOHNSON A W， OMAN C M， SHERIDAN T B， et al. Dynamic task allocation in operational systems： Issues， gaps， and recommendations［C］∥ 2014 IEEE Aerospace Conference. Piscataway： IEEE Press， 2014： 1-15.
9	HARRIS D， STANTON N A， STARR A. Spot the difference： Operational event sequence diagrams as a formal method for work allocation in the development of single-pilot operations for commercial aircraft［J］. Ergonomics， 2015， 58（11）： 1773-1791.
10	HUDDLESTONE J， SEARS R， HARRIS D. The use of operational event sequence diagrams and work domain analysis techniques for the specification of the crewing configuration of a single-pilot commercial aircraft［J］. Cognition， Technology and Work， 2017， 19（2-3）： 289–302.
11	DORNEICH M C， PASSINGER B， HAMBLIN C， et al. Evaluation of the display of cognitive state feedback to drive adaptive task sharing［J］. Frontiers in Neuroscience， 2017， 11： 144.
12	张安，任卫，汤志荔，等. 基于CTL模型和任务绩效的驾驶舱动态功能分配方法［J］. 火力与指挥控制， 2018， 43（7）： 151-156.
	ZHANG A， REN W， TANG Z L， et al. Dynamic function allocation for cockpit based on CTL model and task performance［J］. Fire Control & Command Control， 2018， 43（7）： 151-156 （in Chinese）.
13	唐嘉钰，李相民，代进进，等. 复杂约束条件下异构多智能体联盟任务分配［J］. 控制理论与应用， 2020， 37（11）： 2413-2422.
	TANG J Y， LI X M， DAI J J， et al. Coalition task allocation of heterogeneous multiple agents with complex constraints［J］. Control Theory & Applications， 2020， 37（11）： 2413-2422 （in Chinese）.
14	TOKADL G， DORNEICH M C， MATESSA M. Evaluation of playbook delegation approach in human-autonomy teaming for single pilot operations［J］. International Journal of Human-Computer Interaction， 2021， 37（7）： 703-716.
15	SUN Y， WANG J， SUN Y， et al. Dynamic worker-and-task assignment on uncertain spatial crowdsourcing［C］∥ 2018 IEEE 22nd International Conference on Computer Supported Cooperative Work in Design （CSCWD）. Piscataway： IEEE Press， 2018： 755-760.
16	HE M L， LI Y， WANG X F， et al. NOMA resource allocation method in IoV based on prioritized DQN-DDPG network［J］. EURASIP Journal on Advances in Signal Processing， 2021， 2021（1）： 120.
17	HAN S， LI L， LI X B. Deep Q-network-based cooperative transmission joint strategy optimization algorithm for energy harvesting-powered underwater acoustic sensor networks［J］. Sensors， 2020， 20（22）： 6519.
18	CHEN J J， GUO C L， FENG C Y， et al. Content driven and reinforcement learning based resource allocation scheme in vehicular network［C］∥ ICC 2021 - IEEE International Conference on Communications. Piscataway： IEEE Press， 2021： 1-6.
19	刘冰雁，叶雄兵，周赤非，等. 基于改进DQN的复合模式在轨服务资源分配［J］. 航空学报， 2020， 41（5）： 323630.
	LIU B Y， YE X B， ZHOU C F， et al. Allocation of composite mode on-orbit service resource based on improved DQN［J］. Acta Aeronautica et Astronautica Sinica， 2020， 41（5）： 323630 （in Chinese）.
20	SUN Y， TAN W A. A trust-aware task allocation method using deep Q-learning for uncertain mobile crowdsourcing［J］. Human-Centric Computing and Information Sciences， 2019， 9（1）： 1-27.
21	SUN Y H， PENG M G， MAO S W. Deep reinforcement learning-based mode selection and resource management for green fog radio access networks［J］. IEEE Internet of Things Journal， 2019， 6（2）： 1960-1971.
22	罗庆，张涛，单鹏，等. 基于改进Q学习的IMA系统重构蓝图生成方法［J］. 航空学报， 2021， 42（8）： 525792.
	LUO Q， ZHANG T， SHAN P， et al. Generating reconfiguration blueprints for IMA systems based on improved Q-learning［J］. Acta Aeronautica et Astronautica Sinica， 2021， 42（8）： 525792 （in Chinese）.
23	JI J J， GUO Y N， GAO X Z， et al. Q-learning-based hyperheuristic evolutionary algorithm for dynamic task allocation of crowdsensing［J/OL］. IEEE Transactions on Cybernetics，（2021-10-04）［2022-08-03］. .
24	ZHENG T， WAN J， ZHANG J L， et al. Deep reinforcement learning-based workload scheduling for edge computing［J］. Journal of Cloud Computing， 2022， 11（1）： 3.
25	ZITOUNI F， MAAMRI R. Cooperative learning-agents for task allocation problem［C］∥Interactive Mobile Communication， Technologies and Learning. Berlin： Springer， 2018： 952-968.
26	ZHU P X， FANG X. Multi-UAV cooperative task assignment based on half random Q-learning［J］. Symmetry， 2021， 13（12）： 2417.

智能体	Agent _i
智能体	分区1	分区2	分区3	分区4	分区5	分区6
1	5	5	6	6	7	8
2	4	4	4	5	6	7
3	3	3	4	4	5	5
4	5	5	6	6	7	7
5	3	3	3	3	4	4

γ	收敛情节	回报值
0.1	1 803	1.219
0.2	1 851	1.421
0.3	1 987	1.569
0.4	2 041	1.673
0.5	2 156	1.808
0.6	2 295	1.974
0.7	2 309	2.292
0.8	2 497	2.353
0.9	2 556	2.409

任务	执行路径h_j	是否符合约束
1	1（0）→5（4）→4（6.5）→2（9.5）	是
2	3（11.5）→1（14.5）→2（19）	是
3	4（21.5）→2（24.5）→3（27）→1（29.5）	是
4	1（32.5）→3（37）→4（39.5）	是
5	3（43）→2（46）	是
6	5（49.5）→1（51.5）→4（55.5）	否

任务	方法	最大值	最小值	标准差	发散系数/10^-6
1	DQN	2.710	2.264	0.136 5	5.397
1	Q-Learning	2.541	2.051	0.137 4	12.385
2	DQN	2.251	1.998	0.066 4	1.425
2	Q-Learning	2.163	1.917	0.079 2	1.642
3	DQN	2.383	2.119	0.079 4	1.450
3	Q-Learning	2.275	2.015	0.085 7	1.541
4	DQN	2.334	2.171	0.057 5	0.190
4	Q-Learning	2.177	2.008	0.064 1	0.253
5	DQN	1.942	1.825	0.034 6	0.128
5	Q-Learning	1.865	1.730	0.036 3	0.264
6	DQN	2.258	2.075	0.050 8	0.425
6	Q-Learning	1.934	1.725	0.052 2	1.330

[1]	Haipeng CHEN, Wenxing FU, Jie YAN. Fault diagnosis of thrust offset loss of launch vehicle based on AGABP neural network [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(8): 231148-231148.
[2]	Kaifang WAN, Zhilin WU, Yunhui WU, Haozhi QIANG, Yibo WU, Bo LI. Cooperative location of multiple UAVs with deep reinforcement learning in GPS-denied environment [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(8): 331024-331024.
[3]	Lingfeng JIANG, Xinkai LI, Hai ZHANG, Hanwei LI, Hongli ZHANG. Mapless navigation of UAVs in dynamic environments based on an improved TD3 algorithm [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(8): 331035-331035.
[4]	Mou CHEN, Zhengguo HUANG, Yaohua SHEN, Fan LIU. Overview of composite anti-disturbance control technology of advanced vehicles [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(6): 531303-531303.
[5]	Zhichun YANG, Te YANG. Physical embedded neural network model and method for dynamic load identification [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(5): 531450-531450.
[6]	Min YANG, Guanjun LIU, Ziyuan ZHOU. Control of lunar landers based on secure reinforcement learning [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(3): 630553-630553.
[7]	Chenhao ZHAO, Dewei WU, Jing HE, Qian WU. A semantic feature matching algorithm for UAV visual pose estimation [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(2): 330406-330406.
[8]	Yingjie SHI, Binchao LIU, Songsong LU, Liang CHEN, Hai SHANG, Rui BAO. Neural network model for wing strain-load relationship based on fusion of real and virtual data [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(19): 530921-530921.
[9]	Chengjie GUO, Dian XU, Jinbao LI, Chaoyu CHENG, Shuochang GUO, Rui LI. Stress characterization of high-temperature digital image correlation experiments based on a data fusion-knowledge transfer method [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(19): 531574-531574.
[10]	Yinxuan ZHANG, Qi ZHANG, Zhenyong XU, Linshu MENG. Predicting method of aircraft mechanical response based on residual neural networks [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(19): 531295-531295.
[11]	Yugang ZHANG, Zhe YANG, Senpeng HE, Wenqing YANG. Aircraft attitude prediction model based on physical information neural networks [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(19): 531850-531850.
[12]	Chen WANG, Caisheng WEI, Zeyang YIN, Kai JIN, Xingchen LI. Collaborative planning of multi-UAV trajectories and communication strategies considering channel resource constraints [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(18): 331837-331837.
[13]	Chengxi WANG, Li ZHOU, Xiaolin SUN, Xiaobo ZHANG, Zhanxue WANG. Multi-dimensional simulation between serpentine nozzle and turbofan based on neural network [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(15): 130791-130791.
[14]	Yu WANG, Zhipeng XIE, Yongjian TIAN, Guanglei MENG. Distributed UAV formation control with virtual structure guided reinforcement learning [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(15): 331354-331354.
[15]	Wei CHEN, Lulu LI, Dong CHEN, Shaohui ZHANG, Yafei LI, Ke WANG, Yuanyuan JIN, Mingliang XU. Multi-aircraft cooperative decision-making methods driven by differentiated support demands for carrier-based aircraft [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(13): 531274-531274.

Distributed multi-agent coalition task allocation strategy for single pilot operation mode based on DQN

RichHTML

PDF (PC)

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 20

References 26

Related Articles 15

Recommended Articles

Metrics

Comments