ACTA AERONAUTICAET ASTRONAUTICA SINICA >
Multi-agent communication cooperation based on deep reinforcement learning and information theory
Received date: 2023-11-10
Revised date: 2023-12-06
Accepted date: 2024-02-28
Online published: 2024-03-14
Supported by
National Natural Science Foundation of China(61673084);2021 Liaoning Provincial Department of Education Project(LJKZ1180)
Effective explicit communication among agents in a multi-agent system can increase their capacity for cooperation. However, existing communication strategies typically use the agents’ local observations as the communication content directly, and the communication objects are usually fixed with a certain topology structure. On the one hand, these strategies are difficult to adapt to changes in tasks and environments, which causes uncertainty in the communication process. On the other hand, the communication objects and contents lack focus, resulting in some resource waste and lower communication effectiveness. To address the issues above, this paper proposes an approach that integrates deep reinforcement learning and information theory to realize multi-agent adaptive communication mechanism. The approach uses a prior network to allow the agent to dynamically choose the object, then utilizes the constraints of mutual information and the information bottleneck theory to effectively filter redundant information. Finally, the agent summarizes its own and received information to extract more effective information. The method proposed is demonstrated to improve the stability and interaction efficiency of multi-agent systems compared to other methods through cooperative navigation and traffic junction environments.
Bing GAO , Zhejie ZHANG , Qijie ZOU , Zhiguo LIU , Xiling ZHAO . Multi-agent communication cooperation based on deep reinforcement learning and information theory[J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2024 , 45(18) : 329862 -329862 . DOI: 10.7527/S1000-6893.2024.29862
1 | GRONAUER S, DIEPOLD K. Multi-agent deep reinforcement learning: A survey[J]. Artificial Intelligence Review, 2022, 55(2): 895-943. |
2 | HUANG S Y, ZHANG H B, HUANG Z Y. Multi-UAV collision avoidance using multi-agent reinforcement learning with counterfactual credit assignment[DB/OL]. arXiv preprint: 2204.08594, 2022. |
3 | 魏琳慧, 刘国文, 刘雨, 等. 基于深度强化学习的卫星互联网路由优化研究[J]. 天地一体化信息网络, 2022, 3(3): 65-71. |
WEI L H, LIU G W, LIU Y, et al. Research on routing optimization in satellite Internet based on deep reinforcement learning[J]. Space-Integrated-Ground Information Networks, 2022, 3(3): 65-71 (in Chinese). | |
4 | CHEN J Y, MA R D, OYEKAN J. A deep multi-agent reinforcement learning framework for autonomous aerial navigation to grasping points on loads[J]. Robotics and Autonomous Systems, 2023, 167: 104489. |
5 | KRAEMER L, BANERJEE B. Multi-agent reinforcement learning as a rehearsal for decentralized planning[J]. Neurocomputing, 2016, 190: 82-94. |
6 | ZHU C X, DASTANI M, WANG S H. A survey of multi-agent reinforcement learning with communication[DB/OL]. arXiv preprint: 2203.08975, 2022. |
7 | SUKHBAATAR S, SZLAM A, FERGUS R. Learning multiagent communication with backpropagation[DB/OL]. arXiv preprint: 1605.07736, 2016. |
8 | DING Z L, HUANG T J, LU Z Q. Learning individually inferred communication for multi-agent cooperation[J]. Advances in Neural Information Processing Systems, 2020, 33: 22069-22079. |
9 | KIM D, MOON S, HOSTALLERO D, et al. Learning to schedule communication in multi-agent reinforcement learning[DB/OL]. arXiv preprint: 1902.01554, 2019. |
10 | JIANG J C, LU Z Q. Learning attentional communication for multi-agent cooperation[DB/OL]. arXiv: , 2018. |
11 | DAS A, GERVET T, ROMOFF J, et al. TarMAC: Targeted multi-agent communication[C]∥International Conference on Machine Learning, 2019: 1538-1546. |
12 | SINGH A, JAIN T, SUKHBAATAR S. Learning when to communicate at scale in multiagent cooperative and competitive tasks[DB/OL]. arXiv preprint: 1812.09755, 2018. |
13 | HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997, 9(8): 1735-1780. |
14 | WANG T H, WANG J H, ZHENG C Y, et al. Learning nearly decomposable value functions via communication minimization[DB/OL]. arXiv preprint: 1910.05366, 2019. |
15 | ZHANG S Q, ZHANG Q, LIN J Y. Efficient communication in multi-agent reinforcement learning via variance based control[DB/OL]. arXiv preprint: 1909.02682, 2019. |
16 | ZHANG S Q, LIN J Y, ZHANG Q. Succinct and robust multi-agent communication with temporal message control[J]. Advances in Neural Information Processing Systems, 2020, 33: 17271-17282. |
17 | LOWE R, WU Y, TAMAR A, et al. Multi-agent actor-critic for mixed cooperative-competitive environments[DB/OL]. arXiv preprint: 1706.02275, 2017. |
18 | MAO H Y, ZHANG Z C, XIAO Z, et al. Learning multi-agent communication with double attentional deep reinforcement learning[J]. Autonomous Agents and Multi-Agent Systems, 2020, 34(1): 32. |
19 | NIU Y, PALEJA R, GOMBOLAY M. MAGIC: Multi-agent graph-attention communication[C]∥Mair2 Workshop at International Conference on Computer Vision (ICCV), 2021. |
20 | MAO H Y, ZHANG Z C, XIAO Z, et al. Learning agent communication under limited bandwidth by message pruning[C]∥Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(4): 5142-5149. |
21 | 邹启杰, 汤宇, 高兵, 等. 深度强化学习下的多智能体思考型半多轮通信网络[J/OL]. 控制理论与应用, 2023: 1-9. (2023-12-14). . |
ZOU Q J, TANG Y, GAO B, et al. The thinking communication network with semi-multiple communication cycles under the multi-agent deep reinforcement learning[J/OL]. Control Theory & Applications, 2023: 1-9. (2023-12-14). (in Chinese). | |
22 | LAURI M, HSU D, PAJARINEN J. Partially observable Markov decision processes in robotics: A survey[J]. IEEE Transactions on Robotics, 2023, 39(1): 21-40. |
23 | MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518: 529-533. |
24 | CHO K, VAN MERRIENBOER B, GULCEHRE C, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation[DB/OL]. arXiv preprint: 1406.1078, 2014. |
25 | WEVERS K, LU M. V2X Communication for ITS-from IEEE 802.11 p Towards 5G[J]. IEEE 5G Tech Focus, 2017, 1(2): 5-10. |
26 | TISHBY N, ZASLAVSKY N. Deep learning and the information bottleneck principle[C]∥2015 IEEE Information Theory Workshop (ITW). Piscataway: IEEE Press, 2015: 1-5. |
27 | YANG Y D, HAO J Y, LIAO B, et al. Qatten: A general framework for cooperative multiagent reinforcement learning[DB/OL]. arXiv preprint: 2002.03939, 2020. |
28 | BROCKMAN G, CHEUNG V, PETTERSSON L, et al. OpenAI gym[DB/OL]. arXiv preprint: 1606.01540, 2016. |
/
〈 |
|
〉 |