Swarm Intelligence and Cooperative Control

Multiagent opponent modeling with incompleted information

  • Youpeng DENG ,
  • Jiaxuan FAN ,
  • Yan ZHENG ,
  • Zhenya WANG ,
  • Yongliang LYU ,
  • Yuxiao LI
Expand
  • 1.Academy of Aerospace Science and Innovation,Beijing 102600,China
    2.College of Intelligence and Computing,Tianjin University,Tianjin 300072,China

Received date: 2023-10-27

  Revised date: 2023-11-20

  Accepted date: 2023-12-20

  Online published: 2024-01-04

Supported by

National Natural Science Foundation of China(62106172);Xiaomi Young Talents Program

Abstract

The goal of opponent modeling is to model the opponent’s strategy to maximize the payoff of the main intelligent agent. Most previous work has failed to effectively handle the situations where opponent information is limited. To address this problem, we propose an approach for opponent modeling with incomplete information (OMII), which is capable of extracting cross-epoch opponent strategy representations by solely using self-observations in case of limited opponent information. OMII introduces a novel policy-based data augmentation method that, through contrastive learning, offline learns opponent strategy representations and employs them as additional input to train a general responsive policy. During online testing, OMII extracts opponent strategy representations from recent historical trajectory data and dynamically responds to opponent strategies in conjunction with the general policy. Moreover, OMII ensures a lower bound on the expected payoff by balancing conservatism and exploitation. Experimental results demonstrate that even in scenarios with limited opponent information, OMII can accurately extract opponent strategy representations, and possesses certain generalization capabilities for unknown strategies, outperforming existing opponent modeling algorithms in performance.

Cite this article

Youpeng DENG , Jiaxuan FAN , Yan ZHENG , Zhenya WANG , Yongliang LYU , Yuxiao LI . Multiagent opponent modeling with incompleted information[J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2023 , 44(S2) : 729782 -729782 . DOI: 10.7527/S1000-6893.2023.29782

References

1 FOERSTER J N, CHEN R Y, AL-SHEDIVAT M, et al. Learning with opponent-learning awareness [DB/OL]. arXiv preprint: 1709.04326, 2017.
2 HE H, BOYD-GRABER J, KWOK K, et al. Opponent modeling in deep reinforcement learning[C]∥ Proceedings of the 33rd International Conference on International Conference on Machine Learning-Volume 48. New York: ACM, 2016: 1804–1813.
3 HONG Z W, SU S Y, SHANN T Y, et al. A deep policy inference Q-network for multi-agent systems[DB/OL]. arXiv preprint: 1712.07893, 2017.
4 ZHENG Y, MENG Z, HAO J, et al. A deep Bayesian policy reuse approach against non-stationary agents[C]∥ NeurIPS, 2018: 31.
5 ZHENG Y, MENG Z P, HAO J Y, et al. Weighted double deep multiagent reinforcement learning in stochastic cooperative environments[M]∥ Lecture Notes in Computer Science. Cham: Springer International Publishing, 2018: 421-429.
6 ZHENG Y, HAO J Y, ZHANG Z Z, et al. Efficient multiagent policy optimization based on weighted estimators in stochastic cooperative environments[J]. Journal of Computer Science and Technology202035(2): 268-280.
7 郑岩, 郝建业, 章宗长, 等. 一种多智能体合作式环境下基于带权估计的策略优化算法[J]. 计算机科学技术学报202035(2): 268-280 (in Chinese).
  ZHENG Y, HAO J Y, ZHANG Z Z, et al. A strategy optimization algorithm based on weighted estimation in a multi-agent cooperative environment [J]. Journal of Computer Science and Technology202035(2): 268-280.
8 ZHENG Y, HAO J Y, ZHANG Z Z, et al. Efficient policy detecting and reusing for non-stationarity in Markov games[J]. Autonomous Agents and Multi-Agent Systems202035(1): 2.
9 HAO X T, WANG W X, MAO H Y, et al. API: Boosting multi-agent reinforcement learning via agent-permutation-invariant networks[DB/OL]. arXiv preprint: 2203.05285, 2022.
10 SUN J W, ZHENG Y, HAO J Y, et al. Continuous multiagent control using collective behavior entropy for large-scale home energy management[DB/OL]. arXiv preprint2005.10000, 2020.
11 LI P Y, TANG H Y, YANG T P, et al. PMIC: Improving multi-agent reinforcement learning with progressive mutual information collaboration[DB/OL]. arXiv preprint: 2203.08553, 2022.
12 RAILEANU R, DENTON E, SZLAM A, et al. Modeling others using oneself in multi-agent reinforcement learning[DB/OL]. arXiv preprint1802.09640, 2018.
13 ROSMAN B, HAWASLY M, RAMAMOORTHY S. Bayesian policy reuse[J]. Machine Learning2016104(1): 99-127.
14 HERNANDEZ-LEAL P, TAYLOR M E, ROSMAN B S, et al. Identifying and tracking switching, non-stationary opponents: A Bayesian approach[C]∥ Workshop on Multiagent Interaction without Prior Coordination (MIPC) at AAAI-16, 2016: 560-566.
15 YANG T P, HAO J Y, MENG Z P, et al. Towards efficient detection and optimal response against sophisticated opponents[DB/OL]. arXiv preprint1809.04240, 2018.
16 GANZFRIED S, WANG K A, CHISWICK M. Bayesian opponent modeling in multiplayer imperfect-information games[DB/OL]. arXiv preprint: 2212.06027, 2022.
17 VON KüGELGEN J, USTYUZHANINOV I, GEHLER P, et al. Towards causal generative scene models via competition of experts[DB/OL]. arXiv preprint2004.12906, 2020.
18 罗俊仁, 张万鹏, 袁唯淋, 等. 面向多智能体博弈对抗的对手建模框架[J]. 系统仿真学报202234(9): 1941-1955.
  LUO J R, ZHANG W P, YUAN W L, et al. Research on opponent modeling framework for multi-agent game confrontation[J]. Journal of System Simulation202234(9): 1941-1955 (in Chinese).
19 吴天栋, 石英. 不完美信息博弈中对手模型的研究[J]. 河南科技大学学报(自然科学版)201940(1): 54-59, 7.
  WU T D, SHI Y. Research on opponent modeling in imperfect information games[J]. Journal of Henan University of Science and Technology (Natural Science)201940(1): 54-59, 7 (in Chinese).
20 VAN DEN OORD A, LI Y Z, VINYALS O. Representation learning with contrastive predictive coding[DB/OL]. arXiv preprint1807.03748, 2018.
21 HE K M, FAN H Q, WU Y X, et al. Momentum contrast for unsupervised visual representation learning[C]∥ 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE Press, 2020: 9726-9735.
22 CHEN T, KORNBLITH S, NOROUZI M, et al. A simple framework for contrastive learning of visual representations[DB/OL]. arXiv preprint2002.05709, 2020.
23 AUER P, CESA-BIANCHI N, FREUND Y, et al. The nonstochastic multiarmed bandit problem[J]. SIAM Journal on Computing200232(1): 48-77.
24 FU H B, TIAN Y, YU H X, et al. Greedy when sure and conservative when uncertain about the opponents[C]∥ ICML, 2022: 6829-6848.
Outlines

/