导航

Acta Aeronautica et Astronautica Sinica ›› 2023, Vol. 44 ›› Issue (S2): 729782-729782.doi: 10.7527/S1000-6893.2023.29782

• Swarm Intelligence and Cooperative Control • Previous Articles     Next Articles

Multiagent opponent modeling with incompleted information

Youpeng DENG1,2, Jiaxuan FAN1, Yan ZHENG2(), Zhenya WANG1, Yongliang LYU2, Yuxiao LI2   

  1. 1.Academy of Aerospace Science and Innovation,Beijing 102600,China
    2.College of Intelligence and Computing,Tianjin University,Tianjin 300072,China
  • Received:2023-10-27 Revised:2023-11-20 Accepted:2023-12-20 Online:2023-12-25 Published:2024-01-04
  • Contact: Yan ZHENG E-mail:yanzheng@tju.edu.cn
  • Supported by:
    National Natural Science Foundation of China(62106172);Xiaomi Young Talents Program

Abstract:

The goal of opponent modeling is to model the opponent’s strategy to maximize the payoff of the main intelligent agent. Most previous work has failed to effectively handle the situations where opponent information is limited. To address this problem, we propose an approach for opponent modeling with incomplete information (OMII), which is capable of extracting cross-epoch opponent strategy representations by solely using self-observations in case of limited opponent information. OMII introduces a novel policy-based data augmentation method that, through contrastive learning, offline learns opponent strategy representations and employs them as additional input to train a general responsive policy. During online testing, OMII extracts opponent strategy representations from recent historical trajectory data and dynamically responds to opponent strategies in conjunction with the general policy. Moreover, OMII ensures a lower bound on the expected payoff by balancing conservatism and exploitation. Experimental results demonstrate that even in scenarios with limited opponent information, OMII can accurately extract opponent strategy representations, and possesses certain generalization capabilities for unknown strategies, outperforming existing opponent modeling algorithms in performance.

Key words: decision making intelligence, reinforcement learning, opponent modeling, constrastive learning, multiagent system

CLC Number: