1 |
FOERSTER J N, CHEN R Y, AL-SHEDIVAT M, et al. Learning with opponent-learning awareness [DB/OL]. arXiv preprint: 1709.04326, 2017.
|
2 |
HE H, BOYD-GRABER J, KWOK K, et al. Opponent modeling in deep reinforcement learning[C]∥ Proceedings of the 33rd International Conference on International Conference on Machine Learning-Volume 48. New York: ACM, 2016: 1804–1813.
|
3 |
HONG Z W, SU S Y, SHANN T Y, et al. A deep policy inference Q-network for multi-agent systems[DB/OL]. arXiv preprint: 1712.07893, 2017.
|
4 |
ZHENG Y, MENG Z, HAO J, et al. A deep Bayesian policy reuse approach against non-stationary agents[C]∥ NeurIPS, 2018: 31.
|
5 |
ZHENG Y, MENG Z P, HAO J Y, et al. Weighted double deep multiagent reinforcement learning in stochastic cooperative environments[M]∥ Lecture Notes in Computer Science. Cham: Springer International Publishing, 2018: 421-429.
|
6 |
ZHENG Y, HAO J Y, ZHANG Z Z, et al. Efficient multiagent policy optimization based on weighted estimators in stochastic cooperative environments[J]. Journal of Computer Science and Technology, 2020, 35(2): 268-280.
|
7 |
郑岩, 郝建业, 章宗长, 等. 一种多智能体合作式环境下基于带权估计的策略优化算法[J]. 计算机科学技术学报, 2020, 35(2): 268-280 (in Chinese).
|
|
ZHENG Y, HAO J Y, ZHANG Z Z, et al. A strategy optimization algorithm based on weighted estimation in a multi-agent cooperative environment [J]. Journal of Computer Science and Technology, 2020, 35(2): 268-280.
|
8 |
ZHENG Y, HAO J Y, ZHANG Z Z, et al. Efficient policy detecting and reusing for non-stationarity in Markov games[J]. Autonomous Agents and Multi-Agent Systems, 2020, 35(1): 2.
|
9 |
HAO X T, WANG W X, MAO H Y, et al. API: Boosting multi-agent reinforcement learning via agent-permutation-invariant networks[DB/OL]. arXiv preprint: 2203.05285, 2022.
|
10 |
SUN J W, ZHENG Y, HAO J Y, et al. Continuous multiagent control using collective behavior entropy for large-scale home energy management[DB/OL]. arXiv preprint: 2005.10000, 2020.
|
11 |
LI P Y, TANG H Y, YANG T P, et al. PMIC: Improving multi-agent reinforcement learning with progressive mutual information collaboration[DB/OL]. arXiv preprint: 2203.08553, 2022.
|
12 |
RAILEANU R, DENTON E, SZLAM A, et al. Modeling others using oneself in multi-agent reinforcement learning[DB/OL]. arXiv preprint: 1802.09640, 2018.
|
13 |
ROSMAN B, HAWASLY M, RAMAMOORTHY S. Bayesian policy reuse[J]. Machine Learning, 2016, 104(1): 99-127.
|
14 |
HERNANDEZ-LEAL P, TAYLOR M E, ROSMAN B S, et al. Identifying and tracking switching, non-stationary opponents: A Bayesian approach[C]∥ Workshop on Multiagent Interaction without Prior Coordination (MIPC) at AAAI-16, 2016: 560-566.
|
15 |
YANG T P, HAO J Y, MENG Z P, et al. Towards efficient detection and optimal response against sophisticated opponents[DB/OL]. arXiv preprint: 1809.04240, 2018.
|
16 |
GANZFRIED S, WANG K A, CHISWICK M. Bayesian opponent modeling in multiplayer imperfect-information games[DB/OL]. arXiv preprint: 2212.06027, 2022.
|
17 |
VON KÜGELGEN J, USTYUZHANINOV I, GEHLER P, et al. Towards causal generative scene models via competition of experts[DB/OL]. arXiv preprint: 2004.12906, 2020.
|
18 |
罗俊仁, 张万鹏, 袁唯淋, 等. 面向多智能体博弈对抗的对手建模框架[J]. 系统仿真学报, 2022, 34(9): 1941-1955.
|
|
LUO J R, ZHANG W P, YUAN W L, et al. Research on opponent modeling framework for multi-agent game confrontation[J]. Journal of System Simulation, 2022, 34(9): 1941-1955 (in Chinese).
|
19 |
吴天栋, 石英. 不完美信息博弈中对手模型的研究[J]. 河南科技大学学报(自然科学版), 2019, 40(1): 54-59, 7.
|
|
WU T D, SHI Y. Research on opponent modeling in imperfect information games[J]. Journal of Henan University of Science and Technology (Natural Science), 2019, 40(1): 54-59, 7 (in Chinese).
|
20 |
VAN DEN OORD A, LI Y Z, VINYALS O. Representation learning with contrastive predictive coding[DB/OL]. arXiv preprint: 1807.03748, 2018.
|
21 |
HE K M, FAN H Q, WU Y X, et al. Momentum contrast for unsupervised visual representation learning[C]∥ 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE Press, 2020: 9726-9735.
|
22 |
CHEN T, KORNBLITH S, NOROUZI M, et al. A simple framework for contrastive learning of visual representations[DB/OL]. arXiv preprint: 2002.05709, 2020.
|
23 |
AUER P, CESA-BIANCHI N, FREUND Y, et al. The nonstochastic multiarmed bandit problem[J]. SIAM Journal on Computing, 2002, 32(1): 48-77.
|
24 |
FU H B, TIAN Y, YU H X, et al. Greedy when sure and conservative when uncertain about the opponents[C]∥ ICML, 2022: 6829-6848.
|