Acta Aeronautica et Astronautica Sinica ›› 2024, Vol. 45 ›› Issue (4): 328723.doi: 10.7527/S1000-6893.2023.28723
• Electronics and Electrical Engineering and Control • Previous Articles Next Articles
Shengzhe SHAN1,2, Weiwei ZHANG1(
)
Received:2023-03-21
Revised:2023-06-12
Accepted:2023-08-29
Online:2024-02-25
Published:2023-09-01
Contact:
Weiwei ZHANG
E-mail:aeroelastic@nwpu.edu.cn
Supported by:CLC Number:
Shengzhe SHAN, Weiwei ZHANG. Air combat intelligent decision-making method based on self-play and deep reinforcement learning[J]. Acta Aeronautica et Astronautica Sinica, 2024, 45(4): 328723.
Table 3
Feature extraction of observation
| 分类 | 特征名称 | 数值界限 | 维度 |
|---|---|---|---|
| 我机坐标 | 东西坐标/m | (0,30 000) | 1 |
| 南北坐标/m | (0,30 000) | 1 | |
| 飞行高度/m | (0,12 000) | 1 | |
| 飞行状态 | 飞行空速/(m·s-1) | (0,500) | 2 |
| 飞行表速/(m·s-1) | (0,500) | 2 | |
| 马赫数 | (0,1.6) | 2 | |
| 纵向过载 | (-1,2) | 2 | |
| 法向过载 | (-4,9) | 2 | |
| 侧向过载 | (-1,1) | 2 | |
| 转弯角速率/((°)·s-1) | (0,50)* | 2 | |
| 姿态四元数 | (-1,1) | 8 | |
| 几何态势 | 敌我距离向量/m | (-8 000,8 000)* | 4 |
| 敌我速度向量/m | (-500,500)* | 6 | |
| 敌我高度差/m | (-1 000,1 000)* | 1 | |
| 机炮瞄准系数 | (0,1) | 2 | |
| 水平离轴角/(°) | (-180,180) | 2 | |
| 离轴角/(°) | (-90,90) | 2 | |
| 雷达扫描范围/(°) | (-20,20) | 8 | |
| 导弹扫描范围/(°) | (-20,20) | 8 | |
| 进入角/(°) | (0,180) | 2 | |
| 天线偏角/(°) | (0,180) | 2 | |
| 导弹最大距离/m | (0,8 000)* | 2 | |
| 导弹最小距离/m | (0,3 000)* | 2 | |
| 敌我距离标量/m | (0,8 000)* | 1 | |
| 总计 | 67 | ||
Table 4
Summary of final reward
奖励 类型 | 博弈 分类 | 奖励名称 | 权重分配 | 奖励 特性 | |
|---|---|---|---|---|---|
| 我方 | 敌方 | ||||
结果 奖励 | 零和 博弈 | 导弹杀敌 | 1 | -1 | 稀疏 |
| 机炮杀敌 | 1 | -1 | |||
| 敌机撞地 | 1 | -1 | |||
| 飞出边界 | -1 | 1 | |||
| 相撞/互杀 | 0 | 0 | |||
事件 奖励 | 零和 博弈 | 雷达照射 | 0.05 | -0.05 | 稀疏 |
| 雷达锁定 | 0.2 | -0.2 | |||
| 导弹锁敌 | 0.3 | -0.3 | |||
| 机炮瞄准 | 0.5 | -0.5 | |||
| 达成发射 | 0.55 | -0.55 | |||
过程 奖励 | 零和 博弈 | 角度优势 | 0.005 | -0.005 | 稠密 |
| 能量优势 | 0.008 | -0.008 | |||
相同 利益 | 距离奖励 | -0.000 1 | -0.000 1 | ||
| 高度差奖励 | -0.000 1 | -0.000 1 | |||
边界 奖励 | 零和博弈 | 控制区 | 0.1 | -0.1 | 连续 |
| 非博弈 | 空域坐标 | 0.8 | 0.8 | ||
| 飞行马赫数 | 0.8 | 0.8 | |||
| 飞行表速 | 0.8 | 0.8 | |||
| 双机距离 | 0.8 | 0.8 | |||
| 1 | 杨伟. 关于未来战斗机发展的若干讨论[J]. 航空学报, 2020, 41(6): 524377. |
| YANG W. Development of future fighters[J]. Acta Aeronautica et Astronautica Sinica, 2020, 41(6): 524377 (in Chinese). | |
| 2 | Defense Advanced Research Projects Agency. Alpha dog fight trials go virtual for final event[EB/OL]. (2020-08-07) [2021-03-10]. :. |
| 3 | 董一群, 艾剑良. 自主空战技术中的机动决策:进展与展望[J]. 航空学报, 2020, 41(S2): 724264. |
| DONG Y Q, AI J L. Decision making in autonomous air combat: review and prospects[J]. Acta Aeronautica et Astronautica Sinica, 2020, 41(S2): 724264 (in Chinese). | |
| 4 | SILVER D, HUANG A, MADDISON C J, et al. Mastering the game of Go with deep neural networks and tree search[J]. Nature, 2016, 529(7587): 484-489. |
| 5 | SILVER D, SCHRITTWIESER J, SIMONYAN K, et al. Mastering the game of go without human knowledge[J]. Nature, 2017, 550(7676): 354-359. |
| 6 | SILVER D, HUBERT T, SCHRITTWIESER J, et al. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play[J]. Science, 2018, 362(6419): 1140-1144. |
| 7 | JUMPER J, EVANS R, PRITZEL A, et al. Highly accurate protein structure prediction with AlphaFold[J]. Nature, 2021, 596(7873): 583-589. |
| 8 | FAWZI A, BALOG M, HUANG A, et al. Discovering faster matrix multiplication algorithms with reinforcement learning[J]. Nature, 2022, 610(7930): 47-53. |
| 9 | SILVER D, SINGH S, PRECUP D, et al. Reward is enough[J]. Artificial Intelligence, 2021, 299: 103535. |
| 10 | VINYALS O, BABUSCHKIN I, CZARNECKI W M, et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning[J]. Nature, 2019, 575(7782): 350-354. |
| 11 | VINYALS O, EWALDS T, BARTUNOV S, et al. StarCraft II: A new challenge for reinforcement learning[DB/OL]. 2017:arXiv preprint:1708.04782. |
| 12 | OpenAI. OpenAI five[EB/OL]. 2018. . |
| 13 | BAKER B, KANITSCHEIDER I, MARKOV T M, et al. Emergent tool use from multi-agent autocurricula[DB/OL]. arXiv preprint:1909.07528, 2020. |
| 14 | OH I, RHO S, MOON S, et al. Creating pro-level AI for a real-time fighting game using deep reinforcement learning[J]. IEEE Transactions on Games, 2022, 14(2): 212-220. |
| 15 | KURNIAWAN B, VAMPLEW P, PAPASIMEON M, et al. An empirical study of reward structures for actor-critic reinforcement learning in air combat manoeuvring simulation[C]∥ Australasian Joint Conference on Artificial Intelligence. Cham: Springer, 2019: 54-65. |
| 16 | YANG Q M, ZHU Y, ZHANG J D, et al. UAV air combat autonomous maneuver decision based on DDPG algorithm[C]∥ 2019 IEEE 15th International Conference on Control and Automation (ICCA). Piscataway: IEEE Press, 2019: 37-42. |
| 17 | YANG Q M, ZHANG J D, SHI G Q, et al. Maneuver decision of UAV in short-range air combat based on deep reinforcement learning[J]. IEEE Access, 2019, 8: 363-378. |
| 18 | PIAO H Y, SUN Z X, MENG G L, et al. Beyond-visual-range air combat tactics auto-generation by reinforcement learning[C]∥ 2020 International Joint Conference on Neural Networks (IJCNN). Piscataway: IEEE Press, 2020: 1-8. |
| 19 | 单圣哲, 杨孟超, 张伟伟, 等. 自主空战连续决策方法[J]. 航空工程进展, 2022, 13(5): 47-58. |
| SHAN S Z, YANG M C, ZHANG W W, et al. Continuous decision-making method for autonomous air combat[J]. Advances in Aeronautical Science and Engineering, 2022, 13(5): 47-58 (in Chinese). | |
| 20 | SUTTON R S, BARTO A G. Reinforcement learning: An introduction[M]. 2nd Ed.Cambridge: MIT Press, 2018. |
| 21 | MATHEW A, AMUDHA P, SIVAKUMARI S. Deep learning techniques: an overview[C]∥International Conference on Advanced Machine Learning Technologies and Applications. Singapore: Springer, 2021: 599-608. |
| 22 | MNIH V, KAVUKCUOGLU K, SILVER D, et al. Playing atari with deep reinforcement learning[DB/OL]. arXiv preprint: 1312.5602, 2013. |
| 23 | Github. Unity technologies[EB/OL].(2022-12-14). . |
| 24 | SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms [DB/OL]. arXiv preprint: 1707.06347, 2017. |
| 25 | VON NEUMANN J, MORGENSTERN O. Theory of games and economic behavior: 60th anniversary commemorative edition[M]. Princeton: Princeton University Press, 2007. |
| 26 | SHAPLEY L S. Stochastic games[J]. Proceedings of the National Academy of Sciences of the United States of America, 1953, 39(10): 1095-1100. |
| 27 | LITTMAN M L. Markov games as a framework for multi-agent reinforcement learning[M]∥KAUFMANN M. Machine learning proceedings. Amsterdam: Elsevier, 1994: 157-163. |
| 28 | BROWN G W. Iterative solution of games by fictitious play[J]. Activity Analysis of Production and Allocation, 1951, 13(1): 374-376. |
| 29 | SCHRITTWIESER J, ANTONOGLOU I, HUBERT T, et al. Mastering Atari, Go, chess and shogi by planning with a learned model[J]. Nature, 2020, 588(7839): 604-609. |
| 30 | ZHA D C, XIE J R, MA W Y, et al. DouZero: Mastering DouDizhu with self-play deep reinforcement learning[DB/OL]. arXiv preprint: 2106.06135, 2021. |
| 31 | BANSAL T, PACHOCKI J, SIDOR S, et al. Emergent complexity via multi-agent competition[DB/OL]. arXiv preprint: 1710.03748, 2017. |
| 32 | JADERBERG M, CZARNECKI W M, DUNNING I, et al. Human-level performance in 3D multiplayer games with population-based reinforcement learning[J]. Science, 2019, 364(6443): 859-865. |
| 33 | JULIANI A, BERGES V P, VCKAY E, et al. Unity: a general platform for intelligent agentsV[DB/OL]. arXiv preprint: 1809.02627, 2020. |
| 34 | BONANNI P. The art of the kill: A comprehensive guide to modern air combat[M]. Boulder: Spectrum HoloByte, 1993. |
| 35 | 吴文海, 周思羽, 高丽, 等. 基于导弹攻击区的超视距空战态势评估改进[J]. 系统工程与电子技术, 2011, 33(12): 2679-2685. |
| WU W H, ZHOU S Y, GAO L, et al. Improvements of situation assessment for beyond-visual-range air combat based on missile launching envelope analysis[J]. Systems Engineering and Electronics, 2011, 33(12): 2679-2685 (in Chinese). | |
| 36 | YANG Y D, WANG J. An overview of multi-agent reinforcement learning from game theoretical perspective[DB/OL]. arXiv preprint: 2011.00583v3, 2021. |
| 37 | SCHULMAN J, MORITZ P, LEVINE S, et al. High-dimensional continuous control using generalized advantage estimation[DB/OL]. arXiv preprint: 1506.02438, 2015. |
| 38 | Technologies Unity. Unity ML-agents toolkit[EB/OL]. (2023-07-10). |
| 39 | JADERBERG M, CZARNECKI W M, DUNNING I, et al. Human-level performance in 3D multiplayer games with population-based reinforcement learning[J]. Science, 2019, 364(6443): 859-865. |
| 40 | Wikipedia. Elo rating system[EB/OL]. 2021. . |
| 41 | Github.NWPU-SSZ[EB/OL].(2023-08-28). . |
| [1] | Kaifang WAN, Zhilin WU, Yunhui WU, Haozhi QIANG, Yibo WU, Bo LI. Cooperative location of multiple UAVs with deep reinforcement learning in GPS-denied environment [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(8): 331024-331024. |
| [2] | Lingfeng JIANG, Xinkai LI, Hai ZHANG, Hanwei LI, Hongli ZHANG. Mapless navigation of UAVs in dynamic environments based on an improved TD3 algorithm [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(8): 331035-331035. |
| [3] | Henghui LI, Qianhui LIN, Taofeng HAN, Yang HE. Close-range air combat model based on energy maneuverability and its applications [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(7): 330863-330863. |
| [4] | Guanghui WU, Jing WANG, Hairun XIE, Tuliang MA, Qiang MIAO, Jixin XIANG, Miao ZHANG. Data and knowledge-enabled intelligent aerodynamic design for civil aircraft [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(5): 531485-531485. |
| [5] | Zhenzhen HU, Shaofei CHEN, Peng LI, Jiaxing CHEN, Yu ZHANG, Jing CHEN. Opponent strategy cognition of one-on-one BVR air combat based on explicit opponent modeling [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(4): 330711-330711. |
| [6] | Min YANG, Guanjun LIU, Ziyuan ZHOU. Control of lunar landers based on secure reinforcement learning [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(3): 630553-630553. |
| [7] | Chen WANG, Caisheng WEI, Zeyang YIN, Kai JIN, Xingchen LI. Collaborative planning of multi-UAV trajectories and communication strategies considering channel resource constraints [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(18): 331837-331837. |
| [8] | Yu WANG, Zhipeng XIE, Yongjian TIAN, Guanglei MENG. Distributed UAV formation control with virtual structure guided reinforcement learning [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(15): 331354-331354. |
| [9] | Wei CHEN, Lulu LI, Dong CHEN, Shaohui ZHANG, Yafei LI, Ke WANG, Yuanyuan JIN, Mingliang XU. Multi-aircraft cooperative decision-making methods driven by differentiated support demands for carrier-based aircraft [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(13): 531274-531274. |
| [10] | Xudong CHEN, Qiqi CHEN, Yizhe LUO, Jiabao WANG, Mingliang XU. Dynamic parallel scheduling of heterogeneous carrier-based aircraft deck support operations [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(13): 531329-531329. |
| [11] | Zheng WANG, Hua WANG, Keke CUI, Chaochao LI, Junnan LIU, Mingliang XU. Locally guided reinforcement learning for autonomous dispatching of carrier-based aircraft [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(13): 531333-531333. |
| [12] | Wenhui LING, Chunhui MU, Lingcong NIE, Xian DU, Ximing SUN. Improved DDPG-based multipoint pressure distribution control of variable geometry scramjet combustor at wide range velocities [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(12): 131092-131092. |
| [13] | Yifeng WANG, Yiming PENG, Long LI, Xiaohui WEI, Hong NIE. DQN-based active arrest and recovery technique for UAVs [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(12): 231448-231448. |
| [14] | Zijie YU, Zheng ZHENG, Qingdong LI, Lin GUO, Suping REN, Jian GUO. Trajectory planning for solar-powered UAVs based on deep reinforcement learning [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(12): 331420-331420. |
| [15] | Shusheng CHEN, Muliang JIA, Jiahao LIN, Shiyi JIN, Zhenghong GAO, Yueqing WANG, Zhiqiang MA, Zheng LI, Chenlong DUAN, Jiawei LI. Empowering aircraft technology applications with generative models: Research progress and prospects [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(10): 631194-631194. |
| Viewed | ||||||
|
Full text |
|
|||||
|
Abstract |
|
|||||
Address: No.238, Baiyan Buiding, Beisihuan Zhonglu Road, Haidian District, Beijing, China
Postal code : 100083
E-mail:hkxb@buaa.edu.cn
Total visits: 6658907 Today visits: 1341All copyright © editorial office of Chinese Journal of Aeronautics
All copyright © editorial office of Chinese Journal of Aeronautics
Total visits: 6658907 Today visits: 1341

