Acta Aeronautica et Astronautica Sinica ›› 2024, Vol. 45 ›› Issue (4): 328723-328723.doi: 10.7527/S1000-6893.2023.28723
• Electronics and Electrical Engineering and Control • Previous Articles Next Articles
Shengzhe SHAN1,2, Weiwei ZHANG1()
Received:
2023-03-21
Revised:
2023-06-12
Accepted:
2023-08-29
Online:
2024-02-25
Published:
2023-09-01
Contact:
Weiwei ZHANG
E-mail:aeroelastic@nwpu.edu.cn
Supported by:
CLC Number:
Shengzhe SHAN, Weiwei ZHANG. Air combat intelligent decision-making method based on self-play and deep reinforcement learning[J]. Acta Aeronautica et Astronautica Sinica, 2024, 45(4): 328723-328723.
Table 3
Feature extraction of observation
分类 | 特征名称 | 数值界限 | 维度 |
---|---|---|---|
我机坐标 | 东西坐标/m | (0,30 000) | 1 |
南北坐标/m | (0,30 000) | 1 | |
飞行高度/m | (0,12 000) | 1 | |
飞行状态 | 飞行空速/(m·s-1) | (0,500) | 2 |
飞行表速/(m·s-1) | (0,500) | 2 | |
马赫数 | (0,1.6) | 2 | |
纵向过载 | (-1,2) | 2 | |
法向过载 | (-4,9) | 2 | |
侧向过载 | (-1,1) | 2 | |
转弯角速率/((°)·s-1) | (0,50)* | 2 | |
姿态四元数 | (-1,1) | 8 | |
几何态势 | 敌我距离向量/m | (-8 000,8 000)* | 4 |
敌我速度向量/m | (-500,500)* | 6 | |
敌我高度差/m | (-1 000,1 000)* | 1 | |
机炮瞄准系数 | (0,1) | 2 | |
水平离轴角/(°) | (-180,180) | 2 | |
离轴角/(°) | (-90,90) | 2 | |
雷达扫描范围/(°) | (-20,20) | 8 | |
导弹扫描范围/(°) | (-20,20) | 8 | |
进入角/(°) | (0,180) | 2 | |
天线偏角/(°) | (0,180) | 2 | |
导弹最大距离/m | (0,8 000)* | 2 | |
导弹最小距离/m | (0,3 000)* | 2 | |
敌我距离标量/m | (0,8 000)* | 1 | |
总计 | 67 |
Table 4
Summary of final reward
奖励 类型 | 博弈 分类 | 奖励名称 | 权重分配 | 奖励 特性 | |
---|---|---|---|---|---|
我方 | 敌方 | ||||
结果 奖励 | 零和 博弈 | 导弹杀敌 | 1 | -1 | 稀疏 |
机炮杀敌 | 1 | -1 | |||
敌机撞地 | 1 | -1 | |||
飞出边界 | -1 | 1 | |||
相撞/互杀 | 0 | 0 | |||
事件 奖励 | 零和 博弈 | 雷达照射 | 0.05 | -0.05 | 稀疏 |
雷达锁定 | 0.2 | -0.2 | |||
导弹锁敌 | 0.3 | -0.3 | |||
机炮瞄准 | 0.5 | -0.5 | |||
达成发射 | 0.55 | -0.55 | |||
过程 奖励 | 零和 博弈 | 角度优势 | 0.005 | -0.005 | 稠密 |
能量优势 | 0.008 | -0.008 | |||
相同 利益 | 距离奖励 | -0.000 1 | -0.000 1 | ||
高度差奖励 | -0.000 1 | -0.000 1 | |||
边界 奖励 | 零和博弈 | 控制区 | 0.1 | -0.1 | 连续 |
非博弈 | 空域坐标 | 0.8 | 0.8 | ||
飞行马赫数 | 0.8 | 0.8 | |||
飞行表速 | 0.8 | 0.8 | |||
双机距离 | 0.8 | 0.8 |
1 | 杨伟. 关于未来战斗机发展的若干讨论[J]. 航空学报, 2020, 41(6): 524377. |
YANG W. Development of future fighters[J]. Acta Aeronautica et Astronautica Sinica, 2020, 41(6): 524377 (in Chinese). | |
2 | Defense Advanced Research Projects Agency. Alpha dog fight trials go virtual for final event[EB/OL]. (2020-08-07) [2021-03-10]. :. |
3 | 董一群, 艾剑良. 自主空战技术中的机动决策:进展与展望[J]. 航空学报, 2020, 41(S2): 724264. |
DONG Y Q, AI J L. Decision making in autonomous air combat: review and prospects[J]. Acta Aeronautica et Astronautica Sinica, 2020, 41(S2): 724264 (in Chinese). | |
4 | SILVER D, HUANG A, MADDISON C J, et al. Mastering the game of Go with deep neural networks and tree search[J]. Nature, 2016, 529(7587): 484-489. |
5 | SILVER D, SCHRITTWIESER J, SIMONYAN K, et al. Mastering the game of go without human knowledge[J]. Nature, 2017, 550(7676): 354-359. |
6 | SILVER D, HUBERT T, SCHRITTWIESER J, et al. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play[J]. Science, 2018, 362(6419): 1140-1144. |
7 | JUMPER J, EVANS R, PRITZEL A, et al. Highly accurate protein structure prediction with AlphaFold[J]. Nature, 2021, 596(7873): 583-589. |
8 | FAWZI A, BALOG M, HUANG A, et al. Discovering faster matrix multiplication algorithms with reinforcement learning[J]. Nature, 2022, 610(7930): 47-53. |
9 | SILVER D, SINGH S, PRECUP D, et al. Reward is enough[J]. Artificial Intelligence, 2021, 299: 103535. |
10 | VINYALS O, BABUSCHKIN I, CZARNECKI W M, et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning[J]. Nature, 2019, 575(7782): 350-354. |
11 | VINYALS O, EWALDS T, BARTUNOV S, et al. StarCraft II: A new challenge for reinforcement learning[DB/OL]. 2017:arXiv preprint:1708.04782. |
12 | OpenAI. OpenAI five[EB/OL]. 2018. . |
13 | BAKER B, KANITSCHEIDER I, MARKOV T M, et al. Emergent tool use from multi-agent autocurricula[DB/OL]. arXiv preprint:1909.07528, 2020. |
14 | OH I, RHO S, MOON S, et al. Creating pro-level AI for a real-time fighting game using deep reinforcement learning[J]. IEEE Transactions on Games, 2022, 14(2): 212-220. |
15 | KURNIAWAN B, VAMPLEW P, PAPASIMEON M, et al. An empirical study of reward structures for actor-critic reinforcement learning in air combat manoeuvring simulation[C]∥ Australasian Joint Conference on Artificial Intelligence. Cham: Springer, 2019: 54-65. |
16 | YANG Q M, ZHU Y, ZHANG J D, et al. UAV air combat autonomous maneuver decision based on DDPG algorithm[C]∥ 2019 IEEE 15th International Conference on Control and Automation (ICCA). Piscataway: IEEE Press, 2019: 37-42. |
17 | YANG Q M, ZHANG J D, SHI G Q, et al. Maneuver decision of UAV in short-range air combat based on deep reinforcement learning[J]. IEEE Access, 2019, 8: 363-378. |
18 | PIAO H Y, SUN Z X, MENG G L, et al. Beyond-visual-range air combat tactics auto-generation by reinforcement learning[C]∥ 2020 International Joint Conference on Neural Networks (IJCNN). Piscataway: IEEE Press, 2020: 1-8. |
19 | 单圣哲, 杨孟超, 张伟伟, 等. 自主空战连续决策方法[J]. 航空工程进展, 2022, 13(5): 47-58. |
SHAN S Z, YANG M C, ZHANG W W, et al. Continuous decision-making method for autonomous air combat[J]. Advances in Aeronautical Science and Engineering, 2022, 13(5): 47-58 (in Chinese). | |
20 | SUTTON R S, BARTO A G. Reinforcement learning: An introduction[M]. 2nd Ed.Cambridge: MIT Press, 2018. |
21 | MATHEW A, AMUDHA P, SIVAKUMARI S. Deep learning techniques: an overview[C]∥International Conference on Advanced Machine Learning Technologies and Applications. Singapore: Springer, 2021: 599-608. |
22 | MNIH V, KAVUKCUOGLU K, SILVER D, et al. Playing atari with deep reinforcement learning[DB/OL]. arXiv preprint: 1312.5602, 2013. |
23 | Github. Unity technologies[EB/OL].(2022-12-14). . |
24 | SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms [DB/OL]. arXiv preprint: 1707.06347, 2017. |
25 | VON NEUMANN J, MORGENSTERN O. Theory of games and economic behavior: 60th anniversary commemorative edition[M]. Princeton: Princeton University Press, 2007. |
26 | SHAPLEY L S. Stochastic games[J]. Proceedings of the National Academy of Sciences of the United States of America, 1953, 39(10): 1095-1100. |
27 | LITTMAN M L. Markov games as a framework for multi-agent reinforcement learning[M]∥KAUFMANN M. Machine learning proceedings. Amsterdam: Elsevier, 1994: 157-163. |
28 | BROWN G W. Iterative solution of games by fictitious play[J]. Activity Analysis of Production and Allocation, 1951, 13(1): 374-376. |
29 | SCHRITTWIESER J, ANTONOGLOU I, HUBERT T, et al. Mastering Atari, Go, chess and shogi by planning with a learned model[J]. Nature, 2020, 588(7839): 604-609. |
30 | ZHA D C, XIE J R, MA W Y, et al. DouZero: Mastering DouDizhu with self-play deep reinforcement learning[DB/OL]. arXiv preprint: 2106.06135, 2021. |
31 | BANSAL T, PACHOCKI J, SIDOR S, et al. Emergent complexity via multi-agent competition[DB/OL]. arXiv preprint: 1710.03748, 2017. |
32 | JADERBERG M, CZARNECKI W M, DUNNING I, et al. Human-level performance in 3D multiplayer games with population-based reinforcement learning[J]. Science, 2019, 364(6443): 859-865. |
33 | JULIANI A, BERGES V P, VCKAY E, et al. Unity: a general platform for intelligent agentsV[DB/OL]. arXiv preprint: 1809.02627, 2020. |
34 | BONANNI P. The art of the kill: A comprehensive guide to modern air combat[M]. Boulder: Spectrum HoloByte, 1993. |
35 | 吴文海, 周思羽, 高丽, 等. 基于导弹攻击区的超视距空战态势评估改进[J]. 系统工程与电子技术, 2011, 33(12): 2679-2685. |
WU W H, ZHOU S Y, GAO L, et al. Improvements of situation assessment for beyond-visual-range air combat based on missile launching envelope analysis[J]. Systems Engineering and Electronics, 2011, 33(12): 2679-2685 (in Chinese). | |
36 | YANG Y D, WANG J. An overview of multi-agent reinforcement learning from game theoretical perspective[DB/OL]. arXiv preprint: 2011.00583v3, 2021. |
37 | SCHULMAN J, MORITZ P, LEVINE S, et al. High-dimensional continuous control using generalized advantage estimation[DB/OL]. arXiv preprint: 1506.02438, 2015. |
38 | Technologies Unity. Unity ML-agents toolkit[EB/OL]. (2023-07-10). |
39 | JADERBERG M, CZARNECKI W M, DUNNING I, et al. Human-level performance in 3D multiplayer games with population-based reinforcement learning[J]. Science, 2019, 364(6443): 859-865. |
40 | Wikipedia. Elo rating system[EB/OL]. 2021. . |
41 | Github.NWPU-SSZ[EB/OL].(2023-08-28). . |
[1] | Honglin ZHANG, Jianjun LUO, Weihua MA. Spacecraft game decision making for threat avoidance of space targets based on machine learning [J]. Acta Aeronautica et Astronautica Sinica, 2024, 45(8): 329136-329136. |
[2] | Yunpeng CAI, Dapeng ZHOU, Jiangchuan DING. Intelligent collaborative control of UAV swarms with collision avoidance safety constraints [J]. Acta Aeronautica et Astronautica Sinica, 2024, 45(5): 529683-529683. |
[3] | Xuejian WANG, Yongming WEN, Xiaorong SHI, Ningning ZHANG, Jiexi LIU. Design of hybrid intelligent decision framework for multi⁃agent and multi⁃coupling tasks [J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(S2): 729770-729770. |
[4] | Youpeng DENG, Jiaxuan FAN, Yan ZHENG, Zhenya WANG, Yongliang LYU, Yuxiao LI. Multiagent opponent modeling with incompleted information [J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(S2): 729782-729782. |
[5] | Jinyi MA, Can WANG, Tao XUE, Jianliang AI, Yiqun DONG. Development and illustrative applications of an air combat engagement database [J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(S1): 727538-727538. |
[6] | Zhilin FAN, Hongyong YANG, Yilin HAN. Target round-up control for multi-agent systems based on reinforcement learning [J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(S1): 727487-727487. |
[7] | Lianbo YU, Pinzhao CAO, Liang SHI, Jie LIAN, Dong WANG. An improved conflict⁃based search algorithm for multi⁃agent path planning [J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(S1): 727648-727648. |
[8] | Baichuan ZHANG, Wenhao BI, An ZHANG, Zeming MAO, Mi YANG. Transformer-based error compensation method for air combat aircraft trajectory prediction [J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2023, 44(9): 327413-327413. |
[9] | Yajie MA, Juan WANG, Bin JIANG, Jianye GONG. A fault⁃tolerant control scheme for UAVs-UGVs formation systems [J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2023, 44(8): 327216-327216. |
[10] | Xiaowei FU, Zhe XU, Jindong ZHU, Nan WANG. Maneuvering decision-making of multi-UAV attack-defence confrontation based on PER-MATD3 [J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2023, 44(7): 327083-327083. |
[11] | Xizhen GAO, Liang TANG, Huang HUANG. Deep reinforcement learning in autonomous manipulation for celestial bodies exploration: Applications and challenges [J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2023, 44(6): 26762-026762. |
[12] | Pan ZHOU, Jiangtao HUANG, Sheng ZHANG, Gang LIU, Bowen SHU, Jigang TANG. Intelligent air combat decision making and simulation based on deep reinforcement learning [J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2023, 44(4): 126731-126731. |
[13] | Linkun HE, Wenchao XUE, Ran ZHANG, Huifeng LI. Guidance and control for powered descent and landing of launch vehicles: Overview and outlook [J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(23): 628462-628462. |
[14] | Xiangwei ZHU, Dan SHEN, Kai XIAO, Yuexin MA, Xiang LIAO, Fuqiang GU, Fangwen YU, Kefu GAO, Jingnan LIU. Mechanisms, algorithms, implementation and perspectives of brain⁃inspired navigation [J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(19): 28569-028569. |
[15] | Shuyi GAO, Defu LIN, Duo ZHENG, Xinyu HU. Intelligent cooperative interception strategy of aircraft against cluster attack [J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(18): 328301-328301. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||
Address: No.238, Baiyan Buiding, Beisihuan Zhonglu Road, Haidian District, Beijing, China
Postal code : 100083
E-mail:hkxb@buaa.edu.cn
Total visits: 6658907 Today visits: 1341All copyright © editorial office of Chinese Journal of Aeronautics
All copyright © editorial office of Chinese Journal of Aeronautics
Total visits: 6658907 Today visits: 1341