1 |
WANG Z A, LI H, WU H L, et al. Improving maneuver strategy in air combat by alternate freeze games with a deep reinforcement learning algorithm[J]. Mathematical Problems in Engineering, 2020, 2020: 1-17.
|
2 |
马文, 李辉, 王壮, 等. 基于深度随机博弈的近距空战机动决策[J]. 系统工程与电子技术, 2021, 43(2): 443-451.
|
|
MA W, LI H, WANG Z, et al. Close air combat maneuver decision based on deep stochastic game[J]. Systems Engineering and Electronics, 2021, 43(2): 443-451 (in Chinese).
|
3 |
李宪港, 李强. 典型智能博弈系统技术分析及指控系统智能化发展展望[J]. 智能科学与技术学报, 2020, 2(1): 36-42.
|
|
LI X G, LI Q. Technical analysis of typical intelligent game system and development prospect of intelligent command and control system[J]. Chinese Journal of Intelligent Science and Technology, 2020, 2(1): 36-42 (in Chinese).
|
4 |
POPE A P, IDE J S, MIĆOVIĆ D, et al. Hierarchical reinforcement learning for air-to-air combat[C]∥2021 International Conference on Unmanned Aircraft Systems (ICUAS). Piscataway: IEEE Press, 2021: 275-284.
|
5 |
SUFIYAN D, WIN L T S, WIN S K H, et al. A reinforcement learning approach for control of a nature-inspired aerial vehicle[C]∥2019 International Conference on Robotics and Automation (ICRA). Piscataway: IEEE Press, 2019: 6030-6036.
|
6 |
ZHEN Y, HAO M R, SUN W D. Deep reinforcement learning attitude control of fixed-wing UAVs[C]∥2020 3rd International Conference on Unmanned Systems (ICUS). Piscataway: IEEE Press, 2020: 239-244.
|
7 |
WANG C, YAN C, XIANG X, et al. A continuous actor-critic reinforcement learning approach to flocking with fixed-wing UAVs[C]∥Asian Conference on Machine Learning. Berlin: Springer, 2020: 239-244.
|
8 |
周攀, 黄江涛, 章胜, 等. 基于深度强化学习的智能空战决策与仿真[J]. 航空学报, 2023, 44(4): 126731.
|
|
ZHOU P, HUANG J T, ZHANG S, et al. Intelligent air combat decision making and simulation based on deep reinforcement learning[J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(4): 126731 (in Chinese).
|
9 |
吴宜珈, 赖俊, 陈希亮, 等. 强化学习算法在超视距空战辅助决策上的应用研究[J]. 航空兵器, 2021, 28(2): 55-61.
|
|
WU Y J, LAI J, CHEN X L, et al. Research on the application of reinforcement learning algorithm in decision support of beyond-visual-range air combat[J]. Aero Weaponry, 2021, 28(2): 55-61 (in Chinese).
|
10 |
王欢, 周旭, 邓亦敏, 等. 分层决策多机空战对抗方法[J]. 中国科学: 信息科学, 2022, 52(12): 2225-2238.
|
|
WANG H, ZHOU X, DENG Y M, et al. A hierarchical decision-making method for multi-aircraft air combat confrontation[J]. Scientia Sinica (Informationis), 2022, 52(12): 2225-2238 (in Chinese).
|
11 |
POMERLEAU D A. Alvinn: An autonomous land vehicle in a neural network[C]∥Conference and Workshop on Neural Information Processing Systems. New York: ACM, 1989: 305-313.
|
12 |
BOJARSKI M, DEL TESTA D, DWORAKOWSKI D, et al. End to end learning for self-driving cars[DB/OL]. arXiv preprint: 1604.07316. 2016.
|
13 |
GIUSTI A, GUZZI J, CIREŞAN D C, et al. A machine learning approach to visual perception of forest trails for mobile robots[J]. IEEE Robotics and Automation Letters, 2016, 1(2): 661-667.
|
14 |
NAKANISHI J, MORIMOTO J, ENDO G, et al. Learning from demonstration and adaptation of biped locomotion[J]. Robotics and Autonomous Systems, 2004, 47(2-3): 79-91.
|
15 |
ROSS S, GORDON G J, BAGNELL J A. A reduction of imitation learning and structured prediction to No-regret online learning[C]∥Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics. New York: PMLR, 2011: 627–635.
|
16 |
NG A Y, RUSSELL S J. Algorithms for inverse reinforcement learning[C]∥Proceedings of the Seventeenth International Conference on Machine Learning. New York: ACM, 2000: 663-670.
|
17 |
ZIEBART B D, MAAS A, BAGNELL J A, et al. Maximum entropy inverse reinforcement learning[C]∥ Proceedings of the 23rd National Conference on Artificial Intelligence. New York: ACM, 2008: 1433-1438.
|
18 |
FINN C, LEVINE S, ABBEEL P. Guided cost learning: Deep inverse optimal control via policy optimization[C]∥Proceedings of the 33rd International Conference on International Conference on Machine Learning. New York: ACM, 2016: 49-58.
|
19 |
NAIR A, MCGREW B, ANDRYCHOWICZ M, et al. Overcoming exploration in reinforcement learning with demonstrations[C]∥2018 IEEE International Conference on Robotics and Automation (ICRA). Piscataway: IEEE Press, 2018: 6292-6299.
|
20 |
XU H R, ZHAN X Y, YIN H L, et al. Discriminator-weighted offline imitation learning from suboptimal demonstrations[C]∥Proceedings of the 39th International Conference on Machine Learning. New York: ACM, 2022: 24725-24742.
|
21 |
VINYALS O, BABUSCHKIN I, CZARNECKI W M, et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning[J]. Nature, 2019, 575(7782): 350-354.
|
22 |
WANG P, LIU D P, CHEN J Y, et al. Decision making for autonomous driving via augmented adversarial inverse reinforcement learning[C]∥2021 IEEE International Conference on Robotics and Automation (ICRA). Piscataway: IEEE Press, 2021: 1036-1042.
|
23 |
俞扬, 詹德川, 周志华, 等. 基于模仿学习和强化学习算法的无人机飞行控制方法: CN112162564B[P]. 2021-09-28.
|
|
YU Y, ZHAN D C, ZHOU Z H, et al. Unmanned aerial vehicle flight control method based on imitation learning and reinforcement learning algorithms: CN112162564B[P]. 2021-09-28 (in Chinese).
|
24 |
ZHU Z D, LIN K X, DAI B, et al. Self-adaptive imitation learning: Learning tasks with delayed rewards from sub-optimal demonstrations[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2022, 36(8): 9269-9277.
|
25 |
SCHULMAN J, LEVINE S, MORITZ P, et al. Trust region policy optimization[C]∥International Conference on Machine Learning. New York: ACM, 2015: 1889-1897.
|
26 |
SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[DB/OL]. arXiv preprint: 1707.06347, 2017.
|
27 |
李茹杨, 彭慧民, 李仁刚, 等. 强化学习算法与应用综述[J]. 计算机系统应用, 2020, 29(12): 13-25.
|
|
LI R Y, PENG H M, LI R G, et al. Overview on algorithms and applications for reinforcement learning[J]. Computer Systems and Applications, 2020, 29(12): 13-25 (in Chinese).
|
28 |
OH J, GUO Y, SINGH S, et al. Self-imitation learning [C]∥Proceedings of the 35th International Conference on Machine Learning. New York: ACM, 2018: 3778-3887.
|
29 |
HAARNOJA T, TANG H R, ABBEEL P, et al. Reinforcement learning with deep energy-based policies[C]∥Proceedings of the 34th International Conference on Machine Learning-Volume 70. New York: ACM, 2017: 1352-1361.
|
30 |
LI C, WU F G, ZHAO J S. Accelerating self-imitation learning from demonstrations via policy constraints and Q-ensemble[C]∥2023 International Joint Conference on Neural Networks (IJCNN). Piscataway: IEEE Press, 2023: 1-8.
|
31 |
SCHULMAN J, MORITZ P, LEVINE S, et al. High-dimensional continuous control using generalized advantage estimation[DB/OL]. arXiv preprint: 1506.02438, 2015.
|
32 |
KINGMA D P and BA J. Adam: A method for stochastic optimization[C]∥International Conference for Learning Representations (ICLR). San Juan: Puerto Rico, 2015.
|
33 |
MCGREW J S, HOW J P, WILLIAMS B, et al. Air-combat strategy using approximate dynamic programming[J]. Journal of Guidance, Control, and Dynamics, 2010, 33(5): 1641-1654.
|
34 |
Fujimoto S, Gu S S. A minimalist approach to offline reinforcement learning[C]∥Thirty-Fifth Conference on Neural Information Processing Systems (NeurIPS). New York: ACM, 2021: 20132-20145.
|