special column

Value-filter based air-combat maneuvering optimization

  • Yupeng FU ,
  • Xiangyang DENG ,
  • Ziqiang ZHU ,
  • Limin ZHANG
Expand
  • 1.School of Aviation Support,Naval Aeronautical University,Yantai 264001,China
    2.Department of Automation,Tsinghua University,Beijing 100084,China

Received date: 2023-04-14

  Revised date: 2023-05-30

  Accepted date: 2023-06-14

  Online published: 2023-06-27

Supported by

National Natural Science Foundation of China(91538201);Foundation Program for National Defense High Level Talent(202220539)

Abstract

To address the issues of low data utilization efficiency and convergence difficulty of traditional reinforcement learning algorithm in air-combat maneuvering decision optimization with large state space, the concept and principle of value filter are proposed and analyzed. A reinforcement learning algorithm named Demonstration Policy Constrain (DPC) is presented based on the value filter. A maneuvering decision optimization method based on the DPC algorithm is designed. With the value filter, the state-value based advantage data of the replay buffer and the demonstration buffer are extracted to constrain the optimization direction of the policy. Based on the JSBSim's aerodynamic model of F-16 Aircraft, the simulation results show that the convergence efficiency of the algorithm is significantly improved and the sub-optimal problem of the demonstration policy is mitigated, and the maneuvering decision method proposed achieves good intelligence.

Cite this article

Yupeng FU , Xiangyang DENG , Ziqiang ZHU , Limin ZHANG . Value-filter based air-combat maneuvering optimization[J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2023 , 44(22) : 628871 -628871 . DOI: 10.7527/S1000-6893.2023.28871

References

1 WANG Z A, LI H, WU H L, et al. Improving maneuver strategy in air combat by alternate freeze games with a deep reinforcement learning algorithm[J]. Mathematical Problems in Engineering20202020: 1-17.
2 马文, 李辉, 王壮, 等. 基于深度随机博弈的近距空战机动决策[J]. 系统工程与电子技术202143(2): 443-451.
  MA W, LI H, WANG Z, et al. Close air combat maneuver decision based on deep stochastic game[J]. Systems Engineering and Electronics202143(2): 443-451 (in Chinese).
3 李宪港, 李强. 典型智能博弈系统技术分析及指控系统智能化发展展望[J]. 智能科学与技术学报20202(1): 36-42.
  LI X G, LI Q. Technical analysis of typical intelligent game system and development prospect of intelligent command and control system[J]. Chinese Journal of Intelligent Science and Technology20202(1): 36-42 (in Chinese).
4 POPE A P, IDE J S, MI?OVI? D, et al. Hierarchical reinforcement learning for air-to-air combat[C]∥2021 International Conference on Unmanned Aircraft Systems (ICUAS). Piscataway: IEEE Press, 2021: 275-284.
5 SUFIYAN D, WIN L T S, WIN S K H, et al. A reinforcement learning approach for control of a nature-inspired aerial vehicle[C]∥2019 International Conference on Robotics and Automation (ICRA). Piscataway: IEEE Press, 2019: 6030-6036.
6 ZHEN Y, HAO M R, SUN W D. Deep reinforcement learning attitude control of fixed-wing UAVs[C]∥2020 3rd International Conference on Unmanned Systems (ICUS). Piscataway: IEEE Press, 2020: 239-244.
7 WANG C, YAN C, XIANG X, et al. A continuous actor-critic reinforcement learning approach to flocking with fixed-wing UAVs[C]∥Asian Conference on Machine Learning. Berlin: Springer, 2020: 239-244.
8 周攀, 黄江涛, 章胜, 等. 基于深度强化学习的智能空战决策与仿真[J]. 航空学报202344(4): 126731.
  ZHOU P, HUANG J T, ZHANG S, et al. Intelligent air combat decision making and simulation based on deep reinforcement learning[J]. Acta Aeronautica et Astronautica Sinica202344(4): 126731 (in Chinese).
9 吴宜珈, 赖俊, 陈希亮, 等. 强化学习算法在超视距空战辅助决策上的应用研究[J]. 航空兵器202128(2): 55-61.
  WU Y J, LAI J, CHEN X L, et al. Research on the application of reinforcement learning algorithm in decision support of beyond-visual-range air combat[J]. Aero Weaponry202128(2): 55-61 (in Chinese).
10 王欢, 周旭, 邓亦敏, 等. 分层决策多机空战对抗方法[J]. 中国科学: 信息科学202252(12): 2225-2238.
  WANG H, ZHOU X, DENG Y M, et al. A hierarchical decision-making method for multi-aircraft air combat confrontation[J]. Scientia Sinica (Informationis)202252(12): 2225-2238 (in Chinese).
11 POMERLEAU D A. Alvinn: An autonomous land vehicle in a neural network[C]∥Conference and Workshop on Neural Information Processing Systems. New York: ACM, 1989: 305-313.
12 BOJARSKI M, DEL TESTA D, DWORAKOWSKI D, et al. End to end learning for self-driving cars[DB/OL]. arXiv preprint: 1604.07316. 2016.
13 GIUSTI A, GUZZI J, CIRE?AN D C, et al. A machine learning approach to visual perception of forest trails for mobile robots[J]. IEEE Robotics and Automation Letters20161(2): 661-667.
14 NAKANISHI J, MORIMOTO J, ENDO G, et al. Learning from demonstration and adaptation of biped locomotion[J]. Robotics and Autonomous Systems200447(2-3): 79-91.
15 ROSS S, GORDON G J, BAGNELL J A. A reduction of imitation learning and structured prediction to No-regret online learning[C]∥Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics. New York: PMLR, 2011: 627–635.
16 NG A Y, RUSSELL S J. Algorithms for inverse reinforcement learning[C]∥Proceedings of the Seventeenth International Conference on Machine Learning. New York: ACM, 2000: 663-670.
17 ZIEBART B D, MAAS A, BAGNELL J A, et al. Maximum entropy inverse reinforcement learning[C]∥ Proceedings of the 23rd National Conference on Artificial Intelligence. New York: ACM, 2008: 1433-1438.
18 FINN C, LEVINE S, ABBEEL P. Guided cost learning: Deep inverse optimal control via policy optimization[C]∥Proceedings of the 33rd International Conference on International Conference on Machine Learning. New York: ACM, 2016: 49-58.
19 NAIR A, MCGREW B, ANDRYCHOWICZ M, et al. Overcoming exploration in reinforcement learning with demonstrations[C]∥2018 IEEE International Conference on Robotics and Automation (ICRA). Piscataway: IEEE Press, 2018: 6292-6299.
20 XU H R, ZHAN X Y, YIN H L, et al. Discriminator-weighted offline imitation learning from suboptimal demonstrations[C]∥Proceedings of the 39th International Conference on Machine Learning. New York: ACM, 2022: 24725-24742.
21 VINYALS O, BABUSCHKIN I, CZARNECKI W M, et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning[J]. Nature2019575(7782): 350-354.
22 WANG P, LIU D P, CHEN J Y, et al. Decision making for autonomous driving via augmented adversarial inverse reinforcement learning[C]∥2021 IEEE International Conference on Robotics and Automation (ICRA). Piscataway: IEEE Press, 2021: 1036-1042.
23 俞扬, 詹德川, 周志华, 等. 基于模仿学习和强化学习算法的无人机飞行控制方法: CN112162564B[P]. 2021-09-28.
  YU Y, ZHAN D C, ZHOU Z H, et al. Unmanned aerial vehicle flight control method based on imitation learning and reinforcement learning algorithms: CN112162564B[P]. 2021-09-28 (in Chinese).
24 ZHU Z D, LIN K X, DAI B, et al. Self-adaptive imitation learning: Learning tasks with delayed rewards from sub-optimal demonstrations[J]. Proceedings of the AAAI Conference on Artificial Intelligence202236(8): 9269-9277.
25 SCHULMAN J, LEVINE S, MORITZ P, et al. Trust region policy optimization[C]∥International Conference on Machine Learning. New York: ACM, 2015: 1889-1897.
26 SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[DB/OL]. arXiv preprint: 1707.06347, 2017.
27 李茹杨, 彭慧民, 李仁刚, 等. 强化学习算法与应用综述[J]. 计算机系统应用202029(12): 13-25.
  LI R Y, PENG H M, LI R G, et al. Overview on algorithms and applications for reinforcement learning[J]. Computer Systems and Applications202029(12): 13-25 (in Chinese).
28 OH J, GUO Y, SINGH S, et al. Self-imitation learning [C]∥Proceedings of the 35th International Conference on Machine Learning. New York: ACM, 2018: 3778-3887.
29 HAARNOJA T, TANG H R, ABBEEL P, et al. Reinforcement learning with deep energy-based policies[C]∥Proceedings of the 34th International Conference on Machine Learning-Volume 70. New York: ACM, 2017: 1352-1361.
30 LI C, WU F G, ZHAO J S. Accelerating self-imitation learning from demonstrations via policy constraints and Q-ensemble[C]∥2023 International Joint Conference on Neural Networks (IJCNN). Piscataway: IEEE Press, 2023: 1-8.
31 SCHULMAN J, MORITZ P, LEVINE S, et al. High-dimensional continuous control using generalized advantage estimation[DB/OL]. arXiv preprint: 1506.02438, 2015.
32 KINGMA D P and BA J. Adam: A method for stochastic optimization[C]∥International Conference for Learning Representations (ICLR). San Juan: Puerto Rico, 2015.
33 MCGREW J S, HOW J P, WILLIAMS B, et al. Air-combat strategy using approximate dynamic programming[J]. Journal of Guidance, Control, and Dynamics201033(5): 1641-1654.
34 Fujimoto S, Gu S S. A minimalist approach to offline reinforcement learning[C]∥Thirty-Fifth Conference on Neural Information Processing Systems (NeurIPS). New York: ACM, 2021: 20132-20145.
Outlines

/