基于混合动作的空战分层强化学习决策算法

doi:10.7527/S1000-6893.2024.30053

本期目录 | 过刊浏览 | 高级检索

| 后一篇

基于混合动作的空战分层强化学习决策算法

李佐龙¹,朱纪洪²,匡敏驰³,张杰⁴,任洁⁵

1. 清华大学，精密仪器系
2. 清华大学
3. 清华大学精密仪器系
4. 成都飞机设计研究院
5. 成都飞机设计研究所

收稿日期:2024-01-02 修回日期:2024-04-20 出版日期:2024-04-25 发布日期:2024-04-25
通讯作者: 朱纪洪

Hierarchical decision algorithm for air combat with hybrid action based on reinforcement learning

Received:2024-01-02 Revised:2024-04-20 Online:2024-04-25 Published:2024-04-25

摘要/Abstract

摘要： 智能空战是世界主要军事强国的研究热点。为解决超视距空战博弈机动决策问题，提出了基于深度强化学习的超视距空战分层决策算法。在决策算法中，使用适合于超视距空战的机动动作集，并采用飞行控制的方法，对飞行器的航迹和姿态进行控制。为了扩大模型的动作空间，提升模型的决策能力，将空战的动作空间进行分层，建模为多维离散的动作空间。针对空战中稀疏奖励的问题，设计了一套综合考虑位置优势、武器发射和武器威胁等要素的奖励函数，用于引导智能体向最优策略收敛。搭建完整的数字孪生空战仿真环境和空战专家系统，在仿真环境中训练决策算法，并通过与专家系统的对抗，对决策算法进行评估。实验结果表明，决策算法具备超视距空战自主决策的能力和周期性进攻的特性，能够根据战场态势，进行灵活的机动决策，在与专家系统对抗的过程中具有一定的优势。

关键词: 超视距空战, 智能决策, 深度强化学习, PPO, 机动动作, 分层决策

Abstract: Intelligent air combat is a hot research topic among countries with strong military in the world. In order to solve the maneuver decision problem of air combat Beyond Visual Range (BVR), we propose the hierarchical decision algorithm based on Deep Reinforcement Learning. We use the maneuver set appropriate to the BVR air combat and we control the trajectory and the attitude of the aircraft with flight control law in the decision algorithm. In order to expand the action space of the model and increase its ability of decision-making, we hierarchize the action space and model it as the multi-discrete one. Aimed at the problem of sparse reward in air combat, we design a set of reward function taking the position advantage, weapon launching, weapon threat and other factors into consideration, which can guide the agent to converge to the optimal policy. We also build a complete digital-twin simulation environment for air combat and an expert system. The decision algorithm is trained in the environment and is evaluated by fighting with the expert system. The experiment results indicates that the decision algorithm we propose has the ability to make autonomous and flexible decision in BVR air combat based on current situation and can attack periodically. It has some advantage against the expert system.

Key words: air combat beyond visual range, intelligent decision, deep reinforcement learning, PPO, maneuver, hierar-chical decision

中图分类号:

V249.4

李佐龙朱纪洪匡敏驰张杰任洁. 基于混合动作的空战分层强化学习决策算法[J]. 航空学报, doi: 10.7527/S1000-6893.2024.30053.

E-mail：hkxb@buaa.edu.cn

关于我们

期刊社服务

专业学科

封面文章

友情链接

主管单位：中国科学技术协会主办单位：中国航空学会北京航空航天大学

[1]	蔡云鹏, 周大鹏, 丁江川. 具有防撞安全约束的无人机集群智能协同控制[J]. 航空学报, 2024, 45(5): 529683-529683.
[2]	单圣哲, 张伟伟. 基于自博弈深度强化学习的空战智能决策方法[J]. 航空学报, 2024, 45(4): 328723-328723.
[3]	倪炜霖, 王永海, 徐聪, 赤丰华, 梁海朝. 基于强化学习的高超飞行器协同博弈制导方法[J]. 航空学报, 2023, 44(S2): 729400-729400.
[4]	王雪鉴, 文永明, 石晓荣, 张宁宁, 刘洁玺. 多智能体多耦合任务混合式智能决策架构设计[J]. 航空学报, 2023, 44(S2): 729770-729770.
[5]	高锡珍, 汤亮, 黄煌. 深度强化学习技术在地外探测自主操控中的应用与挑战[J]. 航空学报, 2023, 44(6): 26762-026762.
[6]	周攀, 黄江涛, 章胜, 刘刚, 舒博文, 唐骥罡. 基于深度强化学习的智能空战决策与仿真[J]. 航空学报, 2023, 44(4): 126731-126731.
[7]	朱祥维, 沈丹, 肖凯, 马岳鑫, 廖祥, 古富强, 余芳文, 高柯夫, 刘经南. 类脑导航的机理、算法、实现与展望[J]. 航空学报, 2023, 44(19): 28569-028569.
[8]	董磊, 陈泓兵, 陈曦, 赵长啸. 基于DQN的单一飞行员驾驶模式分布式多智能体联盟任务分配策略[J]. 航空学报, 2023, 44(13): 327895-327895.
[9]	陈文雪, 高长生, 荆武兴. 拦截机动目标的信赖域策略优化制导算法[J]. 航空学报, 2023, 44(11): 327596-327596.
[10]	惠俊鹏, 汪韧, 郭继峰. 基于强化学习的禁飞区绕飞智能制导技术[J]. 航空学报, 2023, 44(11): 327416-327416.
[11]	章胜, 周攀, 何扬, 黄江涛, 刘刚, 唐骥罡, 贾怀智, 杜昕. 基于深度强化学习的空战机动决策试验[J]. 航空学报, 2023, 44(10): 128094-128094.
[12]	惠俊鹏, 汪韧, 俞启东. 基于强化学习的再入飞行器“新质”走廊在线生成技术[J]. 航空学报, 2022, 43(9): 325960-325960.
[13]	向锦武, 董希旺, 丁文锐, 索津莉, 沈林成, 夏辉. 复杂环境下无人集群系统自主协同关键技术[J]. 航空学报, 2022, 43(10): 527570-527570.
[14]	刘浩, 王昊, 孟光磊, 吴昊, 周铭哲. 基于动态贝叶斯网络和模糊灰度理论的飞行训练评估[J]. 航空学报, 2021, 42(8): 525838-525838.
[15]	任峰, 高传强, 唐辉. 机器学习在流动控制领域的应用及发展趋势[J]. 航空学报, 2021, 42(4): 524686-524686.

基于混合动作的空战分层强化学习决策算法

Hierarchical decision algorithm for air combat with hybrid action based on reinforcement learning

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价