基于经验移植的自主空战对抗学习方法

doi:10.7527/S1000-6893.2020.24285

本期目录 | 过刊浏览 | 高级检索

前一篇 | 后一篇

基于经验移植的自主空战对抗学习方法

周凯^1,2, 魏瑞轩², 张启瑞³, 丁超^1,2

1. 空军工程大学研究生院, 西安 710051;
2. 空军工程大学航空工程学院, 西安 710038;
3. 中国人民解放军95561部队, 日喀则 857000

收稿日期:2020-05-15 修回日期:2020-05-30 发布日期:2020-06-18
通讯作者: 周凯 E-mail:kzhouu@163.com
基金资助:
科技部重点项目"新一代人工智能"（2018AAA0102403）；国家自然科学基金（61573373）

Learning method for autonomous air combat based on experience transfer

ZHOU Kai^1,2, WEI Ruixuan², ZHANG Qirui³, DING Chao^1,2

1. Graduate College, Air Force Engineering University, Xi'an 710051, China;
2. Aeronautics Engineering College, Air Force Engineering University, Xi'an 710038, China;
3. Unit 95561 of PLA, Rikaze City 857000, China

Received:2020-05-15 Revised:2020-05-30 Published:2020-06-18
Supported by:
Science and Technology Innovation 2030-Key Project of "New Generation Artificial Intelligence" (2018AAA0102403); National Natural Science Foundation of China (61573373)

摘要/Abstract

摘要： 现有的机器学习方法大多是交互式的学习模式，这类方法在训练过程非常依赖与环境之间的交互数据。空战对抗任务是一种奖励非常稀疏的训练任务，智能体在学习开始的很长一段时间内，都在探索能够获得奖励的动作。如果每一个新的任务都重新训练，是非常浪费计算资源的。因此，设计了一种基于经验移植的学习方法，使得经过训练的智能体能够将知识分享给新的智能体，提高其在新任务上的学习效率。首先，借鉴人类通过经验进行快速学习的现象，构建了基于经验移植的学习的模型；其次，兼顾知识分享和新任务的特征，明确了经验的内涵，建立了"知识+任务→经验"的融合认知方式；再次，设计了借鉴学习方法，将外部经验与任务相结合，进而转化为新个体的知识；最后，使用经验适用度作为筛选指标，分析了经验适用度对借鉴学习效率的影响，确定了执行借鉴学习的筛选边界。新个体通过借鉴学习后能够获得关于新任务的初步知识，在新任务中更快地找到能够获得奖励的动作策略，从而提升在新的任务中的学习速度。

关键词: 空战对抗, 经验移植, 借鉴学习, 知识分享, 融合认知

Abstract: Most of the existing machine learning methods are in interactive learning mode, whose training process relies heavily on the interactive data with the environment. Air combat is a training mission with sparse rewards, with the system usually exploring for a long period of time to find actions that can obtain rewards during the beginning stage of learning. Retraining for every new mission wastes the computing resources. Therefore, a learning method based on experience transfer is designed in this paper, enabling the trained agent to share knowledge with the new agent and thereby improving its learning efficiency in the new task. First of all, a learning model based on experience transfer is constructed by referring to the phenomenon that mankind can learn rapidly through experiences. Secondly, considering both the knowledge sharing and characteristics of the new task, the connotation of experience is defined, and a cognitive mode of "knowledge + task → experience" is established. Thirdly, a reference learning method is designed, combining external experience with the task to further transform it into knowledge of the new agent. Finally, using experience applicability as the screening index, we analyze the influence of experience applicability on the reference learning efficiency, determining the screening boundary of implementing the reference learning. The new agent can therefore obtain preliminary knowledge about the new mission by reference learning and find action policies that can obtain reward so as to improve the learning speed in the new learning mission.

Key words: air combat, experience transfer, reference learning, knowledge sharing, fusion cognition

中图分类号:

周凯, 魏瑞轩, 张启瑞, 丁超. 基于经验移植的自主空战对抗学习方法[J]. 航空学报, 2020, 41(S2): 724285-724285.

ZHOU Kai, WEI Ruixuan, ZHANG Qirui, DING Chao. Learning method for autonomous air combat based on experience transfer[J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2020, 41(S2): 724285-724285.

参考文献

[1] FINN C. Learning to learn with gradients[D]. Berkeley:University of California, Berkeley, 2018:1-20.
[2] PATRICIA N, CAPUTO B. Learning to learn, from transfer learning to domain adaptation:A unifying perspective[C]//Proceedings of the IEEE Conference on Computer Vision and Pat-tern Recognition. Piscataway:IEEE Press, 2014:1442-1449.
[3] HUANG J T, LI J, YU D, et al. Cross-language knowledge transfer using multilingual deep neural network with shared hidden layers[C]//Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway:IEEE Press, 2013:7304-7308.
[4] WANG L, TANG K, XIN B, et al. Knowledge transfer between multi-granularity models for reinforcement learning[C]//Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics. Piscataway:IEEE Press, 2018:2881-2886.
[5] MARKOVA V D, SHOPOV V K. Knowledge transfer in reinforcement learning agent[C]//Proceedings of the IEEE Internation-al Conference on Information Technologies (In-foTech). Piscataway:IEEE Press, 2019:1-4.
[6] SANTORO A, BARTUNOV S, BOTVINICK M, et al. Meta-learning with memory-augmented neural networks[C]//Proceedings of the International Conference on Machine Learning. New York:ACM, 2016:1842-1850.
[7] XU Z, CAO L, CHEN X. Meta-Learning via weighted gradient update[J]. IEEE Access, 2019, 7:110846-110855.
[8] GOODFELLOW I, BENGIO Y, COURVILLE A. Deep learning[M]. Cambridge:MIT Press, 2016:438-481.
[9] PAN S J, YANG Q. A survey on transfer learning[J]. IEEE Transactions on Knowledge and Data Engineering, 2010, 22(10):1345-1359.
[10] TAN C, SUN F, KONG T, et al. A survey on deep transfer learning[C]//Proceedings of the International Conference on Artificial Neural Networks, 2018:270-279.
[11] TAYLOR M E, STONE P. Transfer learning for reinforcement learning domains:A survey[J]. Journal of Machine Learning Research, 2009, 10(7):1633-1685.
[12] SUTTON R S, BARTO A G. Reinforcement learning:An introduction[M]. Cambridge:The MIT Press, 2016:161-280.
[13] WEI R, ZHANG Q, XU Z. Peers' experience learning for developmental robots[J]. International Journal of Social Robotics, 2020, 12(1):35-45.
[14] 张启瑞. 运用认知发育机理的无人机防碰撞控制方法研究[D]. 西安:空军工程大学, 2019:51-78. ZHANG Q R. Research on anti-collision control method of UAV using cognitive development mechanism[D]. Xi'an:Air Force Engineering University, 2019:51-78(in Chinese).
[15] LI R, ZHAO Z, CHEN X, et al. TACT:A transfer actor-Critic learning framework for energy saving in cellular radio access networks[J]. IEEE Transactions on Wireless Communications, 2014, 13(4):2000-2011.
[16] KOUSHIK A M, HU F, KUMAR S. Intelligent spectrum management based on transfer actor-critic learning for rateless transmissions in cognitive radio networks[J]. IEEE Transactions on Mobile Computing, 2018, 17(5):1204-1215.
[17] ZHOU K, WEI R, ZHANG Q, et al. Learning system for air combat decision inspired by cognitive mechanisms of the nrain[J]. IEEE Access, 2020, 8:8129-8144.
[18] SILVER D, LEVER G, HEESS N, et al. Deterministic policy gradient algorithms[C]//Proceedings of the 31st International Conference on Machine Learning, 2014:387-395.
[19] WANG L, WANG M, YUE T. A fuzzy deterministic policy gradient algorithm for pursuit-evasion differential games[J]. Neurocomputing, 2019, 362:106-117.
[20] 刘冰雁,叶雄兵,周赤非,等. 基于改进DQN的复合模式在轨服务资源分配[J]. 航空学报, 2020, 41(5):323630. LIU B Y, YE X B, ZHOU C F, et al. Allocation of composite mode on-orbit service resource based on improved DQN[J]. Acta Aeronautica et Astronautica Sinica, 2020, 41(5):323630(in Chinese).
[21] SUN T, TSAI S, LEE Y, et al. The study on ontelligent advanced fighter air combat decision support system[C]//Proceedings of the IEEE International Conference on Information Reuse & Integration. Piscataway:IEEE Press, 2006:39-44.

E-mail：hkxb@buaa.edu.cn

关于我们

期刊社服务

专业学科

封面文章

友情链接

主管单位：中国科学技术协会主办单位：中国航空学会北京航空航天大学

基于经验移植的自主空战对抗学习方法

Learning method for autonomous air combat based on experience transfer

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 5

编辑推荐

Metrics

本文评价

[1]	宁晓琳, 梁晓钰, 孙晓函, 王帆, 王龙华, 房建成. 地球卫星星光折射导航量测量及其性能对比[J]. 航空学报, 2020, 41(8): 623536-623536.
[2]	张辉, 周向东, 汪新梅, 田宏. 近地空间全天时星敏感器技术现状及发展综述[J]. 航空学报, 2020, 41(8): 623719-623719.
[3]	朱云峰, 孙永荣, 赵伟, 黄斌, 吴玲. 包含乘性噪声自适应修正的非合作目标相对导航算法[J]. 航空学报, 2019, 40(7): 322884-322884.
[4]	许建新, 熊智, 陈明星, 刘建业. 多无人机辅助定位信标的区域导航定位算法[J]. 航空学报, 2018, 39(10): 322172-322172.
[5]	刘佳琪, 王伟, 林德福, 林时尧. 考虑驾驶仪动态性能的指令滤波反演制导律[J]. 航空学报, 2020, 41(12): 324123-324123.