航空学报 > 2020, Vol. 41 Issue (10): 324000-324000   doi: 10.7527/S1000-6893.2020.24000

基于DDPG算法的无人机集群追击任务

张耀中1, 许佳林1, 姚康佳1, 刘洁凌2   

  1. 1. 西北工业大学 电子信息学院, 西安 710072;
    2. 西安北方光电科技防务有限公司, 西安 710043
  • 收稿日期:2020-03-21 修回日期:2020-06-15 发布日期:2020-06-12
  • 通讯作者: 张耀中 E-mail:zhang_y_z@nwpu.edu.cn
  • 基金资助:
    航空科学基金(2017ZC53033)

Pursuit missions for UAV swarms based on DDPG algorithm

ZHANG Yaozhong1, XU Jialin1, YAO Kangjia1, LIU Jieling2   

  1. 1. School of Electronics and Information, Northwestern Polytechnical University, Xi'an 710072, China;
    2. Xi'an North Electro-optic Science & Technology Co. Ltd, Xi'an 710043, China
  • Received:2020-03-21 Revised:2020-06-15 Published:2020-06-12
  • Supported by:
    Aeronautical Science Foundation of China (2017ZC53033)

摘要: 无人机的集群化应用技术是近年来的研究热点,随着无人机自主智能的不断提高,无人机集群技术必将成为未来无人机发展的主要趋势之一。针对无人机集群协同执行对敌方来袭目标的追击任务,构建了典型的任务场景,基于深度确定性策略梯度网络(DDPG)算法,设计了一种引导型回报函数有效解决了深度强化学习在长周期任务下的稀疏回报问题,通过引入基于滑动平均值的软更新策略减少了DDPG算法中Eval网络和Target网络在训练过程中的参数震荡,提高了算法的训练效率。仿真结果表明,训练完成后的无人机集群能够较好地执行对敌方来袭目标的追击任务,任务成功率达到95%。可以说无人机集群技术作为一种全新概念的作战模式在军事领域具有潜在的应用价值,人工智能算法在无人机集群的自主决策智能化发展方向上具有一定的应用前景。

关键词: DDPG算法, 无人机集群, 任务决策, 深度强化学习, 稀疏回报

Abstract: The Unmanned Aerial Vehicle (UAV) swarm technology is one of the research hotspots in recent years. With continuous advancement in autonomous intelligence of UAVs, the UAV swarm technology is bound to become one of the main trends of UAV development in the future. In view of the collaborative pursuit missions of UAV swarms against the enemy, we establish a typical task scenario, and, based on the Deep Deterministic Policy Gradient (DDPG) algorithm, design a guided reward function which effectively solves the sparse rewards problem of deep intensive learning during long-period missions. We introduce a sliding average based soft updating strategy to reduce parameter oscillations in the Eval network and the target network during the training process, thereby improving the training efficiency. The simulation results show that after training, the UAV swarm can successfully carry out the pursuit missions with a success rate of 95%. The UAV swarm technology as a brand new combat mode has a potential application value for application in the military field, and this artificial intelligence algorithm has a certain application prospect in the development of autonomous decision-making by UAV swarms.

Key words: DDPG algorithm, UAV swarms, task decision, deep reinforcement learning, sparse rewards

中图分类号: