航空学报 > 2023, Vol. 44 Issue (S1): 727487-727487   doi: 10.7527/S1000-6893.2022.27487

基于强化学习的多智能体系统目标围捕控制

范之琳, 杨洪勇(), 韩艺琳   

  1. 鲁东大学 信息与电气工程学院,烟台 264025
  • 收稿日期:2022-05-21 修回日期:2022-06-23 接受日期:2022-07-01 出版日期:2023-06-25 发布日期:2022-07-08
  • 通讯作者: 杨洪勇 E-mail:hyyang@yeah.net
  • 基金资助:
    国家自然科学基金(61673200);山东省重大基础研究项目(ZR2018ZC0438)

Target round-up control for multi-agent systems based on reinforcement learning

Zhilin FAN, Hongyong YANG(), Yilin HAN   

  1. School of Information and Electrical Engineering,Ludong University,Yantai 264025,China
  • Received:2022-05-21 Revised:2022-06-23 Accepted:2022-07-01 Online:2023-06-25 Published:2022-07-08
  • Contact: Hongyong YANG E-mail:hyyang@yeah.net
  • Supported by:
    National Natural Science Foundation of China(61673200);Shandong Province Major Basic Research Project(ZR2018ZC0438)

摘要:

针对多智能体系统目标围捕问题,提出了基于强化学习的目标围捕控制方法。首先,对多智能体系统进行马尔可夫博弈建模,设计能够控制系统到期望围捕状态并满足避障要求的势能函数,将模型控制与强化学习原理结合,利用势能模型引导的改进多智能体强化学习算法进行围捕。其次,在已有势能模型的基础上建立跟踪围捕和环航围捕2种围捕策略。前者通过设计速度势能函数实现多智能体一致跟踪。后者加入虚拟环航点,设计虚拟环航点势能函数实现期望环航。最终,仿真验证了多智能体强化学习围捕控制策略的有效性。

关键词: 目标围捕, 强化学习, 势能函数, 多智能体系统, 避障

Abstract:

A target round-up control method for multi-agent systems is proposed based on reinforcement learning. Firstly, Markov game modeling for multi-agent systems is carried out. The potential energy function which meets the requirements of arriving at the desired state and avoiding obstacles is designed according to the task of rounding up, and reinforcement learning principles are combined with the model control. The round-up is performed using multi-agent reinforcement learning guided by the potential energy model. Secondly, based on the existing potential energy model, two surrounding strategies are established: tracking round-up and circumnavigation round-up. With the first strategy, consistent tracking of multiple agents is achieved by designing the potential energy function of velocity. In the second strategy, virtual circumnavigation points are added to design potential energy functions, achieving desired circumnavigation. Finally, the effectiveness of the round-up control based on multi-agent reinforcement learning is verified by simulation.

Key words: target round-up, reinforcement learning, potential energy function, multi-agent systems, avoiding obstacle

中图分类号: