航空学报 > 2024, Vol. 45 Issue (10): 329166-329166   doi: 10.7527/S1000-6893.2023.29166

切换拓扑下异构集群的强化学习时变编队控制

杨加秀, 李新凯(), 张宏立, 王昊   

  1. 新疆大学 电气工程学院,乌鲁木齐 830017
  • 收稿日期:2023-06-13 修回日期:2023-06-26 接受日期:2023-08-23 出版日期:2024-05-25 发布日期:2023-09-01
  • 通讯作者: 李新凯 E-mail:lxk@xju.edu.cn
  • 基金资助:
    国家自然科学基金(62263030);新疆维吾尔自治区自然科学基金青年科学基金(2022D01C86)

Time-varying formation control for heterogeneous clusters with switching topologies via reinforcement learning

Jiaxiu YANG, Xinkai LI(), Hongli ZHANG, Hao WANG   

  1. School of Electrical Engineering,Xinjiang University,Urumqi 830017,China
  • Received:2023-06-13 Revised:2023-06-26 Accepted:2023-08-23 Online:2024-05-25 Published:2023-09-01
  • Contact: Xinkai LI E-mail:lxk@xju.edu.cn
  • Supported by:
    National Natural Science Foundation of China(62263030);Youth Project of Natural Science Foundation of Xinjiang Uygur Autonomous Region(2022D01C86)

摘要:

针对系统模型动态不确定的高阶异构无人集群系统在切换通信拓扑下的时变编队控制问题,提出一种基于积分强化学习的最优分布式分层编队控制方法。结合时变编队切换向量构建多四旋翼无人机系统与多无人车系统的增广系统,将异构集群系统的时变编队控制问题转化为镇定问题。引入带折扣因子的价值函数,将异构集群系统的镇定问题转化为最优控制问题。在不破坏一致性分布式编队控制协议的基础上,仅替换反馈增益参数并对其进行取平均操作,以得到整个异构集群的最优时变编队切换控制协议。利用单网络“动作网络-评价网络”结构,结合积分强化学习算法和分布式控制方法,在线实时更新分布式时变编队切换控制器的反馈增益。通过理论证明和仿真实验验证了所设计控制方案的有效性和优越性。

关键词: 积分强化学习, 异构集群, 时变编队控制, 分布式控制, 切换拓扑, 最优控制

Abstract:

To address the problem of time-varying formation control of high-order heterogeneous unmanned cluster systems with uncertain system model dynamics and switching communication topology, an optimal distributed hierarchical formation control method is proposed based on integral reinforcement learning. The time-varying formation control problem for heterogeneous cluster systems is transformed into a stabilization problem by using time-varying formation switching vectors to construct an augmented system of multi-quadrotor Unmanned Aircraft System (UAS) with multi-unmanned vehicle systems. The value function with discount factor is introduced to transform the stabilization problem of the heterogeneous clustered system into an optimal control problem. Only the feedback gain parameters are replaced and averaged to obtain the optimal time-varying formation switching control protocol for the whole heterogeneous cluster without destroying the consistent distributed formation control protocol. The feedback gain of the distributed time-varying formation switching controller is updated online in real time using a single-network “actor network-critic network” structure, combined with the integral reinforcement learning algorithm and the distributed control method. The effectiveness and superiority of the proposed control scheme are verified by theoretical proof and simulation experiments.

Key words: integral reinforcement learning, heterogeneous cluster, time-varying formation control, distributed control, switching topology, optimal control

中图分类号: