电子电气工程与控制

切换拓扑下异构集群的强化学习时变编队控制

  • 杨加秀 ,
  • 李新凯 ,
  • 张宏立 ,
  • 王昊
展开
  • 新疆大学 电气工程学院,乌鲁木齐 830017
.E-mail: lxk@xju.edu.cn

收稿日期: 2023-06-13

  修回日期: 2023-06-26

  录用日期: 2023-08-23

  网络出版日期: 2023-09-01

基金资助

国家自然科学基金(62263030);新疆维吾尔自治区自然科学基金青年科学基金(2022D01C86)

Time-varying formation control for heterogeneous clusters with switching topologies via reinforcement learning

  • Jiaxiu YANG ,
  • Xinkai LI ,
  • Hongli ZHANG ,
  • Hao WANG
Expand
  • School of Electrical Engineering,Xinjiang University,Urumqi 830017,China
E-mail: lxk@xju.edu.cn

Received date: 2023-06-13

  Revised date: 2023-06-26

  Accepted date: 2023-08-23

  Online published: 2023-09-01

Supported by

National Natural Science Foundation of China(62263030);Youth Project of Natural Science Foundation of Xinjiang Uygur Autonomous Region(2022D01C86)

摘要

针对系统模型动态不确定的高阶异构无人集群系统在切换通信拓扑下的时变编队控制问题,提出一种基于积分强化学习的最优分布式分层编队控制方法。结合时变编队切换向量构建多四旋翼无人机系统与多无人车系统的增广系统,将异构集群系统的时变编队控制问题转化为镇定问题。引入带折扣因子的价值函数,将异构集群系统的镇定问题转化为最优控制问题。在不破坏一致性分布式编队控制协议的基础上,仅替换反馈增益参数并对其进行取平均操作,以得到整个异构集群的最优时变编队切换控制协议。利用单网络“动作网络-评价网络”结构,结合积分强化学习算法和分布式控制方法,在线实时更新分布式时变编队切换控制器的反馈增益。通过理论证明和仿真实验验证了所设计控制方案的有效性和优越性。

本文引用格式

杨加秀 , 李新凯 , 张宏立 , 王昊 . 切换拓扑下异构集群的强化学习时变编队控制[J]. 航空学报, 2024 , 45(10) : 329166 -329166 . DOI: 10.7527/S1000-6893.2023.29166

Abstract

To address the problem of time-varying formation control of high-order heterogeneous unmanned cluster systems with uncertain system model dynamics and switching communication topology, an optimal distributed hierarchical formation control method is proposed based on integral reinforcement learning. The time-varying formation control problem for heterogeneous cluster systems is transformed into a stabilization problem by using time-varying formation switching vectors to construct an augmented system of multi-quadrotor Unmanned Aircraft System (UAS) with multi-unmanned vehicle systems. The value function with discount factor is introduced to transform the stabilization problem of the heterogeneous clustered system into an optimal control problem. Only the feedback gain parameters are replaced and averaged to obtain the optimal time-varying formation switching control protocol for the whole heterogeneous cluster without destroying the consistent distributed formation control protocol. The feedback gain of the distributed time-varying formation switching controller is updated online in real time using a single-network “actor network-critic network” structure, combined with the integral reinforcement learning algorithm and the distributed control method. The effectiveness and superiority of the proposed control scheme are verified by theoretical proof and simulation experiments.

参考文献

1 MEHMOOD A, IQBAL Z, SHAH A ALI, et al. An intelligent cluster-based communication system for multi-unmanned aerial vehicles for searching and rescuing[J]. Electronics202312(3): 607.
2 WANG Y W, WEI Y W, LIU X K, et al. Optimal persistent monitoring using second-order agents with physical constraints[J]. IEEE Transactions on Automatic Control201964(8): 3239-3252.
3 SERVIDIA P A, ESPA?A M. On autonomous reconfiguration of SAR satellite formation flight with continuous control[J]. IEEE Transactions on Aerospace and Electronic Systems202157(6): 3861-3873.
4 ALI Z A, HAN Z G. Multi-unmanned aerial vehicle swarm formation control using hybrid strategy[J]. Transactions of the Institute of Measurement and Control202143(12): 2689-2701.
5 SASKA M, HERT D, BACA T, et al. Formation control of unmanned micro aerial vehicles for straitened environments[J]. Autonomous Robots202044(6): 991-1008.
6 DONG X W, HU G Q. Time-varying formation control for general linear multi-agent systems with switching directed topologies[J]. Automatica201673: 47-55.
7 LIU W, ZHOU S L, QI Y H, et al. Distributed formation control for multiple unmanned aerial vehicles with directed switching communication topologies[J]. Control Theory&Applications201532(10): 1422-1427.
8 KARIMODDINI A, LIN H, CHEN B M, et al. Hybrid three-dimensional formation control for unmanned helicopters[J]. Automatica (Journal of IFAC)201349(2): 424-433.
9 吴宇, 梁天骄. 基于改进一致性算法的无人机编队控制[J]. 航空学报202041(9): 323848.
  WU Y, LIANG T J. Improved consensus-based algorithm for unmanned aerial vehicle formation control[J]. Acta Aeronautica et Astronautica Sinica202041(9): 323848 (in Chinese).
10 OH K K, PARK M C, AHN H S. A survey of multi-agent formation control[J]. Automatica201553: 424-440.
11 魏志强, 翁哲鸣, 化永朝, 等. 切换拓扑下异构无人集群编队-合围跟踪控制[J]. 航空学报202344(2): 326504.
  WEI Z Q, WENG Z M, HUA Y Z, et al. Formation-containment tracking control for heterogeneous unmanned swarm systems with switching topologies[J]. Acta Aeronautica et Astronautica Sinica202344(2): 326504 (in Chinese).
12 LINDEMUTH M, MURPHY R, STEIMLE E, et al. Sea robot-assisted inspection[J]. IEEE Robotics & Automation Magazine201118(2): 96-107.
13 WEI W, WANG J J, FANG Z R, et al. 3U: Joint design of UAV-USV-UUV networks for cooperative target hunting[J]. IEEE Transactions on Vehicular Technology202372(3): 4085-4090.
14 田磊, 董希旺, 赵启伦, 等. 异构集群系统分布式自适应输出时变编队跟踪控制[J]. 自动化学报202147(10): 2386-2401.
  TIAN L, DONG X W, ZHAO Q L, et al. Distributed adaptive time-varying output formation tracking for heterogeneous swarm systems[J]. Acta Automatica Sinica202147(10): 2386-2401 (in Chinese).
15 马亚杰, 王娟, 姜斌, 等. 一种无人机-无人车编队系统容错控制方法[J]. 航空学报202344(8): 327216.
  MA Y J, WANG J, JIANG B, et al. A fault-tolerant control scheme for UAVs-UGVs formation systems[J]. Acta Aeronautica et Astronautica Sinica202344(8): 327216 (in Chinese).
16 DONG X W, LI Q D, ZHAO Q L, et al. Time-varying group formation analysis and design for second-order multi-agent systems with directed topologies[J]. Neurocomputing2016205: 367-374.
17 XIE Y J, LIN Z L. Global optimal consensus for higher-order multi-agent systems with bounded controls[J]. Automatica201999: 301-307.
18 LIU J, LI P, CHEN W, et al. Distributed formation control of fractional-order multi-agent systems with relative damping and nonuniform time-delays[J]. ISA Transactions201993: 189-198.
19 XU Y, LI D Y, LUO D L, et al. Two-layer distributed hybrid affine formation control of networked Euler-Lagrange systems[J]. Journal of the Franklin Institute2019356(4): 2172-2197.
20 NIAN X H, SU S J, PAN H. Consensus tracking protocol and formation control of multi-agent systems with switching topology[J]. Journal of Central South University of Technology201118(4): 1178-1183.
21 DONG X W, ZHOU Y, REN Z, et al. Time-varying formation control for unmanned aerial vehicles with switching interaction topologies[J]. Control Engineering Practice201646: 26-36.
22 DONG X W, SHI Z Y, LU G, et al. Time-varying formation control for high-order linear swarm systems with switching interaction topologies[J]. IET Control Theory & Applications20148(18): 2162-2170.
23 向锦武, 董希旺, 丁文锐, 等. 复杂环境下无人集群系统自主协同关键技术[J]. 航空学报202243(10): 527570.
  XIANG J W, DONG X W, DING W R, et al. Key technologies for autonomous cooperation of unmanned swarm systems in complex environments[J]. Acta Aeronautica et Astronautica Sinica202243(10): 527570 (in Chinese).
24 王琳, 张庆杰, 陈宏伟. 满足LQR指标的群系统编队形成问题优化控制方法[J]. 航空学报202243(S1): 726902.
  WANG L, ZHANG Q J, CHEN H W. Optimal control method for swarm systems formation with LQR performance index [J]. Acta Aeronautica et Astronautica Sinica202243(S1): 726902 (in Chinese).
25 HU J Y, LANZON A. Cooperative adaptive time-varying formation tracking for multi-agent systems with LQR performance index and switching directed topologies[C]∥2018 IEEE Conference on Decision and Control. Piscataway: IEEE Press, 2018: 5102-5107.
26 YANG X K, WANG W, HUANG P. Distributed optimal consensus with obstacle avoidance algorithm of mixed-order UAVs-USVs-UUVs systems[J]. ISA Transactions2020107: 270-286.
27 赵斐然, 游科友. 数据驱动的策略优化控制律设计最新研究综述[J]. 中国科学: 信息科学202353(6): 1027-1049.
  ZHAO F R, YOU K Y. Survey of recent progress in data-driven policy optimization for controller design[J]. Scientia Sinica (Informationis)202353(6): 1027-1049 (in Chinese).
28 MODARES H, LEWIS F L. Linear quadratic tracking control of partially-unknown continuous-time systems using reinforcement learning[J]. IEEE Transactions on Automatic Control201459(11): 3051-3056.
29 ZHU L M, MODARES H, PEEN G O, et al. Adaptive suboptimal output-feedback control for linear systems using integral reinforcement learning[J]. IEEE Transactions on Control Systems Technology201523(1): 264-273.
30 庞文砚, 范家璐, 姜艺, 等. 基于强化学习的部分线性离散时间系统的最优输出调节[J]. 自动化学报202248(9): 2242-2253.
  PANG W Y, FAN J L, JIANG Y, et al. Optimal output regulation of partially linear discrete-time systems using reinforcement learning[J]. Acta Automatica Sinica202248(9): 2242-2253 (in Chinese).
31 MODARES H, LEWIS F L, KANG W, et al. Optimal synchronization of heterogeneous nonlinear systems with unknown dynamics[J]. IEEE Transactions on Automatic Control201863(1): 117-131.
32 YANG Y L, MODARES H, WUNSCH D C, et al. Leader-follower output synchronization of linear heterogeneous systems with active leader using reinforcement learning[J]. IEEE Transactions on Neural Networks and Learning Systems201829(6): 2139-2153.
33 LIU H, MENG Q Y, PENG F C, et al. Heterogeneous formation control of multiple UAVs with limited-input leader via reinforcement learning[J]. Neurocomputing2020412: 63-71.
34 WANG K, MU C X. Learning-based control with decentralized dynamic event-triggering for vehicle systems[J]. IEEE Transactions on Industrial Informatics202319(3): 2629-2639.
35 AWEYA J, OUELLETTE M, MONTUNO D Y. Design and stability analysis of a rate control algorithm using the Routh-Hurwitz stability criterion[J]. IEEE/ACM Transactions on Networking200412(4): 719-732.
36 GAO Y P, WANG L. Sampled-data based consensus of continuous-time multi-agent systems with time-varying topology[J]. IEEE Transactions on Automatic Control201156(5): 1226-1231.
37 TUTSOY O, BARKANA D E, TUGAL H. Design of a completely model free adaptive control in the presence of parametric, non-parametric uncertainties and random control signal delay[J]. ISA Transactions201876: 67-77.
文章导航

/