切换拓扑下异构集群的强化学习时变编队控制
收稿日期: 2023-06-13
修回日期: 2023-06-26
录用日期: 2023-08-23
网络出版日期: 2023-09-01
基金资助
国家自然科学基金(62263030);新疆维吾尔自治区自然科学基金青年科学基金(2022D01C86)
Time-varying formation control for heterogeneous clusters with switching topologies via reinforcement learning
Received date: 2023-06-13
Revised date: 2023-06-26
Accepted date: 2023-08-23
Online published: 2023-09-01
Supported by
National Natural Science Foundation of China(62263030);Youth Project of Natural Science Foundation of Xinjiang Uygur Autonomous Region(2022D01C86)
针对系统模型动态不确定的高阶异构无人集群系统在切换通信拓扑下的时变编队控制问题,提出一种基于积分强化学习的最优分布式分层编队控制方法。结合时变编队切换向量构建多四旋翼无人机系统与多无人车系统的增广系统,将异构集群系统的时变编队控制问题转化为镇定问题。引入带折扣因子的价值函数,将异构集群系统的镇定问题转化为最优控制问题。在不破坏一致性分布式编队控制协议的基础上,仅替换反馈增益参数并对其进行取平均操作,以得到整个异构集群的最优时变编队切换控制协议。利用单网络“动作网络-评价网络”结构,结合积分强化学习算法和分布式控制方法,在线实时更新分布式时变编队切换控制器的反馈增益。通过理论证明和仿真实验验证了所设计控制方案的有效性和优越性。
杨加秀 , 李新凯 , 张宏立 , 王昊 . 切换拓扑下异构集群的强化学习时变编队控制[J]. 航空学报, 2024 , 45(10) : 329166 -329166 . DOI: 10.7527/S1000-6893.2023.29166
To address the problem of time-varying formation control of high-order heterogeneous unmanned cluster systems with uncertain system model dynamics and switching communication topology, an optimal distributed hierarchical formation control method is proposed based on integral reinforcement learning. The time-varying formation control problem for heterogeneous cluster systems is transformed into a stabilization problem by using time-varying formation switching vectors to construct an augmented system of multi-quadrotor Unmanned Aircraft System (UAS) with multi-unmanned vehicle systems. The value function with discount factor is introduced to transform the stabilization problem of the heterogeneous clustered system into an optimal control problem. Only the feedback gain parameters are replaced and averaged to obtain the optimal time-varying formation switching control protocol for the whole heterogeneous cluster without destroying the consistent distributed formation control protocol. The feedback gain of the distributed time-varying formation switching controller is updated online in real time using a single-network “actor network-critic network” structure, combined with the integral reinforcement learning algorithm and the distributed control method. The effectiveness and superiority of the proposed control scheme are verified by theoretical proof and simulation experiments.
1 | MEHMOOD A, IQBAL Z, SHAH A ALI, et al. An intelligent cluster-based communication system for multi-unmanned aerial vehicles for searching and rescuing[J]. Electronics, 2023, 12(3): 607. |
2 | WANG Y W, WEI Y W, LIU X K, et al. Optimal persistent monitoring using second-order agents with physical constraints[J]. IEEE Transactions on Automatic Control, 2019, 64(8): 3239-3252. |
3 | SERVIDIA P A, ESPA?A M. On autonomous reconfiguration of SAR satellite formation flight with continuous control[J]. IEEE Transactions on Aerospace and Electronic Systems, 2021, 57(6): 3861-3873. |
4 | ALI Z A, HAN Z G. Multi-unmanned aerial vehicle swarm formation control using hybrid strategy[J]. Transactions of the Institute of Measurement and Control, 2021, 43(12): 2689-2701. |
5 | SASKA M, HERT D, BACA T, et al. Formation control of unmanned micro aerial vehicles for straitened environments[J]. Autonomous Robots, 2020, 44(6): 991-1008. |
6 | DONG X W, HU G Q. Time-varying formation control for general linear multi-agent systems with switching directed topologies[J]. Automatica, 2016, 73: 47-55. |
7 | LIU W, ZHOU S L, QI Y H, et al. Distributed formation control for multiple unmanned aerial vehicles with directed switching communication topologies[J]. Control Theory&Applications, 2015, 32(10): 1422-1427. |
8 | KARIMODDINI A, LIN H, CHEN B M, et al. Hybrid three-dimensional formation control for unmanned helicopters[J]. Automatica (Journal of IFAC), 2013, 49(2): 424-433. |
9 | 吴宇, 梁天骄. 基于改进一致性算法的无人机编队控制[J]. 航空学报, 2020, 41(9): 323848. |
WU Y, LIANG T J. Improved consensus-based algorithm for unmanned aerial vehicle formation control[J]. Acta Aeronautica et Astronautica Sinica, 2020, 41(9): 323848 (in Chinese). | |
10 | OH K K, PARK M C, AHN H S. A survey of multi-agent formation control[J]. Automatica, 2015, 53: 424-440. |
11 | 魏志强, 翁哲鸣, 化永朝, 等. 切换拓扑下异构无人集群编队-合围跟踪控制[J]. 航空学报, 2023, 44(2): 326504. |
WEI Z Q, WENG Z M, HUA Y Z, et al. Formation-containment tracking control for heterogeneous unmanned swarm systems with switching topologies[J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(2): 326504 (in Chinese). | |
12 | LINDEMUTH M, MURPHY R, STEIMLE E, et al. Sea robot-assisted inspection[J]. IEEE Robotics & Automation Magazine, 2011, 18(2): 96-107. |
13 | WEI W, WANG J J, FANG Z R, et al. 3U: Joint design of UAV-USV-UUV networks for cooperative target hunting[J]. IEEE Transactions on Vehicular Technology, 2023, 72(3): 4085-4090. |
14 | 田磊, 董希旺, 赵启伦, 等. 异构集群系统分布式自适应输出时变编队跟踪控制[J]. 自动化学报, 2021, 47(10): 2386-2401. |
TIAN L, DONG X W, ZHAO Q L, et al. Distributed adaptive time-varying output formation tracking for heterogeneous swarm systems[J]. Acta Automatica Sinica, 2021, 47(10): 2386-2401 (in Chinese). | |
15 | 马亚杰, 王娟, 姜斌, 等. 一种无人机-无人车编队系统容错控制方法[J]. 航空学报, 2023, 44(8): 327216. |
MA Y J, WANG J, JIANG B, et al. A fault-tolerant control scheme for UAVs-UGVs formation systems[J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(8): 327216 (in Chinese). | |
16 | DONG X W, LI Q D, ZHAO Q L, et al. Time-varying group formation analysis and design for second-order multi-agent systems with directed topologies[J]. Neurocomputing, 2016, 205: 367-374. |
17 | XIE Y J, LIN Z L. Global optimal consensus for higher-order multi-agent systems with bounded controls[J]. Automatica, 2019, 99: 301-307. |
18 | LIU J, LI P, CHEN W, et al. Distributed formation control of fractional-order multi-agent systems with relative damping and nonuniform time-delays[J]. ISA Transactions, 2019, 93: 189-198. |
19 | XU Y, LI D Y, LUO D L, et al. Two-layer distributed hybrid affine formation control of networked Euler-Lagrange systems[J]. Journal of the Franklin Institute, 2019, 356(4): 2172-2197. |
20 | NIAN X H, SU S J, PAN H. Consensus tracking protocol and formation control of multi-agent systems with switching topology[J]. Journal of Central South University of Technology, 2011, 18(4): 1178-1183. |
21 | DONG X W, ZHOU Y, REN Z, et al. Time-varying formation control for unmanned aerial vehicles with switching interaction topologies[J]. Control Engineering Practice, 2016, 46: 26-36. |
22 | DONG X W, SHI Z Y, LU G, et al. Time-varying formation control for high-order linear swarm systems with switching interaction topologies[J]. IET Control Theory & Applications, 2014, 8(18): 2162-2170. |
23 | 向锦武, 董希旺, 丁文锐, 等. 复杂环境下无人集群系统自主协同关键技术[J]. 航空学报, 2022, 43(10): 527570. |
XIANG J W, DONG X W, DING W R, et al. Key technologies for autonomous cooperation of unmanned swarm systems in complex environments[J]. Acta Aeronautica et Astronautica Sinica, 2022, 43(10): 527570 (in Chinese). | |
24 | 王琳, 张庆杰, 陈宏伟. 满足LQR指标的群系统编队形成问题优化控制方法[J]. 航空学报, 2022, 43(S1): 726902. |
WANG L, ZHANG Q J, CHEN H W. Optimal control method for swarm systems formation with LQR performance index [J]. Acta Aeronautica et Astronautica Sinica, 2022, 43(S1): 726902 (in Chinese). | |
25 | HU J Y, LANZON A. Cooperative adaptive time-varying formation tracking for multi-agent systems with LQR performance index and switching directed topologies[C]∥2018 IEEE Conference on Decision and Control. Piscataway: IEEE Press, 2018: 5102-5107. |
26 | YANG X K, WANG W, HUANG P. Distributed optimal consensus with obstacle avoidance algorithm of mixed-order UAVs-USVs-UUVs systems[J]. ISA Transactions, 2020, 107: 270-286. |
27 | 赵斐然, 游科友. 数据驱动的策略优化控制律设计最新研究综述[J]. 中国科学: 信息科学, 2023, 53(6): 1027-1049. |
ZHAO F R, YOU K Y. Survey of recent progress in data-driven policy optimization for controller design[J]. Scientia Sinica (Informationis), 2023, 53(6): 1027-1049 (in Chinese). | |
28 | MODARES H, LEWIS F L. Linear quadratic tracking control of partially-unknown continuous-time systems using reinforcement learning[J]. IEEE Transactions on Automatic Control, 2014, 59(11): 3051-3056. |
29 | ZHU L M, MODARES H, PEEN G O, et al. Adaptive suboptimal output-feedback control for linear systems using integral reinforcement learning[J]. IEEE Transactions on Control Systems Technology, 2015, 23(1): 264-273. |
30 | 庞文砚, 范家璐, 姜艺, 等. 基于强化学习的部分线性离散时间系统的最优输出调节[J]. 自动化学报, 2022, 48(9): 2242-2253. |
PANG W Y, FAN J L, JIANG Y, et al. Optimal output regulation of partially linear discrete-time systems using reinforcement learning[J]. Acta Automatica Sinica, 2022, 48(9): 2242-2253 (in Chinese). | |
31 | MODARES H, LEWIS F L, KANG W, et al. Optimal synchronization of heterogeneous nonlinear systems with unknown dynamics[J]. IEEE Transactions on Automatic Control, 2018, 63(1): 117-131. |
32 | YANG Y L, MODARES H, WUNSCH D C, et al. Leader-follower output synchronization of linear heterogeneous systems with active leader using reinforcement learning[J]. IEEE Transactions on Neural Networks and Learning Systems, 2018, 29(6): 2139-2153. |
33 | LIU H, MENG Q Y, PENG F C, et al. Heterogeneous formation control of multiple UAVs with limited-input leader via reinforcement learning[J]. Neurocomputing, 2020, 412: 63-71. |
34 | WANG K, MU C X. Learning-based control with decentralized dynamic event-triggering for vehicle systems[J]. IEEE Transactions on Industrial Informatics, 2023, 19(3): 2629-2639. |
35 | AWEYA J, OUELLETTE M, MONTUNO D Y. Design and stability analysis of a rate control algorithm using the Routh-Hurwitz stability criterion[J]. IEEE/ACM Transactions on Networking, 2004, 12(4): 719-732. |
36 | GAO Y P, WANG L. Sampled-data based consensus of continuous-time multi-agent systems with time-varying topology[J]. IEEE Transactions on Automatic Control, 2011, 56(5): 1226-1231. |
37 | TUTSOY O, BARKANA D E, TUGAL H. Design of a completely model free adaptive control in the presence of parametric, non-parametric uncertainties and random control signal delay[J]. ISA Transactions, 2018, 76: 67-77. |
/
〈 |
|
〉 |