航空学报 > 2021, Vol. 42 Issue (3): 324307-324307   doi: 10.7527/S1000-6893.2020.24307

失效卫星姿态接管的并行学习合作博弈控制

韩楠1,2, 罗建军1,2, 马卫华1,2   

  1. 1. 西北工业大学 航天学院, 西安 710072;
    2. 西北工业大学 航天飞行动力技术重点实验室, 西安 710072
  • 收稿日期:2020-05-27 修回日期:2020-07-11 发布日期:2020-08-31
  • 通讯作者: 马卫华 E-mail:whma_npu@nwpu.edu.cn
  • 基金资助:
    国家自然科学基金(61690210,61690211);深圳科创委基金项目(JCYJ20180508151938535);西北工业大学博士论文创新基金(CX201803)

Concurrent learning cooperative game control for attitude takeover of failed satellites

HAN Nan1,2, LUO Jianjun1,2, MA Weihua1,2   

  1. 1. School of Astronautics, Northwestern Polytechnical University, Xi'an 710072, China;
    2. National Key Laboratory of Aerospace Flight Dynamics, Northwestern Polytechnical University, Xi'an 710072, China
  • Received:2020-05-27 Revised:2020-07-11 Published:2020-08-31
  • Supported by:
    National Natural Science Foundation of China (61690210, 61690211);Science Foundation of Science, Technology and Innovation Commission of Shenzhen Municipality (JCYJ20180508151938535);Innovation Foundation for Doctor Dissertation of Northwestern Polytechnical University (CX201803)

摘要: 针对多颗微小卫星合作接管失效卫星姿态运动的问题,研究了考虑微小卫星控制约束的多星合作博弈策略学习与协同控制方法。首先,建立了微小卫星合作博弈模型,给出了能够处理微小卫星控制约束的多星合作博弈帕累托最优策略显式表达式。其次,针对微小卫星合作博弈策略学习需求,通过过去与当前时刻数据的并行使用,设计了基于并行学习的策略迭代方法,该方法放松了神经网络(NN)权值矢量学习对持续激励条件的要求。给出了为确保神经网络权值矢量估值收敛,所使用的过去时刻数据所需满足的条件,并通过Lyapunov方法分析了神经网络权值矢量估计误差的一致最终有界性。之后,采用并行学习策略迭代方法进行了微小卫星合作博弈帕累托最优策略数值解的逼近。所获得的合作博弈策略具有反馈控制形式,在进行神经网络权值矢量学习后,各微小卫星能够通过合作博弈策略的独立计算实现失效卫星姿态运动接管过程中的闭环协同控制。所设计方法避免了传统姿态控制方法所需进行的力矩分配,消除了微小卫星数量对其控制计算复杂度的影响。最后,通过数值仿真对所设计方法的有效性进行了验证。

关键词: 接管控制, 微小卫星, 合作博弈, 并行学习, 策略迭代

Abstract: Taking the control constraint of microsatellites into consideration, this paper studies the cooperative game strategy learning and coordinated control problems of multiple microsatellites cooperatively taking over the attitude movement of a failed satellite. The cooperative game model of microsatellites is first established, providing the explicit expression of the Pareto optimal strategy that can handle the control constraint of microsatellites. Secondly, to learn the Pareto optimal strategy of microsatellites, we design a concurrent learning based Policy Iteration (PI) method free of the Persistency of Excitation (PE) condition by concurrent use of the past and current data. The condition that the used past data needs to satisfy to ensure the convergence of the Neural Network (NN) weight estimations is proposed, and the Uniform Ultimate Boundedness (UUB) of NN weight estimation errors is analyzed using the Lyapunov method. Then, the Pareto optimal strategy of microsatellites is approximated using the concurrent learning PI method. The obtained cooperative game strategy of microsatellites has a feedback control form, and after the NN weight learning is accomplished, the closed-loop coordinated control of microsatellites can be realized during the takeover control through independent game strategy calculation. Since the need of torque allocation required by traditional attitude control methods is avoided, the computational complexity of the method is independent of the number of microsatellites. Finally, numerical simulations are conducted to validate the effectiveness of the developed method.

Key words: takeover control, microsatellites, cooperative game, concurrent learning, policy iteration

中图分类号: