航空学报 > 2024, Vol. 45 Issue (8): 329136-329136   doi: 10.7527/S1000-6893.2023.29136

基于机器学习的航天器规避目标威胁博弈决策

张鸿林1,2, 罗建军1,2(), 马卫华1,2   

  1. 1.西北工业大学 航天学院,西安  710072
    2.航天飞行动力学技术重点实验室,西安  710072
  • 收稿日期:2023-06-06 修回日期:2023-08-22 接受日期:2023-11-02 出版日期:2024-04-25 发布日期:2023-11-16
  • 通讯作者: 罗建军 E-mail:jjluo@mail.nwpu.edu.cn;jjluo@nwpu.edu.cn
  • 基金资助:
    国家自然科学基金(12072269);航天飞行动力学技术重点实验室基金(6142210210302)

Spacecraft game decision making for threat avoidance of space targets based on machine learning

Honglin ZHANG1,2, Jianjun LUO1,2(), Weihua MA1,2   

  1. 1.School of Astronautics,Northwestern Polytechnical University,Xi’an  710072,China
    2.Science and Technology on Aerospace Flight Dynamics Laboratory,Xi’an  710072,China
  • Received:2023-06-06 Revised:2023-08-22 Accepted:2023-11-02 Online:2024-04-25 Published:2023-11-16
  • Contact: Jianjun LUO E-mail:jjluo@mail.nwpu.edu.cn;jjluo@nwpu.edu.cn
  • Supported by:
    National Natural Science Foundation of China(12072269);Foundation of Science and Technology on Aerospace Flight Dynamics Laboratory(6142210210302)

摘要:

针对航天器规避空间目标抵近威胁的决策问题,提出了一种智能决策框架和基于深度强化学习的自主决策方法。考虑到空间目标的机动特性和威胁规避的博弈性,基于感知-判断-决策-执行(OODA)环决策思想和机器学习方法,提出了一种航天器威胁规避智能博弈决策框架。基于该框架和对空间目标运动意图的推理,为了使航天器决策控制具备博弈应对能力,设计了基于深度强化学习的航天器机动决策算法和训练环境,实现了对空间目标典型运动意图的规避应对;进一步地,采用自我博弈学习训练提升航天器自主机动决策算法的泛化性和应对目标不确定机动的适应能力。最后,通过算例仿真及分析,验证了所提方法的有效性。

关键词: 航天器机动, 智能决策, 威胁规避, OODA环, 深度强化学习

Abstract:

An intelligent decision-making framework and a deep reinforcement learning-based autonomous decision-making method are proposed for the spacecraft decision-making in avoiding the threat of space targets. Taking into account the maneuvering characteristics of space targets and the gameplay of threat avoidance, an intelligent game decision-making framework for spacecraft threat avoidance is proposed based on the Observation-Orientation-Decision-Action (OODA) loop decision-making idea and machine learning techniques. Based on this framework and inference on the motion intentions of space targets, a deep reinforcement learning-based spacecraft maneuver decision-making algorithm and training environment are designed to enable spacecraft decision-making control with game response capability, which realizes the avoidance response to the typical motion intentions of space targets. Furthermore, the generalization of spacecraft autonomous maneuvering decision-making algorithm and its adaptability to possible uncertain maneuvers of space targets are improved by using the self-play learning technique. Finally, the effectiveness of our proposed method is verified through simulations.

Key words: spacecraft maneuver, intelligent decision-making, threat avoidance, OODA loop, deep reinforcement learning

中图分类号: