航空学报 > 2025, Vol. 46 Issue (24): 331872-331872   doi: 10.7527/S1000-6893.2025.31872

基于PPO-SHAP的IMA系统资源分配可解释决策方法

刘嘉琛1,2, 董磊1,3(), 孙紫荆4, 倪晔5, 陈曦1,3, 王鹏1,3   

  1. 1.中国民航大学 民航航空器适航审定技术重点实验室,天津 300300
    2.中国民航大学 安全科学与工程学院,天津 300300
    3.中国民航大学 科技创新研究院,天津 300300
    4.航空工业西安航空计算技术研究所,西安 710065
    5.中国商飞上海飞机试飞工程有限公司,上海 200232
  • 收稿日期:2025-02-13 修回日期:2025-04-01 接受日期:2024-06-26 出版日期:2025-07-16 发布日期:2025-07-15
  • 通讯作者: 董磊 E-mail:l-dong@cauc.edu.cn
  • 基金资助:
    部级项目

An explainable decision-making method for resource allocation in IMA system based on PPO-SHAP

Jiachen LIU1,2, Lei DONG1,3(), Zijing SUN4, Ye NI5, Xi CHEN1,3, Peng WANG1,3   

  1. 1.Key Laboratory of Civil Aircraft Airworthiness Technology,Civil Aviation University of China,Tianjin 300300,China
    2.College of Safety Science and Engineering,Civil Aviation University of China,Tianjin 300300,China
    3.Science and Technology Innovation Research Institute,Civil Aviation University of China,Tianjin 300300,China
    4.AVIC Xi’an Aeronautics Computing Technology Research Institute,Xi’an 710065,China
    5.COMAC Shanghai Aircraft Flight Test Co. ,Ltd. ,Shanghai 200232,China
  • Received:2025-02-13 Revised:2025-04-01 Accepted:2024-06-26 Online:2025-07-16 Published:2025-07-15
  • Contact: Lei DONG E-mail:l-dong@cauc.edu.cn
  • Supported by:
    Ministerial-level Project

摘要:

随着航空人工智能技术的发展,综合模块化航电(IMA)平台上驻留的智能化应用软件面临着资源稀缺和决策难信任的问题。首先在考虑航电系统架构要素的基础上,设计了多约束条件下的IMA系统资源分配决策优化目标,利用近端策略优化(PPO)算法求解序贯决策过程中航电资源的近似最优分配方案。然后建立了面向IMA系统资源分配的决策归因解释框架,通过对训练数据集的聚合和重采样来提取强化学习智能体专家策略,进而采用Shapley加性解释(SHAP)方法实现了全局与局部相结合的IMA系统资源分配决策解释。仿真实验结果表明,相比于贪婪算法和其他基于策略的强化学习算法,所提方法具有良好的收敛速度和学习效果,解决IMA系统资源分配问题时高效性、优越性显著,且能伴随生成定量的、可视化的特征归因解释信息,揭示了强化学习输入特征对决策的影响程度及决策意图,为智能航电系统可解释性方面的适航符合性验证提供了方法指导。

关键词: 可解释性, 综合模块化航电, 近端策略优化, 资源分配, 单一飞行员驾驶, Shapley加性解释

Abstract:

With the development of aviation Artificial Intelligence (AI) technology, the intelligent application software residing on the Integrated Modular Avionics (IMA) platform faces challenges such as resource scarcity and unreliable decision-making. Firstly, based on the consideration of avionics system architectural elements, the optimisation objective of IMA resource allocation decision-making under multiple constraints is designed. The proximal policy optimization algorithm is used to solve the near-optimal allocation scheme of IMA resources in the sequential decision-making process. Next, a decision attribution explanation framework for IMA resource allocation is established, and the expert policy of the reinforcement learning agent is extracted by aggregating and resampling the training dataset. Then, the SHapley Additive exPlanations (SHAP) method is used to achieve a combined global and local explanation of IMA resource allocation decisions. Simulation experiments and result analysis show that, compared with the greedy algorithm and other policy-based reinforcement learning algorithms, the proposed method exhibits good convergence speed and learning effect, and is remarkably superior in solving the IMA resource allocation problem. Additionally, this method generates quantitative and visual feature attribution explanations, which reveals the impact of input features on decision-making and clarify decision intent, thereby providing methodological guidance for airworthiness compliance validation of AI-based avionics system in terms of explainability.

Key words: explainability, integrated modular avionics, proximal policy optimization, resource allocation, single-pilot operations, SHapley additive exPlanations

中图分类号: