航空学报 > 2026, Vol. 47 Issue (1): 331973-331973   doi: 10.7527/S1000-6893.2025.31973

基于贝叶斯优化的机载智能避让系统安全性评估

马赞1,2, 白杰2(), 闫励勤2,3, 陈勇4, 孙淑光2,3   

  1. 1.中国民航大学 安全科学与工程学院,天津 300300
    2.中国民航大学 民用航空器适航审定技术重点实验室,天津 300300
    3.中国民航大学 电子信息与自动化学院,天津 300300
    4.中国商飞上海飞机设计研究院,上海 200216
  • 收稿日期:2025-03-12 修回日期:2025-04-29 接受日期:2025-07-07 出版日期:2025-07-28 发布日期:2025-07-18
  • 通讯作者: 白杰 E-mail:jbai@cauc.edu.cn
  • 基金资助:
    国家重点研发计划(2022YFB3904300);中央高校基金资助课题(XJ2021004301)

Safety assessment for airborne intelligent avoidance system based on Bayesian optimization

Zan MA1,2, Jie BAI2(), Liqin YAN2,3, Yong CHEN4, Shuguang SUN2,3   

  1. 1.College of Safety Science and Engineering,Civil Aviation University of China,Tianjin 300300,China
    2.Key Laboratory of Civil Aircraft Airworthiness Certification Technology,Civil Aviation University of China,Tianjin 300300,China
    3.College of Electronic Information and Automation,Civil Aviation University of China,Tianjin 300300,China
    4.COMAC Shanghai Aircraft Design & Research Institute,Shanghai 200216,China
  • Received:2025-03-12 Revised:2025-04-29 Accepted:2025-07-07 Online:2025-07-28 Published:2025-07-18
  • Contact: Jie BAI E-mail:jbai@cauc.edu.cn
  • Supported by:
    National Key Research and Development Program of China(2022YFB3904300);Fundamental Research Funds for the Central Universities(XJ2021004301)

摘要:

针对强化学习在无人机智能避让系统中应用所带来的适航安全性挑战,在SAE ARP4761标准框架下,基于贝叶斯优化理论提出一种面向无人机智能避让系统安全性评估方法。首先,基于无人机运动学模型和近端策略优化算法,建立智能避让系统模型。其次,将系统模型的验证任务与贝叶斯优化理论结合,通过不确定性探索、边界细化和失效区域采样3个获取函数完成对高斯代理模型的迭代式训练,实现少量样本下智能避让系统的安全验证、安全边界确定和功能失效概率分析,支持整机/系统定量安全性评估。最后,基于典型智能感知避让系统设计架构为案例,证明该方法对适航安全性评估能够发挥有效支撑作用,可为智能避让系统的装机应用提供必要的适航符合性方法和技术保证。同时通过实验验证了在少量样本的情况下,相比于均匀采样和蒙特卡洛方法,基于贝叶斯优化的方法能够为强化学习模块提供细致的失效边界预测、精确的失效概率估计和更高的置信水平。

关键词: 强化学习, 机载智能避让系统, 近端策略优化算法, 贝叶斯优化, 适航安全性

Abstract:

To address the airworthiness safety challenges brought by the application of reinforcement learning in UAV intelligent avoidance systems, this paper proposes a safety assessment method for the intelligent avoidance system based on Bayesian optimization theory within the framework of the SAE ARP4761 standard. First, the intelligent avoidance system model is established based on the UAV kinematic model and the Proximal Policy Optimization (PPO) algorithm. Second, by integrating the system model verification task with Bayesian optimization theory, the iterative training of the Gaussian surrogate model is achieved through three acquisition functions: uncertainty exploration, boundary refinement, and failure region sampling. This enables safety verification, safety boundary determination, and functional failure probability analysis of the intelligent avoidance system with a small number of samples, supporting quantitative safety assessment at the whole aircraft/system level. Finally, taking a typical intelligent avoidance system architecture as a case, the proposed method is demonstrated to effectively support airworthiness safety assessment, providing essential airworthiness compliance methods and technical guarantees for the deployment of intelligent avoidance systems. Experimental results further validate that, under limited sample conditions, the Bayesian optimization-based method outperforms uniform sampling and Monte Carlo methods by offering more detailed failure boundary predictions, precise failure probability estimation, and higher confidence levels for the reinforcement learning module.

Key words: reinforcement learning, airborne intelligent avoidance system, proximal policy optimization, Bayesian optimization, airworthiness safety

中图分类号: