针对强化学习在无人机智能避让系统中应用所带来的适航安全性挑战,在SAE ARP4761标准框架下,基于贝叶斯优化理论提出一种面向无人机智能避让系统安全性评估方法。首先,基于无人机运动学模型和近端策略优化算法,建立智能避让系统模型;其次,将系统模型的验证任务与贝叶斯优化理论结合,通过不确定性探索、边界细化和失效区域采样三个获取函数完成对高斯代理模型的迭代式训练,实现少量样本下智能避让系统的安全验证、安全边界确定和功能失效概率分析,支持整机/系统定量安全性评估。最后,基于典型智能感知避让系统设计架构为案例,表明该方法对适航安全性评估的有效支撑作用,为智能避让系统的装机应用提供必要的适航符合性方法和技术保证。同时也实验验证在少量样本的情况下,相比于均匀采样和蒙特卡洛方法,基于贝叶斯优化的方法能够为强化学习模块提供细致的失效边界预测、精确的失效概率估计和更高的置信水平。
To address the airworthiness safety challenges brought by the application of reinforcement learning in UAV intelligent avoidance systems, this paper proposes a safety assessment method for the intelligent avoidance system based on Bayesian optimization theory within the framework of the SAE ARP4761 standard. First, the intelligent avoidance system model is established based on the UAV kinematic model and the Proximal Policy Optimization (PPO) algorithm. Second, by integrating the system model verification task with Bayesian optimization theory, the iterative training of the Gaussian surrogate model is achieved through three acquisition functions: uncertainty exploration, boundary refinement, and failure region sampling. This enables safety verification, safety boundary determination, and functional failure probability analysis of the intelligent avoidance system with a small number of samples, supporting quantitative safety assessment at the whole aircraft/system level. Finally, taking a typical intelligent avoidance system architecture as a case, the proposed method is demonstrated to effectively support airworthiness safety assessment, providing essential airworthiness compliance methods and technical guarantees for the deployment of intelligent avoidance systems. Experimental results further validate that, with a small number of samples, the Bayesian optimization-based method outperforms uniform sampling and Monte Carlo methods by offering more detailed failure boundary predictions, precise failure probability estimation, and higher confidence levels for the reinforcement learning module.