基于贝叶斯优化的机载智能避让系统安全性评估

doi:10.7527/S1000-6893.2025.31973

电子电气工程与控制

本期目录 | 过刊浏览 | 高级检索

前一篇 | 后一篇

基于贝叶斯优化的机载智能避让系统安全性评估

马赞¹^,², 白杰²(), 闫励勤²^,³, 陈勇⁴, 孙淑光²^,³

^1.中国民航大学安全科学与工程学院，天津 300300
^2.中国民航大学民用航空器适航审定技术重点实验室，天津 300300
^3.中国民航大学电子信息与自动化学院，天津 300300
^4.中国商飞上海飞机设计研究院，上海 200216

收稿日期:2025-03-12 修回日期:2025-04-29 接受日期:2025-07-07 出版日期:2025-07-28 发布日期:2025-07-18
通讯作者: 白杰 E-mail:jbai@cauc.edu.cn
基金资助:
国家重点研发计划(2022YFB3904300);中央高校基金资助课题(XJ2021004301)

Safety assessment for airborne intelligent avoidance system based on Bayesian optimization

Zan MA¹^,², Jie BAI²(), Liqin YAN²^,³, Yong CHEN⁴, Shuguang SUN²^,³

^1.College of Safety Science and Engineering，Civil Aviation University of China，Tianjin 300300，China
^2.Key Laboratory of Civil Aircraft Airworthiness Certification Technology，Civil Aviation University of China，Tianjin 300300，China
^3.College of Electronic Information and Automation，Civil Aviation University of China，Tianjin 300300，China
^4.COMAC Shanghai Aircraft Design & Research Institute，Shanghai 200216，China

Received:2025-03-12 Revised:2025-04-29 Accepted:2025-07-07 Online:2025-07-28 Published:2025-07-18
Contact: Jie BAI E-mail:jbai@cauc.edu.cn
Supported by:
National Key Research and Development Program of China(2022YFB3904300);Fundamental Research Funds for the Central Universities(XJ2021004301)

摘要/Abstract

摘要：

针对强化学习在无人机智能避让系统中应用所带来的适航安全性挑战，在SAE ARP4761标准框架下，基于贝叶斯优化理论提出一种面向无人机智能避让系统安全性评估方法。首先，基于无人机运动学模型和近端策略优化算法，建立智能避让系统模型。其次，将系统模型的验证任务与贝叶斯优化理论结合，通过不确定性探索、边界细化和失效区域采样3个获取函数完成对高斯代理模型的迭代式训练，实现少量样本下智能避让系统的安全验证、安全边界确定和功能失效概率分析，支持整机/系统定量安全性评估。最后，基于典型智能感知避让系统设计架构为案例，证明该方法对适航安全性评估能够发挥有效支撑作用，可为智能避让系统的装机应用提供必要的适航符合性方法和技术保证。同时通过实验验证了在少量样本的情况下，相比于均匀采样和蒙特卡洛方法，基于贝叶斯优化的方法能够为强化学习模块提供细致的失效边界预测、精确的失效概率估计和更高的置信水平。

关键词: 强化学习, 机载智能避让系统, 近端策略优化算法, 贝叶斯优化, 适航安全性

Abstract:

To address the airworthiness safety challenges brought by the application of reinforcement learning in UAV intelligent avoidance systems， this paper proposes a safety assessment method for the intelligent avoidance system based on Bayesian optimization theory within the framework of the SAE ARP4761 standard. First， the intelligent avoidance system model is established based on the UAV kinematic model and the Proximal Policy Optimization （PPO） algorithm. Second， by integrating the system model verification task with Bayesian optimization theory， the iterative training of the Gaussian surrogate model is achieved through three acquisition functions： uncertainty exploration， boundary refinement， and failure region sampling. This enables safety verification， safety boundary determination， and functional failure probability analysis of the intelligent avoidance system with a small number of samples， supporting quantitative safety assessment at the whole aircraft/system level. Finally， taking a typical intelligent avoidance system architecture as a case， the proposed method is demonstrated to effectively support airworthiness safety assessment， providing essential airworthiness compliance methods and technical guarantees for the deployment of intelligent avoidance systems. Experimental results further validate that， under limited sample conditions， the Bayesian optimization-based method outperforms uniform sampling and Monte Carlo methods by offering more detailed failure boundary predictions， precise failure probability estimation， and higher confidence levels for the reinforcement learning module.

Key words: reinforcement learning, airborne intelligent avoidance system, proximal policy optimization, Bayesian optimization, airworthiness safety

中图分类号:

V244.12

马赞, 白杰, 闫励勤, 陈勇, 孙淑光. 基于贝叶斯优化的机载智能避让系统安全性评估[J]. 航空学报, 2026, 47(1): 331973.

Zan MA, Jie BAI, Liqin YAN, Yong CHEN, Shuguang SUN. Safety assessment for airborne intelligent avoidance system based on Bayesian optimization[J]. Acta Aeronautica et Astronautica Sinica, 2026, 47(1): 331973.

图/表 21

图 1

图 2

表1

包含RL模型智能避让系统安全性评估流程

步骤

活动

步骤1：功能危害评估

在运行概念下进行系统功能危害分析（本文重点关注含RL模型错误“机动”功能危害）

步骤2：初步系统安全性评估

1.定义安全性目标

2.定义初步系统架构以满足安全性目标

3.衍生包括独立需求的安全性需求，满足目标和支持架构

4.定义和确认假设

5.分配研制保证水平（DAL）

6.基于贝叶斯优化，通过不确定性探索、边界细化和失效区域采样函数，训练高斯代理模型

7.在冲突距离和冲突角度两维输入空间X和分布函数 $p x$ 的场景下对RL模型进行失效概率估计与分析

8.衍生需求满足性的RL模型运行域（即安全边界）

9.执行RL单元失效模式影响分析

步骤3：系统安全性评估

执行最终的安全性评估

表1

图3

图4

图 5

图 6

图 7

表2

4种评估方法用于安全验证任务

评估方法	$R f a i l$	$x *$	$p x *$	$P ̂ f a i l$	$Δ P f a i l$
真实系统	20.8%	（0.867 93，1.895 79）	1.496×10^-3	8.410 72×10^-5
代理模型	50.2%	（0.861 72，1.895 79）	1.396×10^-3	8.407 26×10^-5	4.12×10^-4
均匀采样	21.7%	（0.838 70，1.935 48）	9.543×10^-4	8.055 44×10^-5	4.22×10^-2
蒙特卡洛	21.6%	（0.827 56，1.918 87）	9.687×10^-4	8.619 19×10^-5	2.48×10^-2

表2

图 8

图 9

图 10

表3

3种获取函数的消融实验用于安全验证任务

获取函数组合	$R f a i l$	$x *$	$p x *$	$P ̂ f a i l$	$Δ P f a i l$
不确定性探索	21.6%	（0.833 67，1.907 82）	1.030×10^-3	8.522 94×10^-5	1.33×10^-2
边界细化	35.5%	（0.861 72，1.899 80）	1.368×10^-3	8.628 98×10^-5	2.59×10^-2
失效区域采样	89.2%	（0.436 88，1.330 66）	3.352×10^-4	5.827 44×10^-5	3.07×10^-1
不确定性探索、边界细化	34.6%	（0.857 72，1.895 79）	1.366×10^-3	8.485 22×10^-5	8.86×10^-3
不确定性探索、失效区域采样	49.3%	（0.821 64，1.887 77）	1.094×10^-3	7.749 55×10^-5	7.86×10^-2
边界细化、失效区域采样	52.8%	（0.857 71，1.895 79）	1.361×10^-3	8.453 47×10^-5	5.08×10^-3
代理模型	50.2%	（0.861 72，1.895 79）	1.396×10^-3	8.407 26×10^-5	4.12×10^-4

表3

图 11

表4

扩展实验安全验证任务对比

评估方法	$R f a i l$	$x *$	$p x *$	$P ̂ f a i l$
均匀采样	21.4%	（0.915，1.932，1.254）	1.665×10^-3	8.051 64×10^-5
代理模型	47.2%	（0.929，1.939，1.293）	1.331×10^-3	7.714 31×10^-5

表4

图 12

图 13

图 14

图 15

表5

图 16

参考文献 32

[1]	DIEZ-TOMILLO J， ALCARAZ-CALERO J M， WANG Q. Face verification algorithms for UAV applications： An empirical comparative analysis［J］. Journal of Communications Software and Systems， 2024， 20（1）： 1-12.
[2]	张学军，李诚龙，张志远，等. 低空航行系统实时风险管理能力构建：概念、挑战与技术［J］. 航空学报， 2025， 46（11）： 8-34.
	ZHANG X J， LI C L， ZHANG Z Y， et al. Constructing in-time risk management capabilities for low-altitude aviation systems： Concepts， technologies， and challenges［J］. Acta Aeronautica et Astronautica Sinica， 2025， 46（11）： 8-34 （in Chinese）.
[3]	国家市场监督管理总局国家标准化管理委员会. 民用无人驾驶航空器系统安全要求：［S］. 北京：中国标准出版社， 2023.
	Standardization Administration of the People’s Republic of China. Safety requirements for civil unmanned aircraft system：［S］. Beijing： Standards Press of China， 2023 （in Chinese）.
[4]	HARMAN W H. TCAS—A system for preventing midair collisions［J］. The Lincoln Laboratory Journal， 1989， 2（3）： 437–457.
[5]	KOCHENDERFER M J， HOLLAND J E， CHRYSSANTHACOPOULOS J P. Next-generation airborne collision avoidance system［J］. Lincoln Laboratory Journal， 2012， 19（1）： 17-33.
[6]	KOCHENDERFER M J， AMATO C， CHOWDHARY G， et al. Optimized airborne collision avoidance［M］∥Decision Making under Uncertainty： Theory and Application. Cambridge： MIT Press， 2015： 249-276.
[7]	ARULKUMARAN K， DEISENROTH M P， BRUNDAGE M， et al. Deep reinforcement learning： A brief survey［J］. IEEE Signal Processing Magazine， 2017， 34（6）： 26-38.
[8]	WULFE B. UAV collision avoidance policy optimization with deep reinforcement learning［D］. Palo Alto：Stanford University， 2017.
[9]	汤新民，李帅，顾俊伟，等. 一种无人机冲突探测与避让系统决策方法［J］. 电子与信息学报， 2025， 47（5）： 1301-1309.
	TANG X M， LI S， GU J W， et al. A decision-making method for UAV conflict detection and avoidance system［J］. Journal of Electronics & Information Technology， 2025， 47（5）： 1301-1309 （in Chinese）.
[10]	HU J M， YANG X X， WANG W C， et al. Obstacle avoidance for UAS in continuous action space using deep reinforcement learning［J］. IEEE Access， 2022， 10： 90623-90634.
[11]	马赞，白杰，陈勇，等. 基于条件高斯PAC-Bayes的机载CNN分类器安全性评估［J］. 航空学报， 2025， 46（4）： 330824.
	MA Z， BAI J， CHEN Y， et al. Safety assessment for airborne CNN classifier based on conditional Gaussian PAC-Bayes［J］. Acta Aeronautica et Astronautica Sinica， 2025， 46（4）： 330824 （in Chinese）.
[12]	International Recommended Practice SAE. Guidelines and methods for conducting the safety assessment process on civil airborne systems and equipment： ARP4761 ［S］. Warrendale： SAE International， 1996.
[13]	CLAVIÈRE A， ASSELIN E， GARION C， et al. Safety verification of neural network controlled systems［C］∥2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops （DSN-W）. Piscataway： IEEE Press， 2021： 47-54.
[14]	BAK S. Nnenum： Verification of ReLU neural networks with optimized abstraction refinement［C］∥NASA Formal Methods. Cham： Springer， 2021： 19-36.
[15]	LECUN Y， CORTES C， BURGES C J C. The MNIST database of handwritten digits［EB/OL］. （2013-05-14）［2025-03-01］. .
[16]	KRIZHEVSKY A， HINTON G. Learning multiple layers of features from tiny images［J］. Handbook of Systemic Autoimmune Diseases， 2009， 1（4）： 1-57.
[17]	MAI V， MANI K， PAULL L. Sample efficient deep reinforcement learning via uncertainty estimation［DB/OL］. arXiv preprint： 2201.01666， 2022.
[18]	PUSTYNNIKOV A， EREMEEV D. Estimating uncertainty for vehicle motion prediction on yandex shifts dataset［DB/OL］. arXiv preprint： 2112.08355， 2021.
[19]	VAN DER LENDE M， SABATELLI M， CARDENAS-CARTAGENA J. Interpretable function approximation with Gaussian processes in value-based model-free reinforcement learning［C］∥Northern Lights Deep Learning Conference 2025. New York： PMLR， 2025： 141-154.
[20]	CORSO A， MOSS R， KOREN M， et al. A survey of algorithms for black-box safety validation of cyber-physical systems［J］. Journal of Artificial Intelligence Research， 2022， 72： 377-428.
[21]	TABANDEH A， JIA G F， GARDONI P. A review and assessment of importance sampling methods for reliability analysis［J］. Structural Safety， 2022， 97： 102216.
[22]	ALI B S， OCHIENG W Y， MAJUMDAR A. ADS-B： Probabilistic safety assessment［J］. Journal of Navigation， 2017， 70（4）： 887-906.
[23]	GHATAS R， JACK D P， TSAKPINIS D， et al. Unmanned aircraft systems detect and avoid system： End-to-end verification and validation simulation study of minimum operational performance standards for integrating unmanned aircraft into the national airspace system［C］∥17th AIAA Aviation Technology， Integration， and Operations Conference. Reston： AIAA， 2017.
[24]	SCHULMAN J， WOLSKI F， DHARIWAL P， et al. Proximal policy optimization algorithms［DB/OL］. arXiv preprint： 1707.06347， 2017.
[25]	GENTON M G. Classes of kernels for machine learning： A statistics perspective［J］. Journal of Machine Learning Research， 2002， 2： 299-312.
[26]	KCOHENDERFER M J， WHEELER T A. Algorithms for optimization［M］. Cambridge： MIT Press， 2019： 1-20.
[27]	MURPHY K P. Machine learning： A probabilistic perspective［M］. Cambridge： MIT Press， 2012： 253-254
[28]	BISHOP， C M， NASRABADI N M. Pattern recognition and machine learning［M］. New York： springer， 2006： 365-366.
[29]	QUIÑONERO-CANDELA J， RASMUSSEN C E. A unifying view of sparse approximate Gaussian process regression［J］. Journal of Machine Learning Research， 2005， 6： 1939-1959.
[30]	JARUS Working Group 4. Detect and avoid-A working paper-Draft version 0.7［Z］. Joint Authorities for Rulemaking on Unmanned Systems （JARUS）， 2015.
[31]	TABASSUM A， SABATINI R， GARDI A. Probabilistic safety assessment for UAS separation assurance and collision avoidance systems［J］. Aerospace， 2019， 6（2）： 19.
[32]	RASCH M， UBBEN P T， MOST T， et al. Safety assessment and uncertainty quantification of automated driver assistance systems using stochastic analysis methods［C］∥NAFEMS World Congress 2019. Knutsford： NAFEMS， 2019： 16.

E-mail：hkxb@buaa.edu.cn

关于我们

期刊社服务

专业学科

封面文章

友情链接

主管单位：中国科学技术协会主办单位：中国航空学会北京航空航天大学

编号	故障树底事件	失效概率/每次飞行
1	ADS-B系统功能失效	1.0×10^-5
2	GPS系统数据丢失	2.0×10^-4
3	GPS系统信息误导	3.0×10^-5
4	RL模块提供错误“机动”	1.0×10^-5
5	RL模块硬件失效	3.5×10^-6
6	飞行控制功能失效	1.0×10^-6
7	无人机位于相撞航线

基于贝叶斯优化的机载智能避让系统安全性评估

Safety assessment for airborne intelligent avoidance system based on Bayesian optimization

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 21

参考文献 32

相关文章 15

编辑推荐

Metrics

本文评价

[1]	章涛, 李攀, 王梓旭, 朱振华. 面向直升机姿态控制的强化学习奖励函数设计[J]. 航空学报, 2025, 46(S1): 732184-732184.
[2]	万开方, 吴志林, 武韫晖, 强皓植, 吴艺博, 李波. 拒止环境下基于深度强化学习的多无人机协同定位[J]. 航空学报, 2025, 46(8): 331024-331024.
[3]	姜凌峰, 李新凯, 张海, 李涵玮, 张宏立. 基于改进TD3算法的无人机动态环境无地图导航[J]. 航空学报, 2025, 46(8): 331035-331035.
[4]	马赞, 白杰, 陈勇, 刘瑞华, 张艳婷. 基于条件高斯PAC-Bayes的机载CNN分类器安全性评估[J]. 航空学报, 2025, 46(4): 330824-330824.
[5]	杨敏, 刘关俊, 周子渊. 基于安全强化学习的月球着陆器控制[J]. 航空学报, 2025, 46(3): 630553-630553.
[6]	谢启超, 曹承钰, 赵逸云, 李繁飙. 基于深度强化学习调参的制导控制一体化方法[J]. 航空学报, 2025, 46(24): 632345-632345.
[7]	范天麒, 邹征夏, 史振威. 基于强化学习数据合成的典型遥感目标检测[J]. 航空学报, 2025, 46(23): 631955-631955.
[8]	王辰, 魏才盛, 殷泽阳, 靳锴, 李星辰. 考虑信道资源约束的多无人机航迹与通信策略协同规划[J]. 航空学报, 2025, 46(18): 331837-331837.
[9]	罗祎喆, 张辉, 余新得, 金钊, 冯朔, 石育澄, 徐明亮. 面向舰载机多波次弹药保障任务的分层动态调度[J]. 航空学报, 2025, 46(18): 331945-331945.
[10]	黄湘松, 王梦宇, 潘大鹏. 基于对抗强化学习的无人机逃离路径规划方法[J]. 航空学报, 2025, 46(17): 331637-331637.
[11]	王昱, 谢志鹏, 田永健, 孟光磊. 虚拟结构引领强化学习分布式无人机编队控制[J]. 航空学报, 2025, 46(15): 331354-331354.
[12]	陈伟, 李璐璐, 陈董, 张少辉, 李亚飞, 王可, 靳远远, 徐明亮. 差异化保障需求驱动的舰载机多机协同决策方法[J]. 航空学报, 2025, 46(13): 531274-531274.
[13]	陈旭东, 陈琦琦, 罗祎喆, 王佳宝, 徐明亮. 异构舰载机舰面保障作业动态并行调度[J]. 航空学报, 2025, 46(13): 531329-531329.
[14]	王政, 王华, 崔可可, 李超超, 刘俊楠, 徐明亮. 局部引导强化学习的舰载机自主调运方法[J]. 航空学报, 2025, 46(13): 531333-531333.
[15]	凌文辉, 牟春晖, 聂聆聪, 杜宪, 孙希明. 基于改进DDPG的宽速域几何可调燃烧室压力分布控制[J]. 航空学报, 2025, 46(12): 131092-131092.