基于Transformer的航空结构表面裂纹智能追踪方法

李嘉欣; 吕帅帅; 王叶子; 杨宇; 李梓悦

doi:10.7527/S1000-6893.2025.32355

航空学报 >

2025 , Vol. 46 >Issue 21: 532355 - 532355

DOI: https://doi.org/10.7527/S1000-6893.2025.32355

中国飞机强度研究所建所 60 周年专刊

基于Transformer的航空结构表面裂纹智能追踪方法

李嘉欣 ,
吕帅帅 ,
王叶子 ,
杨宇 ,
李梓悦

展开

^1.中国飞机强度研究所，西安 710065
^2.强度与结构完整性全国重点实验室，西安 710065

．E-mail： 1056900948@qq.com

收稿日期: 2025-06-03

修回日期: 2025-07-03

录用日期: 2025-08-11

网络出版日期: 2025-08-28

基金资助

国家级项目

收起

Transformer-based intelligent tracking method of aviation structure surface cracks

Jiaxin LI ,
Shuaishuai LYU ,
Yezi WANG ,
Yu YANG ,
Ziyue LI

Expand

^1.Aircraft Strength Research Institute of China，Xi’an 710065，China
^2.National Key Laboratory of Strength and Structural Integrity，Xi’an 710065，China

E-mail：1056900948@qq.com

Received date: 2025-06-03

Revised date: 2025-07-03

Accepted date: 2025-08-11

Online published: 2025-08-28

Supported by

National Level Project

Fold

摘要

基于深度卷积网络的语义分割模型在结构损伤检测领域展现出了良好的应用效果，但在面向航空结构损伤检测时，由于裂纹通常在图像中的占比很小，多层的卷积、池化操作会导致裂纹信息丢失，严重降低分割精度。因此，对基于Transformer的语义分割模型开展研究，设计了适用于航空结构表面损伤检测的裂纹智能追踪通用模型TICT来实现对裂纹的精准分割和智能追踪。首先，使用自适应动态图像块划分机制将图像分割成不同大小、重叠程度的图像块；其次，将图像块输入基于Transformer的编码器中，提取包含裂纹图像上下文信息、局部细节信息的多尺度特征；然后，使用一个轻量级的多层感知机、注意力模块作为解码器生成裂纹掩码图像；最后，通过图像形态学操作对掩码图像中的裂纹连通域进行修正，并映射回原始图像得到精确的裂纹分割区域；通过对疲劳试验过程中实时采集图像重复上述操作，即可实现对裂纹的自动化持续追踪。在金属元件、全机疲劳试验的裂纹图像数据集上对TICT模型进行训练、测试，TICT模型在多种金属元件、全机结构裂纹图像测试集上平均交并比（mIoU）达到了78.31%，证明了TICT模型对各种结构构型、背景复杂、特征微小的航空结构表面裂纹均能够实现精准分割，具有良好的泛化性、鲁棒性。

关键词： 裂纹追踪; Transformer; 计算机视觉; 语义分割; 结构健康监测

本文引用格式

李嘉欣 , 吕帅帅 , 王叶子 , 杨宇 , 李梓悦 . 基于Transformer的航空结构表面裂纹智能追踪方法[J]. 航空学报, 2025 , 46(21) : 532355 -532355 . DOI: 10.7527/S1000-6893.2025.32355

Abstract

Semantic segmentation models based on deep convolutional networks have shown good performance in structural damage detection. However， when it comes to aircraft structural damage detection， cracks usually occupy a small proportion of the image， and the multi-layer convolution and pooling operations can lead to the loss of crack information， thereby seriously reducing the segmentation accuracy. Consequently， this research is conducted on Transformer-based semantic segmentation models， and de-signs a Transformer-based Model for Intelligent Crack Tracking （TICT） for aeronautical structural surface damage detection， aiming to achieve precise segmentation and intelligent tracking of cracks. To start with， an adaptive dynamic patch partitioning mechanism is employed to divide the image into patches of different sizes with varying degrees of overlap. Next， these patches are fed into a Transformer-based encoder to extract multi-scale features containing both the contextual and local details of the crack image. Then， a lightweight multi-layer perceptron along with attention modules is utilized as a decoder to generate a crack mask image. After that， morphological operations are performed on the mask image to correct the connected regions of cracks and map them back to the original image， thus obtaining the exact crack areas. By repeating the aforementioned procedure on the images collected in real time during fatigue tests， automated and continuous tracking of cracks can be realized. The TICT model is trained and tested on datasets of fatigue test images of metal components and entire aircraft. It achieves an Mean Intersection Over Union （mIoU） of 78.31% on the test sets of crack image of various metal components and full-scale aircraft structures， which demonstrates that the TICT model can accurately segment surface cracks in aviation structures with various structural configurations， complex backgrounds， and tiny features， exhibiting good generalization and robustness.

Key words： crack tracking; Transformer; computer vision; semantic segmentation; structural health monitoring

参考文献

[1]	袁慎芳，徐秋慧，陈健. 可靠性评价：从无损检测到结构健康监测［J］. 航空学报， 2025， 46（5）： 531442.
	YUAN S F， XU Q H， CHEN J. Reliability evaluation： From non-destructive testing to structural health monitoring［J］. Acta Aeronautica et Astronautica Sinica， 2025， 46（5）： 531442 （in Chinese）.
[2]	LATEEF F， RUICHEK Y. Survey on semantic segmentation using deep learning techniques［J］. Neurocomputing， 2019， 338： 321-348.
[3]	徐胜军，郝明，孟月波，等. 基于特征增强整体嵌套网络裂缝检测方法［J］. 激光与光电子学进展， 2022， 59（10）： 90-101.
	XU S J， HAO M， MENG Y B， et al. Crack detection method of holistically-nested network based on feature enhancement［J］. Laser & Optoelectronics Progress， 2022， 59（10）： 90-101 （in Chinese）.
[4]	KONG S Y， FAN J S， LIU Y F， et al. Automated crack assessment and quantitative growth monitoring［J］. Computer-Aided Civil and Infrastructure Engineering， 2021， 36（5）： 656-674.
[5]	WANG S， LIU C， ZHANG Y H. Fully convolution network architecture for steel-beam crack detection in fast-stitching images［J］. Mechanical Systems and Signal Processing， 2022， 165： 108377.
[6]	VASWANI A， SHAZEER N， PARMAR N， et al. Attention is all you need［C］∥NIPS’17： Proceedings of the 31st International Conference on Neural Information Processing Systems. New York： ACM， 2017： 6000-6010.
[7]	XIONG R， YANG Y， HE D， et al.On layer normalization in the transformer architecture［C］∥Proceedings of the 37th International Conference on Machine Learning. New York： ACM， 2020： 10524-10533.
[8]	LIU H J， MIAO X Y， MERTZ C， et al. CrackFormer： Transformer network for fine-grained crack detection［C］∥2021 IEEE/CVF International Conference on Computer Vision （ICCV）. Piscataway： IEEE Press， 2021： 3763-3772.
[9]	SHAMSABADI E A， XU C， RAO A S， et al. Vision transformer-based autonomous crack detection on asphalt and concrete surfaces［J］. Automation in Construction， 2022， 140： 104316.
[10]	JU X C， ZHAO X X， QIAN S S. TransMF： Transformer-based multi-scale fusion model for crack detection［J］. Mathematics， 2022， 10（13）： 2354.
[11]	GU E H， XIAO G， LIAN F M， et al. Segmentation and evaluation of crack image from aircraft fuel tank via atrous spatial pyramid fusion and hybrid attention network［J］. IEEE Transactions on Instrumentation and Measurement， 2023， 72： 2512314.
[12]	SHAILAJA P， PADMANABHAN S. A survey on autonomous damage detection on aircraft surfaces using deep learning models［C］∥2022 6th International Conference on Computing Methodologies and Communication （ICCMC）. Piscataway： IEEE Press， 2022： 1135-1140.
[13]	SPENCER B F， SIM S H， KIM R E， et al. Advances in artificial intelligence for structural health monitoring： A comprehensive review［J］. KSCE Journal of Civil Engineering， 2025， 29（3）： 100203.
[14]	YUE X Y， SUN S Y， KUANG Z H， et al. Vision transformer with progressive sampling［C］∥2021 IEEE/CVF International Conference on Computer Vision （ICCV）. Piscataway： IEEE Press， 2021： 377-386.
[15]	ZHANG T， LI B， SEO J， et al. Context-aware token selection and packing for enhanced vision transformer［DB/OL］. arXiv preprint： 2410.23608， 2024.
[16]	WANG W H， XIE E Z， LI X， et al. PVT v2： Improved baselines with Pyramid Vision Transformer［J］. Computational Visual Media， 2022， 8（3）： 415-424.
[17]	ZHOU T， NIU Y X， LU H L， et al. Vision transformer： To discover the “four secrets” of image patches［J］. Information Fusion， 2024， 105： 102248.
[18]	BIRCHFIELD S T， RANGARAJAN S. Spatiograms versus histograms for region-based tracking［C］∥2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition （CVPR’05）. Piscataway： IEEE Press， 2005： 1158-1163.
[19]	VOJIR T， NOSKOVA J， MATAS J. Robust scale-adaptive mean-shift for tracking［J］. Pattern Recognition Letters， 2014， 49： 250-258.
[20]	DOSOVITSKIY A， BEYER L， KOLESNIKOV A， et al. An image is worth 16×16 words： Transformers for image recognition at scale［DB/OL］. arXiv preprint： 2010.11929， 2020.
[21]	LIU Z， LIN Y T， CAO Y， et al. Swin transformer： Hierarchical vision transformer using shifted windows［C］∥2021 IEEE/CVF International Conference on Computer Vision （ICCV）. Piscataway： IEEE Press， 2021： 9992-10002.
[22]	WU G. Powerful design of small vision Transformer on CIFAR10［DB/OL］. arXiv preprint： 2501.06220， 2025.
[23]	XIE E Z， WANG W H， YU Z D， et al. SegFormer： Simple and efficient design for semantic segmentation with transformers［C］∥Neural Information Processing Systems， 2021.
[24]	GONZALES R C， WOODS R E. 数字图像处理学［M］. 4版. 阮秋琦，阮宇智，译. 北京：电子工业出版社， 2001： 457-465.
	GONZALESR C， WOODSR E. Digital image processing［M］. 4th edition. RUAN Q Q， RUAN Y Z， translated. Beijing： Publishing House of Electronics Industry， 2001： 457-465 （in Chinese）.
[25]	SAID K A M， JAMBEK A B. Analysis of image processing using morphological erosion and dilation［J］. Journal of Physics： Conference Series， 2021， 2071（1）： 012033.
[26]	RONNEBERGER O， FISCHER P， BROX T. U-Net： convolutional networks for biomedical image segmentation［C］∥Medical Image Computing and Computer-Assisted Intervention （MICCAI 2015）. Cham： Springer， 2015： 234-241.
[27]	CHEN L C， ZHU Y K， PAPANDREOU G， et al. Encoder-decoder with atrous separable convolution for semantic image segmentation［C］∥Computer Vision- ECCV 2018. Cham： Springer， 2018： 833-851.
[28]	ZHAO H S， SHI J P， QI X J， et al. Pyramid scene parsing network［C］∥2017 IEEE Conference on Computer Vision and Pattern Recognition （CVPR）. Piscataway： IEEE Press， 2017： 6230-6239.
[29]	SUN K， XIAO B， LIU D， et al. Deep high-resolution representation learning for human pose estimation［C］∥ 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Piscataway： IEEE Press， 2019： 5686-5696.
[30]	LIU Y H， YAO J， LU X H， et al. DeepCrack： A deep hierarchical feature learning architecture for crack segmentation［J］. Neurocomputing， 2019， 338： 139-153.

Options

文章导航

模态框（Modal）标题

摘要

本文引用格式

Abstract

参考文献