Transformer-based intelligent tracking method of aviation structure surface cracks

Jiaxin LI; Shuaishuai LYU; Yezi WANG; Yu YANG; Ziyue LI

doi:10.7527/S1000-6893.2025.32355

ACTA AERONAUTICAET ASTRONAUTICA SINICA >

2025 , Vol. 46 >Issue 21: 532355 - 532355

DOI: https://doi.org/10.7527/S1000-6893.2025.32355

Special Issue: 60th Anniversary of Aircraft Strength Research Institute of China

Transformer-based intelligent tracking method of aviation structure surface cracks

Jiaxin LI ,
Shuaishuai LYU ,
Yezi WANG ,
Yu YANG ,
Ziyue LI

Expand

^1.Aircraft Strength Research Institute of China，Xi’an 710065，China
^2.National Key Laboratory of Strength and Structural Integrity，Xi’an 710065，China

E-mail：1056900948@qq.com

Received date: 2025-06-03

Revised date: 2025-07-03

Accepted date: 2025-08-11

Online published: 2025-08-28

Supported by

National Level Project

Fold

Abstract

Semantic segmentation models based on deep convolutional networks have shown good performance in structural damage detection. However， when it comes to aircraft structural damage detection， cracks usually occupy a small proportion of the image， and the multi-layer convolution and pooling operations can lead to the loss of crack information， thereby seriously reducing the segmentation accuracy. Consequently， this research is conducted on Transformer-based semantic segmentation models， and de-signs a Transformer-based Model for Intelligent Crack Tracking （TICT） for aeronautical structural surface damage detection， aiming to achieve precise segmentation and intelligent tracking of cracks. To start with， an adaptive dynamic patch partitioning mechanism is employed to divide the image into patches of different sizes with varying degrees of overlap. Next， these patches are fed into a Transformer-based encoder to extract multi-scale features containing both the contextual and local details of the crack image. Then， a lightweight multi-layer perceptron along with attention modules is utilized as a decoder to generate a crack mask image. After that， morphological operations are performed on the mask image to correct the connected regions of cracks and map them back to the original image， thus obtaining the exact crack areas. By repeating the aforementioned procedure on the images collected in real time during fatigue tests， automated and continuous tracking of cracks can be realized. The TICT model is trained and tested on datasets of fatigue test images of metal components and entire aircraft. It achieves an Mean Intersection Over Union （mIoU） of 78.31% on the test sets of crack image of various metal components and full-scale aircraft structures， which demonstrates that the TICT model can accurately segment surface cracks in aviation structures with various structural configurations， complex backgrounds， and tiny features， exhibiting good generalization and robustness.

Key words： crack tracking; Transformer; computer vision; semantic segmentation; structural health monitoring

Cite this article

Jiaxin LI , Shuaishuai LYU , Yezi WANG , Yu YANG , Ziyue LI . Transformer-based intelligent tracking method of aviation structure surface cracks[J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2025 , 46(21) : 532355 -532355 . DOI: 10.7527/S1000-6893.2025.32355

References

[1]	袁慎芳，徐秋慧，陈健. 可靠性评价：从无损检测到结构健康监测［J］. 航空学报， 2025， 46（5）： 531442.
	YUAN S F， XU Q H， CHEN J. Reliability evaluation： From non-destructive testing to structural health monitoring［J］. Acta Aeronautica et Astronautica Sinica， 2025， 46（5）： 531442 （in Chinese）.
[2]	LATEEF F， RUICHEK Y. Survey on semantic segmentation using deep learning techniques［J］. Neurocomputing， 2019， 338： 321-348.
[3]	徐胜军，郝明，孟月波，等. 基于特征增强整体嵌套网络裂缝检测方法［J］. 激光与光电子学进展， 2022， 59（10）： 90-101.
	XU S J， HAO M， MENG Y B， et al. Crack detection method of holistically-nested network based on feature enhancement［J］. Laser & Optoelectronics Progress， 2022， 59（10）： 90-101 （in Chinese）.
[4]	KONG S Y， FAN J S， LIU Y F， et al. Automated crack assessment and quantitative growth monitoring［J］. Computer-Aided Civil and Infrastructure Engineering， 2021， 36（5）： 656-674.
[5]	WANG S， LIU C， ZHANG Y H. Fully convolution network architecture for steel-beam crack detection in fast-stitching images［J］. Mechanical Systems and Signal Processing， 2022， 165： 108377.
[6]	VASWANI A， SHAZEER N， PARMAR N， et al. Attention is all you need［C］∥NIPS’17： Proceedings of the 31st International Conference on Neural Information Processing Systems. New York： ACM， 2017： 6000-6010.
[7]	XIONG R， YANG Y， HE D， et al.On layer normalization in the transformer architecture［C］∥Proceedings of the 37th International Conference on Machine Learning. New York： ACM， 2020： 10524-10533.
[8]	LIU H J， MIAO X Y， MERTZ C， et al. CrackFormer： Transformer network for fine-grained crack detection［C］∥2021 IEEE/CVF International Conference on Computer Vision （ICCV）. Piscataway： IEEE Press， 2021： 3763-3772.
[9]	SHAMSABADI E A， XU C， RAO A S， et al. Vision transformer-based autonomous crack detection on asphalt and concrete surfaces［J］. Automation in Construction， 2022， 140： 104316.
[10]	JU X C， ZHAO X X， QIAN S S. TransMF： Transformer-based multi-scale fusion model for crack detection［J］. Mathematics， 2022， 10（13）： 2354.
[11]	GU E H， XIAO G， LIAN F M， et al. Segmentation and evaluation of crack image from aircraft fuel tank via atrous spatial pyramid fusion and hybrid attention network［J］. IEEE Transactions on Instrumentation and Measurement， 2023， 72： 2512314.
[12]	SHAILAJA P， PADMANABHAN S. A survey on autonomous damage detection on aircraft surfaces using deep learning models［C］∥2022 6th International Conference on Computing Methodologies and Communication （ICCMC）. Piscataway： IEEE Press， 2022： 1135-1140.
[13]	SPENCER B F， SIM S H， KIM R E， et al. Advances in artificial intelligence for structural health monitoring： A comprehensive review［J］. KSCE Journal of Civil Engineering， 2025， 29（3）： 100203.
[14]	YUE X Y， SUN S Y， KUANG Z H， et al. Vision transformer with progressive sampling［C］∥2021 IEEE/CVF International Conference on Computer Vision （ICCV）. Piscataway： IEEE Press， 2021： 377-386.
[15]	ZHANG T， LI B， SEO J， et al. Context-aware token selection and packing for enhanced vision transformer［DB/OL］. arXiv preprint： 2410.23608， 2024.
[16]	WANG W H， XIE E Z， LI X， et al. PVT v2： Improved baselines with Pyramid Vision Transformer［J］. Computational Visual Media， 2022， 8（3）： 415-424.
[17]	ZHOU T， NIU Y X， LU H L， et al. Vision transformer： To discover the “four secrets” of image patches［J］. Information Fusion， 2024， 105： 102248.
[18]	BIRCHFIELD S T， RANGARAJAN S. Spatiograms versus histograms for region-based tracking［C］∥2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition （CVPR’05）. Piscataway： IEEE Press， 2005： 1158-1163.
[19]	VOJIR T， NOSKOVA J， MATAS J. Robust scale-adaptive mean-shift for tracking［J］. Pattern Recognition Letters， 2014， 49： 250-258.
[20]	DOSOVITSKIY A， BEYER L， KOLESNIKOV A， et al. An image is worth 16×16 words： Transformers for image recognition at scale［DB/OL］. arXiv preprint： 2010.11929， 2020.
[21]	LIU Z， LIN Y T， CAO Y， et al. Swin transformer： Hierarchical vision transformer using shifted windows［C］∥2021 IEEE/CVF International Conference on Computer Vision （ICCV）. Piscataway： IEEE Press， 2021： 9992-10002.
[22]	WU G. Powerful design of small vision Transformer on CIFAR10［DB/OL］. arXiv preprint： 2501.06220， 2025.
[23]	XIE E Z， WANG W H， YU Z D， et al. SegFormer： Simple and efficient design for semantic segmentation with transformers［C］∥Neural Information Processing Systems， 2021.
[24]	GONZALES R C， WOODS R E. 数字图像处理学［M］. 4版. 阮秋琦，阮宇智，译. 北京：电子工业出版社， 2001： 457-465.
	GONZALESR C， WOODSR E. Digital image processing［M］. 4th edition. RUAN Q Q， RUAN Y Z， translated. Beijing： Publishing House of Electronics Industry， 2001： 457-465 （in Chinese）.
[25]	SAID K A M， JAMBEK A B. Analysis of image processing using morphological erosion and dilation［J］. Journal of Physics： Conference Series， 2021， 2071（1）： 012033.
[26]	RONNEBERGER O， FISCHER P， BROX T. U-Net： convolutional networks for biomedical image segmentation［C］∥Medical Image Computing and Computer-Assisted Intervention （MICCAI 2015）. Cham： Springer， 2015： 234-241.
[27]	CHEN L C， ZHU Y K， PAPANDREOU G， et al. Encoder-decoder with atrous separable convolution for semantic image segmentation［C］∥Computer Vision- ECCV 2018. Cham： Springer， 2018： 833-851.
[28]	ZHAO H S， SHI J P， QI X J， et al. Pyramid scene parsing network［C］∥2017 IEEE Conference on Computer Vision and Pattern Recognition （CVPR）. Piscataway： IEEE Press， 2017： 6230-6239.
[29]	SUN K， XIAO B， LIU D， et al. Deep high-resolution representation learning for human pose estimation［C］∥ 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Piscataway： IEEE Press， 2019： 5686-5696.
[30]	LIU Y H， YAO J， LU X H， et al. DeepCrack： A deep hierarchical feature learning architecture for crack segmentation［J］. Neurocomputing， 2019， 338： 139-153.

Options

Outlines

模态框（Modal）标题

Abstract

Cite this article

References