Loading…
Teaching Segment-Anything-Model Domain-Specific Knowledge for Road Crack Segmentation From On-Board Cameras
Road crack segmentation from on-board cameras is a highly desirable yet challenging task for road condition inspection and maintenance. However, existing methods trained on small-scale datasets present limited performance from such challenging perspectives due to the lack of sufficient prior knowled...
Saved in:
Published in: | IEEE transactions on intelligent transportation systems 2024-12, Vol.25 (12), p.20588-20601 |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Road crack segmentation from on-board cameras is a highly desirable yet challenging task for road condition inspection and maintenance. However, existing methods trained on small-scale datasets present limited performance from such challenging perspectives due to the lack of sufficient prior knowledge and effective generalizability. To address this limitation, this paper incorporates a vision foundation model named Segment-Anything-Model (SAM) and fully leverages its rich prior knowledge and strong generalizability to achieve crack segmentation. Also, we construct a customized crack segmentation dataset shot from on-board cameras. Considering the direct use of SAM might not correctly segment cracks, some lightweight and learnable crack adaptation layers are developed and integrated into SAM's image encoder, which take the patch embeddings of the input image and its high-frequency components within road regions as joint inputs. During training, the parameters of the crack adaptation layers are fine-tuned to acquire domain-specific knowledge, while the parameters of the image encoder remain frozen, preserving SAM's rich prior knowledge. Additionally, a sparse prompt generation method is proposed based on the high-frequency components within road regions, which guides the SAM model to better focus on high-frequency regions that may contain cracks. Experimental results demonstrate that the proposed framework achieves state-of-the-art performance, with an improved average precision of 8.14% compared to Mask2Former. Furthermore, the parameter-efficient transfer learning framework significantly reduces the number of parameters requiring fine-tuning, thereby improving efficiency and reducing training costs. The dataset proposed in this paper is available at https://github.com/TRMetaGroup/CrackSeg . |
---|---|
ISSN: | 1524-9050 1558-0016 |
DOI: | 10.1109/TITS.2024.3475371 |