Loading…

Prompt engineering for zero‐shot and few‐shot defect detection and classification using a visual‐language pretrained model

Zero‐shot learning, applied with vision‐language pretrained (VLP) models, is expected to be an alternative to existing deep learning models for defect detection, under insufficient dataset. However, VLP models, including contrastive language‐image pretraining (CLIP), showed fluctuated performance on...

Full description

Saved in:
Bibliographic Details
Published in:Computer-aided civil and infrastructure engineering 2023-07, Vol.38 (11), p.1536-1554
Main Authors: Yong, Gunwoo, Jeon, Kahyun, Gil, Daeyoung, Lee, Ghang
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Zero‐shot learning, applied with vision‐language pretrained (VLP) models, is expected to be an alternative to existing deep learning models for defect detection, under insufficient dataset. However, VLP models, including contrastive language‐image pretraining (CLIP), showed fluctuated performance on prompts (inputs), resulting in research on prompt engineering—optimization of prompts for improving performance. Therefore, this study aims to identify the features of a prompt that can yield the best performance in classifying and detecting building defects using the zero‐shot and few‐shot capabilities of CLIP. The results reveal the following: (1) domain‐specific definitions are better than general definitions and images; (2) a complete sentence is better than a set of core terms; and (3) multimodal information is better than single‐modal information. The resulting detection performance using the proposed prompting method outperformed that of existing supervised models.
ISSN:1093-9687
1467-8667
DOI:10.1111/mice.12954