Loading…

Parameter-Efficient Multi-classification Software Defect Detection Method Based on Pre-trained LLMs

Software Defect Detection (SDD) has always been critical to the development life cycle. A stable defect detection system can not only alleviate the workload of software testers but also enhance the overall efficiency of software development. Researchers have recently proposed various artificial inte...

Full description

Saved in:
Bibliographic Details
Published in:International journal of computational intelligence systems 2024-06, Vol.17 (1), p.1-16, Article 152
Main Authors: Wang, Xuanye, Lu, Lu, Yang, Zhanyu, Tian, Qingyan, Lin, Haisha
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Software Defect Detection (SDD) has always been critical to the development life cycle. A stable defect detection system can not only alleviate the workload of software testers but also enhance the overall efficiency of software development. Researchers have recently proposed various artificial intelligence-based SDD methods and achieved significant advancements. However, these methods still exhibit limitations in terms of reliability and usability. Therefore, we introduce MSDD-(IA) 3 , a novel framework leveraging the pre-trained CodeT5+ and (IA) 3 for parameter-efficient multi-classification SDD. This framework constructs a detection model based on pre-trained CodeT5+ to generate code representations while capturing defect-prone features. Considering the high overhead of pre-trained LLMs, we injects (IA) 3 vectors into specific layers, where only these injected parameters are updated to reduce the training cost. Furthermore, leveraging the properties of the pre-trained CodeT5+, we design a novel feature sequence that enriches the input data through the combination of source code with Natural Language (NL)-based expert metrics. Our experimental results on 64K real-world Python snippets show that MSDD-(IA) 3 demonstrates superior performance compared to state-of-the-art SDD methods, including PM2-CNN, in terms of F1-weighted, Recall-weighted, Precision-weighted, and Matthews Correlation Coefficient. Notably, the training parameters of MSDD-(IA) 3 are only 0.04% of those of the original CodeT5+. Our experimental data and code can be available at ( https://gitee.com/wxyzjp123/msdd-ia3/ ).
ISSN:1875-6883
1875-6883
DOI:10.1007/s44196-024-00551-3