Loading…

Duplicate Bug Report Detection Using an Attention-Based Neural Language Model

Context: Users and developers use bug tracking systems to report errors that occur during the development and testing of software. The manual identification of duplicates is a tedious task especially with software that have large bug repositories. In this context, their automatic detection becomes a...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on reliability 2023-06, Vol.72 (2), p.846-858
Main Authors: Messaoud, Montassar Ben, Miladi, Asma, Jenhani, Ilyes, Mkaouer, Mohamed Wiem, Ghadhab, Lobna
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Context: Users and developers use bug tracking systems to report errors that occur during the development and testing of software. The manual identification of duplicates is a tedious task especially with software that have large bug repositories. In this context, their automatic detection becomes a necessary task that can help prevent frequently fixing the same bug. Objective: In this article, we propose BERT-MLP , a novel pretrained language model using bidirectional encoder representations from ransformers (BERT) for duplicate bug report detection (DBRD) with the aim of improving the detection rate compared to existing works. Method: Our approach considers only unstructured data. These are fed into the BERT model in order to learn the contextual relationships between words. The output is fed into a multilayer perceptron (MLP) classifier, representing our base DBRD. Results: Our approach was evaluated on three projects: Mozilla Firefox, Eclipse Platform, and Thunderbird. It achieved an accuracy of 92.11, 94.08, and 89.03%, respectively, for Mozilla, Eclipse, and Thunderbird. A comparison with a dual-channel convolutional neural network (DC-CNN) model and other pretrained models, including RoBERTa and Sentence-Bert has been conducted. Results showed that BERT-MLP outperformed, the second best performing models (DC-CNN and Sentence-BERT) by 12% in accuracy for Eclipse and 9% for both Mozilla and Thunderbird, respectively.
ISSN:0018-9529
1558-1721
DOI:10.1109/TR.2022.3193645