Loading…
On the use of textual feature extraction techniques to support the automated detection of refactoring documentation
Refactoring is the art of improving the internal structure of a program without altering its external behavior, and it is an important task when it comes to software maintainability. While existing studies have focused on the detection of refactoring operations by mining software repositories, littl...
Saved in:
Published in: | Innovations in systems and software engineering 2022-06, Vol.18 (2), p.233-249 |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Refactoring is the art of improving the internal structure of a program without altering its external behavior, and it is an important task when it comes to software maintainability. While existing studies have focused on the detection of refactoring operations by mining software repositories, little was done to understand how developers document their refactoring activities. Therefore, there is recent trend trying to detect developers documentation of refactoring, by manually analyzing their internal and external software documentation. However, these techniques are limited by their manual process, which hinders their scalability. Hence, in this study, we tackle the detection of refactoring documentation as binary classification problem. We focus on the automatic detection of refactoring activities in commit messages by relying on text-mining, natural language preprocessing, and supervised machine learning techniques. We design our tool to overcome the limitation of the manual process, previously proposed by existing studies, through exploring the transformation of commit messages into features that are used to train various models. For our evaluation, we use and compare five different binary classification algorithms, and we test the effectiveness of these models using an existing dataset of manually curated messages that are known to be documenting refactoring activities in the source code. The experiments are carried out with different data sizes and number of bits. As per our results, the combination of Chi-Squared with Bayes point machine and Fisher score with Bayes point machine could be the most efficient when it comes to automatically identifying refactoring text patterns in commit messages, with an accuracy of 0.96, and an
F
-score of 0.96. |
---|---|
ISSN: | 1614-5046 1614-5054 |
DOI: | 10.1007/s11334-021-00388-5 |