Loading…

On the use of textual feature extraction techniques to support the automated detection of refactoring documentation

Refactoring is the art of improving the internal structure of a program without altering its external behavior, and it is an important task when it comes to software maintainability. While existing studies have focused on the detection of refactoring operations by mining software repositories, littl...

Full description

Saved in:
Bibliographic Details
Published in:Innovations in systems and software engineering 2022-06, Vol.18 (2), p.233-249
Main Authors: Marmolejos, Licelot, AlOmar, Eman Abdullah, Mkaouer, Mohamed Wiem, Newman, Christian, Ouni, Ali
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c319t-2e2d1167247ccdf7b28ea414b93c9e220fde26574816817ac3f4e6973631554c3
cites cdi_FETCH-LOGICAL-c319t-2e2d1167247ccdf7b28ea414b93c9e220fde26574816817ac3f4e6973631554c3
container_end_page 249
container_issue 2
container_start_page 233
container_title Innovations in systems and software engineering
container_volume 18
creator Marmolejos, Licelot
AlOmar, Eman Abdullah
Mkaouer, Mohamed Wiem
Newman, Christian
Ouni, Ali
description Refactoring is the art of improving the internal structure of a program without altering its external behavior, and it is an important task when it comes to software maintainability. While existing studies have focused on the detection of refactoring operations by mining software repositories, little was done to understand how developers document their refactoring activities. Therefore, there is recent trend trying to detect developers documentation of refactoring, by manually analyzing their internal and external software documentation. However, these techniques are limited by their manual process, which hinders their scalability. Hence, in this study, we tackle the detection of refactoring documentation as binary classification problem. We focus on the automatic detection of refactoring activities in commit messages by relying on text-mining, natural language preprocessing, and supervised machine learning techniques. We design our tool to overcome the limitation of the manual process, previously proposed by existing studies, through exploring the transformation of commit messages into features that are used to train various models. For our evaluation, we use and compare five different binary classification algorithms, and we test the effectiveness of these models using an existing dataset of manually curated messages that are known to be documenting refactoring activities in the source code. The experiments are carried out with different data sizes and number of bits. As per our results, the combination of Chi-Squared with Bayes point machine and Fisher score with Bayes point machine could be the most efficient when it comes to automatically identifying refactoring text patterns in commit messages, with an accuracy of 0.96, and an F -score of 0.96.
doi_str_mv 10.1007/s11334-021-00388-5
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2681495749</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2681495749</sourcerecordid><originalsourceid>FETCH-LOGICAL-c319t-2e2d1167247ccdf7b28ea414b93c9e220fde26574816817ac3f4e6973631554c3</originalsourceid><addsrcrecordid>eNp9kEtLxDAYRYMoOI7-AVcB19W82y5l8AUDs9F1yKRfnQ7TpuYB-u9Np6I7V0nIOTdfLkLXlNxSQsq7QCnnoiCMFoTwqirkCVpQRUUhiRSnv3uhztFFCHtCpJKKL1DYDDjuAKcA2LU4wmdM5oBbMDF5wPnojY2dyxTY3dB9JAg4OhzSODofj65J0fUmQoMbyNSRzlke2qw63w3vuHE29TBEM11eorPWHAJc_axL9Pb48Lp6Ltabp5fV_bqwnNaxYMAaSlXJRGlt05ZbVoERVGxrbmtgjLQNMCVLUVFV0dJY3gpQdckVp1IKy5foZs4dvZvmjnrvkh_yk5plQ9TZrTPFZsp6F0IeWo--643_0pToqVw9l6tzufpYrpZZ4rMUxul_4P-i_7G-ATlbfmY</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2681495749</pqid></control><display><type>article</type><title>On the use of textual feature extraction techniques to support the automated detection of refactoring documentation</title><source>Springer Nature</source><creator>Marmolejos, Licelot ; AlOmar, Eman Abdullah ; Mkaouer, Mohamed Wiem ; Newman, Christian ; Ouni, Ali</creator><creatorcontrib>Marmolejos, Licelot ; AlOmar, Eman Abdullah ; Mkaouer, Mohamed Wiem ; Newman, Christian ; Ouni, Ali</creatorcontrib><description>Refactoring is the art of improving the internal structure of a program without altering its external behavior, and it is an important task when it comes to software maintainability. While existing studies have focused on the detection of refactoring operations by mining software repositories, little was done to understand how developers document their refactoring activities. Therefore, there is recent trend trying to detect developers documentation of refactoring, by manually analyzing their internal and external software documentation. However, these techniques are limited by their manual process, which hinders their scalability. Hence, in this study, we tackle the detection of refactoring documentation as binary classification problem. We focus on the automatic detection of refactoring activities in commit messages by relying on text-mining, natural language preprocessing, and supervised machine learning techniques. We design our tool to overcome the limitation of the manual process, previously proposed by existing studies, through exploring the transformation of commit messages into features that are used to train various models. For our evaluation, we use and compare five different binary classification algorithms, and we test the effectiveness of these models using an existing dataset of manually curated messages that are known to be documenting refactoring activities in the source code. The experiments are carried out with different data sizes and number of bits. As per our results, the combination of Chi-Squared with Bayes point machine and Fisher score with Bayes point machine could be the most efficient when it comes to automatically identifying refactoring text patterns in commit messages, with an accuracy of 0.96, and an F -score of 0.96.</description><identifier>ISSN: 1614-5046</identifier><identifier>EISSN: 1614-5054</identifier><identifier>DOI: 10.1007/s11334-021-00388-5</identifier><language>eng</language><publisher>London: Springer London</publisher><subject>Algorithms ; Artificial Intelligence ; Classification ; Computer Applications ; Computer Science ; Data mining ; Documentation ; Feature extraction ; Machine learning ; Maintainability ; Messages ; Natural language processing ; S.i. : Acitsep ; Software Engineering ; Source code</subject><ispartof>Innovations in systems and software engineering, 2022-06, Vol.18 (2), p.233-249</ispartof><rights>The Author(s), under exclusive licence to Springer-Verlag London Ltd. part of Springer Nature 2021</rights><rights>The Author(s), under exclusive licence to Springer-Verlag London Ltd. part of Springer Nature 2021.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c319t-2e2d1167247ccdf7b28ea414b93c9e220fde26574816817ac3f4e6973631554c3</citedby><cites>FETCH-LOGICAL-c319t-2e2d1167247ccdf7b28ea414b93c9e220fde26574816817ac3f4e6973631554c3</cites><orcidid>0000-0001-6010-7561</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Marmolejos, Licelot</creatorcontrib><creatorcontrib>AlOmar, Eman Abdullah</creatorcontrib><creatorcontrib>Mkaouer, Mohamed Wiem</creatorcontrib><creatorcontrib>Newman, Christian</creatorcontrib><creatorcontrib>Ouni, Ali</creatorcontrib><title>On the use of textual feature extraction techniques to support the automated detection of refactoring documentation</title><title>Innovations in systems and software engineering</title><addtitle>Innovations Syst Softw Eng</addtitle><description>Refactoring is the art of improving the internal structure of a program without altering its external behavior, and it is an important task when it comes to software maintainability. While existing studies have focused on the detection of refactoring operations by mining software repositories, little was done to understand how developers document their refactoring activities. Therefore, there is recent trend trying to detect developers documentation of refactoring, by manually analyzing their internal and external software documentation. However, these techniques are limited by their manual process, which hinders their scalability. Hence, in this study, we tackle the detection of refactoring documentation as binary classification problem. We focus on the automatic detection of refactoring activities in commit messages by relying on text-mining, natural language preprocessing, and supervised machine learning techniques. We design our tool to overcome the limitation of the manual process, previously proposed by existing studies, through exploring the transformation of commit messages into features that are used to train various models. For our evaluation, we use and compare five different binary classification algorithms, and we test the effectiveness of these models using an existing dataset of manually curated messages that are known to be documenting refactoring activities in the source code. The experiments are carried out with different data sizes and number of bits. As per our results, the combination of Chi-Squared with Bayes point machine and Fisher score with Bayes point machine could be the most efficient when it comes to automatically identifying refactoring text patterns in commit messages, with an accuracy of 0.96, and an F -score of 0.96.</description><subject>Algorithms</subject><subject>Artificial Intelligence</subject><subject>Classification</subject><subject>Computer Applications</subject><subject>Computer Science</subject><subject>Data mining</subject><subject>Documentation</subject><subject>Feature extraction</subject><subject>Machine learning</subject><subject>Maintainability</subject><subject>Messages</subject><subject>Natural language processing</subject><subject>S.i. : Acitsep</subject><subject>Software Engineering</subject><subject>Source code</subject><issn>1614-5046</issn><issn>1614-5054</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNp9kEtLxDAYRYMoOI7-AVcB19W82y5l8AUDs9F1yKRfnQ7TpuYB-u9Np6I7V0nIOTdfLkLXlNxSQsq7QCnnoiCMFoTwqirkCVpQRUUhiRSnv3uhztFFCHtCpJKKL1DYDDjuAKcA2LU4wmdM5oBbMDF5wPnojY2dyxTY3dB9JAg4OhzSODofj65J0fUmQoMbyNSRzlke2qw63w3vuHE29TBEM11eorPWHAJc_axL9Pb48Lp6Ltabp5fV_bqwnNaxYMAaSlXJRGlt05ZbVoERVGxrbmtgjLQNMCVLUVFV0dJY3gpQdckVp1IKy5foZs4dvZvmjnrvkh_yk5plQ9TZrTPFZsp6F0IeWo--643_0pToqVw9l6tzufpYrpZZ4rMUxul_4P-i_7G-ATlbfmY</recordid><startdate>20220601</startdate><enddate>20220601</enddate><creator>Marmolejos, Licelot</creator><creator>AlOmar, Eman Abdullah</creator><creator>Mkaouer, Mohamed Wiem</creator><creator>Newman, Christian</creator><creator>Ouni, Ali</creator><general>Springer London</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0001-6010-7561</orcidid></search><sort><creationdate>20220601</creationdate><title>On the use of textual feature extraction techniques to support the automated detection of refactoring documentation</title><author>Marmolejos, Licelot ; AlOmar, Eman Abdullah ; Mkaouer, Mohamed Wiem ; Newman, Christian ; Ouni, Ali</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c319t-2e2d1167247ccdf7b28ea414b93c9e220fde26574816817ac3f4e6973631554c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Algorithms</topic><topic>Artificial Intelligence</topic><topic>Classification</topic><topic>Computer Applications</topic><topic>Computer Science</topic><topic>Data mining</topic><topic>Documentation</topic><topic>Feature extraction</topic><topic>Machine learning</topic><topic>Maintainability</topic><topic>Messages</topic><topic>Natural language processing</topic><topic>S.i. : Acitsep</topic><topic>Software Engineering</topic><topic>Source code</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Marmolejos, Licelot</creatorcontrib><creatorcontrib>AlOmar, Eman Abdullah</creatorcontrib><creatorcontrib>Mkaouer, Mohamed Wiem</creatorcontrib><creatorcontrib>Newman, Christian</creatorcontrib><creatorcontrib>Ouni, Ali</creatorcontrib><collection>CrossRef</collection><jtitle>Innovations in systems and software engineering</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Marmolejos, Licelot</au><au>AlOmar, Eman Abdullah</au><au>Mkaouer, Mohamed Wiem</au><au>Newman, Christian</au><au>Ouni, Ali</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>On the use of textual feature extraction techniques to support the automated detection of refactoring documentation</atitle><jtitle>Innovations in systems and software engineering</jtitle><stitle>Innovations Syst Softw Eng</stitle><date>2022-06-01</date><risdate>2022</risdate><volume>18</volume><issue>2</issue><spage>233</spage><epage>249</epage><pages>233-249</pages><issn>1614-5046</issn><eissn>1614-5054</eissn><abstract>Refactoring is the art of improving the internal structure of a program without altering its external behavior, and it is an important task when it comes to software maintainability. While existing studies have focused on the detection of refactoring operations by mining software repositories, little was done to understand how developers document their refactoring activities. Therefore, there is recent trend trying to detect developers documentation of refactoring, by manually analyzing their internal and external software documentation. However, these techniques are limited by their manual process, which hinders their scalability. Hence, in this study, we tackle the detection of refactoring documentation as binary classification problem. We focus on the automatic detection of refactoring activities in commit messages by relying on text-mining, natural language preprocessing, and supervised machine learning techniques. We design our tool to overcome the limitation of the manual process, previously proposed by existing studies, through exploring the transformation of commit messages into features that are used to train various models. For our evaluation, we use and compare five different binary classification algorithms, and we test the effectiveness of these models using an existing dataset of manually curated messages that are known to be documenting refactoring activities in the source code. The experiments are carried out with different data sizes and number of bits. As per our results, the combination of Chi-Squared with Bayes point machine and Fisher score with Bayes point machine could be the most efficient when it comes to automatically identifying refactoring text patterns in commit messages, with an accuracy of 0.96, and an F -score of 0.96.</abstract><cop>London</cop><pub>Springer London</pub><doi>10.1007/s11334-021-00388-5</doi><tpages>17</tpages><orcidid>https://orcid.org/0000-0001-6010-7561</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 1614-5046
ispartof Innovations in systems and software engineering, 2022-06, Vol.18 (2), p.233-249
issn 1614-5046
1614-5054
language eng
recordid cdi_proquest_journals_2681495749
source Springer Nature
subjects Algorithms
Artificial Intelligence
Classification
Computer Applications
Computer Science
Data mining
Documentation
Feature extraction
Machine learning
Maintainability
Messages
Natural language processing
S.i. : Acitsep
Software Engineering
Source code
title On the use of textual feature extraction techniques to support the automated detection of refactoring documentation
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-01T17%3A15%3A22IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=On%20the%20use%20of%20textual%20feature%20extraction%20techniques%20to%20support%20the%20automated%20detection%20of%20refactoring%20documentation&rft.jtitle=Innovations%20in%20systems%20and%20software%20engineering&rft.au=Marmolejos,%20Licelot&rft.date=2022-06-01&rft.volume=18&rft.issue=2&rft.spage=233&rft.epage=249&rft.pages=233-249&rft.issn=1614-5046&rft.eissn=1614-5054&rft_id=info:doi/10.1007/s11334-021-00388-5&rft_dat=%3Cproquest_cross%3E2681495749%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c319t-2e2d1167247ccdf7b28ea414b93c9e220fde26574816817ac3f4e6973631554c3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2681495749&rft_id=info:pmid/&rfr_iscdi=true