Loading…
On the use of textual feature extraction techniques to support the automated detection of refactoring documentation
Refactoring is the art of improving the internal structure of a program without altering its external behavior, and it is an important task when it comes to software maintainability. While existing studies have focused on the detection of refactoring operations by mining software repositories, littl...
Saved in:
Published in: | Innovations in systems and software engineering 2022-06, Vol.18 (2), p.233-249 |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c319t-2e2d1167247ccdf7b28ea414b93c9e220fde26574816817ac3f4e6973631554c3 |
---|---|
cites | cdi_FETCH-LOGICAL-c319t-2e2d1167247ccdf7b28ea414b93c9e220fde26574816817ac3f4e6973631554c3 |
container_end_page | 249 |
container_issue | 2 |
container_start_page | 233 |
container_title | Innovations in systems and software engineering |
container_volume | 18 |
creator | Marmolejos, Licelot AlOmar, Eman Abdullah Mkaouer, Mohamed Wiem Newman, Christian Ouni, Ali |
description | Refactoring is the art of improving the internal structure of a program without altering its external behavior, and it is an important task when it comes to software maintainability. While existing studies have focused on the detection of refactoring operations by mining software repositories, little was done to understand how developers document their refactoring activities. Therefore, there is recent trend trying to detect developers documentation of refactoring, by manually analyzing their internal and external software documentation. However, these techniques are limited by their manual process, which hinders their scalability. Hence, in this study, we tackle the detection of refactoring documentation as binary classification problem. We focus on the automatic detection of refactoring activities in commit messages by relying on text-mining, natural language preprocessing, and supervised machine learning techniques. We design our tool to overcome the limitation of the manual process, previously proposed by existing studies, through exploring the transformation of commit messages into features that are used to train various models. For our evaluation, we use and compare five different binary classification algorithms, and we test the effectiveness of these models using an existing dataset of manually curated messages that are known to be documenting refactoring activities in the source code. The experiments are carried out with different data sizes and number of bits. As per our results, the combination of Chi-Squared with Bayes point machine and Fisher score with Bayes point machine could be the most efficient when it comes to automatically identifying refactoring text patterns in commit messages, with an accuracy of 0.96, and an
F
-score of 0.96. |
doi_str_mv | 10.1007/s11334-021-00388-5 |
format | article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2681495749</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2681495749</sourcerecordid><originalsourceid>FETCH-LOGICAL-c319t-2e2d1167247ccdf7b28ea414b93c9e220fde26574816817ac3f4e6973631554c3</originalsourceid><addsrcrecordid>eNp9kEtLxDAYRYMoOI7-AVcB19W82y5l8AUDs9F1yKRfnQ7TpuYB-u9Np6I7V0nIOTdfLkLXlNxSQsq7QCnnoiCMFoTwqirkCVpQRUUhiRSnv3uhztFFCHtCpJKKL1DYDDjuAKcA2LU4wmdM5oBbMDF5wPnojY2dyxTY3dB9JAg4OhzSODofj65J0fUmQoMbyNSRzlke2qw63w3vuHE29TBEM11eorPWHAJc_axL9Pb48Lp6Ltabp5fV_bqwnNaxYMAaSlXJRGlt05ZbVoERVGxrbmtgjLQNMCVLUVFV0dJY3gpQdckVp1IKy5foZs4dvZvmjnrvkh_yk5plQ9TZrTPFZsp6F0IeWo--643_0pToqVw9l6tzufpYrpZZ4rMUxul_4P-i_7G-ATlbfmY</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2681495749</pqid></control><display><type>article</type><title>On the use of textual feature extraction techniques to support the automated detection of refactoring documentation</title><source>Springer Nature</source><creator>Marmolejos, Licelot ; AlOmar, Eman Abdullah ; Mkaouer, Mohamed Wiem ; Newman, Christian ; Ouni, Ali</creator><creatorcontrib>Marmolejos, Licelot ; AlOmar, Eman Abdullah ; Mkaouer, Mohamed Wiem ; Newman, Christian ; Ouni, Ali</creatorcontrib><description>Refactoring is the art of improving the internal structure of a program without altering its external behavior, and it is an important task when it comes to software maintainability. While existing studies have focused on the detection of refactoring operations by mining software repositories, little was done to understand how developers document their refactoring activities. Therefore, there is recent trend trying to detect developers documentation of refactoring, by manually analyzing their internal and external software documentation. However, these techniques are limited by their manual process, which hinders their scalability. Hence, in this study, we tackle the detection of refactoring documentation as binary classification problem. We focus on the automatic detection of refactoring activities in commit messages by relying on text-mining, natural language preprocessing, and supervised machine learning techniques. We design our tool to overcome the limitation of the manual process, previously proposed by existing studies, through exploring the transformation of commit messages into features that are used to train various models. For our evaluation, we use and compare five different binary classification algorithms, and we test the effectiveness of these models using an existing dataset of manually curated messages that are known to be documenting refactoring activities in the source code. The experiments are carried out with different data sizes and number of bits. As per our results, the combination of Chi-Squared with Bayes point machine and Fisher score with Bayes point machine could be the most efficient when it comes to automatically identifying refactoring text patterns in commit messages, with an accuracy of 0.96, and an
F
-score of 0.96.</description><identifier>ISSN: 1614-5046</identifier><identifier>EISSN: 1614-5054</identifier><identifier>DOI: 10.1007/s11334-021-00388-5</identifier><language>eng</language><publisher>London: Springer London</publisher><subject>Algorithms ; Artificial Intelligence ; Classification ; Computer Applications ; Computer Science ; Data mining ; Documentation ; Feature extraction ; Machine learning ; Maintainability ; Messages ; Natural language processing ; S.i. : Acitsep ; Software Engineering ; Source code</subject><ispartof>Innovations in systems and software engineering, 2022-06, Vol.18 (2), p.233-249</ispartof><rights>The Author(s), under exclusive licence to Springer-Verlag London Ltd. part of Springer Nature 2021</rights><rights>The Author(s), under exclusive licence to Springer-Verlag London Ltd. part of Springer Nature 2021.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c319t-2e2d1167247ccdf7b28ea414b93c9e220fde26574816817ac3f4e6973631554c3</citedby><cites>FETCH-LOGICAL-c319t-2e2d1167247ccdf7b28ea414b93c9e220fde26574816817ac3f4e6973631554c3</cites><orcidid>0000-0001-6010-7561</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Marmolejos, Licelot</creatorcontrib><creatorcontrib>AlOmar, Eman Abdullah</creatorcontrib><creatorcontrib>Mkaouer, Mohamed Wiem</creatorcontrib><creatorcontrib>Newman, Christian</creatorcontrib><creatorcontrib>Ouni, Ali</creatorcontrib><title>On the use of textual feature extraction techniques to support the automated detection of refactoring documentation</title><title>Innovations in systems and software engineering</title><addtitle>Innovations Syst Softw Eng</addtitle><description>Refactoring is the art of improving the internal structure of a program without altering its external behavior, and it is an important task when it comes to software maintainability. While existing studies have focused on the detection of refactoring operations by mining software repositories, little was done to understand how developers document their refactoring activities. Therefore, there is recent trend trying to detect developers documentation of refactoring, by manually analyzing their internal and external software documentation. However, these techniques are limited by their manual process, which hinders their scalability. Hence, in this study, we tackle the detection of refactoring documentation as binary classification problem. We focus on the automatic detection of refactoring activities in commit messages by relying on text-mining, natural language preprocessing, and supervised machine learning techniques. We design our tool to overcome the limitation of the manual process, previously proposed by existing studies, through exploring the transformation of commit messages into features that are used to train various models. For our evaluation, we use and compare five different binary classification algorithms, and we test the effectiveness of these models using an existing dataset of manually curated messages that are known to be documenting refactoring activities in the source code. The experiments are carried out with different data sizes and number of bits. As per our results, the combination of Chi-Squared with Bayes point machine and Fisher score with Bayes point machine could be the most efficient when it comes to automatically identifying refactoring text patterns in commit messages, with an accuracy of 0.96, and an
F
-score of 0.96.</description><subject>Algorithms</subject><subject>Artificial Intelligence</subject><subject>Classification</subject><subject>Computer Applications</subject><subject>Computer Science</subject><subject>Data mining</subject><subject>Documentation</subject><subject>Feature extraction</subject><subject>Machine learning</subject><subject>Maintainability</subject><subject>Messages</subject><subject>Natural language processing</subject><subject>S.i. : Acitsep</subject><subject>Software Engineering</subject><subject>Source code</subject><issn>1614-5046</issn><issn>1614-5054</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNp9kEtLxDAYRYMoOI7-AVcB19W82y5l8AUDs9F1yKRfnQ7TpuYB-u9Np6I7V0nIOTdfLkLXlNxSQsq7QCnnoiCMFoTwqirkCVpQRUUhiRSnv3uhztFFCHtCpJKKL1DYDDjuAKcA2LU4wmdM5oBbMDF5wPnojY2dyxTY3dB9JAg4OhzSODofj65J0fUmQoMbyNSRzlke2qw63w3vuHE29TBEM11eorPWHAJc_axL9Pb48Lp6Ltabp5fV_bqwnNaxYMAaSlXJRGlt05ZbVoERVGxrbmtgjLQNMCVLUVFV0dJY3gpQdckVp1IKy5foZs4dvZvmjnrvkh_yk5plQ9TZrTPFZsp6F0IeWo--643_0pToqVw9l6tzufpYrpZZ4rMUxul_4P-i_7G-ATlbfmY</recordid><startdate>20220601</startdate><enddate>20220601</enddate><creator>Marmolejos, Licelot</creator><creator>AlOmar, Eman Abdullah</creator><creator>Mkaouer, Mohamed Wiem</creator><creator>Newman, Christian</creator><creator>Ouni, Ali</creator><general>Springer London</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0001-6010-7561</orcidid></search><sort><creationdate>20220601</creationdate><title>On the use of textual feature extraction techniques to support the automated detection of refactoring documentation</title><author>Marmolejos, Licelot ; AlOmar, Eman Abdullah ; Mkaouer, Mohamed Wiem ; Newman, Christian ; Ouni, Ali</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c319t-2e2d1167247ccdf7b28ea414b93c9e220fde26574816817ac3f4e6973631554c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Algorithms</topic><topic>Artificial Intelligence</topic><topic>Classification</topic><topic>Computer Applications</topic><topic>Computer Science</topic><topic>Data mining</topic><topic>Documentation</topic><topic>Feature extraction</topic><topic>Machine learning</topic><topic>Maintainability</topic><topic>Messages</topic><topic>Natural language processing</topic><topic>S.i. : Acitsep</topic><topic>Software Engineering</topic><topic>Source code</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Marmolejos, Licelot</creatorcontrib><creatorcontrib>AlOmar, Eman Abdullah</creatorcontrib><creatorcontrib>Mkaouer, Mohamed Wiem</creatorcontrib><creatorcontrib>Newman, Christian</creatorcontrib><creatorcontrib>Ouni, Ali</creatorcontrib><collection>CrossRef</collection><jtitle>Innovations in systems and software engineering</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Marmolejos, Licelot</au><au>AlOmar, Eman Abdullah</au><au>Mkaouer, Mohamed Wiem</au><au>Newman, Christian</au><au>Ouni, Ali</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>On the use of textual feature extraction techniques to support the automated detection of refactoring documentation</atitle><jtitle>Innovations in systems and software engineering</jtitle><stitle>Innovations Syst Softw Eng</stitle><date>2022-06-01</date><risdate>2022</risdate><volume>18</volume><issue>2</issue><spage>233</spage><epage>249</epage><pages>233-249</pages><issn>1614-5046</issn><eissn>1614-5054</eissn><abstract>Refactoring is the art of improving the internal structure of a program without altering its external behavior, and it is an important task when it comes to software maintainability. While existing studies have focused on the detection of refactoring operations by mining software repositories, little was done to understand how developers document their refactoring activities. Therefore, there is recent trend trying to detect developers documentation of refactoring, by manually analyzing their internal and external software documentation. However, these techniques are limited by their manual process, which hinders their scalability. Hence, in this study, we tackle the detection of refactoring documentation as binary classification problem. We focus on the automatic detection of refactoring activities in commit messages by relying on text-mining, natural language preprocessing, and supervised machine learning techniques. We design our tool to overcome the limitation of the manual process, previously proposed by existing studies, through exploring the transformation of commit messages into features that are used to train various models. For our evaluation, we use and compare five different binary classification algorithms, and we test the effectiveness of these models using an existing dataset of manually curated messages that are known to be documenting refactoring activities in the source code. The experiments are carried out with different data sizes and number of bits. As per our results, the combination of Chi-Squared with Bayes point machine and Fisher score with Bayes point machine could be the most efficient when it comes to automatically identifying refactoring text patterns in commit messages, with an accuracy of 0.96, and an
F
-score of 0.96.</abstract><cop>London</cop><pub>Springer London</pub><doi>10.1007/s11334-021-00388-5</doi><tpages>17</tpages><orcidid>https://orcid.org/0000-0001-6010-7561</orcidid></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1614-5046 |
ispartof | Innovations in systems and software engineering, 2022-06, Vol.18 (2), p.233-249 |
issn | 1614-5046 1614-5054 |
language | eng |
recordid | cdi_proquest_journals_2681495749 |
source | Springer Nature |
subjects | Algorithms Artificial Intelligence Classification Computer Applications Computer Science Data mining Documentation Feature extraction Machine learning Maintainability Messages Natural language processing S.i. : Acitsep Software Engineering Source code |
title | On the use of textual feature extraction techniques to support the automated detection of refactoring documentation |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-01T17%3A15%3A22IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=On%20the%20use%20of%20textual%20feature%20extraction%20techniques%20to%20support%20the%20automated%20detection%20of%20refactoring%20documentation&rft.jtitle=Innovations%20in%20systems%20and%20software%20engineering&rft.au=Marmolejos,%20Licelot&rft.date=2022-06-01&rft.volume=18&rft.issue=2&rft.spage=233&rft.epage=249&rft.pages=233-249&rft.issn=1614-5046&rft.eissn=1614-5054&rft_id=info:doi/10.1007/s11334-021-00388-5&rft_dat=%3Cproquest_cross%3E2681495749%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c319t-2e2d1167247ccdf7b28ea414b93c9e220fde26574816817ac3f4e6973631554c3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2681495749&rft_id=info:pmid/&rfr_iscdi=true |