Loading…
Using Coding-Based Ensemble Learning to Improve Software Defect Prediction
Using classification methods to predict software defect proneness with static code attributes has attracted a great deal of attention. The class-imbalance characteristic of software defect data makes the prediction much difficult; thus, a number of methods have been employed to address this problem....
Saved in:
Published in: | IEEE transactions on human-machine systems 2012-11, Vol.42 (6), p.1806-1817 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c358t-febbb81136e4f4dacbc9480b38e6a40b49c60fb98d62786a55279d6fe5f8e1d43 |
---|---|
cites | cdi_FETCH-LOGICAL-c358t-febbb81136e4f4dacbc9480b38e6a40b49c60fb98d62786a55279d6fe5f8e1d43 |
container_end_page | 1817 |
container_issue | 6 |
container_start_page | 1806 |
container_title | IEEE transactions on human-machine systems |
container_volume | 42 |
creator | Sun, Zhongbin Song, Qinbao Zhu, Xiaoyan |
description | Using classification methods to predict software defect proneness with static code attributes has attracted a great deal of attention. The class-imbalance characteristic of software defect data makes the prediction much difficult; thus, a number of methods have been employed to address this problem. However, these conventional methods, such as sampling, cost-sensitive learning, Bagging, and Boosting, could suffer from the loss of important information, unexpected mistakes, and overfitting because they alter the original data distribution. This paper presents a novel method that first converts the imbalanced binary-class data into balanced multiclass data and then builds a defect predictor on the multiclass data with a specific coding scheme. A thorough experiment with four different types of classification algorithms, three data coding schemes, and six conventional imbalance data-handling methods was conducted over the 14 NASA datasets. The experimental results show that the proposed method with a one-against-one coding scheme is averagely superior to the conventional methods. |
doi_str_mv | 10.1109/TSMCC.2012.2226152 |
format | article |
fullrecord | <record><control><sourceid>proquest_pasca</sourceid><recordid>TN_cdi_pascalfrancis_primary_26818943</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>6392473</ieee_id><sourcerecordid>2851366401</sourcerecordid><originalsourceid>FETCH-LOGICAL-c358t-febbb81136e4f4dacbc9480b38e6a40b49c60fb98d62786a55279d6fe5f8e1d43</originalsourceid><addsrcrecordid>eNpdkE1Lw0AQhoMoWKt_QC8BEbyk7nd2jxqrVioKbc_LZjMrKfmou6nivzexxYOnGZjnfRmeKDrHaIIxUjfLxUuWTQjCZEIIEZiTg2iEOZcJYYwc9jtSLBEqTY-jkxDWCGHGFB1Fz6tQNu9x1hb9SO5MgCKeNgHqvIJ4DsY3w7lr41m98e0nxIvWdV_GQ3wPDmwXv3koStuVbXMaHTlTBTjbz3G0epgus6dk_vo4y27niaVcdomDPM8lxlQAc6wwNreKSZRTCcIwlDNlBXK5koUgqRSGc5KqQjjgTgIuGB1H17ve_qGPLYRO12WwUFWmgXYbNKaYi5RQmvbo5T903W5903-ncS9GMInRUEh2lPVtCB6c3viyNv5bY6QHvfpXrx706r3ePnS1rzbBmsp509gy_CWJkFgqRnvuYseVAPB3FlQRllL6A4q8glc</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1244648104</pqid></control><display><type>article</type><title>Using Coding-Based Ensemble Learning to Improve Software Defect Prediction</title><source>IEEE Electronic Library (IEL) Journals</source><creator>Sun, Zhongbin ; Song, Qinbao ; Zhu, Xiaoyan</creator><creatorcontrib>Sun, Zhongbin ; Song, Qinbao ; Zhu, Xiaoyan</creatorcontrib><description>Using classification methods to predict software defect proneness with static code attributes has attracted a great deal of attention. The class-imbalance characteristic of software defect data makes the prediction much difficult; thus, a number of methods have been employed to address this problem. However, these conventional methods, such as sampling, cost-sensitive learning, Bagging, and Boosting, could suffer from the loss of important information, unexpected mistakes, and overfitting because they alter the original data distribution. This paper presents a novel method that first converts the imbalanced binary-class data into balanced multiclass data and then builds a defect predictor on the multiclass data with a specific coding scheme. A thorough experiment with four different types of classification algorithms, three data coding schemes, and six conventional imbalance data-handling methods was conducted over the 14 NASA datasets. The experimental results show that the proposed method with a one-against-one coding scheme is averagely superior to the conventional methods.</description><identifier>ISSN: 1094-6977</identifier><identifier>ISSN: 2168-2291</identifier><identifier>EISSN: 1558-2442</identifier><identifier>EISSN: 2168-2305</identifier><identifier>DOI: 10.1109/TSMCC.2012.2226152</identifier><identifier>CODEN: ITCRFH</identifier><language>eng</language><publisher>New-York, NY: IEEE</publisher><subject>Algorithmics. Computability. Computer arithmetics ; Algorithms ; Applied sciences ; Artificial intelligence ; Boosting ; Class-imbalance data ; Classification ; Coding ; Computer programs ; Computer science; control theory; systems ; Computer systems performance. Reliability ; Data processing. List processing. Character string processing ; Defects ; Encoding ; Exact sciences and technology ; Learning ; Memory organisation. Data processing ; meta learning ; multiclassifier ; Prediction algorithms ; Predictive models ; Sampling ; Software ; Software algorithms ; software defect prediction ; Software defects ; Studies ; Theoretical computing</subject><ispartof>IEEE transactions on human-machine systems, 2012-11, Vol.42 (6), p.1806-1817</ispartof><rights>2014 INIST-CNRS</rights><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Nov 2012</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c358t-febbb81136e4f4dacbc9480b38e6a40b49c60fb98d62786a55279d6fe5f8e1d43</citedby><cites>FETCH-LOGICAL-c358t-febbb81136e4f4dacbc9480b38e6a40b49c60fb98d62786a55279d6fe5f8e1d43</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/6392473$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,27901,27902,54771</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=26818943$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>Sun, Zhongbin</creatorcontrib><creatorcontrib>Song, Qinbao</creatorcontrib><creatorcontrib>Zhu, Xiaoyan</creatorcontrib><title>Using Coding-Based Ensemble Learning to Improve Software Defect Prediction</title><title>IEEE transactions on human-machine systems</title><addtitle>TSMCC</addtitle><description>Using classification methods to predict software defect proneness with static code attributes has attracted a great deal of attention. The class-imbalance characteristic of software defect data makes the prediction much difficult; thus, a number of methods have been employed to address this problem. However, these conventional methods, such as sampling, cost-sensitive learning, Bagging, and Boosting, could suffer from the loss of important information, unexpected mistakes, and overfitting because they alter the original data distribution. This paper presents a novel method that first converts the imbalanced binary-class data into balanced multiclass data and then builds a defect predictor on the multiclass data with a specific coding scheme. A thorough experiment with four different types of classification algorithms, three data coding schemes, and six conventional imbalance data-handling methods was conducted over the 14 NASA datasets. The experimental results show that the proposed method with a one-against-one coding scheme is averagely superior to the conventional methods.</description><subject>Algorithmics. Computability. Computer arithmetics</subject><subject>Algorithms</subject><subject>Applied sciences</subject><subject>Artificial intelligence</subject><subject>Boosting</subject><subject>Class-imbalance data</subject><subject>Classification</subject><subject>Coding</subject><subject>Computer programs</subject><subject>Computer science; control theory; systems</subject><subject>Computer systems performance. Reliability</subject><subject>Data processing. List processing. Character string processing</subject><subject>Defects</subject><subject>Encoding</subject><subject>Exact sciences and technology</subject><subject>Learning</subject><subject>Memory organisation. Data processing</subject><subject>meta learning</subject><subject>multiclassifier</subject><subject>Prediction algorithms</subject><subject>Predictive models</subject><subject>Sampling</subject><subject>Software</subject><subject>Software algorithms</subject><subject>software defect prediction</subject><subject>Software defects</subject><subject>Studies</subject><subject>Theoretical computing</subject><issn>1094-6977</issn><issn>2168-2291</issn><issn>1558-2442</issn><issn>2168-2305</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2012</creationdate><recordtype>article</recordtype><recordid>eNpdkE1Lw0AQhoMoWKt_QC8BEbyk7nd2jxqrVioKbc_LZjMrKfmou6nivzexxYOnGZjnfRmeKDrHaIIxUjfLxUuWTQjCZEIIEZiTg2iEOZcJYYwc9jtSLBEqTY-jkxDWCGHGFB1Fz6tQNu9x1hb9SO5MgCKeNgHqvIJ4DsY3w7lr41m98e0nxIvWdV_GQ3wPDmwXv3koStuVbXMaHTlTBTjbz3G0epgus6dk_vo4y27niaVcdomDPM8lxlQAc6wwNreKSZRTCcIwlDNlBXK5koUgqRSGc5KqQjjgTgIuGB1H17ve_qGPLYRO12WwUFWmgXYbNKaYi5RQmvbo5T903W5903-ncS9GMInRUEh2lPVtCB6c3viyNv5bY6QHvfpXrx706r3ePnS1rzbBmsp509gy_CWJkFgqRnvuYseVAPB3FlQRllL6A4q8glc</recordid><startdate>20121101</startdate><enddate>20121101</enddate><creator>Sun, Zhongbin</creator><creator>Song, Qinbao</creator><creator>Zhu, Xiaoyan</creator><general>IEEE</general><general>Institute of Electrical and Electronics Engineers</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7TB</scope><scope>8FD</scope><scope>FR3</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>F28</scope></search><sort><creationdate>20121101</creationdate><title>Using Coding-Based Ensemble Learning to Improve Software Defect Prediction</title><author>Sun, Zhongbin ; Song, Qinbao ; Zhu, Xiaoyan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c358t-febbb81136e4f4dacbc9480b38e6a40b49c60fb98d62786a55279d6fe5f8e1d43</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2012</creationdate><topic>Algorithmics. Computability. Computer arithmetics</topic><topic>Algorithms</topic><topic>Applied sciences</topic><topic>Artificial intelligence</topic><topic>Boosting</topic><topic>Class-imbalance data</topic><topic>Classification</topic><topic>Coding</topic><topic>Computer programs</topic><topic>Computer science; control theory; systems</topic><topic>Computer systems performance. Reliability</topic><topic>Data processing. List processing. Character string processing</topic><topic>Defects</topic><topic>Encoding</topic><topic>Exact sciences and technology</topic><topic>Learning</topic><topic>Memory organisation. Data processing</topic><topic>meta learning</topic><topic>multiclassifier</topic><topic>Prediction algorithms</topic><topic>Predictive models</topic><topic>Sampling</topic><topic>Software</topic><topic>Software algorithms</topic><topic>software defect prediction</topic><topic>Software defects</topic><topic>Studies</topic><topic>Theoretical computing</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Sun, Zhongbin</creatorcontrib><creatorcontrib>Song, Qinbao</creatorcontrib><creatorcontrib>Zhu, Xiaoyan</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Xplore</collection><collection>Pascal-Francis</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Mechanical & Transportation Engineering Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ANTE: Abstracts in New Technology & Engineering</collection><jtitle>IEEE transactions on human-machine systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Sun, Zhongbin</au><au>Song, Qinbao</au><au>Zhu, Xiaoyan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Using Coding-Based Ensemble Learning to Improve Software Defect Prediction</atitle><jtitle>IEEE transactions on human-machine systems</jtitle><stitle>TSMCC</stitle><date>2012-11-01</date><risdate>2012</risdate><volume>42</volume><issue>6</issue><spage>1806</spage><epage>1817</epage><pages>1806-1817</pages><issn>1094-6977</issn><issn>2168-2291</issn><eissn>1558-2442</eissn><eissn>2168-2305</eissn><coden>ITCRFH</coden><abstract>Using classification methods to predict software defect proneness with static code attributes has attracted a great deal of attention. The class-imbalance characteristic of software defect data makes the prediction much difficult; thus, a number of methods have been employed to address this problem. However, these conventional methods, such as sampling, cost-sensitive learning, Bagging, and Boosting, could suffer from the loss of important information, unexpected mistakes, and overfitting because they alter the original data distribution. This paper presents a novel method that first converts the imbalanced binary-class data into balanced multiclass data and then builds a defect predictor on the multiclass data with a specific coding scheme. A thorough experiment with four different types of classification algorithms, three data coding schemes, and six conventional imbalance data-handling methods was conducted over the 14 NASA datasets. The experimental results show that the proposed method with a one-against-one coding scheme is averagely superior to the conventional methods.</abstract><cop>New-York, NY</cop><pub>IEEE</pub><doi>10.1109/TSMCC.2012.2226152</doi><tpages>12</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1094-6977 |
ispartof | IEEE transactions on human-machine systems, 2012-11, Vol.42 (6), p.1806-1817 |
issn | 1094-6977 2168-2291 1558-2442 2168-2305 |
language | eng |
recordid | cdi_pascalfrancis_primary_26818943 |
source | IEEE Electronic Library (IEL) Journals |
subjects | Algorithmics. Computability. Computer arithmetics Algorithms Applied sciences Artificial intelligence Boosting Class-imbalance data Classification Coding Computer programs Computer science control theory systems Computer systems performance. Reliability Data processing. List processing. Character string processing Defects Encoding Exact sciences and technology Learning Memory organisation. Data processing meta learning multiclassifier Prediction algorithms Predictive models Sampling Software Software algorithms software defect prediction Software defects Studies Theoretical computing |
title | Using Coding-Based Ensemble Learning to Improve Software Defect Prediction |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-04T04%3A44%3A11IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pasca&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Using%20Coding-Based%20Ensemble%20Learning%20to%20Improve%20Software%20Defect%20Prediction&rft.jtitle=IEEE%20transactions%20on%20human-machine%20systems&rft.au=Sun,%20Zhongbin&rft.date=2012-11-01&rft.volume=42&rft.issue=6&rft.spage=1806&rft.epage=1817&rft.pages=1806-1817&rft.issn=1094-6977&rft.eissn=1558-2442&rft.coden=ITCRFH&rft_id=info:doi/10.1109/TSMCC.2012.2226152&rft_dat=%3Cproquest_pasca%3E2851366401%3C/proquest_pasca%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c358t-febbb81136e4f4dacbc9480b38e6a40b49c60fb98d62786a55279d6fe5f8e1d43%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=1244648104&rft_id=info:pmid/&rft_ieee_id=6392473&rfr_iscdi=true |