Loading…
Scalable feature subset selection for big data using parallel hybrid evolutionary algorithm based wrapper under apache spark environment
Extant sequential wrapper-based feature subset selection (FSS) algorithms are not scalable and yield poor performance when applied to big datasets. Hence, to circumvent these challenges, we propose parallel and distributed hybrid evolutionary algorithms (EAs) based wrappers under Apache Spark. We pr...
Saved in:
Published in: | Cluster computing 2023-06, Vol.26 (3), p.1949-1983 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c474t-a254556051449e1a726d08fda7a386b5f687c277e871991f8ea745f53d28290a3 |
---|---|
cites | cdi_FETCH-LOGICAL-c474t-a254556051449e1a726d08fda7a386b5f687c277e871991f8ea745f53d28290a3 |
container_end_page | 1983 |
container_issue | 3 |
container_start_page | 1949 |
container_title | Cluster computing |
container_volume | 26 |
creator | Vivek, Yelleti Ravi, Vadlamani Krishna, P. Radha |
description | Extant sequential wrapper-based feature subset selection (FSS) algorithms are not scalable and yield poor performance when applied to big datasets. Hence, to circumvent these challenges, we propose parallel and distributed hybrid evolutionary algorithms (EAs) based wrappers under Apache Spark. We propose two hybrid EAs based on the Binary Differential Evolution (BDE), and Binary Threshold Accepting (BTA), namely, (i) Parallel Binary Differential Evolution and Threshold Accepting (PB-DETA), where BDE and BTA work in tandem in every iteration, and (ii) its ablation variant, Parallel Binary Threshold Accepting and Differential Evolution (PB-TADE). Here, BTA is invoked to enhance the search capability and avoid premature convergence of BDE. For comparison purposes, we also parallelized two state-of-the-art algorithms: adaptive DE (ADE) and permutation based DE (DE-FS
PM
), and named them PB-ADE and P-DE-FS
PM
respectively. Throughout, logistic regression (LR) is employed to compute the fitness function, namely, area under the receiver operator characteristic curve (AUC). The effectiveness of the proposed algorithms is tested over the five big datasets of varying dimensions. It is noteworthy that the PB-TADE turned out to be statistically significant than the rest. All the algorithms have shown the repeatability property. The proposed parallel model attained a speedup of 2.2–2.9. We also reported feature subset with high AUC and least cardinality. |
doi_str_mv | 10.1007/s10586-022-03725-w |
format | article |
fullrecord | <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_9463682</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2714657251</sourcerecordid><originalsourceid>FETCH-LOGICAL-c474t-a254556051449e1a726d08fda7a386b5f687c277e871991f8ea745f53d28290a3</originalsourceid><addsrcrecordid>eNp9kctu1TAQhiMEoqXwAiyQJTZsAr472SChqlykSiyAtTVJJjkujh3s5Bz1DXhsXE4plwUbj6X55p_59VfVU0ZfMkrNq8yoanRNOa-pMFzVh3vVKVNG1EZJcb_8RWmbRpmT6lHOV5TS1vD2YXUidBnVsj2tvn_qwUPnkYwI65aQ5K3LuJKMHvvVxUDGmEjnJjLACmTLLkxkgQTeoye76y65geA--u0GhnRNwE8xuXU3kw4yDuSQYFkwkS0M5YUF-l3ZUiS-Egx7l2KYMayPqwcj-IxPbutZ9eXtxefz9_Xlx3cfzt9c1r00cq2BK6mUpopJ2SIDw_VAm3EAA8Vsp0bdmJ4bg41hbcvGBsFINSox8Ia3FMRZ9fqou2zdjENfVhcvdkluLsfbCM7-3QluZ6e4t63UQje8CLy4FUjx24Z5tbPLPXoPAeOWLTdMalXiYAV9_g96FbcUij3LW9ZwrRXTheJHqk8x54Tj3TGM2pug7TFoW4K2P4O2hzL07E8bdyO_ki2AOAK5tMKE6ffu_8j-AMCOt3c</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2918266516</pqid></control><display><type>article</type><title>Scalable feature subset selection for big data using parallel hybrid evolutionary algorithm based wrapper under apache spark environment</title><source>Springer Nature</source><creator>Vivek, Yelleti ; Ravi, Vadlamani ; Krishna, P. Radha</creator><creatorcontrib>Vivek, Yelleti ; Ravi, Vadlamani ; Krishna, P. Radha</creatorcontrib><description>Extant sequential wrapper-based feature subset selection (FSS) algorithms are not scalable and yield poor performance when applied to big datasets. Hence, to circumvent these challenges, we propose parallel and distributed hybrid evolutionary algorithms (EAs) based wrappers under Apache Spark. We propose two hybrid EAs based on the Binary Differential Evolution (BDE), and Binary Threshold Accepting (BTA), namely, (i) Parallel Binary Differential Evolution and Threshold Accepting (PB-DETA), where BDE and BTA work in tandem in every iteration, and (ii) its ablation variant, Parallel Binary Threshold Accepting and Differential Evolution (PB-TADE). Here, BTA is invoked to enhance the search capability and avoid premature convergence of BDE. For comparison purposes, we also parallelized two state-of-the-art algorithms: adaptive DE (ADE) and permutation based DE (DE-FS
PM
), and named them PB-ADE and P-DE-FS
PM
respectively. Throughout, logistic regression (LR) is employed to compute the fitness function, namely, area under the receiver operator characteristic curve (AUC). The effectiveness of the proposed algorithms is tested over the five big datasets of varying dimensions. It is noteworthy that the PB-TADE turned out to be statistically significant than the rest. All the algorithms have shown the repeatability property. The proposed parallel model attained a speedup of 2.2–2.9. We also reported feature subset with high AUC and least cardinality.</description><identifier>ISSN: 1386-7857</identifier><identifier>EISSN: 1573-7543</identifier><identifier>DOI: 10.1007/s10586-022-03725-w</identifier><identifier>PMID: 36105649</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Ablation ; Adaptive algorithms ; Big Data ; Computer Communication Networks ; Computer Science ; Data mining ; Datasets ; Design ; Evolutionary algorithms ; Evolutionary computation ; Genetic algorithms ; Iterative methods ; Operating Systems ; Operators (mathematics) ; Optimization ; Parallel processing ; Permutations ; Processor Architectures</subject><ispartof>Cluster computing, 2023-06, Vol.26 (3), p.1949-1983</ispartof><rights>The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2022. Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><rights>The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2022, Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c474t-a254556051449e1a726d08fda7a386b5f687c277e871991f8ea745f53d28290a3</citedby><cites>FETCH-LOGICAL-c474t-a254556051449e1a726d08fda7a386b5f687c277e871991f8ea745f53d28290a3</cites><orcidid>0000-0003-0082-6227</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>230,314,780,784,885,27924,27925</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/36105649$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Vivek, Yelleti</creatorcontrib><creatorcontrib>Ravi, Vadlamani</creatorcontrib><creatorcontrib>Krishna, P. Radha</creatorcontrib><title>Scalable feature subset selection for big data using parallel hybrid evolutionary algorithm based wrapper under apache spark environment</title><title>Cluster computing</title><addtitle>Cluster Comput</addtitle><addtitle>Cluster Comput</addtitle><description>Extant sequential wrapper-based feature subset selection (FSS) algorithms are not scalable and yield poor performance when applied to big datasets. Hence, to circumvent these challenges, we propose parallel and distributed hybrid evolutionary algorithms (EAs) based wrappers under Apache Spark. We propose two hybrid EAs based on the Binary Differential Evolution (BDE), and Binary Threshold Accepting (BTA), namely, (i) Parallel Binary Differential Evolution and Threshold Accepting (PB-DETA), where BDE and BTA work in tandem in every iteration, and (ii) its ablation variant, Parallel Binary Threshold Accepting and Differential Evolution (PB-TADE). Here, BTA is invoked to enhance the search capability and avoid premature convergence of BDE. For comparison purposes, we also parallelized two state-of-the-art algorithms: adaptive DE (ADE) and permutation based DE (DE-FS
PM
), and named them PB-ADE and P-DE-FS
PM
respectively. Throughout, logistic regression (LR) is employed to compute the fitness function, namely, area under the receiver operator characteristic curve (AUC). The effectiveness of the proposed algorithms is tested over the five big datasets of varying dimensions. It is noteworthy that the PB-TADE turned out to be statistically significant than the rest. All the algorithms have shown the repeatability property. The proposed parallel model attained a speedup of 2.2–2.9. We also reported feature subset with high AUC and least cardinality.</description><subject>Ablation</subject><subject>Adaptive algorithms</subject><subject>Big Data</subject><subject>Computer Communication Networks</subject><subject>Computer Science</subject><subject>Data mining</subject><subject>Datasets</subject><subject>Design</subject><subject>Evolutionary algorithms</subject><subject>Evolutionary computation</subject><subject>Genetic algorithms</subject><subject>Iterative methods</subject><subject>Operating Systems</subject><subject>Operators (mathematics)</subject><subject>Optimization</subject><subject>Parallel processing</subject><subject>Permutations</subject><subject>Processor Architectures</subject><issn>1386-7857</issn><issn>1573-7543</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><recordid>eNp9kctu1TAQhiMEoqXwAiyQJTZsAr472SChqlykSiyAtTVJJjkujh3s5Bz1DXhsXE4plwUbj6X55p_59VfVU0ZfMkrNq8yoanRNOa-pMFzVh3vVKVNG1EZJcb_8RWmbRpmT6lHOV5TS1vD2YXUidBnVsj2tvn_qwUPnkYwI65aQ5K3LuJKMHvvVxUDGmEjnJjLACmTLLkxkgQTeoye76y65geA--u0GhnRNwE8xuXU3kw4yDuSQYFkwkS0M5YUF-l3ZUiS-Egx7l2KYMayPqwcj-IxPbutZ9eXtxefz9_Xlx3cfzt9c1r00cq2BK6mUpopJ2SIDw_VAm3EAA8Vsp0bdmJ4bg41hbcvGBsFINSox8Ia3FMRZ9fqou2zdjENfVhcvdkluLsfbCM7-3QluZ6e4t63UQje8CLy4FUjx24Z5tbPLPXoPAeOWLTdMalXiYAV9_g96FbcUij3LW9ZwrRXTheJHqk8x54Tj3TGM2pug7TFoW4K2P4O2hzL07E8bdyO_ki2AOAK5tMKE6ffu_8j-AMCOt3c</recordid><startdate>20230601</startdate><enddate>20230601</enddate><creator>Vivek, Yelleti</creator><creator>Ravi, Vadlamani</creator><creator>Krishna, P. Radha</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>8FE</scope><scope>8FG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>P5Z</scope><scope>P62</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>7X8</scope><scope>5PM</scope><orcidid>https://orcid.org/0000-0003-0082-6227</orcidid></search><sort><creationdate>20230601</creationdate><title>Scalable feature subset selection for big data using parallel hybrid evolutionary algorithm based wrapper under apache spark environment</title><author>Vivek, Yelleti ; Ravi, Vadlamani ; Krishna, P. Radha</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c474t-a254556051449e1a726d08fda7a386b5f687c277e871991f8ea745f53d28290a3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Ablation</topic><topic>Adaptive algorithms</topic><topic>Big Data</topic><topic>Computer Communication Networks</topic><topic>Computer Science</topic><topic>Data mining</topic><topic>Datasets</topic><topic>Design</topic><topic>Evolutionary algorithms</topic><topic>Evolutionary computation</topic><topic>Genetic algorithms</topic><topic>Iterative methods</topic><topic>Operating Systems</topic><topic>Operators (mathematics)</topic><topic>Optimization</topic><topic>Parallel processing</topic><topic>Permutations</topic><topic>Processor Architectures</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Vivek, Yelleti</creatorcontrib><creatorcontrib>Ravi, Vadlamani</creatorcontrib><creatorcontrib>Krishna, P. Radha</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>AUTh Library subscriptions: ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection (Proquest) (PQ_SDU_P3)</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Cluster computing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Vivek, Yelleti</au><au>Ravi, Vadlamani</au><au>Krishna, P. Radha</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Scalable feature subset selection for big data using parallel hybrid evolutionary algorithm based wrapper under apache spark environment</atitle><jtitle>Cluster computing</jtitle><stitle>Cluster Comput</stitle><addtitle>Cluster Comput</addtitle><date>2023-06-01</date><risdate>2023</risdate><volume>26</volume><issue>3</issue><spage>1949</spage><epage>1983</epage><pages>1949-1983</pages><issn>1386-7857</issn><eissn>1573-7543</eissn><abstract>Extant sequential wrapper-based feature subset selection (FSS) algorithms are not scalable and yield poor performance when applied to big datasets. Hence, to circumvent these challenges, we propose parallel and distributed hybrid evolutionary algorithms (EAs) based wrappers under Apache Spark. We propose two hybrid EAs based on the Binary Differential Evolution (BDE), and Binary Threshold Accepting (BTA), namely, (i) Parallel Binary Differential Evolution and Threshold Accepting (PB-DETA), where BDE and BTA work in tandem in every iteration, and (ii) its ablation variant, Parallel Binary Threshold Accepting and Differential Evolution (PB-TADE). Here, BTA is invoked to enhance the search capability and avoid premature convergence of BDE. For comparison purposes, we also parallelized two state-of-the-art algorithms: adaptive DE (ADE) and permutation based DE (DE-FS
PM
), and named them PB-ADE and P-DE-FS
PM
respectively. Throughout, logistic regression (LR) is employed to compute the fitness function, namely, area under the receiver operator characteristic curve (AUC). The effectiveness of the proposed algorithms is tested over the five big datasets of varying dimensions. It is noteworthy that the PB-TADE turned out to be statistically significant than the rest. All the algorithms have shown the repeatability property. The proposed parallel model attained a speedup of 2.2–2.9. We also reported feature subset with high AUC and least cardinality.</abstract><cop>New York</cop><pub>Springer US</pub><pmid>36105649</pmid><doi>10.1007/s10586-022-03725-w</doi><tpages>35</tpages><orcidid>https://orcid.org/0000-0003-0082-6227</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1386-7857 |
ispartof | Cluster computing, 2023-06, Vol.26 (3), p.1949-1983 |
issn | 1386-7857 1573-7543 |
language | eng |
recordid | cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_9463682 |
source | Springer Nature |
subjects | Ablation Adaptive algorithms Big Data Computer Communication Networks Computer Science Data mining Datasets Design Evolutionary algorithms Evolutionary computation Genetic algorithms Iterative methods Operating Systems Operators (mathematics) Optimization Parallel processing Permutations Processor Architectures |
title | Scalable feature subset selection for big data using parallel hybrid evolutionary algorithm based wrapper under apache spark environment |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T04%3A37%3A51IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Scalable%20feature%20subset%20selection%20for%20big%20data%20using%20parallel%20hybrid%20evolutionary%20algorithm%20based%20wrapper%20under%20apache%20spark%20environment&rft.jtitle=Cluster%20computing&rft.au=Vivek,%20Yelleti&rft.date=2023-06-01&rft.volume=26&rft.issue=3&rft.spage=1949&rft.epage=1983&rft.pages=1949-1983&rft.issn=1386-7857&rft.eissn=1573-7543&rft_id=info:doi/10.1007/s10586-022-03725-w&rft_dat=%3Cproquest_pubme%3E2714657251%3C/proquest_pubme%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c474t-a254556051449e1a726d08fda7a386b5f687c277e871991f8ea745f53d28290a3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2918266516&rft_id=info:pmid/36105649&rfr_iscdi=true |