Loading…

High-Performance Machine Learning for Large-Scale Data Classification considering Class Imbalance

Currently, data classification is one of the most important ways to analysis data. However, along with the development of data collection, transmission, and storage technologies, the scale of the data has been sharply increased. Additionally, due to multiple classes and imbalanced data distribution...

Full description

Saved in:

Bibliographic Details
Published in:	Scientific programming 2020, Vol.2020 (2020), p.1-16
Main Authors:	Wang, Xi, Chen, Xianbang, Li, Xiang, Liu, Yang, Li, Huaqiang
Format:	Article
Language:	English
Subjects:	Accuracy Algorithms Artificial neural networks Classification Clustering Data collection Datasets Efficiency Machine learning Neural networks Oversampling Parallel processing Support vector machines Training
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by	cdi_FETCH-LOGICAL-c360t-4ff647d459e650e1aa759d43b6beed5eea34bbdd34328bd575e86d0ba01a55823
cites	cdi_FETCH-LOGICAL-c360t-4ff647d459e650e1aa759d43b6beed5eea34bbdd34328bd575e86d0ba01a55823
container_end_page	16
container_issue	2020
container_start_page	1
container_title	Scientific programming
container_volume	2020
creator	Wang, Xi Chen, Xianbang Li, Xiang Liu, Yang Li, Huaqiang
description	Currently, data classification is one of the most important ways to analysis data. However, along with the development of data collection, transmission, and storage technologies, the scale of the data has been sharply increased. Additionally, due to multiple classes and imbalanced data distribution in the dataset, the class imbalance issue is also gradually highlighted. The traditional machine learning algorithms lack of abilities for handling the aforementioned issues so that the classification efficiency and precision may be significantly impacted. Therefore, this paper presents an improved artificial neural network in enabling the high-performance classification for the imbalanced large volume data. Firstly, the Borderline-SMOTE (synthetic minority oversampling technique) algorithm is employed to balance the training dataset, which potentially aims at improving the training of the back propagation neural network (BPNN), and then, zero-mean, batch-normalization, and rectified linear unit (ReLU) are further employed to optimize the input layer and hidden layers of BPNN. At last, the ensemble learning-based parallelization of the improved BPNN is implemented using the Hadoop framework. Positive conclusions can be summarized according to the experimental results. Benefitting from Borderline-SMOTE, the imbalanced training dataset can be balanced, which improves the training performance and the classification accuracy. The improvements for the input layer and hidden layer also enhance the training performances in terms of convergence. The parallelization and the ensemble learning techniques enable BPNN to implement the high-performance large-scale data classification. The experimental results show the effectiveness of the presented classification algorithm.
doi_str_mv	10.1155/2020/1953461
format	article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2407984882</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2407984882</sourcerecordid><originalsourceid>FETCH-LOGICAL-c360t-4ff647d459e650e1aa759d43b6beed5eea34bbdd34328bd575e86d0ba01a55823</originalsourceid><addsrcrecordid>eNqF0M9LwzAUB_AgCs7pzbMEPGpd0iRtcpT5Y4OJggreymvzumV0rSYd4n9vuw48enoP3ofvgy8h55zdcK7UJGYxm3CjhEz4ARlxnarIcPNx2O1M6cjEUh6TkxDWjHHNGRsRmLnlKnpBXzZ-A3WB9AmKlauRLhB87eol7S50AX6J0WsBFdI7aIFOKwjBla6A1jU1LZo6OIu-97sTnW9yqPrAU3JUQhXwbD_H5P3h_m06ixbPj_Pp7SIqRMLaSJZlIlMrlcFEMeQAqTJWijzJEa1CBCHz3FohRaxzq1KFOrEsB8ZBKR2LMbkccj9987XF0GbrZuvr7mUWS5YaLfVOXQ-q8E0IHsvs07sN-J-Ms6wvMetLzPYldvxq4F0lFr7df_pi0NgZLOFPx0wbnYhfqCp7Vg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2407984882</pqid></control><display><type>article</type><title>High-Performance Machine Learning for Large-Scale Data Classification considering Class Imbalance</title><source>Wiley-Blackwell Titles (Open access)</source><creator>Wang, Xi ; Chen, Xianbang ; Li, Xiang ; Liu, Yang ; Li, Huaqiang</creator><contributor>Ali, Rahman</contributor><creatorcontrib>Wang, Xi ; Chen, Xianbang ; Li, Xiang ; Liu, Yang ; Li, Huaqiang ; Ali, Rahman</creatorcontrib><description>Currently, data classification is one of the most important ways to analysis data. However, along with the development of data collection, transmission, and storage technologies, the scale of the data has been sharply increased. Additionally, due to multiple classes and imbalanced data distribution in the dataset, the class imbalance issue is also gradually highlighted. The traditional machine learning algorithms lack of abilities for handling the aforementioned issues so that the classification efficiency and precision may be significantly impacted. Therefore, this paper presents an improved artificial neural network in enabling the high-performance classification for the imbalanced large volume data. Firstly, the Borderline-SMOTE (synthetic minority oversampling technique) algorithm is employed to balance the training dataset, which potentially aims at improving the training of the back propagation neural network (BPNN), and then, zero-mean, batch-normalization, and rectified linear unit (ReLU) are further employed to optimize the input layer and hidden layers of BPNN. At last, the ensemble learning-based parallelization of the improved BPNN is implemented using the Hadoop framework. Positive conclusions can be summarized according to the experimental results. Benefitting from Borderline-SMOTE, the imbalanced training dataset can be balanced, which improves the training performance and the classification accuracy. The improvements for the input layer and hidden layer also enhance the training performances in terms of convergence. The parallelization and the ensemble learning techniques enable BPNN to implement the high-performance large-scale data classification. The experimental results show the effectiveness of the presented classification algorithm.</description><identifier>ISSN: 1058-9244</identifier><identifier>EISSN: 1875-919X</identifier><identifier>DOI: 10.1155/2020/1953461</identifier><language>eng</language><publisher>Cairo, Egypt: Hindawi Publishing Corporation</publisher><subject>Accuracy ; Algorithms ; Artificial neural networks ; Classification ; Clustering ; Data collection ; Datasets ; Efficiency ; Machine learning ; Neural networks ; Oversampling ; Parallel processing ; Support vector machines ; Training</subject><ispartof>Scientific programming, 2020, Vol.2020 (2020), p.1-16</ispartof><rights>Copyright © 2020 Yang Liu et al.</rights><rights>Copyright © 2020 Yang Liu et al. This is an open access article distributed under the Creative Commons Attribution License (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. http://creativecommons.org/licenses/by/4.0</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c360t-4ff647d459e650e1aa759d43b6beed5eea34bbdd34328bd575e86d0ba01a55823</citedby><cites>FETCH-LOGICAL-c360t-4ff647d459e650e1aa759d43b6beed5eea34bbdd34328bd575e86d0ba01a55823</cites><orcidid>0000-0002-0716-4097</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,4010,27900,27901,27902</link.rule.ids></links><search><contributor>Ali, Rahman</contributor><creatorcontrib>Wang, Xi</creatorcontrib><creatorcontrib>Chen, Xianbang</creatorcontrib><creatorcontrib>Li, Xiang</creatorcontrib><creatorcontrib>Liu, Yang</creatorcontrib><creatorcontrib>Li, Huaqiang</creatorcontrib><title>High-Performance Machine Learning for Large-Scale Data Classification considering Class Imbalance</title><title>Scientific programming</title><description>Currently, data classification is one of the most important ways to analysis data. However, along with the development of data collection, transmission, and storage technologies, the scale of the data has been sharply increased. Additionally, due to multiple classes and imbalanced data distribution in the dataset, the class imbalance issue is also gradually highlighted. The traditional machine learning algorithms lack of abilities for handling the aforementioned issues so that the classification efficiency and precision may be significantly impacted. Therefore, this paper presents an improved artificial neural network in enabling the high-performance classification for the imbalanced large volume data. Firstly, the Borderline-SMOTE (synthetic minority oversampling technique) algorithm is employed to balance the training dataset, which potentially aims at improving the training of the back propagation neural network (BPNN), and then, zero-mean, batch-normalization, and rectified linear unit (ReLU) are further employed to optimize the input layer and hidden layers of BPNN. At last, the ensemble learning-based parallelization of the improved BPNN is implemented using the Hadoop framework. Positive conclusions can be summarized according to the experimental results. Benefitting from Borderline-SMOTE, the imbalanced training dataset can be balanced, which improves the training performance and the classification accuracy. The improvements for the input layer and hidden layer also enhance the training performances in terms of convergence. The parallelization and the ensemble learning techniques enable BPNN to implement the high-performance large-scale data classification. The experimental results show the effectiveness of the presented classification algorithm.</description><subject>Accuracy</subject><subject>Algorithms</subject><subject>Artificial neural networks</subject><subject>Classification</subject><subject>Clustering</subject><subject>Data collection</subject><subject>Datasets</subject><subject>Efficiency</subject><subject>Machine learning</subject><subject>Neural networks</subject><subject>Oversampling</subject><subject>Parallel processing</subject><subject>Support vector machines</subject><subject>Training</subject><issn>1058-9244</issn><issn>1875-919X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><recordid>eNqF0M9LwzAUB_AgCs7pzbMEPGpd0iRtcpT5Y4OJggreymvzumV0rSYd4n9vuw48enoP3ofvgy8h55zdcK7UJGYxm3CjhEz4ARlxnarIcPNx2O1M6cjEUh6TkxDWjHHNGRsRmLnlKnpBXzZ-A3WB9AmKlauRLhB87eol7S50AX6J0WsBFdI7aIFOKwjBla6A1jU1LZo6OIu-97sTnW9yqPrAU3JUQhXwbD_H5P3h_m06ixbPj_Pp7SIqRMLaSJZlIlMrlcFEMeQAqTJWijzJEa1CBCHz3FohRaxzq1KFOrEsB8ZBKR2LMbkccj9987XF0GbrZuvr7mUWS5YaLfVOXQ-q8E0IHsvs07sN-J-Ms6wvMetLzPYldvxq4F0lFr7df_pi0NgZLOFPx0wbnYhfqCp7Vg</recordid><startdate>2020</startdate><enddate>2020</enddate><creator>Wang, Xi</creator><creator>Chen, Xianbang</creator><creator>Li, Xiang</creator><creator>Liu, Yang</creator><creator>Li, Huaqiang</creator><general>Hindawi Publishing Corporation</general><general>Hindawi</general><general>Hindawi Limited</general><scope>ADJCN</scope><scope>AHFXO</scope><scope>RHU</scope><scope>RHW</scope><scope>RHX</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-0716-4097</orcidid></search><sort><creationdate>2020</creationdate><title>High-Performance Machine Learning for Large-Scale Data Classification considering Class Imbalance</title><author>Wang, Xi ; Chen, Xianbang ; Li, Xiang ; Liu, Yang ; Li, Huaqiang</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c360t-4ff647d459e650e1aa759d43b6beed5eea34bbdd34328bd575e86d0ba01a55823</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Accuracy</topic><topic>Algorithms</topic><topic>Artificial neural networks</topic><topic>Classification</topic><topic>Clustering</topic><topic>Data collection</topic><topic>Datasets</topic><topic>Efficiency</topic><topic>Machine learning</topic><topic>Neural networks</topic><topic>Oversampling</topic><topic>Parallel processing</topic><topic>Support vector machines</topic><topic>Training</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Wang, Xi</creatorcontrib><creatorcontrib>Chen, Xianbang</creatorcontrib><creatorcontrib>Li, Xiang</creatorcontrib><creatorcontrib>Liu, Yang</creatorcontrib><creatorcontrib>Li, Huaqiang</creatorcontrib><collection>الدوريات العلمية والإحصائية - e-Marefa Academic and Statistical Periodicals</collection><collection>معرفة - المحتوى العربي الأكاديمي المتكامل - e-Marefa Academic Complete</collection><collection>Hindawi Publishing Complete</collection><collection>Hindawi Publishing Subscription Journals</collection><collection>Hindawi Publishing Open Access</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Scientific programming</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Wang, Xi</au><au>Chen, Xianbang</au><au>Li, Xiang</au><au>Liu, Yang</au><au>Li, Huaqiang</au><au>Ali, Rahman</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>High-Performance Machine Learning for Large-Scale Data Classification considering Class Imbalance</atitle><jtitle>Scientific programming</jtitle><date>2020</date><risdate>2020</risdate><volume>2020</volume><issue>2020</issue><spage>1</spage><epage>16</epage><pages>1-16</pages><issn>1058-9244</issn><eissn>1875-919X</eissn><abstract>Currently, data classification is one of the most important ways to analysis data. However, along with the development of data collection, transmission, and storage technologies, the scale of the data has been sharply increased. Additionally, due to multiple classes and imbalanced data distribution in the dataset, the class imbalance issue is also gradually highlighted. The traditional machine learning algorithms lack of abilities for handling the aforementioned issues so that the classification efficiency and precision may be significantly impacted. Therefore, this paper presents an improved artificial neural network in enabling the high-performance classification for the imbalanced large volume data. Firstly, the Borderline-SMOTE (synthetic minority oversampling technique) algorithm is employed to balance the training dataset, which potentially aims at improving the training of the back propagation neural network (BPNN), and then, zero-mean, batch-normalization, and rectified linear unit (ReLU) are further employed to optimize the input layer and hidden layers of BPNN. At last, the ensemble learning-based parallelization of the improved BPNN is implemented using the Hadoop framework. Positive conclusions can be summarized according to the experimental results. Benefitting from Borderline-SMOTE, the imbalanced training dataset can be balanced, which improves the training performance and the classification accuracy. The improvements for the input layer and hidden layer also enhance the training performances in terms of convergence. The parallelization and the ensemble learning techniques enable BPNN to implement the high-performance large-scale data classification. The experimental results show the effectiveness of the presented classification algorithm.</abstract><cop>Cairo, Egypt</cop><pub>Hindawi Publishing Corporation</pub><doi>10.1155/2020/1953461</doi><tpages>16</tpages><orcidid>https://orcid.org/0000-0002-0716-4097</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 1058-9244
ispartof	Scientific programming, 2020, Vol.2020 (2020), p.1-16
issn	1058-9244 1875-919X
language	eng
recordid	cdi_proquest_journals_2407984882
source	Wiley-Blackwell Titles (Open access)
subjects	Accuracy Algorithms Artificial neural networks Classification Clustering Data collection Datasets Efficiency Machine learning Neural networks Oversampling Parallel processing Support vector machines Training
title	High-Performance Machine Learning for Large-Scale Data Classification considering Class Imbalance
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-23T20%3A31%3A33IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=High-Performance%20Machine%20Learning%20for%20Large-Scale%20Data%20Classification%20considering%20Class%20Imbalance&rft.jtitle=Scientific%20programming&rft.au=Wang,%20Xi&rft.date=2020&rft.volume=2020&rft.issue=2020&rft.spage=1&rft.epage=16&rft.pages=1-16&rft.issn=1058-9244&rft.eissn=1875-919X&rft_id=info:doi/10.1155/2020/1953461&rft_dat=%3Cproquest_cross%3E2407984882%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c360t-4ff647d459e650e1aa759d43b6beed5eea34bbdd34328bd575e86d0ba01a55823%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2407984882&rft_id=info:pmid/&rfr_iscdi=true