Loading…

A Comparative Study of Protein Sequences Classification-Based Machine Learning Methods for COVID-19 Virus against HIV-1

The effective spread of COVID-19 cases in several countries produces more protein sequences that are released in genomic public sources. It provides some awareness and indications for virus classification of COVID-19 and HIV-1 that are essential for drug discovery of COVID-19. This paper reveals the...

Full description

Saved in:
Bibliographic Details
Published in:Applied artificial intelligence 2021-12, Vol.35 (15), p.1733-1745
Main Authors: Afify, Heba M., Zanaty, Muhammad S.
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c451t-3d6e980100dfa5ab2fb6f4b9dc1008b27cab04acef53c6682d8e9423d5afc8343
cites cdi_FETCH-LOGICAL-c451t-3d6e980100dfa5ab2fb6f4b9dc1008b27cab04acef53c6682d8e9423d5afc8343
container_end_page 1745
container_issue 15
container_start_page 1733
container_title Applied artificial intelligence
container_volume 35
creator Afify, Heba M.
Zanaty, Muhammad S.
description The effective spread of COVID-19 cases in several countries produces more protein sequences that are released in genomic public sources. It provides some awareness and indications for virus classification of COVID-19 and HIV-1 that are essential for drug discovery of COVID-19. This paper reveals the importance of machine learning algorithms to handle the recognition of two different viruses. Therefore, 18,476 protein sequences for both COVID-19 and HIV-1 and 9238 for each virus are applied to the proposed model based on feature extraction, data labeling, and six classifiers. Amino acid classification according to their dipoles and volumes is employed as a feature extraction tool based on the creation of eight features from twenty amino acids by using the conjoint triad (CT) method. The data labeling is employed as a coding tool by binary numbers refereeing zero for COVID-19 and one for HIV-1. The random forest (RF) model achieved the highest classification accuracy of 99.89% for eight features and 97.80% for two features. The experimental results significantly confirmed that eight features required more computational time than two features, but the accuracy rate was nearly similar in the two cases. This classification strategy of COVID-19 and HIV-1 will prompt the prediction of protein sequences of the new virus.
doi_str_mv 10.1080/08839514.2021.1991136
format article
fullrecord <record><control><sourceid>proquest_infor</sourceid><recordid>TN_cdi_proquest_journals_2644778318</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><doaj_id>oai_doaj_org_article_3d7b47db551842dfac06485ed17bf15d</doaj_id><sourcerecordid>2644778318</sourcerecordid><originalsourceid>FETCH-LOGICAL-c451t-3d6e980100dfa5ab2fb6f4b9dc1008b27cab04acef53c6682d8e9423d5afc8343</originalsourceid><addsrcrecordid>eNp9kVtvEzEQhS0EEqHwE5As8bzBXttr7xtluTRSqiIV8mrN-pI62tjB3lDl3-OQwiNPI5355sxoDkJvKVlSosh7ohTrBeXLlrR0SfueUtY9Q4valE0nuHiOFmemOUMv0atSdoQQKiVdoMdrPKT9ATLM4ZfD9_PRnnDy-FtOswsR37ufRxeNK3iYoJTgg6lkis1HKM7iWzAPITq8dpBjiFt86-aHZAv2KePhbrP61NAeb0I-FgxbCLHM-Ga1aehr9MLDVNybp3qFfnz5_H24adZ3X1fD9boxXNC5YbZzvSKUEOtBwNj6sfN87K2pkhpbaWAkHIzzgpmuU61VructswK8UYyzK7S6-NoEO33IYQ_5pBME_UdIeashz8FMTjMrRy7tKARVvK37DOm4Es5SOXoqbPV6d_E65FS_Uma9S8cc6_m67TiXUjGqKiUulMmplOz8v62U6HNe-m9e-pyXfsqrzn24zIVYn7eHx5Qnq2c4TSn7DNGEotn_LX4D2K2bqw</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2644778318</pqid></control><display><type>article</type><title>A Comparative Study of Protein Sequences Classification-Based Machine Learning Methods for COVID-19 Virus against HIV-1</title><source>EBSCOhost Business Source Ultimate</source><creator>Afify, Heba M. ; Zanaty, Muhammad S.</creator><creatorcontrib>Afify, Heba M. ; Zanaty, Muhammad S.</creatorcontrib><description>The effective spread of COVID-19 cases in several countries produces more protein sequences that are released in genomic public sources. It provides some awareness and indications for virus classification of COVID-19 and HIV-1 that are essential for drug discovery of COVID-19. This paper reveals the importance of machine learning algorithms to handle the recognition of two different viruses. Therefore, 18,476 protein sequences for both COVID-19 and HIV-1 and 9238 for each virus are applied to the proposed model based on feature extraction, data labeling, and six classifiers. Amino acid classification according to their dipoles and volumes is employed as a feature extraction tool based on the creation of eight features from twenty amino acids by using the conjoint triad (CT) method. The data labeling is employed as a coding tool by binary numbers refereeing zero for COVID-19 and one for HIV-1. The random forest (RF) model achieved the highest classification accuracy of 99.89% for eight features and 97.80% for two features. The experimental results significantly confirmed that eight features required more computational time than two features, but the accuracy rate was nearly similar in the two cases. This classification strategy of COVID-19 and HIV-1 will prompt the prediction of protein sequences of the new virus.</description><identifier>ISSN: 0883-9514</identifier><identifier>EISSN: 1087-6545</identifier><identifier>DOI: 10.1080/08839514.2021.1991136</identifier><language>eng</language><publisher>Philadelphia: Taylor &amp; Francis</publisher><subject>Accuracy ; Algorithms ; Amino acids ; Classification ; Comparative studies ; Computing time ; Coronaviruses ; COVID-19 ; Dipoles ; Feature extraction ; Labeling ; Machine learning ; Proteins ; Viral diseases ; Viruses</subject><ispartof>Applied artificial intelligence, 2021-12, Vol.35 (15), p.1733-1745</ispartof><rights>2021 Taylor &amp; Francis 2021</rights><rights>2021 Taylor &amp; Francis</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c451t-3d6e980100dfa5ab2fb6f4b9dc1008b27cab04acef53c6682d8e9423d5afc8343</citedby><cites>FETCH-LOGICAL-c451t-3d6e980100dfa5ab2fb6f4b9dc1008b27cab04acef53c6682d8e9423d5afc8343</cites><orcidid>0000-0002-6279-0883</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,777,781,27905,27906</link.rule.ids></links><search><creatorcontrib>Afify, Heba M.</creatorcontrib><creatorcontrib>Zanaty, Muhammad S.</creatorcontrib><title>A Comparative Study of Protein Sequences Classification-Based Machine Learning Methods for COVID-19 Virus against HIV-1</title><title>Applied artificial intelligence</title><description>The effective spread of COVID-19 cases in several countries produces more protein sequences that are released in genomic public sources. It provides some awareness and indications for virus classification of COVID-19 and HIV-1 that are essential for drug discovery of COVID-19. This paper reveals the importance of machine learning algorithms to handle the recognition of two different viruses. Therefore, 18,476 protein sequences for both COVID-19 and HIV-1 and 9238 for each virus are applied to the proposed model based on feature extraction, data labeling, and six classifiers. Amino acid classification according to their dipoles and volumes is employed as a feature extraction tool based on the creation of eight features from twenty amino acids by using the conjoint triad (CT) method. The data labeling is employed as a coding tool by binary numbers refereeing zero for COVID-19 and one for HIV-1. The random forest (RF) model achieved the highest classification accuracy of 99.89% for eight features and 97.80% for two features. The experimental results significantly confirmed that eight features required more computational time than two features, but the accuracy rate was nearly similar in the two cases. This classification strategy of COVID-19 and HIV-1 will prompt the prediction of protein sequences of the new virus.</description><subject>Accuracy</subject><subject>Algorithms</subject><subject>Amino acids</subject><subject>Classification</subject><subject>Comparative studies</subject><subject>Computing time</subject><subject>Coronaviruses</subject><subject>COVID-19</subject><subject>Dipoles</subject><subject>Feature extraction</subject><subject>Labeling</subject><subject>Machine learning</subject><subject>Proteins</subject><subject>Viral diseases</subject><subject>Viruses</subject><issn>0883-9514</issn><issn>1087-6545</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>DOA</sourceid><recordid>eNp9kVtvEzEQhS0EEqHwE5As8bzBXttr7xtluTRSqiIV8mrN-pI62tjB3lDl3-OQwiNPI5355sxoDkJvKVlSosh7ohTrBeXLlrR0SfueUtY9Q4valE0nuHiOFmemOUMv0atSdoQQKiVdoMdrPKT9ATLM4ZfD9_PRnnDy-FtOswsR37ufRxeNK3iYoJTgg6lkis1HKM7iWzAPITq8dpBjiFt86-aHZAv2KePhbrP61NAeb0I-FgxbCLHM-Ga1aehr9MLDVNybp3qFfnz5_H24adZ3X1fD9boxXNC5YbZzvSKUEOtBwNj6sfN87K2pkhpbaWAkHIzzgpmuU61VructswK8UYyzK7S6-NoEO33IYQ_5pBME_UdIeashz8FMTjMrRy7tKARVvK37DOm4Es5SOXoqbPV6d_E65FS_Uma9S8cc6_m67TiXUjGqKiUulMmplOz8v62U6HNe-m9e-pyXfsqrzn24zIVYn7eHx5Qnq2c4TSn7DNGEotn_LX4D2K2bqw</recordid><startdate>20211215</startdate><enddate>20211215</enddate><creator>Afify, Heba M.</creator><creator>Zanaty, Muhammad S.</creator><general>Taylor &amp; Francis</general><general>Taylor &amp; Francis Ltd</general><general>Taylor &amp; Francis Group</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0002-6279-0883</orcidid></search><sort><creationdate>20211215</creationdate><title>A Comparative Study of Protein Sequences Classification-Based Machine Learning Methods for COVID-19 Virus against HIV-1</title><author>Afify, Heba M. ; Zanaty, Muhammad S.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c451t-3d6e980100dfa5ab2fb6f4b9dc1008b27cab04acef53c6682d8e9423d5afc8343</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Accuracy</topic><topic>Algorithms</topic><topic>Amino acids</topic><topic>Classification</topic><topic>Comparative studies</topic><topic>Computing time</topic><topic>Coronaviruses</topic><topic>COVID-19</topic><topic>Dipoles</topic><topic>Feature extraction</topic><topic>Labeling</topic><topic>Machine learning</topic><topic>Proteins</topic><topic>Viral diseases</topic><topic>Viruses</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Afify, Heba M.</creatorcontrib><creatorcontrib>Zanaty, Muhammad S.</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>Applied artificial intelligence</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Afify, Heba M.</au><au>Zanaty, Muhammad S.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Comparative Study of Protein Sequences Classification-Based Machine Learning Methods for COVID-19 Virus against HIV-1</atitle><jtitle>Applied artificial intelligence</jtitle><date>2021-12-15</date><risdate>2021</risdate><volume>35</volume><issue>15</issue><spage>1733</spage><epage>1745</epage><pages>1733-1745</pages><issn>0883-9514</issn><eissn>1087-6545</eissn><abstract>The effective spread of COVID-19 cases in several countries produces more protein sequences that are released in genomic public sources. It provides some awareness and indications for virus classification of COVID-19 and HIV-1 that are essential for drug discovery of COVID-19. This paper reveals the importance of machine learning algorithms to handle the recognition of two different viruses. Therefore, 18,476 protein sequences for both COVID-19 and HIV-1 and 9238 for each virus are applied to the proposed model based on feature extraction, data labeling, and six classifiers. Amino acid classification according to their dipoles and volumes is employed as a feature extraction tool based on the creation of eight features from twenty amino acids by using the conjoint triad (CT) method. The data labeling is employed as a coding tool by binary numbers refereeing zero for COVID-19 and one for HIV-1. The random forest (RF) model achieved the highest classification accuracy of 99.89% for eight features and 97.80% for two features. The experimental results significantly confirmed that eight features required more computational time than two features, but the accuracy rate was nearly similar in the two cases. This classification strategy of COVID-19 and HIV-1 will prompt the prediction of protein sequences of the new virus.</abstract><cop>Philadelphia</cop><pub>Taylor &amp; Francis</pub><doi>10.1080/08839514.2021.1991136</doi><tpages>13</tpages><orcidid>https://orcid.org/0000-0002-6279-0883</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0883-9514
ispartof Applied artificial intelligence, 2021-12, Vol.35 (15), p.1733-1745
issn 0883-9514
1087-6545
language eng
recordid cdi_proquest_journals_2644778318
source EBSCOhost Business Source Ultimate
subjects Accuracy
Algorithms
Amino acids
Classification
Comparative studies
Computing time
Coronaviruses
COVID-19
Dipoles
Feature extraction
Labeling
Machine learning
Proteins
Viral diseases
Viruses
title A Comparative Study of Protein Sequences Classification-Based Machine Learning Methods for COVID-19 Virus against HIV-1
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-18T22%3A06%3A20IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_infor&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Comparative%20Study%20of%20Protein%20Sequences%20Classification-Based%20Machine%20Learning%20Methods%20for%20COVID-19%20Virus%20against%20HIV-1&rft.jtitle=Applied%20artificial%20intelligence&rft.au=Afify,%20Heba%20M.&rft.date=2021-12-15&rft.volume=35&rft.issue=15&rft.spage=1733&rft.epage=1745&rft.pages=1733-1745&rft.issn=0883-9514&rft.eissn=1087-6545&rft_id=info:doi/10.1080/08839514.2021.1991136&rft_dat=%3Cproquest_infor%3E2644778318%3C/proquest_infor%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c451t-3d6e980100dfa5ab2fb6f4b9dc1008b27cab04acef53c6682d8e9423d5afc8343%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2644778318&rft_id=info:pmid/&rfr_iscdi=true