Loading…

Word2vec neural model-based technique to generate protein vectors for combating COVID-19: a machine learning approach

The world was ambushed in 2019 by the COVID-19 virus which affected the health, economy, and lifestyle of individuals worldwide. One way of combating such a public health concern is by using appropriate, rapid, and unbiased diagnostic tools for quick detection of infected people. However, a current...

Full description

Saved in:
Bibliographic Details
Published in:International journal of information technology (Singapore. Online) 2022, Vol.14 (7), p.3291-3299
Main Authors: Adjuik, Toby A., Ananey-Obiri, Daniel
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c3192-441d940cc3284e35fd2107a0823f01dbe9d3dc92f339024d299050f7c52f480f3
cites cdi_FETCH-LOGICAL-c3192-441d940cc3284e35fd2107a0823f01dbe9d3dc92f339024d299050f7c52f480f3
container_end_page 3299
container_issue 7
container_start_page 3291
container_title International journal of information technology (Singapore. Online)
container_volume 14
creator Adjuik, Toby A.
Ananey-Obiri, Daniel
description The world was ambushed in 2019 by the COVID-19 virus which affected the health, economy, and lifestyle of individuals worldwide. One way of combating such a public health concern is by using appropriate, rapid, and unbiased diagnostic tools for quick detection of infected people. However, a current dearth of bioinformatics tools necessitates modeling studies to help diagnose COVID-19 cases. Molecular-based methods such as the real-time reverse transcription polymerase chain reaction (rRT-PCR) for detecting COVID-19 is time consuming and prone to contamination. Modern bioinformatics tools have made it possible to create large databases of protein sequences of various diseases, apply data mining techniques, and accurately diagnose diseases. However, the current sequence alignment tools that use these databases are not able to detect novel COVID-19 viral sequences due to high sequence dissimilarity. The objective of this study, therefore, was to develop models that can accurately classify COVID-19 viral sequences rapidly using protein vectors generated by neural word embedding technique. Five machine learning models; K nearest neighbor regression (KNN), support vector machine (SVM), random forest (RF), Linear discriminant analysis (LDA), and Logistic regression were developed using datasets from the National Center for Biotechnology. Our results suggest, the RF model performed better than all other models on the training dataset with 99% accuracy score and 99.5% accuracy on the testing dataset. The implication of this study is that, rapid detection of the COVID-19 virus in suspected cases could potentially save lives as less time will be needed to ascertain the status of a patient.
doi_str_mv 10.1007/s41870-022-00949-2
format article
fullrecord <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_9119569</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2669505313</sourcerecordid><originalsourceid>FETCH-LOGICAL-c3192-441d940cc3284e35fd2107a0823f01dbe9d3dc92f339024d299050f7c52f480f3</originalsourceid><addsrcrecordid>eNp9UcluFDEQbSEQiUJ-gAPykYvB5aV7zAEJTVgiRcqF5Wi57fKMUbc92N2R-HscJozgwqmseku56nXdc2CvgLHhdZWwGRhlnFPGtNSUP-rOuQKgHIA_Pr2ZPOsua40jE8B7oQZ42p0J1QOAUufd-i0Xz-_QkYRrsROZs8eJjraiJwu6fYo_ViRLJjtMWOyC5FDygjGRJlpyqSTkQlyeR7vEtCPb26_XVxT0G2LJbN0-JiQT2pLuQXto4tZ81j0Jdqp4-VAvui8f3n_efqI3tx-vt-9uqBOgOZUSvJbMOcE3EoUKvu0zWLbhIjDwI2ovvNM8CKEZl55rzRQLg1M8yA0L4qJ7e_Q9rOOM3mFa2o7mUOJsy0-TbTT_IinuzS7fGQ2gVa-bwcsHg5LbHepi5lgdTpNNmNdqeN9rxZQA0aj8SHUl11ownMYAM_eRmWNkpkVmfkdmeBO9-PuDJ8mfgBpBHAm1QWmHxXzPa0ntaP-z_QX17qI3</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2669505313</pqid></control><display><type>article</type><title>Word2vec neural model-based technique to generate protein vectors for combating COVID-19: a machine learning approach</title><source>Springer Nature</source><creator>Adjuik, Toby A. ; Ananey-Obiri, Daniel</creator><creatorcontrib>Adjuik, Toby A. ; Ananey-Obiri, Daniel</creatorcontrib><description>The world was ambushed in 2019 by the COVID-19 virus which affected the health, economy, and lifestyle of individuals worldwide. One way of combating such a public health concern is by using appropriate, rapid, and unbiased diagnostic tools for quick detection of infected people. However, a current dearth of bioinformatics tools necessitates modeling studies to help diagnose COVID-19 cases. Molecular-based methods such as the real-time reverse transcription polymerase chain reaction (rRT-PCR) for detecting COVID-19 is time consuming and prone to contamination. Modern bioinformatics tools have made it possible to create large databases of protein sequences of various diseases, apply data mining techniques, and accurately diagnose diseases. However, the current sequence alignment tools that use these databases are not able to detect novel COVID-19 viral sequences due to high sequence dissimilarity. The objective of this study, therefore, was to develop models that can accurately classify COVID-19 viral sequences rapidly using protein vectors generated by neural word embedding technique. Five machine learning models; K nearest neighbor regression (KNN), support vector machine (SVM), random forest (RF), Linear discriminant analysis (LDA), and Logistic regression were developed using datasets from the National Center for Biotechnology. Our results suggest, the RF model performed better than all other models on the training dataset with 99% accuracy score and 99.5% accuracy on the testing dataset. The implication of this study is that, rapid detection of the COVID-19 virus in suspected cases could potentially save lives as less time will be needed to ascertain the status of a patient.</description><identifier>ISSN: 2511-2104</identifier><identifier>EISSN: 2511-2112</identifier><identifier>DOI: 10.1007/s41870-022-00949-2</identifier><identifier>PMID: 35611155</identifier><language>eng</language><publisher>Singapore: Springer Nature Singapore</publisher><subject>Artificial Intelligence ; Computer Imaging ; Computer Science ; Image Processing and Computer Vision ; Machine Learning ; Original Research ; Pattern Recognition and Graphics ; Software Engineering ; Vision</subject><ispartof>International journal of information technology (Singapore. Online), 2022, Vol.14 (7), p.3291-3299</ispartof><rights>The Author(s), under exclusive licence to Bharati Vidyapeeth's Institute of Computer Applications and Management 2022</rights><rights>The Author(s), under exclusive licence to Bharati Vidyapeeth's Institute of Computer Applications and Management 2022.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c3192-441d940cc3284e35fd2107a0823f01dbe9d3dc92f339024d299050f7c52f480f3</citedby><cites>FETCH-LOGICAL-c3192-441d940cc3284e35fd2107a0823f01dbe9d3dc92f339024d299050f7c52f480f3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>230,314,780,784,885,27924,27925</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/35611155$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Adjuik, Toby A.</creatorcontrib><creatorcontrib>Ananey-Obiri, Daniel</creatorcontrib><title>Word2vec neural model-based technique to generate protein vectors for combating COVID-19: a machine learning approach</title><title>International journal of information technology (Singapore. Online)</title><addtitle>Int. j. inf. tecnol</addtitle><addtitle>Int J Inf Technol</addtitle><description>The world was ambushed in 2019 by the COVID-19 virus which affected the health, economy, and lifestyle of individuals worldwide. One way of combating such a public health concern is by using appropriate, rapid, and unbiased diagnostic tools for quick detection of infected people. However, a current dearth of bioinformatics tools necessitates modeling studies to help diagnose COVID-19 cases. Molecular-based methods such as the real-time reverse transcription polymerase chain reaction (rRT-PCR) for detecting COVID-19 is time consuming and prone to contamination. Modern bioinformatics tools have made it possible to create large databases of protein sequences of various diseases, apply data mining techniques, and accurately diagnose diseases. However, the current sequence alignment tools that use these databases are not able to detect novel COVID-19 viral sequences due to high sequence dissimilarity. The objective of this study, therefore, was to develop models that can accurately classify COVID-19 viral sequences rapidly using protein vectors generated by neural word embedding technique. Five machine learning models; K nearest neighbor regression (KNN), support vector machine (SVM), random forest (RF), Linear discriminant analysis (LDA), and Logistic regression were developed using datasets from the National Center for Biotechnology. Our results suggest, the RF model performed better than all other models on the training dataset with 99% accuracy score and 99.5% accuracy on the testing dataset. The implication of this study is that, rapid detection of the COVID-19 virus in suspected cases could potentially save lives as less time will be needed to ascertain the status of a patient.</description><subject>Artificial Intelligence</subject><subject>Computer Imaging</subject><subject>Computer Science</subject><subject>Image Processing and Computer Vision</subject><subject>Machine Learning</subject><subject>Original Research</subject><subject>Pattern Recognition and Graphics</subject><subject>Software Engineering</subject><subject>Vision</subject><issn>2511-2104</issn><issn>2511-2112</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNp9UcluFDEQbSEQiUJ-gAPykYvB5aV7zAEJTVgiRcqF5Wi57fKMUbc92N2R-HscJozgwqmseku56nXdc2CvgLHhdZWwGRhlnFPGtNSUP-rOuQKgHIA_Pr2ZPOsua40jE8B7oQZ42p0J1QOAUufd-i0Xz-_QkYRrsROZs8eJjraiJwu6fYo_ViRLJjtMWOyC5FDygjGRJlpyqSTkQlyeR7vEtCPb26_XVxT0G2LJbN0-JiQT2pLuQXto4tZ81j0Jdqp4-VAvui8f3n_efqI3tx-vt-9uqBOgOZUSvJbMOcE3EoUKvu0zWLbhIjDwI2ovvNM8CKEZl55rzRQLg1M8yA0L4qJ7e_Q9rOOM3mFa2o7mUOJsy0-TbTT_IinuzS7fGQ2gVa-bwcsHg5LbHepi5lgdTpNNmNdqeN9rxZQA0aj8SHUl11ownMYAM_eRmWNkpkVmfkdmeBO9-PuDJ8mfgBpBHAm1QWmHxXzPa0ntaP-z_QX17qI3</recordid><startdate>2022</startdate><enddate>2022</enddate><creator>Adjuik, Toby A.</creator><creator>Ananey-Obiri, Daniel</creator><general>Springer Nature Singapore</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>5PM</scope></search><sort><creationdate>2022</creationdate><title>Word2vec neural model-based technique to generate protein vectors for combating COVID-19: a machine learning approach</title><author>Adjuik, Toby A. ; Ananey-Obiri, Daniel</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c3192-441d940cc3284e35fd2107a0823f01dbe9d3dc92f339024d299050f7c52f480f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Artificial Intelligence</topic><topic>Computer Imaging</topic><topic>Computer Science</topic><topic>Image Processing and Computer Vision</topic><topic>Machine Learning</topic><topic>Original Research</topic><topic>Pattern Recognition and Graphics</topic><topic>Software Engineering</topic><topic>Vision</topic><toplevel>online_resources</toplevel><creatorcontrib>Adjuik, Toby A.</creatorcontrib><creatorcontrib>Ananey-Obiri, Daniel</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>International journal of information technology (Singapore. Online)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Adjuik, Toby A.</au><au>Ananey-Obiri, Daniel</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Word2vec neural model-based technique to generate protein vectors for combating COVID-19: a machine learning approach</atitle><jtitle>International journal of information technology (Singapore. Online)</jtitle><stitle>Int. j. inf. tecnol</stitle><addtitle>Int J Inf Technol</addtitle><date>2022</date><risdate>2022</risdate><volume>14</volume><issue>7</issue><spage>3291</spage><epage>3299</epage><pages>3291-3299</pages><issn>2511-2104</issn><eissn>2511-2112</eissn><abstract>The world was ambushed in 2019 by the COVID-19 virus which affected the health, economy, and lifestyle of individuals worldwide. One way of combating such a public health concern is by using appropriate, rapid, and unbiased diagnostic tools for quick detection of infected people. However, a current dearth of bioinformatics tools necessitates modeling studies to help diagnose COVID-19 cases. Molecular-based methods such as the real-time reverse transcription polymerase chain reaction (rRT-PCR) for detecting COVID-19 is time consuming and prone to contamination. Modern bioinformatics tools have made it possible to create large databases of protein sequences of various diseases, apply data mining techniques, and accurately diagnose diseases. However, the current sequence alignment tools that use these databases are not able to detect novel COVID-19 viral sequences due to high sequence dissimilarity. The objective of this study, therefore, was to develop models that can accurately classify COVID-19 viral sequences rapidly using protein vectors generated by neural word embedding technique. Five machine learning models; K nearest neighbor regression (KNN), support vector machine (SVM), random forest (RF), Linear discriminant analysis (LDA), and Logistic regression were developed using datasets from the National Center for Biotechnology. Our results suggest, the RF model performed better than all other models on the training dataset with 99% accuracy score and 99.5% accuracy on the testing dataset. The implication of this study is that, rapid detection of the COVID-19 virus in suspected cases could potentially save lives as less time will be needed to ascertain the status of a patient.</abstract><cop>Singapore</cop><pub>Springer Nature Singapore</pub><pmid>35611155</pmid><doi>10.1007/s41870-022-00949-2</doi><tpages>9</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2511-2104
ispartof International journal of information technology (Singapore. Online), 2022, Vol.14 (7), p.3291-3299
issn 2511-2104
2511-2112
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_9119569
source Springer Nature
subjects Artificial Intelligence
Computer Imaging
Computer Science
Image Processing and Computer Vision
Machine Learning
Original Research
Pattern Recognition and Graphics
Software Engineering
Vision
title Word2vec neural model-based technique to generate protein vectors for combating COVID-19: a machine learning approach
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T06%3A13%3A06IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Word2vec%20neural%20model-based%20technique%20to%20generate%20protein%20vectors%20for%20combating%20COVID-19:%20a%20machine%20learning%20approach&rft.jtitle=International%20journal%20of%20information%20technology%20(Singapore.%20Online)&rft.au=Adjuik,%20Toby%20A.&rft.date=2022&rft.volume=14&rft.issue=7&rft.spage=3291&rft.epage=3299&rft.pages=3291-3299&rft.issn=2511-2104&rft.eissn=2511-2112&rft_id=info:doi/10.1007/s41870-022-00949-2&rft_dat=%3Cproquest_pubme%3E2669505313%3C/proquest_pubme%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c3192-441d940cc3284e35fd2107a0823f01dbe9d3dc92f339024d299050f7c52f480f3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2669505313&rft_id=info:pmid/35611155&rfr_iscdi=true