Loading…

Retrospective analysis of the accuracy of predicting the alert level of COVID-19 in 202 countries using Google Trends and machine learning

Internet search engine data, such as Google Trends, was shown to be correlated with the incidence of COVID-19, but only in several countries. We aim to develop a model from a small number of countries to predict the epidemic alert level in all the countries worldwide. The "interest over time&qu...

Full description

Saved in:
Bibliographic Details
Published in:Journal of global health 2020-12, Vol.10 (2), p.020511-020511
Main Authors: Peng, Yuanyuan, Li, Cuilian, Rong, Yibiao, Chen, Xinjian, Chen, Haoyu
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c484t-a9a4412745563d55c2d3b7d985d6839401c73be36ef5fbc54f85011de843327a3
cites cdi_FETCH-LOGICAL-c484t-a9a4412745563d55c2d3b7d985d6839401c73be36ef5fbc54f85011de843327a3
container_end_page 020511
container_issue 2
container_start_page 020511
container_title Journal of global health
container_volume 10
creator Peng, Yuanyuan
Li, Cuilian
Rong, Yibiao
Chen, Xinjian
Chen, Haoyu
description Internet search engine data, such as Google Trends, was shown to be correlated with the incidence of COVID-19, but only in several countries. We aim to develop a model from a small number of countries to predict the epidemic alert level in all the countries worldwide. The "interest over time" and "interest by region" Google Trends data of Coronavirus, pneumonia, and six COVID symptom-related terms were searched. The daily incidence of COVID-19 from 10 January to 23 April 2020 of 202 countries was retrieved from the World Health Organization. Three alert levels were defined. Ten weeks' data from 20 countries were used for training with machine learning algorithms. The features were selected according to the correlation and importance. The model was then tested on 2830 samples of 202 countries. Our model performed well in 154 (76.2%) countries, of which each had no more than four misclassified samples. In these 154 countries, the accuracy was 0.8133, and the kappa coefficient was 0.6828. While in all 202 countries, the accuracy was 0.7527, and the kappa coefficient was 0.5841. The proposed algorithm based on Random Forest Classification and nine features performed better compared to other machine learning methods and the models with different numbers of features. Our result suggested that the model developed from 20 countries with Google Trends data and Random Forest Classification can be applied to predict the epidemic alert levels of most countries worldwide.
doi_str_mv 10.7189/jogh.10.020511
format article
fullrecord <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_7567446</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2455174055</sourcerecordid><originalsourceid>FETCH-LOGICAL-c484t-a9a4412745563d55c2d3b7d985d6839401c73be36ef5fbc54f85011de843327a3</originalsourceid><addsrcrecordid>eNpdkV9rFDEUxYMottS--igBX3yZbf5OMi-CrFoLhYJUX0M2ubObZTZZk5mF_Qp-ajNsXap5SQ73d09uchB6S8lCUd3dbNN6s6iCMCIpfYEuGRGqYZ1uX57PSl-g61K2pC5FOdPta3TBOaVEduIS_f4OY05lD24MB8A22uFYQsGpx-OmauembN1x1vsMPlQsrk-lAfKIBzjAMFeXDz_vPje0wyFiRhh2aYpjDlDwVOaW25TWA-DHDNGXeo_HO-s2IUK1sDlW5A161duhwPXTfoV-fP3yuPzW3D_c3i0_3TdOaDE2trNCUKaElC33Ujrm-Ur5Tkvfat4JQp3iK-At9LJfOSl6LQmlHrTgnCnLr9DHk-9-Wu3AO6hz2sHsc9jZfDTJBvNvJYaNWaeDUbJVQrTV4MOTQU6_Jiij2YXiYBhshDQVw-poVAkiZUXf_4du05TrJ1dKdpViLVGVWpwoV6MoGfrzMJSYOWkzJz2LU9K14d3zJ5zxv7nyP8cppM4</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2595172607</pqid></control><display><type>article</type><title>Retrospective analysis of the accuracy of predicting the alert level of COVID-19 in 202 countries using Google Trends and machine learning</title><source>Publicly Available Content Database</source><source>PubMed Central</source><source>Coronavirus Research Database</source><creator>Peng, Yuanyuan ; Li, Cuilian ; Rong, Yibiao ; Chen, Xinjian ; Chen, Haoyu</creator><creatorcontrib>Peng, Yuanyuan ; Li, Cuilian ; Rong, Yibiao ; Chen, Xinjian ; Chen, Haoyu</creatorcontrib><description>Internet search engine data, such as Google Trends, was shown to be correlated with the incidence of COVID-19, but only in several countries. We aim to develop a model from a small number of countries to predict the epidemic alert level in all the countries worldwide. The "interest over time" and "interest by region" Google Trends data of Coronavirus, pneumonia, and six COVID symptom-related terms were searched. The daily incidence of COVID-19 from 10 January to 23 April 2020 of 202 countries was retrieved from the World Health Organization. Three alert levels were defined. Ten weeks' data from 20 countries were used for training with machine learning algorithms. The features were selected according to the correlation and importance. The model was then tested on 2830 samples of 202 countries. Our model performed well in 154 (76.2%) countries, of which each had no more than four misclassified samples. In these 154 countries, the accuracy was 0.8133, and the kappa coefficient was 0.6828. While in all 202 countries, the accuracy was 0.7527, and the kappa coefficient was 0.5841. The proposed algorithm based on Random Forest Classification and nine features performed better compared to other machine learning methods and the models with different numbers of features. Our result suggested that the model developed from 20 countries with Google Trends data and Random Forest Classification can be applied to predict the epidemic alert levels of most countries worldwide.</description><identifier>ISSN: 2047-2978</identifier><identifier>EISSN: 2047-2986</identifier><identifier>DOI: 10.7189/jogh.10.020511</identifier><identifier>PMID: 33110594</identifier><language>eng</language><publisher>Scotland: Edinburgh University Global Health Society</publisher><subject>Accuracy ; Algorithms ; Betacoronavirus ; Coronavirus Infections - epidemiology ; Coronaviruses ; Correlation analysis ; COVID-19 ; Data Accuracy ; Deep learning ; Diarrhea ; Disease transmission ; Epidemics ; Fever ; Global health ; Global Health - statistics &amp; numerical data ; Humans ; Incidence ; Infectious diseases ; Machine learning ; Machine Learning - statistics &amp; numerical data ; Models, Statistical ; Pandemics ; Pneumonia ; Pneumonia, Viral - epidemiology ; Research Theme 1: COVID-19 Pandemic ; Retrospective Studies ; SARS-CoV-2 ; Search Engine - statistics &amp; numerical data ; Trends</subject><ispartof>Journal of global health, 2020-12, Vol.10 (2), p.020511-020511</ispartof><rights>Copyright © 2020 by the Journal of Global Health. All rights reserved.</rights><rights>Copyright © 2020 by the Journal of Global Health. All rights reserved. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>Copyright © 2020 by the Journal of Global Health. All rights reserved. 2020</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c484t-a9a4412745563d55c2d3b7d985d6839401c73be36ef5fbc54f85011de843327a3</citedby><cites>FETCH-LOGICAL-c484t-a9a4412745563d55c2d3b7d985d6839401c73be36ef5fbc54f85011de843327a3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.proquest.com/docview/2595172607/fulltextPDF?pq-origsite=primo$$EPDF$$P50$$Gproquest$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/2595172607?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>230,314,723,776,780,881,25732,27903,27904,36991,36992,38495,43874,44569,53769,53771,74158,74872</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/33110594$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Peng, Yuanyuan</creatorcontrib><creatorcontrib>Li, Cuilian</creatorcontrib><creatorcontrib>Rong, Yibiao</creatorcontrib><creatorcontrib>Chen, Xinjian</creatorcontrib><creatorcontrib>Chen, Haoyu</creatorcontrib><title>Retrospective analysis of the accuracy of predicting the alert level of COVID-19 in 202 countries using Google Trends and machine learning</title><title>Journal of global health</title><addtitle>J Glob Health</addtitle><description>Internet search engine data, such as Google Trends, was shown to be correlated with the incidence of COVID-19, but only in several countries. We aim to develop a model from a small number of countries to predict the epidemic alert level in all the countries worldwide. The "interest over time" and "interest by region" Google Trends data of Coronavirus, pneumonia, and six COVID symptom-related terms were searched. The daily incidence of COVID-19 from 10 January to 23 April 2020 of 202 countries was retrieved from the World Health Organization. Three alert levels were defined. Ten weeks' data from 20 countries were used for training with machine learning algorithms. The features were selected according to the correlation and importance. The model was then tested on 2830 samples of 202 countries. Our model performed well in 154 (76.2%) countries, of which each had no more than four misclassified samples. In these 154 countries, the accuracy was 0.8133, and the kappa coefficient was 0.6828. While in all 202 countries, the accuracy was 0.7527, and the kappa coefficient was 0.5841. The proposed algorithm based on Random Forest Classification and nine features performed better compared to other machine learning methods and the models with different numbers of features. Our result suggested that the model developed from 20 countries with Google Trends data and Random Forest Classification can be applied to predict the epidemic alert levels of most countries worldwide.</description><subject>Accuracy</subject><subject>Algorithms</subject><subject>Betacoronavirus</subject><subject>Coronavirus Infections - epidemiology</subject><subject>Coronaviruses</subject><subject>Correlation analysis</subject><subject>COVID-19</subject><subject>Data Accuracy</subject><subject>Deep learning</subject><subject>Diarrhea</subject><subject>Disease transmission</subject><subject>Epidemics</subject><subject>Fever</subject><subject>Global health</subject><subject>Global Health - statistics &amp; numerical data</subject><subject>Humans</subject><subject>Incidence</subject><subject>Infectious diseases</subject><subject>Machine learning</subject><subject>Machine Learning - statistics &amp; numerical data</subject><subject>Models, Statistical</subject><subject>Pandemics</subject><subject>Pneumonia</subject><subject>Pneumonia, Viral - epidemiology</subject><subject>Research Theme 1: COVID-19 Pandemic</subject><subject>Retrospective Studies</subject><subject>SARS-CoV-2</subject><subject>Search Engine - statistics &amp; numerical data</subject><subject>Trends</subject><issn>2047-2978</issn><issn>2047-2986</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>COVID</sourceid><sourceid>PIMPY</sourceid><recordid>eNpdkV9rFDEUxYMottS--igBX3yZbf5OMi-CrFoLhYJUX0M2ubObZTZZk5mF_Qp-ajNsXap5SQ73d09uchB6S8lCUd3dbNN6s6iCMCIpfYEuGRGqYZ1uX57PSl-g61K2pC5FOdPta3TBOaVEduIS_f4OY05lD24MB8A22uFYQsGpx-OmauembN1x1vsMPlQsrk-lAfKIBzjAMFeXDz_vPje0wyFiRhh2aYpjDlDwVOaW25TWA-DHDNGXeo_HO-s2IUK1sDlW5A161duhwPXTfoV-fP3yuPzW3D_c3i0_3TdOaDE2trNCUKaElC33Ujrm-Ur5Tkvfat4JQp3iK-At9LJfOSl6LQmlHrTgnCnLr9DHk-9-Wu3AO6hz2sHsc9jZfDTJBvNvJYaNWaeDUbJVQrTV4MOTQU6_Jiij2YXiYBhshDQVw-poVAkiZUXf_4du05TrJ1dKdpViLVGVWpwoV6MoGfrzMJSYOWkzJz2LU9K14d3zJ5zxv7nyP8cppM4</recordid><startdate>20201201</startdate><enddate>20201201</enddate><creator>Peng, Yuanyuan</creator><creator>Li, Cuilian</creator><creator>Rong, Yibiao</creator><creator>Chen, Xinjian</creator><creator>Chen, Haoyu</creator><general>Edinburgh University Global Health Society</general><general>International Society of Global Health</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7X7</scope><scope>7XB</scope><scope>88E</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AEUYN</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>CCPQU</scope><scope>COVID</scope><scope>DWQXO</scope><scope>EHMNL</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>K9.</scope><scope>M0S</scope><scope>M1P</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>7X8</scope><scope>5PM</scope></search><sort><creationdate>20201201</creationdate><title>Retrospective analysis of the accuracy of predicting the alert level of COVID-19 in 202 countries using Google Trends and machine learning</title><author>Peng, Yuanyuan ; Li, Cuilian ; Rong, Yibiao ; Chen, Xinjian ; Chen, Haoyu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c484t-a9a4412745563d55c2d3b7d985d6839401c73be36ef5fbc54f85011de843327a3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Accuracy</topic><topic>Algorithms</topic><topic>Betacoronavirus</topic><topic>Coronavirus Infections - epidemiology</topic><topic>Coronaviruses</topic><topic>Correlation analysis</topic><topic>COVID-19</topic><topic>Data Accuracy</topic><topic>Deep learning</topic><topic>Diarrhea</topic><topic>Disease transmission</topic><topic>Epidemics</topic><topic>Fever</topic><topic>Global health</topic><topic>Global Health - statistics &amp; numerical data</topic><topic>Humans</topic><topic>Incidence</topic><topic>Infectious diseases</topic><topic>Machine learning</topic><topic>Machine Learning - statistics &amp; numerical data</topic><topic>Models, Statistical</topic><topic>Pandemics</topic><topic>Pneumonia</topic><topic>Pneumonia, Viral - epidemiology</topic><topic>Research Theme 1: COVID-19 Pandemic</topic><topic>Retrospective Studies</topic><topic>SARS-CoV-2</topic><topic>Search Engine - statistics &amp; numerical data</topic><topic>Trends</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Peng, Yuanyuan</creatorcontrib><creatorcontrib>Li, Cuilian</creatorcontrib><creatorcontrib>Rong, Yibiao</creatorcontrib><creatorcontrib>Chen, Xinjian</creatorcontrib><creatorcontrib>Chen, Haoyu</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Health &amp; Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Medical Database (Alumni Edition)</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest One Sustainability</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>ProQuest One Community College</collection><collection>Coronavirus Research Database</collection><collection>ProQuest Central</collection><collection>UK &amp; Ireland Database</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>Health &amp; Medical Collection (Alumni Edition)</collection><collection>Medical Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Journal of global health</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Peng, Yuanyuan</au><au>Li, Cuilian</au><au>Rong, Yibiao</au><au>Chen, Xinjian</au><au>Chen, Haoyu</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Retrospective analysis of the accuracy of predicting the alert level of COVID-19 in 202 countries using Google Trends and machine learning</atitle><jtitle>Journal of global health</jtitle><addtitle>J Glob Health</addtitle><date>2020-12-01</date><risdate>2020</risdate><volume>10</volume><issue>2</issue><spage>020511</spage><epage>020511</epage><pages>020511-020511</pages><issn>2047-2978</issn><eissn>2047-2986</eissn><abstract>Internet search engine data, such as Google Trends, was shown to be correlated with the incidence of COVID-19, but only in several countries. We aim to develop a model from a small number of countries to predict the epidemic alert level in all the countries worldwide. The "interest over time" and "interest by region" Google Trends data of Coronavirus, pneumonia, and six COVID symptom-related terms were searched. The daily incidence of COVID-19 from 10 January to 23 April 2020 of 202 countries was retrieved from the World Health Organization. Three alert levels were defined. Ten weeks' data from 20 countries were used for training with machine learning algorithms. The features were selected according to the correlation and importance. The model was then tested on 2830 samples of 202 countries. Our model performed well in 154 (76.2%) countries, of which each had no more than four misclassified samples. In these 154 countries, the accuracy was 0.8133, and the kappa coefficient was 0.6828. While in all 202 countries, the accuracy was 0.7527, and the kappa coefficient was 0.5841. The proposed algorithm based on Random Forest Classification and nine features performed better compared to other machine learning methods and the models with different numbers of features. Our result suggested that the model developed from 20 countries with Google Trends data and Random Forest Classification can be applied to predict the epidemic alert levels of most countries worldwide.</abstract><cop>Scotland</cop><pub>Edinburgh University Global Health Society</pub><pmid>33110594</pmid><doi>10.7189/jogh.10.020511</doi><tpages>1</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2047-2978
ispartof Journal of global health, 2020-12, Vol.10 (2), p.020511-020511
issn 2047-2978
2047-2986
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_7567446
source Publicly Available Content Database; PubMed Central; Coronavirus Research Database
subjects Accuracy
Algorithms
Betacoronavirus
Coronavirus Infections - epidemiology
Coronaviruses
Correlation analysis
COVID-19
Data Accuracy
Deep learning
Diarrhea
Disease transmission
Epidemics
Fever
Global health
Global Health - statistics & numerical data
Humans
Incidence
Infectious diseases
Machine learning
Machine Learning - statistics & numerical data
Models, Statistical
Pandemics
Pneumonia
Pneumonia, Viral - epidemiology
Research Theme 1: COVID-19 Pandemic
Retrospective Studies
SARS-CoV-2
Search Engine - statistics & numerical data
Trends
title Retrospective analysis of the accuracy of predicting the alert level of COVID-19 in 202 countries using Google Trends and machine learning
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-27T02%3A52%3A16IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Retrospective%20analysis%20of%20the%20accuracy%20of%20predicting%20the%20alert%20level%20of%20COVID-19%20in%20202%20countries%20using%20Google%20Trends%20and%20machine%20learning&rft.jtitle=Journal%20of%20global%20health&rft.au=Peng,%20Yuanyuan&rft.date=2020-12-01&rft.volume=10&rft.issue=2&rft.spage=020511&rft.epage=020511&rft.pages=020511-020511&rft.issn=2047-2978&rft.eissn=2047-2986&rft_id=info:doi/10.7189/jogh.10.020511&rft_dat=%3Cproquest_pubme%3E2455174055%3C/proquest_pubme%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c484t-a9a4412745563d55c2d3b7d985d6839401c73be36ef5fbc54f85011de843327a3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2595172607&rft_id=info:pmid/33110594&rfr_iscdi=true