Loading…

A protocol for adding knowledge to Wikidata: aligning resources on human coronaviruses

Pandemics, even more than other medical problems, require swift integration of knowledge. When caused by a new virus, understanding the underlying biology may help finding solutions. In a setting where there are a large number of loosely related projects and initiatives, we need common ground, also...

Full description

Saved in:
Bibliographic Details
Published in:BMC biology 2021-01, Vol.19 (1), p.12-12, Article 12
Main Authors: Waagmeester, Andra, Willighagen, Egon L, Su, Andrew I, Kutmon, Martina, Gayo, Jose Emilio Labra, Fernández-Álvarez, Daniel, Groom, Quentin, Schaap, Peter J, Verhagen, Lisa M, Koehorst, Jasper J
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c631t-2c32ed18a7dbae482ca323282a2668118dcc1ce79784f14c3ccea3ad820764ef3
cites cdi_FETCH-LOGICAL-c631t-2c32ed18a7dbae482ca323282a2668118dcc1ce79784f14c3ccea3ad820764ef3
container_end_page 12
container_issue 1
container_start_page 12
container_title BMC biology
container_volume 19
creator Waagmeester, Andra
Willighagen, Egon L
Su, Andrew I
Kutmon, Martina
Gayo, Jose Emilio Labra
Fernández-Álvarez, Daniel
Groom, Quentin
Schaap, Peter J
Verhagen, Lisa M
Koehorst, Jasper J
description Pandemics, even more than other medical problems, require swift integration of knowledge. When caused by a new virus, understanding the underlying biology may help finding solutions. In a setting where there are a large number of loosely related projects and initiatives, we need common ground, also known as a "commons." Wikidata, a public knowledge graph aligned with Wikipedia, is such a commons and uses unique identifiers to link knowledge in other knowledge bases. However, Wikidata may not always have the right schema for the urgent questions. In this paper, we address this problem by showing how a data schema required for the integration can be modeled with entity schemas represented by Shape Expressions. As a telling example, we describe the process of aligning resources on the genomes and proteomes of the SARS-CoV-2 virus and related viruses as well as how Shape Expressions can be defined for Wikidata to model the knowledge, helping others studying the SARS-CoV-2 pandemic. How this model can be used to make data between various resources interoperable is demonstrated by integrating data from NCBI (National Center for Biotechnology Information) Taxonomy, NCBI Genes, UniProt, and WikiPathways. Based on that model, a set of automated applications or bots were written for regular updates of these sources in Wikidata and added to a platform for automatically running these updates. Although this workflow is developed and applied in the context of the COVID-19 pandemic, to demonstrate its broader applicability it was also applied to other human coronaviruses (MERS, SARS, human coronavirus NL63, human coronavirus 229E, human coronavirus HKU1, human coronavirus OC4).
doi_str_mv 10.1186/s12915-020-00940-y
format article
fullrecord <record><control><sourceid>gale_doaj_</sourceid><recordid>TN_cdi_doaj_primary_oai_doaj_org_article_3bffc3df9828436bb991c4ad57f7be4c</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A650496495</galeid><doaj_id>oai_doaj_org_article_3bffc3df9828436bb991c4ad57f7be4c</doaj_id><sourcerecordid>A650496495</sourcerecordid><originalsourceid>FETCH-LOGICAL-c631t-2c32ed18a7dbae482ca323282a2668118dcc1ce79784f14c3ccea3ad820764ef3</originalsourceid><addsrcrecordid>eNqNkltrFDEUxwdRbLv6BXyQAV_0YWpuM8n4ICzFy0Kh4KU-hjO5TLOdTdZkpna_vdluXbrig4SQcPI7_5xz-BfFC4xOMRbN24RJi-sKEVQh1DJUbR4Vx5gzXHGE-OMH96PiJKUlQqTmnD4tjihlgghEj4vLebmOYQwqDKUNsQStne_Lax9-DUb3phxD-cNdOw0jvCthcL3fvkeTwhSVSWXw5dW0Al-qEIOHGxenZNKz4omFIZnn9-es-P7xw7ezz9X5xafF2fy8Ug3FY0UUJUZjAVx3YHJNCiihRBAgTSNyk1oprAxvuWAWM0WVMkBBC4J4w4yls2Kx09UBlnId3QriRgZw8i4QYi8hjk4NRtLOWkW1bXPnjDZd17ZYMdA1t7wzWXtWvN9praduZbQyfowwHIgevnh3JftwI3kup6ZtFnh9LxDDz8mkUa5cUmYYwJswJUmYQCTDrM7oq7_QZZ6nz6PKVIuEoDzvPdVDbsB5G_K_aisq502NWNuwdqt1-g8qL21WTgVvrMvxg4Q3BwmZGc3t2MOUklx8_fL_7MXlIUt2rIohpWjsfnYYya1j5c6xMjtW3jlWbnLSy4dT36f8sSj9DQNH5S8</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2490883788</pqid></control><display><type>article</type><title>A protocol for adding knowledge to Wikidata: aligning resources on human coronaviruses</title><source>Publicly Available Content Database</source><source>PubMed Central</source><source>Coronavirus Research Database</source><creator>Waagmeester, Andra ; Willighagen, Egon L ; Su, Andrew I ; Kutmon, Martina ; Gayo, Jose Emilio Labra ; Fernández-Álvarez, Daniel ; Groom, Quentin ; Schaap, Peter J ; Verhagen, Lisa M ; Koehorst, Jasper J</creator><creatorcontrib>Waagmeester, Andra ; Willighagen, Egon L ; Su, Andrew I ; Kutmon, Martina ; Gayo, Jose Emilio Labra ; Fernández-Álvarez, Daniel ; Groom, Quentin ; Schaap, Peter J ; Verhagen, Lisa M ; Koehorst, Jasper J</creatorcontrib><description>Pandemics, even more than other medical problems, require swift integration of knowledge. When caused by a new virus, understanding the underlying biology may help finding solutions. In a setting where there are a large number of loosely related projects and initiatives, we need common ground, also known as a "commons." Wikidata, a public knowledge graph aligned with Wikipedia, is such a commons and uses unique identifiers to link knowledge in other knowledge bases. However, Wikidata may not always have the right schema for the urgent questions. In this paper, we address this problem by showing how a data schema required for the integration can be modeled with entity schemas represented by Shape Expressions. As a telling example, we describe the process of aligning resources on the genomes and proteomes of the SARS-CoV-2 virus and related viruses as well as how Shape Expressions can be defined for Wikidata to model the knowledge, helping others studying the SARS-CoV-2 pandemic. How this model can be used to make data between various resources interoperable is demonstrated by integrating data from NCBI (National Center for Biotechnology Information) Taxonomy, NCBI Genes, UniProt, and WikiPathways. Based on that model, a set of automated applications or bots were written for regular updates of these sources in Wikidata and added to a platform for automatically running these updates. Although this workflow is developed and applied in the context of the COVID-19 pandemic, to demonstrate its broader applicability it was also applied to other human coronaviruses (MERS, SARS, human coronavirus NL63, human coronavirus 229E, human coronavirus HKU1, human coronavirus OC4).</description><identifier>ISSN: 1741-7007</identifier><identifier>EISSN: 1741-7007</identifier><identifier>DOI: 10.1186/s12915-020-00940-y</identifier><identifier>PMID: 33482803</identifier><language>eng</language><publisher>England: BioMed Central Ltd</publisher><subject>Automation ; Biotechnology ; Coronaviridae ; Coronavirus - genetics ; Coronavirus - physiology ; Coronavirus Infections - metabolism ; Coronavirus Infections - pathology ; Coronavirus Infections - virology ; Coronaviruses ; COVID-19 ; COVID-19 - metabolism ; COVID-19 - pathology ; COVID-19 - virology ; Data models ; Disease ; Genes ; Genome, Viral ; Genomes ; Genomics - methods ; Humans ; Information management ; Integration ; Internet ; Knowledge ; Knowledge Bases ; Knowledge bases (artificial intelligence) ; Knowledge representation ; Linked Data ; Medical literature ; Medical research ; Methodology ; Online databases ; Ontology ; Open Science ; Pandemics ; Proteins ; Proteomics - methods ; Resource Description Framework-RDF ; Respiratory diseases ; SARS-CoV-2 - genetics ; SARS-CoV-2 - physiology ; Semantic web ; Semantics ; Severe acute respiratory syndrome ; Severe acute respiratory syndrome coronavirus 2 ; ShEx ; Taxonomy ; Technology application ; Viral diseases ; Viral Proteins - genetics ; Viral Proteins - metabolism ; Viruses ; Web Ontology Language-OWL ; Wikidata ; Workflow</subject><ispartof>BMC biology, 2021-01, Vol.19 (1), p.12-12, Article 12</ispartof><rights>COPYRIGHT 2021 BioMed Central Ltd.</rights><rights>2021. This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>The Author(s) 2021</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c631t-2c32ed18a7dbae482ca323282a2668118dcc1ce79784f14c3ccea3ad820764ef3</citedby><cites>FETCH-LOGICAL-c631t-2c32ed18a7dbae482ca323282a2668118dcc1ce79784f14c3ccea3ad820764ef3</cites><orcidid>0000-0002-7699-8191 ; 0000-0002-0596-5376 ; 0000-0001-7542-0286 ; 0000-0002-9859-4104 ; 0000-0001-8172-8981 ; 0000-0001-8907-5348 ; 0000-0002-4346-6084 ; 0000-0001-9773-4008 ; 0000-0002-8666-7660 ; 0000-0002-4130-580X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.proquest.com/docview/2490883788?pq-origsite=primo$$EPDF$$P50$$Gproquest$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/2490883788?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,885,25753,27924,27925,37012,37013,38516,43895,44590,53791,53793,74412,75126</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/33482803$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Waagmeester, Andra</creatorcontrib><creatorcontrib>Willighagen, Egon L</creatorcontrib><creatorcontrib>Su, Andrew I</creatorcontrib><creatorcontrib>Kutmon, Martina</creatorcontrib><creatorcontrib>Gayo, Jose Emilio Labra</creatorcontrib><creatorcontrib>Fernández-Álvarez, Daniel</creatorcontrib><creatorcontrib>Groom, Quentin</creatorcontrib><creatorcontrib>Schaap, Peter J</creatorcontrib><creatorcontrib>Verhagen, Lisa M</creatorcontrib><creatorcontrib>Koehorst, Jasper J</creatorcontrib><title>A protocol for adding knowledge to Wikidata: aligning resources on human coronaviruses</title><title>BMC biology</title><addtitle>BMC Biol</addtitle><description>Pandemics, even more than other medical problems, require swift integration of knowledge. When caused by a new virus, understanding the underlying biology may help finding solutions. In a setting where there are a large number of loosely related projects and initiatives, we need common ground, also known as a "commons." Wikidata, a public knowledge graph aligned with Wikipedia, is such a commons and uses unique identifiers to link knowledge in other knowledge bases. However, Wikidata may not always have the right schema for the urgent questions. In this paper, we address this problem by showing how a data schema required for the integration can be modeled with entity schemas represented by Shape Expressions. As a telling example, we describe the process of aligning resources on the genomes and proteomes of the SARS-CoV-2 virus and related viruses as well as how Shape Expressions can be defined for Wikidata to model the knowledge, helping others studying the SARS-CoV-2 pandemic. How this model can be used to make data between various resources interoperable is demonstrated by integrating data from NCBI (National Center for Biotechnology Information) Taxonomy, NCBI Genes, UniProt, and WikiPathways. Based on that model, a set of automated applications or bots were written for regular updates of these sources in Wikidata and added to a platform for automatically running these updates. Although this workflow is developed and applied in the context of the COVID-19 pandemic, to demonstrate its broader applicability it was also applied to other human coronaviruses (MERS, SARS, human coronavirus NL63, human coronavirus 229E, human coronavirus HKU1, human coronavirus OC4).</description><subject>Automation</subject><subject>Biotechnology</subject><subject>Coronaviridae</subject><subject>Coronavirus - genetics</subject><subject>Coronavirus - physiology</subject><subject>Coronavirus Infections - metabolism</subject><subject>Coronavirus Infections - pathology</subject><subject>Coronavirus Infections - virology</subject><subject>Coronaviruses</subject><subject>COVID-19</subject><subject>COVID-19 - metabolism</subject><subject>COVID-19 - pathology</subject><subject>COVID-19 - virology</subject><subject>Data models</subject><subject>Disease</subject><subject>Genes</subject><subject>Genome, Viral</subject><subject>Genomes</subject><subject>Genomics - methods</subject><subject>Humans</subject><subject>Information management</subject><subject>Integration</subject><subject>Internet</subject><subject>Knowledge</subject><subject>Knowledge Bases</subject><subject>Knowledge bases (artificial intelligence)</subject><subject>Knowledge representation</subject><subject>Linked Data</subject><subject>Medical literature</subject><subject>Medical research</subject><subject>Methodology</subject><subject>Online databases</subject><subject>Ontology</subject><subject>Open Science</subject><subject>Pandemics</subject><subject>Proteins</subject><subject>Proteomics - methods</subject><subject>Resource Description Framework-RDF</subject><subject>Respiratory diseases</subject><subject>SARS-CoV-2 - genetics</subject><subject>SARS-CoV-2 - physiology</subject><subject>Semantic web</subject><subject>Semantics</subject><subject>Severe acute respiratory syndrome</subject><subject>Severe acute respiratory syndrome coronavirus 2</subject><subject>ShEx</subject><subject>Taxonomy</subject><subject>Technology application</subject><subject>Viral diseases</subject><subject>Viral Proteins - genetics</subject><subject>Viral Proteins - metabolism</subject><subject>Viruses</subject><subject>Web Ontology Language-OWL</subject><subject>Wikidata</subject><subject>Workflow</subject><issn>1741-7007</issn><issn>1741-7007</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>COVID</sourceid><sourceid>PIMPY</sourceid><sourceid>DOA</sourceid><recordid>eNqNkltrFDEUxwdRbLv6BXyQAV_0YWpuM8n4ICzFy0Kh4KU-hjO5TLOdTdZkpna_vdluXbrig4SQcPI7_5xz-BfFC4xOMRbN24RJi-sKEVQh1DJUbR4Vx5gzXHGE-OMH96PiJKUlQqTmnD4tjihlgghEj4vLebmOYQwqDKUNsQStne_Lax9-DUb3phxD-cNdOw0jvCthcL3fvkeTwhSVSWXw5dW0Al-qEIOHGxenZNKz4omFIZnn9-es-P7xw7ezz9X5xafF2fy8Ug3FY0UUJUZjAVx3YHJNCiihRBAgTSNyk1oprAxvuWAWM0WVMkBBC4J4w4yls2Kx09UBlnId3QriRgZw8i4QYi8hjk4NRtLOWkW1bXPnjDZd17ZYMdA1t7wzWXtWvN9praduZbQyfowwHIgevnh3JftwI3kup6ZtFnh9LxDDz8mkUa5cUmYYwJswJUmYQCTDrM7oq7_QZZ6nz6PKVIuEoDzvPdVDbsB5G_K_aisq502NWNuwdqt1-g8qL21WTgVvrMvxg4Q3BwmZGc3t2MOUklx8_fL_7MXlIUt2rIohpWjsfnYYya1j5c6xMjtW3jlWbnLSy4dT36f8sSj9DQNH5S8</recordid><startdate>20210122</startdate><enddate>20210122</enddate><creator>Waagmeester, Andra</creator><creator>Willighagen, Egon L</creator><creator>Su, Andrew I</creator><creator>Kutmon, Martina</creator><creator>Gayo, Jose Emilio Labra</creator><creator>Fernández-Álvarez, Daniel</creator><creator>Groom, Quentin</creator><creator>Schaap, Peter J</creator><creator>Verhagen, Lisa M</creator><creator>Koehorst, Jasper J</creator><general>BioMed Central Ltd</general><general>BioMed Central</general><general>BMC</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>IOV</scope><scope>ISR</scope><scope>3V.</scope><scope>4U-</scope><scope>7QG</scope><scope>7QP</scope><scope>7QR</scope><scope>7SN</scope><scope>7SS</scope><scope>7TK</scope><scope>7X7</scope><scope>7XB</scope><scope>88E</scope><scope>8FD</scope><scope>8FE</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>8G5</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BHPHI</scope><scope>C1K</scope><scope>CCPQU</scope><scope>COVID</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>GUQSH</scope><scope>HCIFZ</scope><scope>K9.</scope><scope>LK8</scope><scope>M0S</scope><scope>M1P</scope><scope>M2O</scope><scope>M7P</scope><scope>MBDVC</scope><scope>P64</scope><scope>PADUT</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope><scope>7X8</scope><scope>5PM</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0002-7699-8191</orcidid><orcidid>https://orcid.org/0000-0002-0596-5376</orcidid><orcidid>https://orcid.org/0000-0001-7542-0286</orcidid><orcidid>https://orcid.org/0000-0002-9859-4104</orcidid><orcidid>https://orcid.org/0000-0001-8172-8981</orcidid><orcidid>https://orcid.org/0000-0001-8907-5348</orcidid><orcidid>https://orcid.org/0000-0002-4346-6084</orcidid><orcidid>https://orcid.org/0000-0001-9773-4008</orcidid><orcidid>https://orcid.org/0000-0002-8666-7660</orcidid><orcidid>https://orcid.org/0000-0002-4130-580X</orcidid></search><sort><creationdate>20210122</creationdate><title>A protocol for adding knowledge to Wikidata: aligning resources on human coronaviruses</title><author>Waagmeester, Andra ; Willighagen, Egon L ; Su, Andrew I ; Kutmon, Martina ; Gayo, Jose Emilio Labra ; Fernández-Álvarez, Daniel ; Groom, Quentin ; Schaap, Peter J ; Verhagen, Lisa M ; Koehorst, Jasper J</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c631t-2c32ed18a7dbae482ca323282a2668118dcc1ce79784f14c3ccea3ad820764ef3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Automation</topic><topic>Biotechnology</topic><topic>Coronaviridae</topic><topic>Coronavirus - genetics</topic><topic>Coronavirus - physiology</topic><topic>Coronavirus Infections - metabolism</topic><topic>Coronavirus Infections - pathology</topic><topic>Coronavirus Infections - virology</topic><topic>Coronaviruses</topic><topic>COVID-19</topic><topic>COVID-19 - metabolism</topic><topic>COVID-19 - pathology</topic><topic>COVID-19 - virology</topic><topic>Data models</topic><topic>Disease</topic><topic>Genes</topic><topic>Genome, Viral</topic><topic>Genomes</topic><topic>Genomics - methods</topic><topic>Humans</topic><topic>Information management</topic><topic>Integration</topic><topic>Internet</topic><topic>Knowledge</topic><topic>Knowledge Bases</topic><topic>Knowledge bases (artificial intelligence)</topic><topic>Knowledge representation</topic><topic>Linked Data</topic><topic>Medical literature</topic><topic>Medical research</topic><topic>Methodology</topic><topic>Online databases</topic><topic>Ontology</topic><topic>Open Science</topic><topic>Pandemics</topic><topic>Proteins</topic><topic>Proteomics - methods</topic><topic>Resource Description Framework-RDF</topic><topic>Respiratory diseases</topic><topic>SARS-CoV-2 - genetics</topic><topic>SARS-CoV-2 - physiology</topic><topic>Semantic web</topic><topic>Semantics</topic><topic>Severe acute respiratory syndrome</topic><topic>Severe acute respiratory syndrome coronavirus 2</topic><topic>ShEx</topic><topic>Taxonomy</topic><topic>Technology application</topic><topic>Viral diseases</topic><topic>Viral Proteins - genetics</topic><topic>Viral Proteins - metabolism</topic><topic>Viruses</topic><topic>Web Ontology Language-OWL</topic><topic>Wikidata</topic><topic>Workflow</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Waagmeester, Andra</creatorcontrib><creatorcontrib>Willighagen, Egon L</creatorcontrib><creatorcontrib>Su, Andrew I</creatorcontrib><creatorcontrib>Kutmon, Martina</creatorcontrib><creatorcontrib>Gayo, Jose Emilio Labra</creatorcontrib><creatorcontrib>Fernández-Álvarez, Daniel</creatorcontrib><creatorcontrib>Groom, Quentin</creatorcontrib><creatorcontrib>Schaap, Peter J</creatorcontrib><creatorcontrib>Verhagen, Lisa M</creatorcontrib><creatorcontrib>Koehorst, Jasper J</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Gale in Context : Opposing Viewpoints</collection><collection>Science in Context</collection><collection>ProQuest Central (Corporate)</collection><collection>University Readers</collection><collection>Animal Behavior Abstracts</collection><collection>Calcium &amp; Calcified Tissue Abstracts</collection><collection>Chemoreception Abstracts</collection><collection>Ecology Abstracts</collection><collection>Entomology Abstracts (Full archive)</collection><collection>Neurosciences Abstracts</collection><collection>Health &amp; Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Medical Database (Alumni Edition)</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>Research Library (Alumni Edition)</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Natural Science Collection</collection><collection>Environmental Sciences and Pollution Management</collection><collection>ProQuest One Community College</collection><collection>Coronavirus Research Database</collection><collection>ProQuest Central Korea</collection><collection>Engineering Research Database</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>Research Library Prep</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>Biological Sciences</collection><collection>Health &amp; Medical Collection (Alumni Edition)</collection><collection>PML(ProQuest Medical Library)</collection><collection>ProQuest research library</collection><collection>Biological Science Database</collection><collection>Research Library (Corporate)</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Research Library China</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><collection>Directory of Open Access Journals</collection><jtitle>BMC biology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Waagmeester, Andra</au><au>Willighagen, Egon L</au><au>Su, Andrew I</au><au>Kutmon, Martina</au><au>Gayo, Jose Emilio Labra</au><au>Fernández-Álvarez, Daniel</au><au>Groom, Quentin</au><au>Schaap, Peter J</au><au>Verhagen, Lisa M</au><au>Koehorst, Jasper J</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A protocol for adding knowledge to Wikidata: aligning resources on human coronaviruses</atitle><jtitle>BMC biology</jtitle><addtitle>BMC Biol</addtitle><date>2021-01-22</date><risdate>2021</risdate><volume>19</volume><issue>1</issue><spage>12</spage><epage>12</epage><pages>12-12</pages><artnum>12</artnum><issn>1741-7007</issn><eissn>1741-7007</eissn><abstract>Pandemics, even more than other medical problems, require swift integration of knowledge. When caused by a new virus, understanding the underlying biology may help finding solutions. In a setting where there are a large number of loosely related projects and initiatives, we need common ground, also known as a "commons." Wikidata, a public knowledge graph aligned with Wikipedia, is such a commons and uses unique identifiers to link knowledge in other knowledge bases. However, Wikidata may not always have the right schema for the urgent questions. In this paper, we address this problem by showing how a data schema required for the integration can be modeled with entity schemas represented by Shape Expressions. As a telling example, we describe the process of aligning resources on the genomes and proteomes of the SARS-CoV-2 virus and related viruses as well as how Shape Expressions can be defined for Wikidata to model the knowledge, helping others studying the SARS-CoV-2 pandemic. How this model can be used to make data between various resources interoperable is demonstrated by integrating data from NCBI (National Center for Biotechnology Information) Taxonomy, NCBI Genes, UniProt, and WikiPathways. Based on that model, a set of automated applications or bots were written for regular updates of these sources in Wikidata and added to a platform for automatically running these updates. Although this workflow is developed and applied in the context of the COVID-19 pandemic, to demonstrate its broader applicability it was also applied to other human coronaviruses (MERS, SARS, human coronavirus NL63, human coronavirus 229E, human coronavirus HKU1, human coronavirus OC4).</abstract><cop>England</cop><pub>BioMed Central Ltd</pub><pmid>33482803</pmid><doi>10.1186/s12915-020-00940-y</doi><tpages>1</tpages><orcidid>https://orcid.org/0000-0002-7699-8191</orcidid><orcidid>https://orcid.org/0000-0002-0596-5376</orcidid><orcidid>https://orcid.org/0000-0001-7542-0286</orcidid><orcidid>https://orcid.org/0000-0002-9859-4104</orcidid><orcidid>https://orcid.org/0000-0001-8172-8981</orcidid><orcidid>https://orcid.org/0000-0001-8907-5348</orcidid><orcidid>https://orcid.org/0000-0002-4346-6084</orcidid><orcidid>https://orcid.org/0000-0001-9773-4008</orcidid><orcidid>https://orcid.org/0000-0002-8666-7660</orcidid><orcidid>https://orcid.org/0000-0002-4130-580X</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1741-7007
ispartof BMC biology, 2021-01, Vol.19 (1), p.12-12, Article 12
issn 1741-7007
1741-7007
language eng
recordid cdi_doaj_primary_oai_doaj_org_article_3bffc3df9828436bb991c4ad57f7be4c
source Publicly Available Content Database; PubMed Central; Coronavirus Research Database
subjects Automation
Biotechnology
Coronaviridae
Coronavirus - genetics
Coronavirus - physiology
Coronavirus Infections - metabolism
Coronavirus Infections - pathology
Coronavirus Infections - virology
Coronaviruses
COVID-19
COVID-19 - metabolism
COVID-19 - pathology
COVID-19 - virology
Data models
Disease
Genes
Genome, Viral
Genomes
Genomics - methods
Humans
Information management
Integration
Internet
Knowledge
Knowledge Bases
Knowledge bases (artificial intelligence)
Knowledge representation
Linked Data
Medical literature
Medical research
Methodology
Online databases
Ontology
Open Science
Pandemics
Proteins
Proteomics - methods
Resource Description Framework-RDF
Respiratory diseases
SARS-CoV-2 - genetics
SARS-CoV-2 - physiology
Semantic web
Semantics
Severe acute respiratory syndrome
Severe acute respiratory syndrome coronavirus 2
ShEx
Taxonomy
Technology application
Viral diseases
Viral Proteins - genetics
Viral Proteins - metabolism
Viruses
Web Ontology Language-OWL
Wikidata
Workflow
title A protocol for adding knowledge to Wikidata: aligning resources on human coronaviruses
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-23T17%3A39%3A02IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_doaj_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20protocol%20for%20adding%20knowledge%20to%20Wikidata:%20aligning%20resources%20on%20human%20coronaviruses&rft.jtitle=BMC%20biology&rft.au=Waagmeester,%20Andra&rft.date=2021-01-22&rft.volume=19&rft.issue=1&rft.spage=12&rft.epage=12&rft.pages=12-12&rft.artnum=12&rft.issn=1741-7007&rft.eissn=1741-7007&rft_id=info:doi/10.1186/s12915-020-00940-y&rft_dat=%3Cgale_doaj_%3EA650496495%3C/gale_doaj_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c631t-2c32ed18a7dbae482ca323282a2668118dcc1ce79784f14c3ccea3ad820764ef3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2490883788&rft_id=info:pmid/33482803&rft_galeid=A650496495&rfr_iscdi=true