Loading…
A protocol for adding knowledge to Wikidata: aligning resources on human coronaviruses
Pandemics, even more than other medical problems, require swift integration of knowledge. When caused by a new virus, understanding the underlying biology may help finding solutions. In a setting where there are a large number of loosely related projects and initiatives, we need common ground, also...
Saved in:
Published in: | BMC biology 2021-01, Vol.19 (1), p.12-12, Article 12 |
---|---|
Main Authors: | , , , , , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c631t-2c32ed18a7dbae482ca323282a2668118dcc1ce79784f14c3ccea3ad820764ef3 |
---|---|
cites | cdi_FETCH-LOGICAL-c631t-2c32ed18a7dbae482ca323282a2668118dcc1ce79784f14c3ccea3ad820764ef3 |
container_end_page | 12 |
container_issue | 1 |
container_start_page | 12 |
container_title | BMC biology |
container_volume | 19 |
creator | Waagmeester, Andra Willighagen, Egon L Su, Andrew I Kutmon, Martina Gayo, Jose Emilio Labra Fernández-Álvarez, Daniel Groom, Quentin Schaap, Peter J Verhagen, Lisa M Koehorst, Jasper J |
description | Pandemics, even more than other medical problems, require swift integration of knowledge. When caused by a new virus, understanding the underlying biology may help finding solutions. In a setting where there are a large number of loosely related projects and initiatives, we need common ground, also known as a "commons." Wikidata, a public knowledge graph aligned with Wikipedia, is such a commons and uses unique identifiers to link knowledge in other knowledge bases. However, Wikidata may not always have the right schema for the urgent questions. In this paper, we address this problem by showing how a data schema required for the integration can be modeled with entity schemas represented by Shape Expressions.
As a telling example, we describe the process of aligning resources on the genomes and proteomes of the SARS-CoV-2 virus and related viruses as well as how Shape Expressions can be defined for Wikidata to model the knowledge, helping others studying the SARS-CoV-2 pandemic. How this model can be used to make data between various resources interoperable is demonstrated by integrating data from NCBI (National Center for Biotechnology Information) Taxonomy, NCBI Genes, UniProt, and WikiPathways. Based on that model, a set of automated applications or bots were written for regular updates of these sources in Wikidata and added to a platform for automatically running these updates.
Although this workflow is developed and applied in the context of the COVID-19 pandemic, to demonstrate its broader applicability it was also applied to other human coronaviruses (MERS, SARS, human coronavirus NL63, human coronavirus 229E, human coronavirus HKU1, human coronavirus OC4). |
doi_str_mv | 10.1186/s12915-020-00940-y |
format | article |
fullrecord | <record><control><sourceid>gale_doaj_</sourceid><recordid>TN_cdi_doaj_primary_oai_doaj_org_article_3bffc3df9828436bb991c4ad57f7be4c</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A650496495</galeid><doaj_id>oai_doaj_org_article_3bffc3df9828436bb991c4ad57f7be4c</doaj_id><sourcerecordid>A650496495</sourcerecordid><originalsourceid>FETCH-LOGICAL-c631t-2c32ed18a7dbae482ca323282a2668118dcc1ce79784f14c3ccea3ad820764ef3</originalsourceid><addsrcrecordid>eNqNkltrFDEUxwdRbLv6BXyQAV_0YWpuM8n4ICzFy0Kh4KU-hjO5TLOdTdZkpna_vdluXbrig4SQcPI7_5xz-BfFC4xOMRbN24RJi-sKEVQh1DJUbR4Vx5gzXHGE-OMH96PiJKUlQqTmnD4tjihlgghEj4vLebmOYQwqDKUNsQStne_Lax9-DUb3phxD-cNdOw0jvCthcL3fvkeTwhSVSWXw5dW0Al-qEIOHGxenZNKz4omFIZnn9-es-P7xw7ezz9X5xafF2fy8Ug3FY0UUJUZjAVx3YHJNCiihRBAgTSNyk1oprAxvuWAWM0WVMkBBC4J4w4yls2Kx09UBlnId3QriRgZw8i4QYi8hjk4NRtLOWkW1bXPnjDZd17ZYMdA1t7wzWXtWvN9praduZbQyfowwHIgevnh3JftwI3kup6ZtFnh9LxDDz8mkUa5cUmYYwJswJUmYQCTDrM7oq7_QZZ6nz6PKVIuEoDzvPdVDbsB5G_K_aisq502NWNuwdqt1-g8qL21WTgVvrMvxg4Q3BwmZGc3t2MOUklx8_fL_7MXlIUt2rIohpWjsfnYYya1j5c6xMjtW3jlWbnLSy4dT36f8sSj9DQNH5S8</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2490883788</pqid></control><display><type>article</type><title>A protocol for adding knowledge to Wikidata: aligning resources on human coronaviruses</title><source>Publicly Available Content Database</source><source>PubMed Central</source><source>Coronavirus Research Database</source><creator>Waagmeester, Andra ; Willighagen, Egon L ; Su, Andrew I ; Kutmon, Martina ; Gayo, Jose Emilio Labra ; Fernández-Álvarez, Daniel ; Groom, Quentin ; Schaap, Peter J ; Verhagen, Lisa M ; Koehorst, Jasper J</creator><creatorcontrib>Waagmeester, Andra ; Willighagen, Egon L ; Su, Andrew I ; Kutmon, Martina ; Gayo, Jose Emilio Labra ; Fernández-Álvarez, Daniel ; Groom, Quentin ; Schaap, Peter J ; Verhagen, Lisa M ; Koehorst, Jasper J</creatorcontrib><description>Pandemics, even more than other medical problems, require swift integration of knowledge. When caused by a new virus, understanding the underlying biology may help finding solutions. In a setting where there are a large number of loosely related projects and initiatives, we need common ground, also known as a "commons." Wikidata, a public knowledge graph aligned with Wikipedia, is such a commons and uses unique identifiers to link knowledge in other knowledge bases. However, Wikidata may not always have the right schema for the urgent questions. In this paper, we address this problem by showing how a data schema required for the integration can be modeled with entity schemas represented by Shape Expressions.
As a telling example, we describe the process of aligning resources on the genomes and proteomes of the SARS-CoV-2 virus and related viruses as well as how Shape Expressions can be defined for Wikidata to model the knowledge, helping others studying the SARS-CoV-2 pandemic. How this model can be used to make data between various resources interoperable is demonstrated by integrating data from NCBI (National Center for Biotechnology Information) Taxonomy, NCBI Genes, UniProt, and WikiPathways. Based on that model, a set of automated applications or bots were written for regular updates of these sources in Wikidata and added to a platform for automatically running these updates.
Although this workflow is developed and applied in the context of the COVID-19 pandemic, to demonstrate its broader applicability it was also applied to other human coronaviruses (MERS, SARS, human coronavirus NL63, human coronavirus 229E, human coronavirus HKU1, human coronavirus OC4).</description><identifier>ISSN: 1741-7007</identifier><identifier>EISSN: 1741-7007</identifier><identifier>DOI: 10.1186/s12915-020-00940-y</identifier><identifier>PMID: 33482803</identifier><language>eng</language><publisher>England: BioMed Central Ltd</publisher><subject>Automation ; Biotechnology ; Coronaviridae ; Coronavirus - genetics ; Coronavirus - physiology ; Coronavirus Infections - metabolism ; Coronavirus Infections - pathology ; Coronavirus Infections - virology ; Coronaviruses ; COVID-19 ; COVID-19 - metabolism ; COVID-19 - pathology ; COVID-19 - virology ; Data models ; Disease ; Genes ; Genome, Viral ; Genomes ; Genomics - methods ; Humans ; Information management ; Integration ; Internet ; Knowledge ; Knowledge Bases ; Knowledge bases (artificial intelligence) ; Knowledge representation ; Linked Data ; Medical literature ; Medical research ; Methodology ; Online databases ; Ontology ; Open Science ; Pandemics ; Proteins ; Proteomics - methods ; Resource Description Framework-RDF ; Respiratory diseases ; SARS-CoV-2 - genetics ; SARS-CoV-2 - physiology ; Semantic web ; Semantics ; Severe acute respiratory syndrome ; Severe acute respiratory syndrome coronavirus 2 ; ShEx ; Taxonomy ; Technology application ; Viral diseases ; Viral Proteins - genetics ; Viral Proteins - metabolism ; Viruses ; Web Ontology Language-OWL ; Wikidata ; Workflow</subject><ispartof>BMC biology, 2021-01, Vol.19 (1), p.12-12, Article 12</ispartof><rights>COPYRIGHT 2021 BioMed Central Ltd.</rights><rights>2021. This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>The Author(s) 2021</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c631t-2c32ed18a7dbae482ca323282a2668118dcc1ce79784f14c3ccea3ad820764ef3</citedby><cites>FETCH-LOGICAL-c631t-2c32ed18a7dbae482ca323282a2668118dcc1ce79784f14c3ccea3ad820764ef3</cites><orcidid>0000-0002-7699-8191 ; 0000-0002-0596-5376 ; 0000-0001-7542-0286 ; 0000-0002-9859-4104 ; 0000-0001-8172-8981 ; 0000-0001-8907-5348 ; 0000-0002-4346-6084 ; 0000-0001-9773-4008 ; 0000-0002-8666-7660 ; 0000-0002-4130-580X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.proquest.com/docview/2490883788?pq-origsite=primo$$EPDF$$P50$$Gproquest$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/2490883788?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,885,25753,27924,27925,37012,37013,38516,43895,44590,53791,53793,74412,75126</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/33482803$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Waagmeester, Andra</creatorcontrib><creatorcontrib>Willighagen, Egon L</creatorcontrib><creatorcontrib>Su, Andrew I</creatorcontrib><creatorcontrib>Kutmon, Martina</creatorcontrib><creatorcontrib>Gayo, Jose Emilio Labra</creatorcontrib><creatorcontrib>Fernández-Álvarez, Daniel</creatorcontrib><creatorcontrib>Groom, Quentin</creatorcontrib><creatorcontrib>Schaap, Peter J</creatorcontrib><creatorcontrib>Verhagen, Lisa M</creatorcontrib><creatorcontrib>Koehorst, Jasper J</creatorcontrib><title>A protocol for adding knowledge to Wikidata: aligning resources on human coronaviruses</title><title>BMC biology</title><addtitle>BMC Biol</addtitle><description>Pandemics, even more than other medical problems, require swift integration of knowledge. When caused by a new virus, understanding the underlying biology may help finding solutions. In a setting where there are a large number of loosely related projects and initiatives, we need common ground, also known as a "commons." Wikidata, a public knowledge graph aligned with Wikipedia, is such a commons and uses unique identifiers to link knowledge in other knowledge bases. However, Wikidata may not always have the right schema for the urgent questions. In this paper, we address this problem by showing how a data schema required for the integration can be modeled with entity schemas represented by Shape Expressions.
As a telling example, we describe the process of aligning resources on the genomes and proteomes of the SARS-CoV-2 virus and related viruses as well as how Shape Expressions can be defined for Wikidata to model the knowledge, helping others studying the SARS-CoV-2 pandemic. How this model can be used to make data between various resources interoperable is demonstrated by integrating data from NCBI (National Center for Biotechnology Information) Taxonomy, NCBI Genes, UniProt, and WikiPathways. Based on that model, a set of automated applications or bots were written for regular updates of these sources in Wikidata and added to a platform for automatically running these updates.
Although this workflow is developed and applied in the context of the COVID-19 pandemic, to demonstrate its broader applicability it was also applied to other human coronaviruses (MERS, SARS, human coronavirus NL63, human coronavirus 229E, human coronavirus HKU1, human coronavirus OC4).</description><subject>Automation</subject><subject>Biotechnology</subject><subject>Coronaviridae</subject><subject>Coronavirus - genetics</subject><subject>Coronavirus - physiology</subject><subject>Coronavirus Infections - metabolism</subject><subject>Coronavirus Infections - pathology</subject><subject>Coronavirus Infections - virology</subject><subject>Coronaviruses</subject><subject>COVID-19</subject><subject>COVID-19 - metabolism</subject><subject>COVID-19 - pathology</subject><subject>COVID-19 - virology</subject><subject>Data models</subject><subject>Disease</subject><subject>Genes</subject><subject>Genome, Viral</subject><subject>Genomes</subject><subject>Genomics - methods</subject><subject>Humans</subject><subject>Information management</subject><subject>Integration</subject><subject>Internet</subject><subject>Knowledge</subject><subject>Knowledge Bases</subject><subject>Knowledge bases (artificial intelligence)</subject><subject>Knowledge representation</subject><subject>Linked Data</subject><subject>Medical literature</subject><subject>Medical research</subject><subject>Methodology</subject><subject>Online databases</subject><subject>Ontology</subject><subject>Open Science</subject><subject>Pandemics</subject><subject>Proteins</subject><subject>Proteomics - methods</subject><subject>Resource Description Framework-RDF</subject><subject>Respiratory diseases</subject><subject>SARS-CoV-2 - genetics</subject><subject>SARS-CoV-2 - physiology</subject><subject>Semantic web</subject><subject>Semantics</subject><subject>Severe acute respiratory syndrome</subject><subject>Severe acute respiratory syndrome coronavirus 2</subject><subject>ShEx</subject><subject>Taxonomy</subject><subject>Technology application</subject><subject>Viral diseases</subject><subject>Viral Proteins - genetics</subject><subject>Viral Proteins - metabolism</subject><subject>Viruses</subject><subject>Web Ontology Language-OWL</subject><subject>Wikidata</subject><subject>Workflow</subject><issn>1741-7007</issn><issn>1741-7007</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>COVID</sourceid><sourceid>PIMPY</sourceid><sourceid>DOA</sourceid><recordid>eNqNkltrFDEUxwdRbLv6BXyQAV_0YWpuM8n4ICzFy0Kh4KU-hjO5TLOdTdZkpna_vdluXbrig4SQcPI7_5xz-BfFC4xOMRbN24RJi-sKEVQh1DJUbR4Vx5gzXHGE-OMH96PiJKUlQqTmnD4tjihlgghEj4vLebmOYQwqDKUNsQStne_Lax9-DUb3phxD-cNdOw0jvCthcL3fvkeTwhSVSWXw5dW0Al-qEIOHGxenZNKz4omFIZnn9-es-P7xw7ezz9X5xafF2fy8Ug3FY0UUJUZjAVx3YHJNCiihRBAgTSNyk1oprAxvuWAWM0WVMkBBC4J4w4yls2Kx09UBlnId3QriRgZw8i4QYi8hjk4NRtLOWkW1bXPnjDZd17ZYMdA1t7wzWXtWvN9praduZbQyfowwHIgevnh3JftwI3kup6ZtFnh9LxDDz8mkUa5cUmYYwJswJUmYQCTDrM7oq7_QZZ6nz6PKVIuEoDzvPdVDbsB5G_K_aisq502NWNuwdqt1-g8qL21WTgVvrMvxg4Q3BwmZGc3t2MOUklx8_fL_7MXlIUt2rIohpWjsfnYYya1j5c6xMjtW3jlWbnLSy4dT36f8sSj9DQNH5S8</recordid><startdate>20210122</startdate><enddate>20210122</enddate><creator>Waagmeester, Andra</creator><creator>Willighagen, Egon L</creator><creator>Su, Andrew I</creator><creator>Kutmon, Martina</creator><creator>Gayo, Jose Emilio Labra</creator><creator>Fernández-Álvarez, Daniel</creator><creator>Groom, Quentin</creator><creator>Schaap, Peter J</creator><creator>Verhagen, Lisa M</creator><creator>Koehorst, Jasper J</creator><general>BioMed Central Ltd</general><general>BioMed Central</general><general>BMC</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>IOV</scope><scope>ISR</scope><scope>3V.</scope><scope>4U-</scope><scope>7QG</scope><scope>7QP</scope><scope>7QR</scope><scope>7SN</scope><scope>7SS</scope><scope>7TK</scope><scope>7X7</scope><scope>7XB</scope><scope>88E</scope><scope>8FD</scope><scope>8FE</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>8G5</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BHPHI</scope><scope>C1K</scope><scope>CCPQU</scope><scope>COVID</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>GUQSH</scope><scope>HCIFZ</scope><scope>K9.</scope><scope>LK8</scope><scope>M0S</scope><scope>M1P</scope><scope>M2O</scope><scope>M7P</scope><scope>MBDVC</scope><scope>P64</scope><scope>PADUT</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope><scope>7X8</scope><scope>5PM</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0002-7699-8191</orcidid><orcidid>https://orcid.org/0000-0002-0596-5376</orcidid><orcidid>https://orcid.org/0000-0001-7542-0286</orcidid><orcidid>https://orcid.org/0000-0002-9859-4104</orcidid><orcidid>https://orcid.org/0000-0001-8172-8981</orcidid><orcidid>https://orcid.org/0000-0001-8907-5348</orcidid><orcidid>https://orcid.org/0000-0002-4346-6084</orcidid><orcidid>https://orcid.org/0000-0001-9773-4008</orcidid><orcidid>https://orcid.org/0000-0002-8666-7660</orcidid><orcidid>https://orcid.org/0000-0002-4130-580X</orcidid></search><sort><creationdate>20210122</creationdate><title>A protocol for adding knowledge to Wikidata: aligning resources on human coronaviruses</title><author>Waagmeester, Andra ; Willighagen, Egon L ; Su, Andrew I ; Kutmon, Martina ; Gayo, Jose Emilio Labra ; Fernández-Álvarez, Daniel ; Groom, Quentin ; Schaap, Peter J ; Verhagen, Lisa M ; Koehorst, Jasper J</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c631t-2c32ed18a7dbae482ca323282a2668118dcc1ce79784f14c3ccea3ad820764ef3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Automation</topic><topic>Biotechnology</topic><topic>Coronaviridae</topic><topic>Coronavirus - genetics</topic><topic>Coronavirus - physiology</topic><topic>Coronavirus Infections - metabolism</topic><topic>Coronavirus Infections - pathology</topic><topic>Coronavirus Infections - virology</topic><topic>Coronaviruses</topic><topic>COVID-19</topic><topic>COVID-19 - metabolism</topic><topic>COVID-19 - pathology</topic><topic>COVID-19 - virology</topic><topic>Data models</topic><topic>Disease</topic><topic>Genes</topic><topic>Genome, Viral</topic><topic>Genomes</topic><topic>Genomics - methods</topic><topic>Humans</topic><topic>Information management</topic><topic>Integration</topic><topic>Internet</topic><topic>Knowledge</topic><topic>Knowledge Bases</topic><topic>Knowledge bases (artificial intelligence)</topic><topic>Knowledge representation</topic><topic>Linked Data</topic><topic>Medical literature</topic><topic>Medical research</topic><topic>Methodology</topic><topic>Online databases</topic><topic>Ontology</topic><topic>Open Science</topic><topic>Pandemics</topic><topic>Proteins</topic><topic>Proteomics - methods</topic><topic>Resource Description Framework-RDF</topic><topic>Respiratory diseases</topic><topic>SARS-CoV-2 - genetics</topic><topic>SARS-CoV-2 - physiology</topic><topic>Semantic web</topic><topic>Semantics</topic><topic>Severe acute respiratory syndrome</topic><topic>Severe acute respiratory syndrome coronavirus 2</topic><topic>ShEx</topic><topic>Taxonomy</topic><topic>Technology application</topic><topic>Viral diseases</topic><topic>Viral Proteins - genetics</topic><topic>Viral Proteins - metabolism</topic><topic>Viruses</topic><topic>Web Ontology Language-OWL</topic><topic>Wikidata</topic><topic>Workflow</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Waagmeester, Andra</creatorcontrib><creatorcontrib>Willighagen, Egon L</creatorcontrib><creatorcontrib>Su, Andrew I</creatorcontrib><creatorcontrib>Kutmon, Martina</creatorcontrib><creatorcontrib>Gayo, Jose Emilio Labra</creatorcontrib><creatorcontrib>Fernández-Álvarez, Daniel</creatorcontrib><creatorcontrib>Groom, Quentin</creatorcontrib><creatorcontrib>Schaap, Peter J</creatorcontrib><creatorcontrib>Verhagen, Lisa M</creatorcontrib><creatorcontrib>Koehorst, Jasper J</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Gale in Context : Opposing Viewpoints</collection><collection>Science in Context</collection><collection>ProQuest Central (Corporate)</collection><collection>University Readers</collection><collection>Animal Behavior Abstracts</collection><collection>Calcium & Calcified Tissue Abstracts</collection><collection>Chemoreception Abstracts</collection><collection>Ecology Abstracts</collection><collection>Entomology Abstracts (Full archive)</collection><collection>Neurosciences Abstracts</collection><collection>Health & Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Medical Database (Alumni Edition)</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>Research Library (Alumni Edition)</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Natural Science Collection</collection><collection>Environmental Sciences and Pollution Management</collection><collection>ProQuest One Community College</collection><collection>Coronavirus Research Database</collection><collection>ProQuest Central Korea</collection><collection>Engineering Research Database</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>Research Library Prep</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>Biological Sciences</collection><collection>Health & Medical Collection (Alumni Edition)</collection><collection>PML(ProQuest Medical Library)</collection><collection>ProQuest research library</collection><collection>Biological Science Database</collection><collection>Research Library (Corporate)</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Research Library China</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><collection>Directory of Open Access Journals</collection><jtitle>BMC biology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Waagmeester, Andra</au><au>Willighagen, Egon L</au><au>Su, Andrew I</au><au>Kutmon, Martina</au><au>Gayo, Jose Emilio Labra</au><au>Fernández-Álvarez, Daniel</au><au>Groom, Quentin</au><au>Schaap, Peter J</au><au>Verhagen, Lisa M</au><au>Koehorst, Jasper J</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A protocol for adding knowledge to Wikidata: aligning resources on human coronaviruses</atitle><jtitle>BMC biology</jtitle><addtitle>BMC Biol</addtitle><date>2021-01-22</date><risdate>2021</risdate><volume>19</volume><issue>1</issue><spage>12</spage><epage>12</epage><pages>12-12</pages><artnum>12</artnum><issn>1741-7007</issn><eissn>1741-7007</eissn><abstract>Pandemics, even more than other medical problems, require swift integration of knowledge. When caused by a new virus, understanding the underlying biology may help finding solutions. In a setting where there are a large number of loosely related projects and initiatives, we need common ground, also known as a "commons." Wikidata, a public knowledge graph aligned with Wikipedia, is such a commons and uses unique identifiers to link knowledge in other knowledge bases. However, Wikidata may not always have the right schema for the urgent questions. In this paper, we address this problem by showing how a data schema required for the integration can be modeled with entity schemas represented by Shape Expressions.
As a telling example, we describe the process of aligning resources on the genomes and proteomes of the SARS-CoV-2 virus and related viruses as well as how Shape Expressions can be defined for Wikidata to model the knowledge, helping others studying the SARS-CoV-2 pandemic. How this model can be used to make data between various resources interoperable is demonstrated by integrating data from NCBI (National Center for Biotechnology Information) Taxonomy, NCBI Genes, UniProt, and WikiPathways. Based on that model, a set of automated applications or bots were written for regular updates of these sources in Wikidata and added to a platform for automatically running these updates.
Although this workflow is developed and applied in the context of the COVID-19 pandemic, to demonstrate its broader applicability it was also applied to other human coronaviruses (MERS, SARS, human coronavirus NL63, human coronavirus 229E, human coronavirus HKU1, human coronavirus OC4).</abstract><cop>England</cop><pub>BioMed Central Ltd</pub><pmid>33482803</pmid><doi>10.1186/s12915-020-00940-y</doi><tpages>1</tpages><orcidid>https://orcid.org/0000-0002-7699-8191</orcidid><orcidid>https://orcid.org/0000-0002-0596-5376</orcidid><orcidid>https://orcid.org/0000-0001-7542-0286</orcidid><orcidid>https://orcid.org/0000-0002-9859-4104</orcidid><orcidid>https://orcid.org/0000-0001-8172-8981</orcidid><orcidid>https://orcid.org/0000-0001-8907-5348</orcidid><orcidid>https://orcid.org/0000-0002-4346-6084</orcidid><orcidid>https://orcid.org/0000-0001-9773-4008</orcidid><orcidid>https://orcid.org/0000-0002-8666-7660</orcidid><orcidid>https://orcid.org/0000-0002-4130-580X</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1741-7007 |
ispartof | BMC biology, 2021-01, Vol.19 (1), p.12-12, Article 12 |
issn | 1741-7007 1741-7007 |
language | eng |
recordid | cdi_doaj_primary_oai_doaj_org_article_3bffc3df9828436bb991c4ad57f7be4c |
source | Publicly Available Content Database; PubMed Central; Coronavirus Research Database |
subjects | Automation Biotechnology Coronaviridae Coronavirus - genetics Coronavirus - physiology Coronavirus Infections - metabolism Coronavirus Infections - pathology Coronavirus Infections - virology Coronaviruses COVID-19 COVID-19 - metabolism COVID-19 - pathology COVID-19 - virology Data models Disease Genes Genome, Viral Genomes Genomics - methods Humans Information management Integration Internet Knowledge Knowledge Bases Knowledge bases (artificial intelligence) Knowledge representation Linked Data Medical literature Medical research Methodology Online databases Ontology Open Science Pandemics Proteins Proteomics - methods Resource Description Framework-RDF Respiratory diseases SARS-CoV-2 - genetics SARS-CoV-2 - physiology Semantic web Semantics Severe acute respiratory syndrome Severe acute respiratory syndrome coronavirus 2 ShEx Taxonomy Technology application Viral diseases Viral Proteins - genetics Viral Proteins - metabolism Viruses Web Ontology Language-OWL Wikidata Workflow |
title | A protocol for adding knowledge to Wikidata: aligning resources on human coronaviruses |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-23T17%3A39%3A02IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_doaj_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20protocol%20for%20adding%20knowledge%20to%20Wikidata:%20aligning%20resources%20on%20human%20coronaviruses&rft.jtitle=BMC%20biology&rft.au=Waagmeester,%20Andra&rft.date=2021-01-22&rft.volume=19&rft.issue=1&rft.spage=12&rft.epage=12&rft.pages=12-12&rft.artnum=12&rft.issn=1741-7007&rft.eissn=1741-7007&rft_id=info:doi/10.1186/s12915-020-00940-y&rft_dat=%3Cgale_doaj_%3EA650496495%3C/gale_doaj_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c631t-2c32ed18a7dbae482ca323282a2668118dcc1ce79784f14c3ccea3ad820764ef3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2490883788&rft_id=info:pmid/33482803&rft_galeid=A650496495&rfr_iscdi=true |