Loading…

Computational graph pangenomics: a tutorial on data structures and their applications

Computational pangenomics is an emerging research field that is changing the way computer scientists are facing challenges in biological sequence analysis. In past decades, contributions from combinatorics, stringology, graph theory and data structures were essential in the development of a plethora...

Full description

Saved in:
Bibliographic Details
Published in:Natural computing 2022-03, Vol.21 (1), p.81-108
Main Authors: Baaijens, Jasmijn A., Bonizzoni, Paola, Boucher, Christina, Della Vedova, Gianluca, Pirola, Yuri, Rizzi, Raffaella, Sirén, Jouni
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c419t-d944d8770595534778759f9b3a70bdae3a7416aaff2d5af5d32c83b338e74b823
cites cdi_FETCH-LOGICAL-c419t-d944d8770595534778759f9b3a70bdae3a7416aaff2d5af5d32c83b338e74b823
container_end_page 108
container_issue 1
container_start_page 81
container_title Natural computing
container_volume 21
creator Baaijens, Jasmijn A.
Bonizzoni, Paola
Boucher, Christina
Della Vedova, Gianluca
Pirola, Yuri
Rizzi, Raffaella
Sirén, Jouni
description Computational pangenomics is an emerging research field that is changing the way computer scientists are facing challenges in biological sequence analysis. In past decades, contributions from combinatorics, stringology, graph theory and data structures were essential in the development of a plethora of software tools for the analysis of the human genome. These tools allowed computational biologists to approach ambitious projects at population scale, such as the 1000 Genomes Project. A major contribution of the 1000 Genomes Project is the characterization of a broad spectrum of genetic variations in the human genome, including the discovery of novel variations in the South Asian, African and European populations—thus enhancing the catalogue of variability within the reference genome. Currently, the need to take into account the high variability in population genomes as well as the specificity of an individual genome in a personalized approach to medicine is rapidly pushing the abandonment of the traditional paradigm of using a single reference genome. A graph-based representation of multiple genomes, or a graph pangenome , is replacing the linear reference genome. This means completely rethinking well-established procedures to analyze, store, and access information from genome representations. Properly addressing these challenges is crucial to face the computational tasks of ambitious healthcare projects aiming to characterize human diversity by sequencing 1M individuals (Stark et al. 2019 ). This tutorial aims to introduce readers to the most recent advances in the theory of data structures for the representation of graph pangenomes. We discuss efficient representations of haplotypes and the variability of genotypes in graph pangenomes, and highlight applications in solving computational problems in human and microbial (viral) pangenomes.
doi_str_mv 10.1007/s11047-022-09882-6
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_2791705325</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2640562097</sourcerecordid><originalsourceid>FETCH-LOGICAL-c419t-d944d8770595534778759f9b3a70bdae3a7416aaff2d5af5d32c83b338e74b823</originalsourceid><addsrcrecordid>eNp9kLlOxDAQhi0E4lh4AQpkiYYm4CO-6NCKS0KiYWtrkjgQlMTBdgreHrPLIVFQzUjzzT-aD6FjSs4pIeoiUkpKVRDGCmK0ZoXcQvtUKFYYZeT2Zy9VoTTVe-ggxldCGBWC7qI9Lo00iqt9tFr6YZoTpM6P0OPnANMLnmB8dqMfujpeYsBpTj50eepH3EACHFOY6zQHFzGMDU4vrgsYpqnv6nVQPEQ7LfTRHX3VBVrdXD8t74qHx9v75dVDUZfUpKIxZdlopYgwQvBSKa2EaU3FQZGqAZdrSSVA27JGQCsazmrNK861U2WlGV-gs03uFPzb7GKyQxdr1_cwOj9Hy5ShOZ0zkdHTP-irn0P-OVOyJEIyko0sENtQdfAxBtfaKXQDhHdLif2UbjfSbZZu19KtzEsnX9FzNbjmZ-Xbcgb4Boh5lNWG39v_xH4AOcOMuA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2640562097</pqid></control><display><type>article</type><title>Computational graph pangenomics: a tutorial on data structures and their applications</title><source>Springer Nature</source><creator>Baaijens, Jasmijn A. ; Bonizzoni, Paola ; Boucher, Christina ; Della Vedova, Gianluca ; Pirola, Yuri ; Rizzi, Raffaella ; Sirén, Jouni</creator><creatorcontrib>Baaijens, Jasmijn A. ; Bonizzoni, Paola ; Boucher, Christina ; Della Vedova, Gianluca ; Pirola, Yuri ; Rizzi, Raffaella ; Sirén, Jouni</creatorcontrib><description>Computational pangenomics is an emerging research field that is changing the way computer scientists are facing challenges in biological sequence analysis. In past decades, contributions from combinatorics, stringology, graph theory and data structures were essential in the development of a plethora of software tools for the analysis of the human genome. These tools allowed computational biologists to approach ambitious projects at population scale, such as the 1000 Genomes Project. A major contribution of the 1000 Genomes Project is the characterization of a broad spectrum of genetic variations in the human genome, including the discovery of novel variations in the South Asian, African and European populations—thus enhancing the catalogue of variability within the reference genome. Currently, the need to take into account the high variability in population genomes as well as the specificity of an individual genome in a personalized approach to medicine is rapidly pushing the abandonment of the traditional paradigm of using a single reference genome. A graph-based representation of multiple genomes, or a graph pangenome , is replacing the linear reference genome. This means completely rethinking well-established procedures to analyze, store, and access information from genome representations. Properly addressing these challenges is crucial to face the computational tasks of ambitious healthcare projects aiming to characterize human diversity by sequencing 1M individuals (Stark et al. 2019 ). This tutorial aims to introduce readers to the most recent advances in the theory of data structures for the representation of graph pangenomes. We discuss efficient representations of haplotypes and the variability of genotypes in graph pangenomes, and highlight applications in solving computational problems in human and microbial (viral) pangenomes.</description><identifier>ISSN: 1567-7818</identifier><identifier>EISSN: 1572-9796</identifier><identifier>DOI: 10.1007/s11047-022-09882-6</identifier><identifier>PMID: 36969737</identifier><language>eng</language><publisher>Dordrecht: Springer Netherlands</publisher><subject>Artificial Intelligence ; Combinatorial analysis ; Complex Systems ; Computer Science ; Data structures ; Evolutionary Biology ; Fields (mathematics) ; Genomes ; Graph theory ; Graphical representations ; Microorganisms ; Processor Architectures ; Software ; Software development tools ; Theory of Computation</subject><ispartof>Natural computing, 2022-03, Vol.21 (1), p.81-108</ispartof><rights>The Author(s) 2022. corrected publication 2022</rights><rights>The Author(s) 2022. corrected publication 2022. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c419t-d944d8770595534778759f9b3a70bdae3a7416aaff2d5af5d32c83b338e74b823</citedby><cites>FETCH-LOGICAL-c419t-d944d8770595534778759f9b3a70bdae3a7416aaff2d5af5d32c83b338e74b823</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/36969737$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Baaijens, Jasmijn A.</creatorcontrib><creatorcontrib>Bonizzoni, Paola</creatorcontrib><creatorcontrib>Boucher, Christina</creatorcontrib><creatorcontrib>Della Vedova, Gianluca</creatorcontrib><creatorcontrib>Pirola, Yuri</creatorcontrib><creatorcontrib>Rizzi, Raffaella</creatorcontrib><creatorcontrib>Sirén, Jouni</creatorcontrib><title>Computational graph pangenomics: a tutorial on data structures and their applications</title><title>Natural computing</title><addtitle>Nat Comput</addtitle><addtitle>Nat Comput</addtitle><description>Computational pangenomics is an emerging research field that is changing the way computer scientists are facing challenges in biological sequence analysis. In past decades, contributions from combinatorics, stringology, graph theory and data structures were essential in the development of a plethora of software tools for the analysis of the human genome. These tools allowed computational biologists to approach ambitious projects at population scale, such as the 1000 Genomes Project. A major contribution of the 1000 Genomes Project is the characterization of a broad spectrum of genetic variations in the human genome, including the discovery of novel variations in the South Asian, African and European populations—thus enhancing the catalogue of variability within the reference genome. Currently, the need to take into account the high variability in population genomes as well as the specificity of an individual genome in a personalized approach to medicine is rapidly pushing the abandonment of the traditional paradigm of using a single reference genome. A graph-based representation of multiple genomes, or a graph pangenome , is replacing the linear reference genome. This means completely rethinking well-established procedures to analyze, store, and access information from genome representations. Properly addressing these challenges is crucial to face the computational tasks of ambitious healthcare projects aiming to characterize human diversity by sequencing 1M individuals (Stark et al. 2019 ). This tutorial aims to introduce readers to the most recent advances in the theory of data structures for the representation of graph pangenomes. We discuss efficient representations of haplotypes and the variability of genotypes in graph pangenomes, and highlight applications in solving computational problems in human and microbial (viral) pangenomes.</description><subject>Artificial Intelligence</subject><subject>Combinatorial analysis</subject><subject>Complex Systems</subject><subject>Computer Science</subject><subject>Data structures</subject><subject>Evolutionary Biology</subject><subject>Fields (mathematics)</subject><subject>Genomes</subject><subject>Graph theory</subject><subject>Graphical representations</subject><subject>Microorganisms</subject><subject>Processor Architectures</subject><subject>Software</subject><subject>Software development tools</subject><subject>Theory of Computation</subject><issn>1567-7818</issn><issn>1572-9796</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNp9kLlOxDAQhi0E4lh4AQpkiYYm4CO-6NCKS0KiYWtrkjgQlMTBdgreHrPLIVFQzUjzzT-aD6FjSs4pIeoiUkpKVRDGCmK0ZoXcQvtUKFYYZeT2Zy9VoTTVe-ggxldCGBWC7qI9Lo00iqt9tFr6YZoTpM6P0OPnANMLnmB8dqMfujpeYsBpTj50eepH3EACHFOY6zQHFzGMDU4vrgsYpqnv6nVQPEQ7LfTRHX3VBVrdXD8t74qHx9v75dVDUZfUpKIxZdlopYgwQvBSKa2EaU3FQZGqAZdrSSVA27JGQCsazmrNK861U2WlGV-gs03uFPzb7GKyQxdr1_cwOj9Hy5ShOZ0zkdHTP-irn0P-OVOyJEIyko0sENtQdfAxBtfaKXQDhHdLif2UbjfSbZZu19KtzEsnX9FzNbjmZ-Xbcgb4Boh5lNWG39v_xH4AOcOMuA</recordid><startdate>202203</startdate><enddate>202203</enddate><creator>Baaijens, Jasmijn A.</creator><creator>Bonizzoni, Paola</creator><creator>Boucher, Christina</creator><creator>Della Vedova, Gianluca</creator><creator>Pirola, Yuri</creator><creator>Rizzi, Raffaella</creator><creator>Sirén, Jouni</creator><general>Springer Netherlands</general><general>Springer Nature B.V</general><scope>C6C</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7SC</scope><scope>7XB</scope><scope>88I</scope><scope>8AL</scope><scope>8AO</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M0N</scope><scope>M2P</scope><scope>P5Z</scope><scope>P62</scope><scope>PHGZM</scope><scope>PHGZT</scope><scope>PKEHL</scope><scope>PQEST</scope><scope>PQGLB</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope><scope>7X8</scope></search><sort><creationdate>202203</creationdate><title>Computational graph pangenomics: a tutorial on data structures and their applications</title><author>Baaijens, Jasmijn A. ; Bonizzoni, Paola ; Boucher, Christina ; Della Vedova, Gianluca ; Pirola, Yuri ; Rizzi, Raffaella ; Sirén, Jouni</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c419t-d944d8770595534778759f9b3a70bdae3a7416aaff2d5af5d32c83b338e74b823</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Artificial Intelligence</topic><topic>Combinatorial analysis</topic><topic>Complex Systems</topic><topic>Computer Science</topic><topic>Data structures</topic><topic>Evolutionary Biology</topic><topic>Fields (mathematics)</topic><topic>Genomes</topic><topic>Graph theory</topic><topic>Graphical representations</topic><topic>Microorganisms</topic><topic>Processor Architectures</topic><topic>Software</topic><topic>Software development tools</topic><topic>Theory of Computation</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Baaijens, Jasmijn A.</creatorcontrib><creatorcontrib>Bonizzoni, Paola</creatorcontrib><creatorcontrib>Boucher, Christina</creatorcontrib><creatorcontrib>Della Vedova, Gianluca</creatorcontrib><creatorcontrib>Pirola, Yuri</creatorcontrib><creatorcontrib>Rizzi, Raffaella</creatorcontrib><creatorcontrib>Sirén, Jouni</creatorcontrib><collection>Springer Nature OA Free Journals</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Computer and Information Systems Abstracts</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Science Database (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Computing Database</collection><collection>Science Database</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest Central (New)</collection><collection>ProQuest One Academic (New)</collection><collection>ProQuest One Academic Middle East (New)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Applied &amp; Life Sciences</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><collection>MEDLINE - Academic</collection><jtitle>Natural computing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Baaijens, Jasmijn A.</au><au>Bonizzoni, Paola</au><au>Boucher, Christina</au><au>Della Vedova, Gianluca</au><au>Pirola, Yuri</au><au>Rizzi, Raffaella</au><au>Sirén, Jouni</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Computational graph pangenomics: a tutorial on data structures and their applications</atitle><jtitle>Natural computing</jtitle><stitle>Nat Comput</stitle><addtitle>Nat Comput</addtitle><date>2022-03</date><risdate>2022</risdate><volume>21</volume><issue>1</issue><spage>81</spage><epage>108</epage><pages>81-108</pages><issn>1567-7818</issn><eissn>1572-9796</eissn><abstract>Computational pangenomics is an emerging research field that is changing the way computer scientists are facing challenges in biological sequence analysis. In past decades, contributions from combinatorics, stringology, graph theory and data structures were essential in the development of a plethora of software tools for the analysis of the human genome. These tools allowed computational biologists to approach ambitious projects at population scale, such as the 1000 Genomes Project. A major contribution of the 1000 Genomes Project is the characterization of a broad spectrum of genetic variations in the human genome, including the discovery of novel variations in the South Asian, African and European populations—thus enhancing the catalogue of variability within the reference genome. Currently, the need to take into account the high variability in population genomes as well as the specificity of an individual genome in a personalized approach to medicine is rapidly pushing the abandonment of the traditional paradigm of using a single reference genome. A graph-based representation of multiple genomes, or a graph pangenome , is replacing the linear reference genome. This means completely rethinking well-established procedures to analyze, store, and access information from genome representations. Properly addressing these challenges is crucial to face the computational tasks of ambitious healthcare projects aiming to characterize human diversity by sequencing 1M individuals (Stark et al. 2019 ). This tutorial aims to introduce readers to the most recent advances in the theory of data structures for the representation of graph pangenomes. We discuss efficient representations of haplotypes and the variability of genotypes in graph pangenomes, and highlight applications in solving computational problems in human and microbial (viral) pangenomes.</abstract><cop>Dordrecht</cop><pub>Springer Netherlands</pub><pmid>36969737</pmid><doi>10.1007/s11047-022-09882-6</doi><tpages>28</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1567-7818
ispartof Natural computing, 2022-03, Vol.21 (1), p.81-108
issn 1567-7818
1572-9796
language eng
recordid cdi_proquest_miscellaneous_2791705325
source Springer Nature
subjects Artificial Intelligence
Combinatorial analysis
Complex Systems
Computer Science
Data structures
Evolutionary Biology
Fields (mathematics)
Genomes
Graph theory
Graphical representations
Microorganisms
Processor Architectures
Software
Software development tools
Theory of Computation
title Computational graph pangenomics: a tutorial on data structures and their applications
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-03-10T01%3A00%3A42IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Computational%20graph%20pangenomics:%20a%20tutorial%20on%20data%20structures%20and%20their%20applications&rft.jtitle=Natural%20computing&rft.au=Baaijens,%20Jasmijn%20A.&rft.date=2022-03&rft.volume=21&rft.issue=1&rft.spage=81&rft.epage=108&rft.pages=81-108&rft.issn=1567-7818&rft.eissn=1572-9796&rft_id=info:doi/10.1007/s11047-022-09882-6&rft_dat=%3Cproquest_cross%3E2640562097%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c419t-d944d8770595534778759f9b3a70bdae3a7416aaff2d5af5d32c83b338e74b823%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2640562097&rft_id=info:pmid/36969737&rfr_iscdi=true