Loading…

ngsJulia: population genetic analysis of next-generation DNA sequencing data with Julia language [version 1; peer review: 1 approved with reservations]

A sound analysis of DNA sequencing data is important to extract meaningful information and infer quantities of interest. Sequencing and mapping errors coupled with low and variable coverage hamper the identification of genotypes and variants and the estimation of population genetic parameters. Metho...

Full description

Saved in:
Bibliographic Details
Published in:F1000 research 2022, Vol.11, p.126
Main Authors: Mas-Sandoval, Alex, Jin, Chenyu, Fracassetti, Marco, Fumagalli, Matteo
Format: Article
Language:English
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites cdi_FETCH-LOGICAL-c1491-864927baafef2aefbd0042032401270f0af10cba1d0bc7c1e413370942cf88933
container_end_page
container_issue
container_start_page 126
container_title F1000 research
container_volume 11
creator Mas-Sandoval, Alex
Jin, Chenyu
Fracassetti, Marco
Fumagalli, Matteo
description A sound analysis of DNA sequencing data is important to extract meaningful information and infer quantities of interest. Sequencing and mapping errors coupled with low and variable coverage hamper the identification of genotypes and variants and the estimation of population genetic parameters. Methods and implementations to estimate population genetic parameters from sequencing data available nowadays either are suitable for the analysis of genomes from model organisms only, require moderate sequencing coverage, or are not easily adaptable to specific applications. To address these issues, we introduce ngsJulia, a collection of templates and functions in Julia language to process short-read sequencing data for population genetic analysis. We further describe two implementations, ngsPool and ngsPloidy, for the analysis of pooled sequencing data and polyploid genomes, respectively. Through simulations, we illustrate the performance of estimating various population genetic parameters using these implementations, using both established and novel statistical methods. These results inform on optimal experimental design and demonstrate the applicabil- ity of methods in ngsJulia to estimate parameters of interest even from low coverage sequencing data. ngsJulia provide users with a flexible and efficient framework for ad hoc analysis of sequencing data.ngsJulia is available from: https://github.com/mfumagalli/ngsJulia
doi_str_mv 10.12688/f1000research.104368.1
format article
fullrecord <record><control><sourceid>faculty1000_cross</sourceid><recordid>TN_cdi_crossref_primary_10_12688_f1000research_104368_1</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_12688_f1000research_104368_1</sourcerecordid><originalsourceid>FETCH-LOGICAL-c1491-864927baafef2aefbd0042032401270f0af10cba1d0bc7c1e413370942cf88933</originalsourceid><addsrcrecordid>eNp9kMtOwzAQRS0EElXpN-AfSBk7Vh7tqipvVbCBFULRxBmnQSEJdtLSL-F3aRoWsGE1I809V5rD2LmAqZBBFF0YAQCWHKHV66kA5QfRVByxkQQVeEKBPP61n7KJc297AuLYD2Q4Yl9V7u67ssAZb-qmK7Et6ornVFFbaI4VljtXOF4bXtFn6_UHO2QuHxbc0UdHlS6qnGfYIt8W7Zof6niJVd5hTvxlQ9b1gJjzhshyS5uCtjMuODaNrTeUDVz_hd0cyt3rGTsxWDqa_Mwxe76-elreeqvHm7vlYuVpoWLhRYGKZZgiGjISyaQZgJLgSwVChmAA94J0iiKDVIdakBK-H0KspDZRFPv-mIVDr7a1c5ZM0tjiHe0uEZAcFCd_FCeD4kTsyflAGtRd2e76VPIr9j_9DUHUh4k</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>ngsJulia: population genetic analysis of next-generation DNA sequencing data with Julia language [version 1; peer review: 1 approved with reservations]</title><source>Publicly Available Content Database</source><source>PubMed Central</source><creator>Mas-Sandoval, Alex ; Jin, Chenyu ; Fracassetti, Marco ; Fumagalli, Matteo</creator><creatorcontrib>Mas-Sandoval, Alex ; Jin, Chenyu ; Fracassetti, Marco ; Fumagalli, Matteo</creatorcontrib><description>A sound analysis of DNA sequencing data is important to extract meaningful information and infer quantities of interest. Sequencing and mapping errors coupled with low and variable coverage hamper the identification of genotypes and variants and the estimation of population genetic parameters. Methods and implementations to estimate population genetic parameters from sequencing data available nowadays either are suitable for the analysis of genomes from model organisms only, require moderate sequencing coverage, or are not easily adaptable to specific applications. To address these issues, we introduce ngsJulia, a collection of templates and functions in Julia language to process short-read sequencing data for population genetic analysis. We further describe two implementations, ngsPool and ngsPloidy, for the analysis of pooled sequencing data and polyploid genomes, respectively. Through simulations, we illustrate the performance of estimating various population genetic parameters using these implementations, using both established and novel statistical methods. These results inform on optimal experimental design and demonstrate the applicabil- ity of methods in ngsJulia to estimate parameters of interest even from low coverage sequencing data. ngsJulia provide users with a flexible and efficient framework for ad hoc analysis of sequencing data.ngsJulia is available from: https://github.com/mfumagalli/ngsJulia</description><identifier>ISSN: 2046-1402</identifier><identifier>EISSN: 2046-1402</identifier><identifier>DOI: 10.12688/f1000research.104368.1</identifier><language>eng</language><ispartof>F1000 research, 2022, Vol.11, p.126</ispartof><rights>Copyright: © 2022 Mas-Sandoval A et al.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c1491-864927baafef2aefbd0042032401270f0af10cba1d0bc7c1e413370942cf88933</cites><orcidid>0000-0002-4084-2953 ; 0000-0002-2962-2669 ; 0000-0002-1712-9404</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,4024,27923,27924,27925</link.rule.ids></links><search><creatorcontrib>Mas-Sandoval, Alex</creatorcontrib><creatorcontrib>Jin, Chenyu</creatorcontrib><creatorcontrib>Fracassetti, Marco</creatorcontrib><creatorcontrib>Fumagalli, Matteo</creatorcontrib><title>ngsJulia: population genetic analysis of next-generation DNA sequencing data with Julia language [version 1; peer review: 1 approved with reservations]</title><title>F1000 research</title><description>A sound analysis of DNA sequencing data is important to extract meaningful information and infer quantities of interest. Sequencing and mapping errors coupled with low and variable coverage hamper the identification of genotypes and variants and the estimation of population genetic parameters. Methods and implementations to estimate population genetic parameters from sequencing data available nowadays either are suitable for the analysis of genomes from model organisms only, require moderate sequencing coverage, or are not easily adaptable to specific applications. To address these issues, we introduce ngsJulia, a collection of templates and functions in Julia language to process short-read sequencing data for population genetic analysis. We further describe two implementations, ngsPool and ngsPloidy, for the analysis of pooled sequencing data and polyploid genomes, respectively. Through simulations, we illustrate the performance of estimating various population genetic parameters using these implementations, using both established and novel statistical methods. These results inform on optimal experimental design and demonstrate the applicabil- ity of methods in ngsJulia to estimate parameters of interest even from low coverage sequencing data. ngsJulia provide users with a flexible and efficient framework for ad hoc analysis of sequencing data.ngsJulia is available from: https://github.com/mfumagalli/ngsJulia</description><issn>2046-1402</issn><issn>2046-1402</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNp9kMtOwzAQRS0EElXpN-AfSBk7Vh7tqipvVbCBFULRxBmnQSEJdtLSL-F3aRoWsGE1I809V5rD2LmAqZBBFF0YAQCWHKHV66kA5QfRVByxkQQVeEKBPP61n7KJc297AuLYD2Q4Yl9V7u67ssAZb-qmK7Et6ornVFFbaI4VljtXOF4bXtFn6_UHO2QuHxbc0UdHlS6qnGfYIt8W7Zof6niJVd5hTvxlQ9b1gJjzhshyS5uCtjMuODaNrTeUDVz_hd0cyt3rGTsxWDqa_Mwxe76-elreeqvHm7vlYuVpoWLhRYGKZZgiGjISyaQZgJLgSwVChmAA94J0iiKDVIdakBK-H0KspDZRFPv-mIVDr7a1c5ZM0tjiHe0uEZAcFCd_FCeD4kTsyflAGtRd2e76VPIr9j_9DUHUh4k</recordid><startdate>2022</startdate><enddate>2022</enddate><creator>Mas-Sandoval, Alex</creator><creator>Jin, Chenyu</creator><creator>Fracassetti, Marco</creator><creator>Fumagalli, Matteo</creator><scope>C-E</scope><scope>CH4</scope><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0002-4084-2953</orcidid><orcidid>https://orcid.org/0000-0002-2962-2669</orcidid><orcidid>https://orcid.org/0000-0002-1712-9404</orcidid></search><sort><creationdate>2022</creationdate><title>ngsJulia: population genetic analysis of next-generation DNA sequencing data with Julia language [version 1; peer review: 1 approved with reservations]</title><author>Mas-Sandoval, Alex ; Jin, Chenyu ; Fracassetti, Marco ; Fumagalli, Matteo</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c1491-864927baafef2aefbd0042032401270f0af10cba1d0bc7c1e413370942cf88933</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Mas-Sandoval, Alex</creatorcontrib><creatorcontrib>Jin, Chenyu</creatorcontrib><creatorcontrib>Fracassetti, Marco</creatorcontrib><creatorcontrib>Fumagalli, Matteo</creatorcontrib><collection>F1000Research</collection><collection>Faculty of 1000</collection><collection>CrossRef</collection><jtitle>F1000 research</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Mas-Sandoval, Alex</au><au>Jin, Chenyu</au><au>Fracassetti, Marco</au><au>Fumagalli, Matteo</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>ngsJulia: population genetic analysis of next-generation DNA sequencing data with Julia language [version 1; peer review: 1 approved with reservations]</atitle><jtitle>F1000 research</jtitle><date>2022</date><risdate>2022</risdate><volume>11</volume><spage>126</spage><pages>126-</pages><issn>2046-1402</issn><eissn>2046-1402</eissn><abstract>A sound analysis of DNA sequencing data is important to extract meaningful information and infer quantities of interest. Sequencing and mapping errors coupled with low and variable coverage hamper the identification of genotypes and variants and the estimation of population genetic parameters. Methods and implementations to estimate population genetic parameters from sequencing data available nowadays either are suitable for the analysis of genomes from model organisms only, require moderate sequencing coverage, or are not easily adaptable to specific applications. To address these issues, we introduce ngsJulia, a collection of templates and functions in Julia language to process short-read sequencing data for population genetic analysis. We further describe two implementations, ngsPool and ngsPloidy, for the analysis of pooled sequencing data and polyploid genomes, respectively. Through simulations, we illustrate the performance of estimating various population genetic parameters using these implementations, using both established and novel statistical methods. These results inform on optimal experimental design and demonstrate the applicabil- ity of methods in ngsJulia to estimate parameters of interest even from low coverage sequencing data. ngsJulia provide users with a flexible and efficient framework for ad hoc analysis of sequencing data.ngsJulia is available from: https://github.com/mfumagalli/ngsJulia</abstract><doi>10.12688/f1000research.104368.1</doi><orcidid>https://orcid.org/0000-0002-4084-2953</orcidid><orcidid>https://orcid.org/0000-0002-2962-2669</orcidid><orcidid>https://orcid.org/0000-0002-1712-9404</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2046-1402
ispartof F1000 research, 2022, Vol.11, p.126
issn 2046-1402
2046-1402
language eng
recordid cdi_crossref_primary_10_12688_f1000research_104368_1
source Publicly Available Content Database; PubMed Central
title ngsJulia: population genetic analysis of next-generation DNA sequencing data with Julia language [version 1; peer review: 1 approved with reservations]
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-04T15%3A12%3A39IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-faculty1000_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=ngsJulia:%20population%20genetic%20analysis%20of%20next-generation%20DNA%20sequencing%20data%20with%20Julia%20language%20%5Bversion%201;%20peer%20review:%201%20approved%20with%20reservations%5D&rft.jtitle=F1000%20research&rft.au=Mas-Sandoval,%20Alex&rft.date=2022&rft.volume=11&rft.spage=126&rft.pages=126-&rft.issn=2046-1402&rft.eissn=2046-1402&rft_id=info:doi/10.12688/f1000research.104368.1&rft_dat=%3Cfaculty1000_cross%3E10_12688_f1000research_104368_1%3C/faculty1000_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c1491-864927baafef2aefbd0042032401270f0af10cba1d0bc7c1e413370942cf88933%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true