Loading…

A reference‐free approach to analyse RADseq data using standard next generation sequencing toolkits

Genotyping‐by‐sequencing methods such as RADseq are popular for generating genomic and population‐scale data sets from a diverse range of organisms. These often lack a usable reference genome, restricting users to RADseq specific software for processing. However, these come with limitations compared...

Full description

Saved in:
Bibliographic Details
Published in:Molecular ecology resources 2021-05, Vol.21 (4), p.1085-1097
Main Authors: Heller, Rasmus, Nursyifa, Casia, Garcia‐Erill, Genís, Salmona, Jordi, Chikhi, Lounes, Meisner, Jonas, Korneliussen, Thorfinn Sand, Albrechtsen, Anders
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c4054-47b7b9e458449fff13f279df53923bfb6308d2155041fe5b925ada245675e6cb3
cites cdi_FETCH-LOGICAL-c4054-47b7b9e458449fff13f279df53923bfb6308d2155041fe5b925ada245675e6cb3
container_end_page 1097
container_issue 4
container_start_page 1085
container_title Molecular ecology resources
container_volume 21
creator Heller, Rasmus
Nursyifa, Casia
Garcia‐Erill, Genís
Salmona, Jordi
Chikhi, Lounes
Meisner, Jonas
Korneliussen, Thorfinn Sand
Albrechtsen, Anders
description Genotyping‐by‐sequencing methods such as RADseq are popular for generating genomic and population‐scale data sets from a diverse range of organisms. These often lack a usable reference genome, restricting users to RADseq specific software for processing. However, these come with limitations compared to generic next generation sequencing (NGS) toolkits. Here, we describe and test a simple pipeline for reference‐free RADseq data processing that blends de novo elements from STACKS with the full suite of state‐of‐the art NGS tools. Specifically, we use the de novo RADseq assembly employed by STACKS to create a catalogue of RAD loci that serves as a reference for read mapping, variant calling and site filters. Using RADseq data from 28 zebra sequenced to ~8x depth‐of‐coverage we evaluate our approach by comparing the site frequency spectra (SFS) to those from alternative pipelines. Most pipelines yielded similar SFS at 8x depth, but only a genotype likelihood based pipeline performed similarly at low sequencing depth (2–4x). We compared the RADseq SFS with medium‐depth (~13x) shotgun sequencing of eight overlapping samples, revealing that the RADseq SFS was persistently slightly skewed towards rare and invariant alleles. Using simulations and human data we confirm that this is expected when there is allelic dropout (AD) in the RADseq data. AD in the RADseq data caused a heterozygosity deficit of ~16%, which dropped to ~5% after filtering AD. Hence, AD was the most important source of bias in our RADseq data.
doi_str_mv 10.1111/1755-0998.13324
format article
fullrecord <record><control><sourceid>proquest_hal_p</sourceid><recordid>TN_cdi_hal_primary_oai_HAL_hal_04717371v1</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2511165594</sourcerecordid><originalsourceid>FETCH-LOGICAL-c4054-47b7b9e458449fff13f279df53923bfb6308d2155041fe5b925ada245675e6cb3</originalsourceid><addsrcrecordid>eNqFkctOxCAUhonReF-7MyRudDEKBcqwnHhPRk2MJu4IbQ9a7ZQRWnV2PoLP6JNIrc7CjbCAnHz8OZwPoS1K9mlcB1QKMSBKDfcpYwlfQKvzyuL8PrxbQWshPBKSEiX5MlphjMedqFUEI-zBgoc6h8_3D-sBsJlOvTP5A24cNrWpZgHw9egowDMuTGNwG8r6HofG1IXxBa7hrcH3UIM3TelqHLk2xnVM41z1VDZhAy1ZUwXY_DnX0e3J8c3h2WB8dXp-OBoPck4EH3CZyUwBF0POlbWWMptIVVjBVMIym6WMDIuECkE4tSAylQhTmISLVApI84yto70-98FUeurLifEz7Uypz0Zj3dUIl1QySV9oZHd7Nn42NhwaPSlDDlVlanBt0AmXkivFZBrRnT_oo2t9nEykRNSQCqF4pA56KvcuhDjVeQeU6M6W7nzozo3-thVfbP_kttkEijn_qycCogdeywpm_-Xpi-PLPvgLwhueUg</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2511165594</pqid></control><display><type>article</type><title>A reference‐free approach to analyse RADseq data using standard next generation sequencing toolkits</title><source>Wiley-Blackwell Read &amp; Publish Collection</source><creator>Heller, Rasmus ; Nursyifa, Casia ; Garcia‐Erill, Genís ; Salmona, Jordi ; Chikhi, Lounes ; Meisner, Jonas ; Korneliussen, Thorfinn Sand ; Albrechtsen, Anders</creator><creatorcontrib>Heller, Rasmus ; Nursyifa, Casia ; Garcia‐Erill, Genís ; Salmona, Jordi ; Chikhi, Lounes ; Meisner, Jonas ; Korneliussen, Thorfinn Sand ; Albrechtsen, Anders</creatorcontrib><description>Genotyping‐by‐sequencing methods such as RADseq are popular for generating genomic and population‐scale data sets from a diverse range of organisms. These often lack a usable reference genome, restricting users to RADseq specific software for processing. However, these come with limitations compared to generic next generation sequencing (NGS) toolkits. Here, we describe and test a simple pipeline for reference‐free RADseq data processing that blends de novo elements from STACKS with the full suite of state‐of‐the art NGS tools. Specifically, we use the de novo RADseq assembly employed by STACKS to create a catalogue of RAD loci that serves as a reference for read mapping, variant calling and site filters. Using RADseq data from 28 zebra sequenced to ~8x depth‐of‐coverage we evaluate our approach by comparing the site frequency spectra (SFS) to those from alternative pipelines. Most pipelines yielded similar SFS at 8x depth, but only a genotype likelihood based pipeline performed similarly at low sequencing depth (2–4x). We compared the RADseq SFS with medium‐depth (~13x) shotgun sequencing of eight overlapping samples, revealing that the RADseq SFS was persistently slightly skewed towards rare and invariant alleles. Using simulations and human data we confirm that this is expected when there is allelic dropout (AD) in the RADseq data. AD in the RADseq data caused a heterozygosity deficit of ~16%, which dropped to ~5% after filtering AD. Hence, AD was the most important source of bias in our RADseq data.</description><identifier>ISSN: 1755-098X</identifier><identifier>EISSN: 1755-0998</identifier><identifier>DOI: 10.1111/1755-0998.13324</identifier><identifier>PMID: 33434329</identifier><language>eng</language><publisher>England: Wiley Subscription Services, Inc</publisher><subject>allelic dropout ; Data processing ; Environmental Sciences ; Frequency spectrum ; Gene mapping ; genetic diversity ; genotype calling ; genotype likelihood ; Genotypes ; Genotyping ; Heterozygosity ; Next-generation sequencing ; Pipelines ; RADseq ; site frequency spectrum ; Stacks ; Toolkits</subject><ispartof>Molecular ecology resources, 2021-05, Vol.21 (4), p.1085-1097</ispartof><rights>2021 John Wiley &amp; Sons Ltd</rights><rights>2021 John Wiley &amp; Sons Ltd.</rights><rights>Copyright © 2021 John Wiley &amp; Sons Ltd</rights><rights>Distributed under a Creative Commons Attribution 4.0 International License</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c4054-47b7b9e458449fff13f279df53923bfb6308d2155041fe5b925ada245675e6cb3</citedby><cites>FETCH-LOGICAL-c4054-47b7b9e458449fff13f279df53923bfb6308d2155041fe5b925ada245675e6cb3</cites><orcidid>0000-0002-9540-6673 ; 0000-0002-7803-9664 ; 0000-0002-1140-0718 ; 0000-0001-6583-6923 ; 0000-0003-3150-1708 ; 0000-0001-7306-031X ; 0000-0001-7576-5380 ; 0000-0002-1950-5805</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>230,314,780,784,885,27924,27925</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/33434329$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink><backlink>$$Uhttps://cnrs.hal.science/hal-04717371$$DView record in HAL$$Hfree_for_read</backlink></links><search><creatorcontrib>Heller, Rasmus</creatorcontrib><creatorcontrib>Nursyifa, Casia</creatorcontrib><creatorcontrib>Garcia‐Erill, Genís</creatorcontrib><creatorcontrib>Salmona, Jordi</creatorcontrib><creatorcontrib>Chikhi, Lounes</creatorcontrib><creatorcontrib>Meisner, Jonas</creatorcontrib><creatorcontrib>Korneliussen, Thorfinn Sand</creatorcontrib><creatorcontrib>Albrechtsen, Anders</creatorcontrib><title>A reference‐free approach to analyse RADseq data using standard next generation sequencing toolkits</title><title>Molecular ecology resources</title><addtitle>Mol Ecol Resour</addtitle><description>Genotyping‐by‐sequencing methods such as RADseq are popular for generating genomic and population‐scale data sets from a diverse range of organisms. These often lack a usable reference genome, restricting users to RADseq specific software for processing. However, these come with limitations compared to generic next generation sequencing (NGS) toolkits. Here, we describe and test a simple pipeline for reference‐free RADseq data processing that blends de novo elements from STACKS with the full suite of state‐of‐the art NGS tools. Specifically, we use the de novo RADseq assembly employed by STACKS to create a catalogue of RAD loci that serves as a reference for read mapping, variant calling and site filters. Using RADseq data from 28 zebra sequenced to ~8x depth‐of‐coverage we evaluate our approach by comparing the site frequency spectra (SFS) to those from alternative pipelines. Most pipelines yielded similar SFS at 8x depth, but only a genotype likelihood based pipeline performed similarly at low sequencing depth (2–4x). We compared the RADseq SFS with medium‐depth (~13x) shotgun sequencing of eight overlapping samples, revealing that the RADseq SFS was persistently slightly skewed towards rare and invariant alleles. Using simulations and human data we confirm that this is expected when there is allelic dropout (AD) in the RADseq data. AD in the RADseq data caused a heterozygosity deficit of ~16%, which dropped to ~5% after filtering AD. Hence, AD was the most important source of bias in our RADseq data.</description><subject>allelic dropout</subject><subject>Data processing</subject><subject>Environmental Sciences</subject><subject>Frequency spectrum</subject><subject>Gene mapping</subject><subject>genetic diversity</subject><subject>genotype calling</subject><subject>genotype likelihood</subject><subject>Genotypes</subject><subject>Genotyping</subject><subject>Heterozygosity</subject><subject>Next-generation sequencing</subject><subject>Pipelines</subject><subject>RADseq</subject><subject>site frequency spectrum</subject><subject>Stacks</subject><subject>Toolkits</subject><issn>1755-098X</issn><issn>1755-0998</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><recordid>eNqFkctOxCAUhonReF-7MyRudDEKBcqwnHhPRk2MJu4IbQ9a7ZQRWnV2PoLP6JNIrc7CjbCAnHz8OZwPoS1K9mlcB1QKMSBKDfcpYwlfQKvzyuL8PrxbQWshPBKSEiX5MlphjMedqFUEI-zBgoc6h8_3D-sBsJlOvTP5A24cNrWpZgHw9egowDMuTGNwG8r6HofG1IXxBa7hrcH3UIM3TelqHLk2xnVM41z1VDZhAy1ZUwXY_DnX0e3J8c3h2WB8dXp-OBoPck4EH3CZyUwBF0POlbWWMptIVVjBVMIym6WMDIuECkE4tSAylQhTmISLVApI84yto70-98FUeurLifEz7Uypz0Zj3dUIl1QySV9oZHd7Nn42NhwaPSlDDlVlanBt0AmXkivFZBrRnT_oo2t9nEykRNSQCqF4pA56KvcuhDjVeQeU6M6W7nzozo3-thVfbP_kttkEijn_qycCogdeywpm_-Xpi-PLPvgLwhueUg</recordid><startdate>202105</startdate><enddate>202105</enddate><creator>Heller, Rasmus</creator><creator>Nursyifa, Casia</creator><creator>Garcia‐Erill, Genís</creator><creator>Salmona, Jordi</creator><creator>Chikhi, Lounes</creator><creator>Meisner, Jonas</creator><creator>Korneliussen, Thorfinn Sand</creator><creator>Albrechtsen, Anders</creator><general>Wiley Subscription Services, Inc</general><general>Wiley/Blackwell</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SN</scope><scope>7SS</scope><scope>8FD</scope><scope>C1K</scope><scope>FR3</scope><scope>M7N</scope><scope>P64</scope><scope>RC3</scope><scope>7X8</scope><scope>1XC</scope><orcidid>https://orcid.org/0000-0002-9540-6673</orcidid><orcidid>https://orcid.org/0000-0002-7803-9664</orcidid><orcidid>https://orcid.org/0000-0002-1140-0718</orcidid><orcidid>https://orcid.org/0000-0001-6583-6923</orcidid><orcidid>https://orcid.org/0000-0003-3150-1708</orcidid><orcidid>https://orcid.org/0000-0001-7306-031X</orcidid><orcidid>https://orcid.org/0000-0001-7576-5380</orcidid><orcidid>https://orcid.org/0000-0002-1950-5805</orcidid></search><sort><creationdate>202105</creationdate><title>A reference‐free approach to analyse RADseq data using standard next generation sequencing toolkits</title><author>Heller, Rasmus ; Nursyifa, Casia ; Garcia‐Erill, Genís ; Salmona, Jordi ; Chikhi, Lounes ; Meisner, Jonas ; Korneliussen, Thorfinn Sand ; Albrechtsen, Anders</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c4054-47b7b9e458449fff13f279df53923bfb6308d2155041fe5b925ada245675e6cb3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>allelic dropout</topic><topic>Data processing</topic><topic>Environmental Sciences</topic><topic>Frequency spectrum</topic><topic>Gene mapping</topic><topic>genetic diversity</topic><topic>genotype calling</topic><topic>genotype likelihood</topic><topic>Genotypes</topic><topic>Genotyping</topic><topic>Heterozygosity</topic><topic>Next-generation sequencing</topic><topic>Pipelines</topic><topic>RADseq</topic><topic>site frequency spectrum</topic><topic>Stacks</topic><topic>Toolkits</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Heller, Rasmus</creatorcontrib><creatorcontrib>Nursyifa, Casia</creatorcontrib><creatorcontrib>Garcia‐Erill, Genís</creatorcontrib><creatorcontrib>Salmona, Jordi</creatorcontrib><creatorcontrib>Chikhi, Lounes</creatorcontrib><creatorcontrib>Meisner, Jonas</creatorcontrib><creatorcontrib>Korneliussen, Thorfinn Sand</creatorcontrib><creatorcontrib>Albrechtsen, Anders</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>Ecology Abstracts</collection><collection>Entomology Abstracts (Full archive)</collection><collection>Technology Research Database</collection><collection>Environmental Sciences and Pollution Management</collection><collection>Engineering Research Database</collection><collection>Algology Mycology and Protozoology Abstracts (Microbiology C)</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><collection>Hyper Article en Ligne (HAL)</collection><jtitle>Molecular ecology resources</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Heller, Rasmus</au><au>Nursyifa, Casia</au><au>Garcia‐Erill, Genís</au><au>Salmona, Jordi</au><au>Chikhi, Lounes</au><au>Meisner, Jonas</au><au>Korneliussen, Thorfinn Sand</au><au>Albrechtsen, Anders</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A reference‐free approach to analyse RADseq data using standard next generation sequencing toolkits</atitle><jtitle>Molecular ecology resources</jtitle><addtitle>Mol Ecol Resour</addtitle><date>2021-05</date><risdate>2021</risdate><volume>21</volume><issue>4</issue><spage>1085</spage><epage>1097</epage><pages>1085-1097</pages><issn>1755-098X</issn><eissn>1755-0998</eissn><abstract>Genotyping‐by‐sequencing methods such as RADseq are popular for generating genomic and population‐scale data sets from a diverse range of organisms. These often lack a usable reference genome, restricting users to RADseq specific software for processing. However, these come with limitations compared to generic next generation sequencing (NGS) toolkits. Here, we describe and test a simple pipeline for reference‐free RADseq data processing that blends de novo elements from STACKS with the full suite of state‐of‐the art NGS tools. Specifically, we use the de novo RADseq assembly employed by STACKS to create a catalogue of RAD loci that serves as a reference for read mapping, variant calling and site filters. Using RADseq data from 28 zebra sequenced to ~8x depth‐of‐coverage we evaluate our approach by comparing the site frequency spectra (SFS) to those from alternative pipelines. Most pipelines yielded similar SFS at 8x depth, but only a genotype likelihood based pipeline performed similarly at low sequencing depth (2–4x). We compared the RADseq SFS with medium‐depth (~13x) shotgun sequencing of eight overlapping samples, revealing that the RADseq SFS was persistently slightly skewed towards rare and invariant alleles. Using simulations and human data we confirm that this is expected when there is allelic dropout (AD) in the RADseq data. AD in the RADseq data caused a heterozygosity deficit of ~16%, which dropped to ~5% after filtering AD. Hence, AD was the most important source of bias in our RADseq data.</abstract><cop>England</cop><pub>Wiley Subscription Services, Inc</pub><pmid>33434329</pmid><doi>10.1111/1755-0998.13324</doi><tpages>13</tpages><orcidid>https://orcid.org/0000-0002-9540-6673</orcidid><orcidid>https://orcid.org/0000-0002-7803-9664</orcidid><orcidid>https://orcid.org/0000-0002-1140-0718</orcidid><orcidid>https://orcid.org/0000-0001-6583-6923</orcidid><orcidid>https://orcid.org/0000-0003-3150-1708</orcidid><orcidid>https://orcid.org/0000-0001-7306-031X</orcidid><orcidid>https://orcid.org/0000-0001-7576-5380</orcidid><orcidid>https://orcid.org/0000-0002-1950-5805</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 1755-098X
ispartof Molecular ecology resources, 2021-05, Vol.21 (4), p.1085-1097
issn 1755-098X
1755-0998
language eng
recordid cdi_hal_primary_oai_HAL_hal_04717371v1
source Wiley-Blackwell Read & Publish Collection
subjects allelic dropout
Data processing
Environmental Sciences
Frequency spectrum
Gene mapping
genetic diversity
genotype calling
genotype likelihood
Genotypes
Genotyping
Heterozygosity
Next-generation sequencing
Pipelines
RADseq
site frequency spectrum
Stacks
Toolkits
title A reference‐free approach to analyse RADseq data using standard next generation sequencing toolkits
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-29T20%3A00%3A10IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_hal_p&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20reference%E2%80%90free%20approach%20to%20analyse%20RADseq%20data%20using%20standard%20next%20generation%20sequencing%20toolkits&rft.jtitle=Molecular%20ecology%20resources&rft.au=Heller,%20Rasmus&rft.date=2021-05&rft.volume=21&rft.issue=4&rft.spage=1085&rft.epage=1097&rft.pages=1085-1097&rft.issn=1755-098X&rft.eissn=1755-0998&rft_id=info:doi/10.1111/1755-0998.13324&rft_dat=%3Cproquest_hal_p%3E2511165594%3C/proquest_hal_p%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c4054-47b7b9e458449fff13f279df53923bfb6308d2155041fe5b925ada245675e6cb3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2511165594&rft_id=info:pmid/33434329&rfr_iscdi=true