Loading…

Timesweeper: accurately identifying selective sweeps using population genomic time series

Abstract Despite decades of research, identifying selective sweeps, the genomic footprints of positive selection, remains a core problem in population genetics. Of the myriad methods that have been developed to tackle this task, few are designed to leverage the potential of genomic time-series data....

Full description

Saved in:
Bibliographic Details
Published in:Genetics (Austin) 2023-07, Vol.224 (3)
Main Authors: Whitehouse, Logan S, Schrider, Daniel R
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c461t-730375d7d9f9465669c2c8564c18d218dc8a2ec2297e510e6b2a1feba9bffb73
cites cdi_FETCH-LOGICAL-c461t-730375d7d9f9465669c2c8564c18d218dc8a2ec2297e510e6b2a1feba9bffb73
container_end_page
container_issue 3
container_start_page
container_title Genetics (Austin)
container_volume 224
creator Whitehouse, Logan S
Schrider, Daniel R
description Abstract Despite decades of research, identifying selective sweeps, the genomic footprints of positive selection, remains a core problem in population genetics. Of the myriad methods that have been developed to tackle this task, few are designed to leverage the potential of genomic time-series data. This is because in most population genetic studies of natural populations, only a single period of time can be sampled. Recent advancements in sequencing technology, including improvements in extracting and sequencing ancient DNA, have made repeated samplings of a population possible, allowing for more direct analysis of recent evolutionary dynamics. Serial sampling of organisms with shorter generation times has also become more feasible due to improvements in the cost and throughput of sequencing. With these advances in mind, here we present Timesweeper, a fast and accurate convolutional neural network-based tool for identifying selective sweeps in data consisting of multiple genomic samplings of a population over time. Timesweeper analyzes population genomic time-series data by first simulating training data under a demographic model appropriate for the data of interest, training a one-dimensional convolutional neural network on said simulations, and inferring which polymorphisms in this serialized data set were the direct target of a completed or ongoing selective sweep. We show that Timesweeper is accurate under multiple simulated demographic and sampling scenarios, identifies selected variants with high resolution, and estimates selection coefficients more accurately than existing methods. In sum, we show that more accurate inferences about natural selection are possible when genomic time-series data are available; such data will continue to proliferate in coming years due to both the sequencing of ancient samples and repeated samplings of extant populations with faster generation times, as well as experimentally evolved populations where time-series data are often generated. Methodological advances such as Timesweeper thus have the potential to help resolve the controversy over the role of positive selection in the genome. We provide Timesweeper as a Python package for use by the community. Despite decades of research, detecting genomic loci responsible for adaptation remains challenging; however, improvements in DNA sequencing make it possible to measure genomic variation across multiple timepoints. Such data could aid in detecting selective sweeps—in wh
doi_str_mv 10.1093/genetics/iyad084
format article
fullrecord <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_10324941</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><oup_id>10.1093/genetics/iyad084</oup_id><sourcerecordid>3050377257</sourcerecordid><originalsourceid>FETCH-LOGICAL-c461t-730375d7d9f9465669c2c8564c18d218dc8a2ec2297e510e6b2a1feba9bffb73</originalsourceid><addsrcrecordid>eNqFkUtLxDAUhYMozvjYu5KCG0FG82jSxo2I-ALBzWxchTS9HTO0TU1aZf69GWeU0Y2LkHDz3cM5HISOCD4nWLKLGbTQWxMu7EKXOE-30JjIlE2oYGR74z1CeyHMMcZC8nwXjVhGeCZJOkYvU9tA-ADowF8m2pjB6x7qRWJLaHtbLWw7SwLUYHr7DskXGZIhLMed64Za99a1STTiGmuSPqpF3FsIB2in0nWAw_W9j6Z3t9Obh8nT8_3jzfXTxKSC9JOMYZbxMitlJVPBhZCGmpyL1JC8pPGYXFMwlMoMOMEgCqpJBYWWRVUVGdtHVyvZbigaKE107XWtOm8b7RfKaat-_7T2Vc3cuyKY0VSmJCqcrhW8exsg9KqxwUBd6xbcEBTNCeEij2RET_6gczf4NsZTDPMYJKN8aQmvKONdCB6qHzcEq2Vv6rs3te4trhxvpvhZ-C4qAmcrwA3d_3KfpXmo1g</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3050377257</pqid></control><display><type>article</type><title>Timesweeper: accurately identifying selective sweeps using population genomic time series</title><source>Freely Accessible Science Journals</source><source>Oxford Journals Online</source><source>Alma/SFX Local Collection</source><creator>Whitehouse, Logan S ; Schrider, Daniel R</creator><contributor>Coop, G</contributor><creatorcontrib>Whitehouse, Logan S ; Schrider, Daniel R ; Coop, G</creatorcontrib><description>Abstract Despite decades of research, identifying selective sweeps, the genomic footprints of positive selection, remains a core problem in population genetics. Of the myriad methods that have been developed to tackle this task, few are designed to leverage the potential of genomic time-series data. This is because in most population genetic studies of natural populations, only a single period of time can be sampled. Recent advancements in sequencing technology, including improvements in extracting and sequencing ancient DNA, have made repeated samplings of a population possible, allowing for more direct analysis of recent evolutionary dynamics. Serial sampling of organisms with shorter generation times has also become more feasible due to improvements in the cost and throughput of sequencing. With these advances in mind, here we present Timesweeper, a fast and accurate convolutional neural network-based tool for identifying selective sweeps in data consisting of multiple genomic samplings of a population over time. Timesweeper analyzes population genomic time-series data by first simulating training data under a demographic model appropriate for the data of interest, training a one-dimensional convolutional neural network on said simulations, and inferring which polymorphisms in this serialized data set were the direct target of a completed or ongoing selective sweep. We show that Timesweeper is accurate under multiple simulated demographic and sampling scenarios, identifies selected variants with high resolution, and estimates selection coefficients more accurately than existing methods. In sum, we show that more accurate inferences about natural selection are possible when genomic time-series data are available; such data will continue to proliferate in coming years due to both the sequencing of ancient samples and repeated samplings of extant populations with faster generation times, as well as experimentally evolved populations where time-series data are often generated. Methodological advances such as Timesweeper thus have the potential to help resolve the controversy over the role of positive selection in the genome. We provide Timesweeper as a Python package for use by the community. Despite decades of research, detecting genomic loci responsible for adaptation remains challenging; however, improvements in DNA sequencing make it possible to measure genomic variation across multiple timepoints. Such data could aid in detecting selective sweeps—in which an adaptive mutation rapidly increases in frequency—thereby revealing loci responding to natural selection. Whitehouse and Schrider present a machine learning method called Timesweeper that accurately detects sweeps from time-series data—including unphased or un-genotyped data, provided allele frequency estimates are obtainable.</description><identifier>ISSN: 1943-2631</identifier><identifier>ISSN: 0016-6731</identifier><identifier>EISSN: 1943-2631</identifier><identifier>DOI: 10.1093/genetics/iyad084</identifier><identifier>PMID: 37157914</identifier><language>eng</language><publisher>US: Oxford University Press</publisher><subject>Artificial neural networks ; Demographics ; Demography ; DNA sequencing ; Genetics ; Genetics, Population ; Genomics ; Investigation ; Metagenomics ; Natural populations ; Natural selection ; Neural networks ; Polymorphism, Genetic ; Population ; Population genetics ; Population studies ; Populations ; Positive selection ; Sampling ; Selection, Genetic ; Simulation ; Time Factors ; Time series</subject><ispartof>Genetics (Austin), 2023-07, Vol.224 (3)</ispartof><rights>The Author(s) 2023. Published by Oxford University Press on behalf of The Genetics Society of America. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com 2023</rights><rights>The Author(s) 2023. Published by Oxford University Press on behalf of The Genetics Society of America. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.</rights><rights>The Author(s) 2023. Published by Oxford University Press on behalf of The Genetics Society of America. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c461t-730375d7d9f9465669c2c8564c18d218dc8a2ec2297e510e6b2a1feba9bffb73</citedby><cites>FETCH-LOGICAL-c461t-730375d7d9f9465669c2c8564c18d218dc8a2ec2297e510e6b2a1feba9bffb73</cites><orcidid>0000-0001-5249-4151</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>230,314,780,784,885,27924,27925</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/37157914$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><contributor>Coop, G</contributor><creatorcontrib>Whitehouse, Logan S</creatorcontrib><creatorcontrib>Schrider, Daniel R</creatorcontrib><title>Timesweeper: accurately identifying selective sweeps using population genomic time series</title><title>Genetics (Austin)</title><addtitle>Genetics</addtitle><description>Abstract Despite decades of research, identifying selective sweeps, the genomic footprints of positive selection, remains a core problem in population genetics. Of the myriad methods that have been developed to tackle this task, few are designed to leverage the potential of genomic time-series data. This is because in most population genetic studies of natural populations, only a single period of time can be sampled. Recent advancements in sequencing technology, including improvements in extracting and sequencing ancient DNA, have made repeated samplings of a population possible, allowing for more direct analysis of recent evolutionary dynamics. Serial sampling of organisms with shorter generation times has also become more feasible due to improvements in the cost and throughput of sequencing. With these advances in mind, here we present Timesweeper, a fast and accurate convolutional neural network-based tool for identifying selective sweeps in data consisting of multiple genomic samplings of a population over time. Timesweeper analyzes population genomic time-series data by first simulating training data under a demographic model appropriate for the data of interest, training a one-dimensional convolutional neural network on said simulations, and inferring which polymorphisms in this serialized data set were the direct target of a completed or ongoing selective sweep. We show that Timesweeper is accurate under multiple simulated demographic and sampling scenarios, identifies selected variants with high resolution, and estimates selection coefficients more accurately than existing methods. In sum, we show that more accurate inferences about natural selection are possible when genomic time-series data are available; such data will continue to proliferate in coming years due to both the sequencing of ancient samples and repeated samplings of extant populations with faster generation times, as well as experimentally evolved populations where time-series data are often generated. Methodological advances such as Timesweeper thus have the potential to help resolve the controversy over the role of positive selection in the genome. We provide Timesweeper as a Python package for use by the community. Despite decades of research, detecting genomic loci responsible for adaptation remains challenging; however, improvements in DNA sequencing make it possible to measure genomic variation across multiple timepoints. Such data could aid in detecting selective sweeps—in which an adaptive mutation rapidly increases in frequency—thereby revealing loci responding to natural selection. Whitehouse and Schrider present a machine learning method called Timesweeper that accurately detects sweeps from time-series data—including unphased or un-genotyped data, provided allele frequency estimates are obtainable.</description><subject>Artificial neural networks</subject><subject>Demographics</subject><subject>Demography</subject><subject>DNA sequencing</subject><subject>Genetics</subject><subject>Genetics, Population</subject><subject>Genomics</subject><subject>Investigation</subject><subject>Metagenomics</subject><subject>Natural populations</subject><subject>Natural selection</subject><subject>Neural networks</subject><subject>Polymorphism, Genetic</subject><subject>Population</subject><subject>Population genetics</subject><subject>Population studies</subject><subject>Populations</subject><subject>Positive selection</subject><subject>Sampling</subject><subject>Selection, Genetic</subject><subject>Simulation</subject><subject>Time Factors</subject><subject>Time series</subject><issn>1943-2631</issn><issn>0016-6731</issn><issn>1943-2631</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><recordid>eNqFkUtLxDAUhYMozvjYu5KCG0FG82jSxo2I-ALBzWxchTS9HTO0TU1aZf69GWeU0Y2LkHDz3cM5HISOCD4nWLKLGbTQWxMu7EKXOE-30JjIlE2oYGR74z1CeyHMMcZC8nwXjVhGeCZJOkYvU9tA-ADowF8m2pjB6x7qRWJLaHtbLWw7SwLUYHr7DskXGZIhLMed64Za99a1STTiGmuSPqpF3FsIB2in0nWAw_W9j6Z3t9Obh8nT8_3jzfXTxKSC9JOMYZbxMitlJVPBhZCGmpyL1JC8pPGYXFMwlMoMOMEgCqpJBYWWRVUVGdtHVyvZbigaKE107XWtOm8b7RfKaat-_7T2Vc3cuyKY0VSmJCqcrhW8exsg9KqxwUBd6xbcEBTNCeEij2RET_6gczf4NsZTDPMYJKN8aQmvKONdCB6qHzcEq2Vv6rs3te4trhxvpvhZ-C4qAmcrwA3d_3KfpXmo1g</recordid><startdate>20230706</startdate><enddate>20230706</enddate><creator>Whitehouse, Logan S</creator><creator>Schrider, Daniel R</creator><general>Oxford University Press</general><general>Genetics Society of America</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>4T-</scope><scope>4U-</scope><scope>7QP</scope><scope>7SS</scope><scope>7TK</scope><scope>7TM</scope><scope>8FD</scope><scope>FR3</scope><scope>K9.</scope><scope>M7N</scope><scope>P64</scope><scope>RC3</scope><scope>7X8</scope><scope>5PM</scope><orcidid>https://orcid.org/0000-0001-5249-4151</orcidid></search><sort><creationdate>20230706</creationdate><title>Timesweeper: accurately identifying selective sweeps using population genomic time series</title><author>Whitehouse, Logan S ; Schrider, Daniel R</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c461t-730375d7d9f9465669c2c8564c18d218dc8a2ec2297e510e6b2a1feba9bffb73</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Artificial neural networks</topic><topic>Demographics</topic><topic>Demography</topic><topic>DNA sequencing</topic><topic>Genetics</topic><topic>Genetics, Population</topic><topic>Genomics</topic><topic>Investigation</topic><topic>Metagenomics</topic><topic>Natural populations</topic><topic>Natural selection</topic><topic>Neural networks</topic><topic>Polymorphism, Genetic</topic><topic>Population</topic><topic>Population genetics</topic><topic>Population studies</topic><topic>Populations</topic><topic>Positive selection</topic><topic>Sampling</topic><topic>Selection, Genetic</topic><topic>Simulation</topic><topic>Time Factors</topic><topic>Time series</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Whitehouse, Logan S</creatorcontrib><creatorcontrib>Schrider, Daniel R</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Docstoc</collection><collection>University Readers</collection><collection>Calcium &amp; Calcified Tissue Abstracts</collection><collection>Entomology Abstracts (Full archive)</collection><collection>Neurosciences Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>Algology Mycology and Protozoology Abstracts (Microbiology C)</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Genetics (Austin)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Whitehouse, Logan S</au><au>Schrider, Daniel R</au><au>Coop, G</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Timesweeper: accurately identifying selective sweeps using population genomic time series</atitle><jtitle>Genetics (Austin)</jtitle><addtitle>Genetics</addtitle><date>2023-07-06</date><risdate>2023</risdate><volume>224</volume><issue>3</issue><issn>1943-2631</issn><issn>0016-6731</issn><eissn>1943-2631</eissn><abstract>Abstract Despite decades of research, identifying selective sweeps, the genomic footprints of positive selection, remains a core problem in population genetics. Of the myriad methods that have been developed to tackle this task, few are designed to leverage the potential of genomic time-series data. This is because in most population genetic studies of natural populations, only a single period of time can be sampled. Recent advancements in sequencing technology, including improvements in extracting and sequencing ancient DNA, have made repeated samplings of a population possible, allowing for more direct analysis of recent evolutionary dynamics. Serial sampling of organisms with shorter generation times has also become more feasible due to improvements in the cost and throughput of sequencing. With these advances in mind, here we present Timesweeper, a fast and accurate convolutional neural network-based tool for identifying selective sweeps in data consisting of multiple genomic samplings of a population over time. Timesweeper analyzes population genomic time-series data by first simulating training data under a demographic model appropriate for the data of interest, training a one-dimensional convolutional neural network on said simulations, and inferring which polymorphisms in this serialized data set were the direct target of a completed or ongoing selective sweep. We show that Timesweeper is accurate under multiple simulated demographic and sampling scenarios, identifies selected variants with high resolution, and estimates selection coefficients more accurately than existing methods. In sum, we show that more accurate inferences about natural selection are possible when genomic time-series data are available; such data will continue to proliferate in coming years due to both the sequencing of ancient samples and repeated samplings of extant populations with faster generation times, as well as experimentally evolved populations where time-series data are often generated. Methodological advances such as Timesweeper thus have the potential to help resolve the controversy over the role of positive selection in the genome. We provide Timesweeper as a Python package for use by the community. Despite decades of research, detecting genomic loci responsible for adaptation remains challenging; however, improvements in DNA sequencing make it possible to measure genomic variation across multiple timepoints. Such data could aid in detecting selective sweeps—in which an adaptive mutation rapidly increases in frequency—thereby revealing loci responding to natural selection. Whitehouse and Schrider present a machine learning method called Timesweeper that accurately detects sweeps from time-series data—including unphased or un-genotyped data, provided allele frequency estimates are obtainable.</abstract><cop>US</cop><pub>Oxford University Press</pub><pmid>37157914</pmid><doi>10.1093/genetics/iyad084</doi><orcidid>https://orcid.org/0000-0001-5249-4151</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1943-2631
ispartof Genetics (Austin), 2023-07, Vol.224 (3)
issn 1943-2631
0016-6731
1943-2631
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_10324941
source Freely Accessible Science Journals; Oxford Journals Online; Alma/SFX Local Collection
subjects Artificial neural networks
Demographics
Demography
DNA sequencing
Genetics
Genetics, Population
Genomics
Investigation
Metagenomics
Natural populations
Natural selection
Neural networks
Polymorphism, Genetic
Population
Population genetics
Population studies
Populations
Positive selection
Sampling
Selection, Genetic
Simulation
Time Factors
Time series
title Timesweeper: accurately identifying selective sweeps using population genomic time series
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-26T09%3A53%3A37IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Timesweeper:%20accurately%20identifying%20selective%20sweeps%20using%20population%20genomic%20time%20series&rft.jtitle=Genetics%20(Austin)&rft.au=Whitehouse,%20Logan%20S&rft.date=2023-07-06&rft.volume=224&rft.issue=3&rft.issn=1943-2631&rft.eissn=1943-2631&rft_id=info:doi/10.1093/genetics/iyad084&rft_dat=%3Cproquest_pubme%3E3050377257%3C/proquest_pubme%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c461t-730375d7d9f9465669c2c8564c18d218dc8a2ec2297e510e6b2a1feba9bffb73%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=3050377257&rft_id=info:pmid/37157914&rft_oup_id=10.1093/genetics/iyad084&rfr_iscdi=true