Loading…

Kermut: Composite kernel regression for protein variant effects

Reliable prediction of protein variant effects is crucial for both protein optimization and for advancing biological understanding. For practical use in protein engineering, it is important that we can also provide reliable uncertainty estimates for our predictions, and while prediction accuracy has...

Full description

Saved in:
Bibliographic Details
Published in:arXiv.org 2024-10
Main Authors: Groth, Peter Mørch, Kerrn, Mads Herbert, Olsen, Lars, Salomon, Jesper, Boomsma, Wouter
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Groth, Peter Mørch
Kerrn, Mads Herbert
Olsen, Lars
Salomon, Jesper
Boomsma, Wouter
description Reliable prediction of protein variant effects is crucial for both protein optimization and for advancing biological understanding. For practical use in protein engineering, it is important that we can also provide reliable uncertainty estimates for our predictions, and while prediction accuracy has seen much progress in recent years, uncertainty metrics are rarely reported. We here provide a Gaussian process regression model, Kermut, with a novel composite kernel for modeling mutation similarity, which obtains state-of-the-art performance for supervised protein variant effect prediction while also offering estimates of uncertainty through its posterior. An analysis of the quality of the uncertainty estimates demonstrates that our model provides meaningful levels of overall calibration, but that instance-specific uncertainty calibration remains more challenging.
format article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_3074864254</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3074864254</sourcerecordid><originalsourceid>FETCH-proquest_journals_30748642543</originalsourceid><addsrcrecordid>eNqNyr0KwjAUQOEgCBbtO1xwLsT8tMXFoSiCq3spciOpbVLvTX1-O_gATmc430pkSutDURulNiJn7qWUqqyUtToTpxvSOKcjNHGcIvuE8EIKOADhk5DZxwAuEkwUE_oAn458FxKgc_hIvBNr1w2M-a9bsb-c7821WPx7Rk5tH2cKy2q1rExdGmWN_k99ATjuOT4</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3074864254</pqid></control><display><type>article</type><title>Kermut: Composite kernel regression for protein variant effects</title><source>Publicly Available Content (ProQuest)</source><creator>Groth, Peter Mørch ; Kerrn, Mads Herbert ; Olsen, Lars ; Salomon, Jesper ; Boomsma, Wouter</creator><creatorcontrib>Groth, Peter Mørch ; Kerrn, Mads Herbert ; Olsen, Lars ; Salomon, Jesper ; Boomsma, Wouter</creatorcontrib><description>Reliable prediction of protein variant effects is crucial for both protein optimization and for advancing biological understanding. For practical use in protein engineering, it is important that we can also provide reliable uncertainty estimates for our predictions, and while prediction accuracy has seen much progress in recent years, uncertainty metrics are rarely reported. We here provide a Gaussian process regression model, Kermut, with a novel composite kernel for modeling mutation similarity, which obtains state-of-the-art performance for supervised protein variant effect prediction while also offering estimates of uncertainty through its posterior. An analysis of the quality of the uncertainty estimates demonstrates that our model provides meaningful levels of overall calibration, but that instance-specific uncertainty calibration remains more challenging.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Estimates ; Gaussian process ; Proteins ; Regression models ; Uncertainty</subject><ispartof>arXiv.org, 2024-10</ispartof><rights>2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/3074864254?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>776,780,25731,36989,44566</link.rule.ids></links><search><creatorcontrib>Groth, Peter Mørch</creatorcontrib><creatorcontrib>Kerrn, Mads Herbert</creatorcontrib><creatorcontrib>Olsen, Lars</creatorcontrib><creatorcontrib>Salomon, Jesper</creatorcontrib><creatorcontrib>Boomsma, Wouter</creatorcontrib><title>Kermut: Composite kernel regression for protein variant effects</title><title>arXiv.org</title><description>Reliable prediction of protein variant effects is crucial for both protein optimization and for advancing biological understanding. For practical use in protein engineering, it is important that we can also provide reliable uncertainty estimates for our predictions, and while prediction accuracy has seen much progress in recent years, uncertainty metrics are rarely reported. We here provide a Gaussian process regression model, Kermut, with a novel composite kernel for modeling mutation similarity, which obtains state-of-the-art performance for supervised protein variant effect prediction while also offering estimates of uncertainty through its posterior. An analysis of the quality of the uncertainty estimates demonstrates that our model provides meaningful levels of overall calibration, but that instance-specific uncertainty calibration remains more challenging.</description><subject>Estimates</subject><subject>Gaussian process</subject><subject>Proteins</subject><subject>Regression models</subject><subject>Uncertainty</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNqNyr0KwjAUQOEgCBbtO1xwLsT8tMXFoSiCq3spciOpbVLvTX1-O_gATmc430pkSutDURulNiJn7qWUqqyUtToTpxvSOKcjNHGcIvuE8EIKOADhk5DZxwAuEkwUE_oAn458FxKgc_hIvBNr1w2M-a9bsb-c7821WPx7Rk5tH2cKy2q1rExdGmWN_k99ATjuOT4</recordid><startdate>20241031</startdate><enddate>20241031</enddate><creator>Groth, Peter Mørch</creator><creator>Kerrn, Mads Herbert</creator><creator>Olsen, Lars</creator><creator>Salomon, Jesper</creator><creator>Boomsma, Wouter</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20241031</creationdate><title>Kermut: Composite kernel regression for protein variant effects</title><author>Groth, Peter Mørch ; Kerrn, Mads Herbert ; Olsen, Lars ; Salomon, Jesper ; Boomsma, Wouter</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_30748642543</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Estimates</topic><topic>Gaussian process</topic><topic>Proteins</topic><topic>Regression models</topic><topic>Uncertainty</topic><toplevel>online_resources</toplevel><creatorcontrib>Groth, Peter Mørch</creatorcontrib><creatorcontrib>Kerrn, Mads Herbert</creatorcontrib><creatorcontrib>Olsen, Lars</creatorcontrib><creatorcontrib>Salomon, Jesper</creatorcontrib><creatorcontrib>Boomsma, Wouter</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Database (Proquest)</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>AUTh Library subscriptions: ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>ProQuest Engineering Database</collection><collection>Publicly Available Content (ProQuest)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Groth, Peter Mørch</au><au>Kerrn, Mads Herbert</au><au>Olsen, Lars</au><au>Salomon, Jesper</au><au>Boomsma, Wouter</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Kermut: Composite kernel regression for protein variant effects</atitle><jtitle>arXiv.org</jtitle><date>2024-10-31</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>Reliable prediction of protein variant effects is crucial for both protein optimization and for advancing biological understanding. For practical use in protein engineering, it is important that we can also provide reliable uncertainty estimates for our predictions, and while prediction accuracy has seen much progress in recent years, uncertainty metrics are rarely reported. We here provide a Gaussian process regression model, Kermut, with a novel composite kernel for modeling mutation similarity, which obtains state-of-the-art performance for supervised protein variant effect prediction while also offering estimates of uncertainty through its posterior. An analysis of the quality of the uncertainty estimates demonstrates that our model provides meaningful levels of overall calibration, but that instance-specific uncertainty calibration remains more challenging.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2024-10
issn 2331-8422
language eng
recordid cdi_proquest_journals_3074864254
source Publicly Available Content (ProQuest)
subjects Estimates
Gaussian process
Proteins
Regression models
Uncertainty
title Kermut: Composite kernel regression for protein variant effects
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-14T18%3A42%3A37IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Kermut:%20Composite%20kernel%20regression%20for%20protein%20variant%20effects&rft.jtitle=arXiv.org&rft.au=Groth,%20Peter%20M%C3%B8rch&rft.date=2024-10-31&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E3074864254%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_journals_30748642543%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=3074864254&rft_id=info:pmid/&rfr_iscdi=true