Loading…

Optimizing strategy for the discovery of compositionally-biased or low-complexity regions in proteins

Proteins can contain tracts dominated by a subset of amino acids and that have a functional significance. These are often termed ‘low-complexity regions’ (LCRs) or ‘compositionally-biased regions’ (CBRs). However, a wide spectrum of compositional bias is possible, and program parameters used to anno...

Full description

Saved in:
Bibliographic Details
Published in:Scientific reports 2024-01, Vol.14 (1), p.680-680, Article 680
Main Author: Harrison, Paul M.
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites cdi_FETCH-LOGICAL-c436t-7ad4557175837d0124abe98a13c9e108ef6908608ffee03d520e649af034c9493
container_end_page 680
container_issue 1
container_start_page 680
container_title Scientific reports
container_volume 14
creator Harrison, Paul M.
description Proteins can contain tracts dominated by a subset of amino acids and that have a functional significance. These are often termed ‘low-complexity regions’ (LCRs) or ‘compositionally-biased regions’ (CBRs). However, a wide spectrum of compositional bias is possible, and program parameters used to annotate these regions are often arbitrarily chosen. Also, investigators are sometimes interested in longer regions, or sometimes very short ones. Here, two programs for annotating LCRs/CBRs, namely SEG and fLPS, are investigated in detail across the whole expanse of their parameter spaces. In doing so, boundary behaviours are resolved that are used to derive an optimized systematic strategy for annotating LCRs/CBRs. Sets of parameters that progressively annotate or ‘cover’ more of protein sequence space and are optimized for a given target length have been derived. This progressive annotation can be applied to discern the biological relevance of CBRs, e.g., in parsing domains for experimental constructs and in generating hypotheses. It is also useful for picking out candidate regions of interest of a given target length and bias signature, and for assessing the parameter dependence of annotations. This latter application is demonstrated for a set of human intrinsically-disordered proteins associated with cancer.
doi_str_mv 10.1038/s41598-023-50991-8
format article
fullrecord <record><control><sourceid>proquest_doaj_</sourceid><recordid>TN_cdi_doaj_primary_oai_doaj_org_article_e62d74214530455089d1ce36a843770a</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><doaj_id>oai_doaj_org_article_e62d74214530455089d1ce36a843770a</doaj_id><sourcerecordid>2910734886</sourcerecordid><originalsourceid>FETCH-LOGICAL-c436t-7ad4557175837d0124abe98a13c9e108ef6908608ffee03d520e649af034c9493</originalsourceid><addsrcrecordid>eNp9kUtv1DAUhaMK1FZt_wALZIkNG8P1K7GXqOJRqVI3dG15kpvgURIPtgcIvx7PpC0VC7yx5fvdc-1zquoVg3cMhH6fJFNGU-CCKjCGUX1SnXOQinLB-Ytn57PqKqUtlKW4kcycVmdCM81rY84rvNtlP_nffh5IytFlHBbSh0jyNySdT234gXEhoSdtmHYh-ezD7MZxoRvvEnakoGP4SQ_VEX_5vJCIQ2ES8TPZxZDRz-myetm7MeHVw35R3X_6-PX6C729-3xz_eGWtlLUmTauk0o1rFFaNB0wLt0GjXZMtAYZaOxrA7oG3feIIDrFAWtpXA9CtkYacVHdrLpdcFu7i35ycbHBeXu8CHGwLmbfjmix5l0jOZNKFKMUaNOxFkXttBRNA65ovV21yie-7zFlOxU7cBzdjGGfLDeMaQkK6oK--Qfdhn0sNh0paITU-kDxlWpjSCli__RABvaQqV0ztSVTe8zU6tL0-kF6v5mwe2p5TLAAYgVSKc0Dxr-z_yP7B-jMq2M</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2910734886</pqid></control><display><type>article</type><title>Optimizing strategy for the discovery of compositionally-biased or low-complexity regions in proteins</title><source>Publicly Available Content Database</source><source>PubMed Central</source><source>Free Full-Text Journals in Chemistry</source><source>Springer Nature - nature.com Journals - Fully Open Access</source><creator>Harrison, Paul M.</creator><creatorcontrib>Harrison, Paul M.</creatorcontrib><description>Proteins can contain tracts dominated by a subset of amino acids and that have a functional significance. These are often termed ‘low-complexity regions’ (LCRs) or ‘compositionally-biased regions’ (CBRs). However, a wide spectrum of compositional bias is possible, and program parameters used to annotate these regions are often arbitrarily chosen. Also, investigators are sometimes interested in longer regions, or sometimes very short ones. Here, two programs for annotating LCRs/CBRs, namely SEG and fLPS, are investigated in detail across the whole expanse of their parameter spaces. In doing so, boundary behaviours are resolved that are used to derive an optimized systematic strategy for annotating LCRs/CBRs. Sets of parameters that progressively annotate or ‘cover’ more of protein sequence space and are optimized for a given target length have been derived. This progressive annotation can be applied to discern the biological relevance of CBRs, e.g., in parsing domains for experimental constructs and in generating hypotheses. It is also useful for picking out candidate regions of interest of a given target length and bias signature, and for assessing the parameter dependence of annotations. This latter application is demonstrated for a set of human intrinsically-disordered proteins associated with cancer.</description><identifier>ISSN: 2045-2322</identifier><identifier>EISSN: 2045-2322</identifier><identifier>DOI: 10.1038/s41598-023-50991-8</identifier><identifier>PMID: 38182699</identifier><language>eng</language><publisher>London: Nature Publishing Group UK</publisher><subject>631/114 ; 631/114/2184 ; 631/114/663 ; Amino Acid Sequence ; Amino Acids ; Antifibrinolytic Agents ; Bias ; Humanities and Social Sciences ; Humans ; Intrinsically Disordered Proteins ; multidisciplinary ; Protein Domains ; Proteins ; Science ; Science (multidisciplinary)</subject><ispartof>Scientific reports, 2024-01, Vol.14 (1), p.680-680, Article 680</ispartof><rights>The Author(s) 2024</rights><rights>2024. The Author(s).</rights><rights>The Author(s) 2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c436t-7ad4557175837d0124abe98a13c9e108ef6908608ffee03d520e649af034c9493</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.proquest.com/docview/2910734886/fulltextPDF?pq-origsite=primo$$EPDF$$P50$$Gproquest$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/2910734886?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,25731,27901,27902,36989,36990,44566,74869</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/38182699$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Harrison, Paul M.</creatorcontrib><title>Optimizing strategy for the discovery of compositionally-biased or low-complexity regions in proteins</title><title>Scientific reports</title><addtitle>Sci Rep</addtitle><addtitle>Sci Rep</addtitle><description>Proteins can contain tracts dominated by a subset of amino acids and that have a functional significance. These are often termed ‘low-complexity regions’ (LCRs) or ‘compositionally-biased regions’ (CBRs). However, a wide spectrum of compositional bias is possible, and program parameters used to annotate these regions are often arbitrarily chosen. Also, investigators are sometimes interested in longer regions, or sometimes very short ones. Here, two programs for annotating LCRs/CBRs, namely SEG and fLPS, are investigated in detail across the whole expanse of their parameter spaces. In doing so, boundary behaviours are resolved that are used to derive an optimized systematic strategy for annotating LCRs/CBRs. Sets of parameters that progressively annotate or ‘cover’ more of protein sequence space and are optimized for a given target length have been derived. This progressive annotation can be applied to discern the biological relevance of CBRs, e.g., in parsing domains for experimental constructs and in generating hypotheses. It is also useful for picking out candidate regions of interest of a given target length and bias signature, and for assessing the parameter dependence of annotations. This latter application is demonstrated for a set of human intrinsically-disordered proteins associated with cancer.</description><subject>631/114</subject><subject>631/114/2184</subject><subject>631/114/663</subject><subject>Amino Acid Sequence</subject><subject>Amino Acids</subject><subject>Antifibrinolytic Agents</subject><subject>Bias</subject><subject>Humanities and Social Sciences</subject><subject>Humans</subject><subject>Intrinsically Disordered Proteins</subject><subject>multidisciplinary</subject><subject>Protein Domains</subject><subject>Proteins</subject><subject>Science</subject><subject>Science (multidisciplinary)</subject><issn>2045-2322</issn><issn>2045-2322</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><sourceid>DOA</sourceid><recordid>eNp9kUtv1DAUhaMK1FZt_wALZIkNG8P1K7GXqOJRqVI3dG15kpvgURIPtgcIvx7PpC0VC7yx5fvdc-1zquoVg3cMhH6fJFNGU-CCKjCGUX1SnXOQinLB-Ytn57PqKqUtlKW4kcycVmdCM81rY84rvNtlP_nffh5IytFlHBbSh0jyNySdT234gXEhoSdtmHYh-ezD7MZxoRvvEnakoGP4SQ_VEX_5vJCIQ2ES8TPZxZDRz-myetm7MeHVw35R3X_6-PX6C729-3xz_eGWtlLUmTauk0o1rFFaNB0wLt0GjXZMtAYZaOxrA7oG3feIIDrFAWtpXA9CtkYacVHdrLpdcFu7i35ycbHBeXu8CHGwLmbfjmix5l0jOZNKFKMUaNOxFkXttBRNA65ovV21yie-7zFlOxU7cBzdjGGfLDeMaQkK6oK--Qfdhn0sNh0paITU-kDxlWpjSCli__RABvaQqV0ztSVTe8zU6tL0-kF6v5mwe2p5TLAAYgVSKc0Dxr-z_yP7B-jMq2M</recordid><startdate>20240105</startdate><enddate>20240105</enddate><creator>Harrison, Paul M.</creator><general>Nature Publishing Group UK</general><general>Nature Publishing Group</general><general>Nature Portfolio</general><scope>C6C</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7X7</scope><scope>7XB</scope><scope>88A</scope><scope>88E</scope><scope>88I</scope><scope>8FE</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AEUYN</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BHPHI</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>K9.</scope><scope>LK8</scope><scope>M0S</scope><scope>M1P</scope><scope>M2P</scope><scope>M7P</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>Q9U</scope><scope>7X8</scope><scope>DOA</scope></search><sort><creationdate>20240105</creationdate><title>Optimizing strategy for the discovery of compositionally-biased or low-complexity regions in proteins</title><author>Harrison, Paul M.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c436t-7ad4557175837d0124abe98a13c9e108ef6908608ffee03d520e649af034c9493</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>631/114</topic><topic>631/114/2184</topic><topic>631/114/663</topic><topic>Amino Acid Sequence</topic><topic>Amino Acids</topic><topic>Antifibrinolytic Agents</topic><topic>Bias</topic><topic>Humanities and Social Sciences</topic><topic>Humans</topic><topic>Intrinsically Disordered Proteins</topic><topic>multidisciplinary</topic><topic>Protein Domains</topic><topic>Proteins</topic><topic>Science</topic><topic>Science (multidisciplinary)</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Harrison, Paul M.</creatorcontrib><collection>Springer Open Access</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Health &amp; Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Biology Database (Alumni Edition)</collection><collection>Medical Database (Alumni Edition)</collection><collection>Science Database (Alumni Edition)</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest One Sustainability</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Natural Science Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>Biological Sciences</collection><collection>Health &amp; Medical Collection (Alumni Edition)</collection><collection>Medical Database</collection><collection>Science Database</collection><collection>Biological Science Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central Basic</collection><collection>MEDLINE - Academic</collection><collection>Directory of Open Access Journals</collection><jtitle>Scientific reports</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Harrison, Paul M.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Optimizing strategy for the discovery of compositionally-biased or low-complexity regions in proteins</atitle><jtitle>Scientific reports</jtitle><stitle>Sci Rep</stitle><addtitle>Sci Rep</addtitle><date>2024-01-05</date><risdate>2024</risdate><volume>14</volume><issue>1</issue><spage>680</spage><epage>680</epage><pages>680-680</pages><artnum>680</artnum><issn>2045-2322</issn><eissn>2045-2322</eissn><abstract>Proteins can contain tracts dominated by a subset of amino acids and that have a functional significance. These are often termed ‘low-complexity regions’ (LCRs) or ‘compositionally-biased regions’ (CBRs). However, a wide spectrum of compositional bias is possible, and program parameters used to annotate these regions are often arbitrarily chosen. Also, investigators are sometimes interested in longer regions, or sometimes very short ones. Here, two programs for annotating LCRs/CBRs, namely SEG and fLPS, are investigated in detail across the whole expanse of their parameter spaces. In doing so, boundary behaviours are resolved that are used to derive an optimized systematic strategy for annotating LCRs/CBRs. Sets of parameters that progressively annotate or ‘cover’ more of protein sequence space and are optimized for a given target length have been derived. This progressive annotation can be applied to discern the biological relevance of CBRs, e.g., in parsing domains for experimental constructs and in generating hypotheses. It is also useful for picking out candidate regions of interest of a given target length and bias signature, and for assessing the parameter dependence of annotations. This latter application is demonstrated for a set of human intrinsically-disordered proteins associated with cancer.</abstract><cop>London</cop><pub>Nature Publishing Group UK</pub><pmid>38182699</pmid><doi>10.1038/s41598-023-50991-8</doi><tpages>1</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2045-2322
ispartof Scientific reports, 2024-01, Vol.14 (1), p.680-680, Article 680
issn 2045-2322
2045-2322
language eng
recordid cdi_doaj_primary_oai_doaj_org_article_e62d74214530455089d1ce36a843770a
source Publicly Available Content Database; PubMed Central; Free Full-Text Journals in Chemistry; Springer Nature - nature.com Journals - Fully Open Access
subjects 631/114
631/114/2184
631/114/663
Amino Acid Sequence
Amino Acids
Antifibrinolytic Agents
Bias
Humanities and Social Sciences
Humans
Intrinsically Disordered Proteins
multidisciplinary
Protein Domains
Proteins
Science
Science (multidisciplinary)
title Optimizing strategy for the discovery of compositionally-biased or low-complexity regions in proteins
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-10T20%3A05%3A03IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_doaj_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Optimizing%20strategy%20for%20the%20discovery%20of%20compositionally-biased%20or%20low-complexity%20regions%20in%20proteins&rft.jtitle=Scientific%20reports&rft.au=Harrison,%20Paul%20M.&rft.date=2024-01-05&rft.volume=14&rft.issue=1&rft.spage=680&rft.epage=680&rft.pages=680-680&rft.artnum=680&rft.issn=2045-2322&rft.eissn=2045-2322&rft_id=info:doi/10.1038/s41598-023-50991-8&rft_dat=%3Cproquest_doaj_%3E2910734886%3C/proquest_doaj_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c436t-7ad4557175837d0124abe98a13c9e108ef6908608ffee03d520e649af034c9493%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2910734886&rft_id=info:pmid/38182699&rfr_iscdi=true