Loading…
Optimizing strategy for the discovery of compositionally-biased or low-complexity regions in proteins
Proteins can contain tracts dominated by a subset of amino acids and that have a functional significance. These are often termed ‘low-complexity regions’ (LCRs) or ‘compositionally-biased regions’ (CBRs). However, a wide spectrum of compositional bias is possible, and program parameters used to anno...
Saved in:
Published in: | Scientific reports 2024-01, Vol.14 (1), p.680-680, Article 680 |
---|---|
Main Author: | |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | cdi_FETCH-LOGICAL-c436t-7ad4557175837d0124abe98a13c9e108ef6908608ffee03d520e649af034c9493 |
container_end_page | 680 |
container_issue | 1 |
container_start_page | 680 |
container_title | Scientific reports |
container_volume | 14 |
creator | Harrison, Paul M. |
description | Proteins can contain tracts dominated by a subset of amino acids and that have a functional significance. These are often termed ‘low-complexity regions’ (LCRs) or ‘compositionally-biased regions’ (CBRs). However, a wide spectrum of compositional bias is possible, and program parameters used to annotate these regions are often arbitrarily chosen. Also, investigators are sometimes interested in longer regions, or sometimes very short ones. Here, two programs for annotating LCRs/CBRs, namely SEG and fLPS, are investigated in detail across the whole expanse of their parameter spaces. In doing so, boundary behaviours are resolved that are used to derive an optimized systematic strategy for annotating LCRs/CBRs. Sets of parameters that progressively annotate or ‘cover’ more of protein sequence space and are optimized for a given target length have been derived. This progressive annotation can be applied to discern the biological relevance of CBRs, e.g., in parsing domains for experimental constructs and in generating hypotheses. It is also useful for picking out candidate regions of interest of a given target length and bias signature, and for assessing the parameter dependence of annotations. This latter application is demonstrated for a set of human intrinsically-disordered proteins associated with cancer. |
doi_str_mv | 10.1038/s41598-023-50991-8 |
format | article |
fullrecord | <record><control><sourceid>proquest_doaj_</sourceid><recordid>TN_cdi_doaj_primary_oai_doaj_org_article_e62d74214530455089d1ce36a843770a</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><doaj_id>oai_doaj_org_article_e62d74214530455089d1ce36a843770a</doaj_id><sourcerecordid>2910734886</sourcerecordid><originalsourceid>FETCH-LOGICAL-c436t-7ad4557175837d0124abe98a13c9e108ef6908608ffee03d520e649af034c9493</originalsourceid><addsrcrecordid>eNp9kUtv1DAUhaMK1FZt_wALZIkNG8P1K7GXqOJRqVI3dG15kpvgURIPtgcIvx7PpC0VC7yx5fvdc-1zquoVg3cMhH6fJFNGU-CCKjCGUX1SnXOQinLB-Ytn57PqKqUtlKW4kcycVmdCM81rY84rvNtlP_nffh5IytFlHBbSh0jyNySdT234gXEhoSdtmHYh-ezD7MZxoRvvEnakoGP4SQ_VEX_5vJCIQ2ES8TPZxZDRz-myetm7MeHVw35R3X_6-PX6C729-3xz_eGWtlLUmTauk0o1rFFaNB0wLt0GjXZMtAYZaOxrA7oG3feIIDrFAWtpXA9CtkYacVHdrLpdcFu7i35ycbHBeXu8CHGwLmbfjmix5l0jOZNKFKMUaNOxFkXttBRNA65ovV21yie-7zFlOxU7cBzdjGGfLDeMaQkK6oK--Qfdhn0sNh0paITU-kDxlWpjSCli__RABvaQqV0ztSVTe8zU6tL0-kF6v5mwe2p5TLAAYgVSKc0Dxr-z_yP7B-jMq2M</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2910734886</pqid></control><display><type>article</type><title>Optimizing strategy for the discovery of compositionally-biased or low-complexity regions in proteins</title><source>Publicly Available Content Database</source><source>PubMed Central</source><source>Free Full-Text Journals in Chemistry</source><source>Springer Nature - nature.com Journals - Fully Open Access</source><creator>Harrison, Paul M.</creator><creatorcontrib>Harrison, Paul M.</creatorcontrib><description>Proteins can contain tracts dominated by a subset of amino acids and that have a functional significance. These are often termed ‘low-complexity regions’ (LCRs) or ‘compositionally-biased regions’ (CBRs). However, a wide spectrum of compositional bias is possible, and program parameters used to annotate these regions are often arbitrarily chosen. Also, investigators are sometimes interested in longer regions, or sometimes very short ones. Here, two programs for annotating LCRs/CBRs, namely SEG and fLPS, are investigated in detail across the whole expanse of their parameter spaces. In doing so, boundary behaviours are resolved that are used to derive an optimized systematic strategy for annotating LCRs/CBRs. Sets of parameters that progressively annotate or ‘cover’ more of protein sequence space and are optimized for a given target length have been derived. This progressive annotation can be applied to discern the biological relevance of CBRs, e.g., in parsing domains for experimental constructs and in generating hypotheses. It is also useful for picking out candidate regions of interest of a given target length and bias signature, and for assessing the parameter dependence of annotations. This latter application is demonstrated for a set of human intrinsically-disordered proteins associated with cancer.</description><identifier>ISSN: 2045-2322</identifier><identifier>EISSN: 2045-2322</identifier><identifier>DOI: 10.1038/s41598-023-50991-8</identifier><identifier>PMID: 38182699</identifier><language>eng</language><publisher>London: Nature Publishing Group UK</publisher><subject>631/114 ; 631/114/2184 ; 631/114/663 ; Amino Acid Sequence ; Amino Acids ; Antifibrinolytic Agents ; Bias ; Humanities and Social Sciences ; Humans ; Intrinsically Disordered Proteins ; multidisciplinary ; Protein Domains ; Proteins ; Science ; Science (multidisciplinary)</subject><ispartof>Scientific reports, 2024-01, Vol.14 (1), p.680-680, Article 680</ispartof><rights>The Author(s) 2024</rights><rights>2024. The Author(s).</rights><rights>The Author(s) 2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c436t-7ad4557175837d0124abe98a13c9e108ef6908608ffee03d520e649af034c9493</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.proquest.com/docview/2910734886/fulltextPDF?pq-origsite=primo$$EPDF$$P50$$Gproquest$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/2910734886?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,25731,27901,27902,36989,36990,44566,74869</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/38182699$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Harrison, Paul M.</creatorcontrib><title>Optimizing strategy for the discovery of compositionally-biased or low-complexity regions in proteins</title><title>Scientific reports</title><addtitle>Sci Rep</addtitle><addtitle>Sci Rep</addtitle><description>Proteins can contain tracts dominated by a subset of amino acids and that have a functional significance. These are often termed ‘low-complexity regions’ (LCRs) or ‘compositionally-biased regions’ (CBRs). However, a wide spectrum of compositional bias is possible, and program parameters used to annotate these regions are often arbitrarily chosen. Also, investigators are sometimes interested in longer regions, or sometimes very short ones. Here, two programs for annotating LCRs/CBRs, namely SEG and fLPS, are investigated in detail across the whole expanse of their parameter spaces. In doing so, boundary behaviours are resolved that are used to derive an optimized systematic strategy for annotating LCRs/CBRs. Sets of parameters that progressively annotate or ‘cover’ more of protein sequence space and are optimized for a given target length have been derived. This progressive annotation can be applied to discern the biological relevance of CBRs, e.g., in parsing domains for experimental constructs and in generating hypotheses. It is also useful for picking out candidate regions of interest of a given target length and bias signature, and for assessing the parameter dependence of annotations. This latter application is demonstrated for a set of human intrinsically-disordered proteins associated with cancer.</description><subject>631/114</subject><subject>631/114/2184</subject><subject>631/114/663</subject><subject>Amino Acid Sequence</subject><subject>Amino Acids</subject><subject>Antifibrinolytic Agents</subject><subject>Bias</subject><subject>Humanities and Social Sciences</subject><subject>Humans</subject><subject>Intrinsically Disordered Proteins</subject><subject>multidisciplinary</subject><subject>Protein Domains</subject><subject>Proteins</subject><subject>Science</subject><subject>Science (multidisciplinary)</subject><issn>2045-2322</issn><issn>2045-2322</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><sourceid>DOA</sourceid><recordid>eNp9kUtv1DAUhaMK1FZt_wALZIkNG8P1K7GXqOJRqVI3dG15kpvgURIPtgcIvx7PpC0VC7yx5fvdc-1zquoVg3cMhH6fJFNGU-CCKjCGUX1SnXOQinLB-Ytn57PqKqUtlKW4kcycVmdCM81rY84rvNtlP_nffh5IytFlHBbSh0jyNySdT234gXEhoSdtmHYh-ezD7MZxoRvvEnakoGP4SQ_VEX_5vJCIQ2ES8TPZxZDRz-myetm7MeHVw35R3X_6-PX6C729-3xz_eGWtlLUmTauk0o1rFFaNB0wLt0GjXZMtAYZaOxrA7oG3feIIDrFAWtpXA9CtkYacVHdrLpdcFu7i35ycbHBeXu8CHGwLmbfjmix5l0jOZNKFKMUaNOxFkXttBRNA65ovV21yie-7zFlOxU7cBzdjGGfLDeMaQkK6oK--Qfdhn0sNh0paITU-kDxlWpjSCli__RABvaQqV0ztSVTe8zU6tL0-kF6v5mwe2p5TLAAYgVSKc0Dxr-z_yP7B-jMq2M</recordid><startdate>20240105</startdate><enddate>20240105</enddate><creator>Harrison, Paul M.</creator><general>Nature Publishing Group UK</general><general>Nature Publishing Group</general><general>Nature Portfolio</general><scope>C6C</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7X7</scope><scope>7XB</scope><scope>88A</scope><scope>88E</scope><scope>88I</scope><scope>8FE</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AEUYN</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BHPHI</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>K9.</scope><scope>LK8</scope><scope>M0S</scope><scope>M1P</scope><scope>M2P</scope><scope>M7P</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>Q9U</scope><scope>7X8</scope><scope>DOA</scope></search><sort><creationdate>20240105</creationdate><title>Optimizing strategy for the discovery of compositionally-biased or low-complexity regions in proteins</title><author>Harrison, Paul M.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c436t-7ad4557175837d0124abe98a13c9e108ef6908608ffee03d520e649af034c9493</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>631/114</topic><topic>631/114/2184</topic><topic>631/114/663</topic><topic>Amino Acid Sequence</topic><topic>Amino Acids</topic><topic>Antifibrinolytic Agents</topic><topic>Bias</topic><topic>Humanities and Social Sciences</topic><topic>Humans</topic><topic>Intrinsically Disordered Proteins</topic><topic>multidisciplinary</topic><topic>Protein Domains</topic><topic>Proteins</topic><topic>Science</topic><topic>Science (multidisciplinary)</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Harrison, Paul M.</creatorcontrib><collection>Springer Open Access</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Health & Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Biology Database (Alumni Edition)</collection><collection>Medical Database (Alumni Edition)</collection><collection>Science Database (Alumni Edition)</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest One Sustainability</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Natural Science Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>Biological Sciences</collection><collection>Health & Medical Collection (Alumni Edition)</collection><collection>Medical Database</collection><collection>Science Database</collection><collection>Biological Science Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central Basic</collection><collection>MEDLINE - Academic</collection><collection>Directory of Open Access Journals</collection><jtitle>Scientific reports</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Harrison, Paul M.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Optimizing strategy for the discovery of compositionally-biased or low-complexity regions in proteins</atitle><jtitle>Scientific reports</jtitle><stitle>Sci Rep</stitle><addtitle>Sci Rep</addtitle><date>2024-01-05</date><risdate>2024</risdate><volume>14</volume><issue>1</issue><spage>680</spage><epage>680</epage><pages>680-680</pages><artnum>680</artnum><issn>2045-2322</issn><eissn>2045-2322</eissn><abstract>Proteins can contain tracts dominated by a subset of amino acids and that have a functional significance. These are often termed ‘low-complexity regions’ (LCRs) or ‘compositionally-biased regions’ (CBRs). However, a wide spectrum of compositional bias is possible, and program parameters used to annotate these regions are often arbitrarily chosen. Also, investigators are sometimes interested in longer regions, or sometimes very short ones. Here, two programs for annotating LCRs/CBRs, namely SEG and fLPS, are investigated in detail across the whole expanse of their parameter spaces. In doing so, boundary behaviours are resolved that are used to derive an optimized systematic strategy for annotating LCRs/CBRs. Sets of parameters that progressively annotate or ‘cover’ more of protein sequence space and are optimized for a given target length have been derived. This progressive annotation can be applied to discern the biological relevance of CBRs, e.g., in parsing domains for experimental constructs and in generating hypotheses. It is also useful for picking out candidate regions of interest of a given target length and bias signature, and for assessing the parameter dependence of annotations. This latter application is demonstrated for a set of human intrinsically-disordered proteins associated with cancer.</abstract><cop>London</cop><pub>Nature Publishing Group UK</pub><pmid>38182699</pmid><doi>10.1038/s41598-023-50991-8</doi><tpages>1</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 2045-2322 |
ispartof | Scientific reports, 2024-01, Vol.14 (1), p.680-680, Article 680 |
issn | 2045-2322 2045-2322 |
language | eng |
recordid | cdi_doaj_primary_oai_doaj_org_article_e62d74214530455089d1ce36a843770a |
source | Publicly Available Content Database; PubMed Central; Free Full-Text Journals in Chemistry; Springer Nature - nature.com Journals - Fully Open Access |
subjects | 631/114 631/114/2184 631/114/663 Amino Acid Sequence Amino Acids Antifibrinolytic Agents Bias Humanities and Social Sciences Humans Intrinsically Disordered Proteins multidisciplinary Protein Domains Proteins Science Science (multidisciplinary) |
title | Optimizing strategy for the discovery of compositionally-biased or low-complexity regions in proteins |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-10T20%3A05%3A03IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_doaj_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Optimizing%20strategy%20for%20the%20discovery%20of%20compositionally-biased%20or%20low-complexity%20regions%20in%20proteins&rft.jtitle=Scientific%20reports&rft.au=Harrison,%20Paul%20M.&rft.date=2024-01-05&rft.volume=14&rft.issue=1&rft.spage=680&rft.epage=680&rft.pages=680-680&rft.artnum=680&rft.issn=2045-2322&rft.eissn=2045-2322&rft_id=info:doi/10.1038/s41598-023-50991-8&rft_dat=%3Cproquest_doaj_%3E2910734886%3C/proquest_doaj_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c436t-7ad4557175837d0124abe98a13c9e108ef6908608ffee03d520e649af034c9493%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2910734886&rft_id=info:pmid/38182699&rfr_iscdi=true |