Loading…
Optimizing multiple sequence alignments using a genetic algorithm based on three objectives: structural information, non-gaps percentage and totally conserved columns
Multiple sequence alignments (MSAs) are widely used approaches in bioinformatics to carry out other tasks such as structure predictions, biological function analyses or phylogenetic modeling. However, current tools usually provide partially optimal alignments, as each one is focused on specific biol...
Saved in:
Published in: | Bioinformatics 2013-09, Vol.29 (17), p.2112-2121 |
---|---|
Main Authors: | , , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c389t-3434998b9ac1816276d3ac2a5ea0fde77f0a09c7d65345c0d6304ac4e99740813 |
---|---|
cites | cdi_FETCH-LOGICAL-c389t-3434998b9ac1816276d3ac2a5ea0fde77f0a09c7d65345c0d6304ac4e99740813 |
container_end_page | 2121 |
container_issue | 17 |
container_start_page | 2112 |
container_title | Bioinformatics |
container_volume | 29 |
creator | Ortuño, Francisco M Valenzuela, Olga Rojas, Fernando Pomares, Hector Florido, Javier P Urquiza, Jose M Rojas, Ignacio |
description | Multiple sequence alignments (MSAs) are widely used approaches in bioinformatics to carry out other tasks such as structure predictions, biological function analyses or phylogenetic modeling. However, current tools usually provide partially optimal alignments, as each one is focused on specific biological features. Thus, the same set of sequences can produce different alignments, above all when sequences are less similar. Consequently, researchers and biologists do not agree about which is the most suitable way to evaluate MSAs. Recent evaluations tend to use more complex scores including further biological features. Among them, 3D structures are increasingly being used to evaluate alignments. Because structures are more conserved in proteins than sequences, scores with structural information are better suited to evaluate more distant relationships between sequences.
The proposed multiobjective algorithm, based on the non-dominated sorting genetic algorithm, aims to jointly optimize three objectives: STRIKE score, non-gaps percentage and totally conserved columns. It was significantly assessed on the BAliBASE benchmark according to the Kruskal-Wallis test (P < 0.01). This algorithm also outperforms other aligners, such as ClustalW, Multiple Sequence Alignment Genetic Algorithm (MSA-GA), PRRP, DIALIGN, Hidden Markov Model Training (HMMT), Pattern-Induced Multi-sequence Alignment (PIMA), MULTIALIGN, Sequence Alignment Genetic Algorithm (SAGA), PILEUP, Rubber Band Technique Genetic Algorithm (RBT-GA) and Vertical Decomposition Genetic Algorithm (VDGA), according to the Wilcoxon signed-rank test (P < 0.05), whereas it shows results not significantly different to 3D-COFFEE (P > 0.05) with the advantage of being able to use less structures. Structural information is included within the objective function to evaluate more accurately the obtained alignments.
The source code is available at http://www.ugr.es/~fortuno/MOSAStrE/MO-SAStrE.zip. |
doi_str_mv | 10.1093/bioinformatics/btt360 |
format | article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1434021562</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1420604888</sourcerecordid><originalsourceid>FETCH-LOGICAL-c389t-3434998b9ac1816276d3ac2a5ea0fde77f0a09c7d65345c0d6304ac4e99740813</originalsourceid><addsrcrecordid>eNqNkctu1jAQhS0Eohd4BJCXLAgdx44Ts0MVFKRK3cA6cpxJ6sqXYDuVygPxnLj6yy-xYzUjzZnzjeYQ8obBBwaKX0w22rDE5HWxJl9MpXAJz8gp47JvxMDY82MP_ISc5XwHAB108iU5aXmveN-JU_L7ZivW2182rNTvrtjNIc34c8dgkGpn1-AxlEz3_CjRdMWAlVhHa0y23Ho66YwzjYGW24RI43SHpth7zB9pLmk3ZU_a0eOxMbynIYZm1VumGyZT7fVaWWGmJRbt3AM1MWRM99XWRLf7kF-RF4t2GV8_1XPy48vn75dfm-ubq2-Xn64bwwdVGi64UGqYlDZsYLLt5cy1aXWHGpYZ-34BDcr0s-y46AzMkoPQRqBSvYCB8XPy7uC7pVh_kMvobTbonA4Y9zyyCoCWdbL9D2kLEsQwDFXaHaQmxZwTLuOWrNfpYWQwPqY5_pvmeEiz7r19QuyTx_m49Tc-_ge7KKWR</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1420604888</pqid></control><display><type>article</type><title>Optimizing multiple sequence alignments using a genetic algorithm based on three objectives: structural information, non-gaps percentage and totally conserved columns</title><source>Open Access: Oxford University Press Open Journals</source><source>PubMed Central</source><creator>Ortuño, Francisco M ; Valenzuela, Olga ; Rojas, Fernando ; Pomares, Hector ; Florido, Javier P ; Urquiza, Jose M ; Rojas, Ignacio</creator><creatorcontrib>Ortuño, Francisco M ; Valenzuela, Olga ; Rojas, Fernando ; Pomares, Hector ; Florido, Javier P ; Urquiza, Jose M ; Rojas, Ignacio</creatorcontrib><description>Multiple sequence alignments (MSAs) are widely used approaches in bioinformatics to carry out other tasks such as structure predictions, biological function analyses or phylogenetic modeling. However, current tools usually provide partially optimal alignments, as each one is focused on specific biological features. Thus, the same set of sequences can produce different alignments, above all when sequences are less similar. Consequently, researchers and biologists do not agree about which is the most suitable way to evaluate MSAs. Recent evaluations tend to use more complex scores including further biological features. Among them, 3D structures are increasingly being used to evaluate alignments. Because structures are more conserved in proteins than sequences, scores with structural information are better suited to evaluate more distant relationships between sequences.
The proposed multiobjective algorithm, based on the non-dominated sorting genetic algorithm, aims to jointly optimize three objectives: STRIKE score, non-gaps percentage and totally conserved columns. It was significantly assessed on the BAliBASE benchmark according to the Kruskal-Wallis test (P < 0.01). This algorithm also outperforms other aligners, such as ClustalW, Multiple Sequence Alignment Genetic Algorithm (MSA-GA), PRRP, DIALIGN, Hidden Markov Model Training (HMMT), Pattern-Induced Multi-sequence Alignment (PIMA), MULTIALIGN, Sequence Alignment Genetic Algorithm (SAGA), PILEUP, Rubber Band Technique Genetic Algorithm (RBT-GA) and Vertical Decomposition Genetic Algorithm (VDGA), according to the Wilcoxon signed-rank test (P < 0.05), whereas it shows results not significantly different to 3D-COFFEE (P > 0.05) with the advantage of being able to use less structures. Structural information is included within the objective function to evaluate more accurately the obtained alignments.
The source code is available at http://www.ugr.es/~fortuno/MOSAStrE/MO-SAStrE.zip.</description><identifier>ISSN: 1367-4803</identifier><identifier>EISSN: 1367-4811</identifier><identifier>EISSN: 1460-2059</identifier><identifier>DOI: 10.1093/bioinformatics/btt360</identifier><identifier>PMID: 23793754</identifier><language>eng</language><publisher>England</publisher><subject>Algorithms ; Databases, Protein ; Phylogeny ; Protein Conformation ; Proteins - classification ; Sequence Alignment - methods ; Sequence Analysis, Protein</subject><ispartof>Bioinformatics, 2013-09, Vol.29 (17), p.2112-2121</ispartof><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c389t-3434998b9ac1816276d3ac2a5ea0fde77f0a09c7d65345c0d6304ac4e99740813</citedby><cites>FETCH-LOGICAL-c389t-3434998b9ac1816276d3ac2a5ea0fde77f0a09c7d65345c0d6304ac4e99740813</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27923,27924</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/23793754$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Ortuño, Francisco M</creatorcontrib><creatorcontrib>Valenzuela, Olga</creatorcontrib><creatorcontrib>Rojas, Fernando</creatorcontrib><creatorcontrib>Pomares, Hector</creatorcontrib><creatorcontrib>Florido, Javier P</creatorcontrib><creatorcontrib>Urquiza, Jose M</creatorcontrib><creatorcontrib>Rojas, Ignacio</creatorcontrib><title>Optimizing multiple sequence alignments using a genetic algorithm based on three objectives: structural information, non-gaps percentage and totally conserved columns</title><title>Bioinformatics</title><addtitle>Bioinformatics</addtitle><description>Multiple sequence alignments (MSAs) are widely used approaches in bioinformatics to carry out other tasks such as structure predictions, biological function analyses or phylogenetic modeling. However, current tools usually provide partially optimal alignments, as each one is focused on specific biological features. Thus, the same set of sequences can produce different alignments, above all when sequences are less similar. Consequently, researchers and biologists do not agree about which is the most suitable way to evaluate MSAs. Recent evaluations tend to use more complex scores including further biological features. Among them, 3D structures are increasingly being used to evaluate alignments. Because structures are more conserved in proteins than sequences, scores with structural information are better suited to evaluate more distant relationships between sequences.
The proposed multiobjective algorithm, based on the non-dominated sorting genetic algorithm, aims to jointly optimize three objectives: STRIKE score, non-gaps percentage and totally conserved columns. It was significantly assessed on the BAliBASE benchmark according to the Kruskal-Wallis test (P < 0.01). This algorithm also outperforms other aligners, such as ClustalW, Multiple Sequence Alignment Genetic Algorithm (MSA-GA), PRRP, DIALIGN, Hidden Markov Model Training (HMMT), Pattern-Induced Multi-sequence Alignment (PIMA), MULTIALIGN, Sequence Alignment Genetic Algorithm (SAGA), PILEUP, Rubber Band Technique Genetic Algorithm (RBT-GA) and Vertical Decomposition Genetic Algorithm (VDGA), according to the Wilcoxon signed-rank test (P < 0.05), whereas it shows results not significantly different to 3D-COFFEE (P > 0.05) with the advantage of being able to use less structures. Structural information is included within the objective function to evaluate more accurately the obtained alignments.
The source code is available at http://www.ugr.es/~fortuno/MOSAStrE/MO-SAStrE.zip.</description><subject>Algorithms</subject><subject>Databases, Protein</subject><subject>Phylogeny</subject><subject>Protein Conformation</subject><subject>Proteins - classification</subject><subject>Sequence Alignment - methods</subject><subject>Sequence Analysis, Protein</subject><issn>1367-4803</issn><issn>1367-4811</issn><issn>1460-2059</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2013</creationdate><recordtype>article</recordtype><recordid>eNqNkctu1jAQhS0Eohd4BJCXLAgdx44Ts0MVFKRK3cA6cpxJ6sqXYDuVygPxnLj6yy-xYzUjzZnzjeYQ8obBBwaKX0w22rDE5HWxJl9MpXAJz8gp47JvxMDY82MP_ISc5XwHAB108iU5aXmveN-JU_L7ZivW2182rNTvrtjNIc34c8dgkGpn1-AxlEz3_CjRdMWAlVhHa0y23Ho66YwzjYGW24RI43SHpth7zB9pLmk3ZU_a0eOxMbynIYZm1VumGyZT7fVaWWGmJRbt3AM1MWRM99XWRLf7kF-RF4t2GV8_1XPy48vn75dfm-ubq2-Xn64bwwdVGi64UGqYlDZsYLLt5cy1aXWHGpYZ-34BDcr0s-y46AzMkoPQRqBSvYCB8XPy7uC7pVh_kMvobTbonA4Y9zyyCoCWdbL9D2kLEsQwDFXaHaQmxZwTLuOWrNfpYWQwPqY5_pvmeEiz7r19QuyTx_m49Tc-_ge7KKWR</recordid><startdate>20130901</startdate><enddate>20130901</enddate><creator>Ortuño, Francisco M</creator><creator>Valenzuela, Olga</creator><creator>Rojas, Fernando</creator><creator>Pomares, Hector</creator><creator>Florido, Javier P</creator><creator>Urquiza, Jose M</creator><creator>Rojas, Ignacio</creator><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>7QO</scope><scope>8FD</scope><scope>FR3</scope><scope>P64</scope></search><sort><creationdate>20130901</creationdate><title>Optimizing multiple sequence alignments using a genetic algorithm based on three objectives: structural information, non-gaps percentage and totally conserved columns</title><author>Ortuño, Francisco M ; Valenzuela, Olga ; Rojas, Fernando ; Pomares, Hector ; Florido, Javier P ; Urquiza, Jose M ; Rojas, Ignacio</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c389t-3434998b9ac1816276d3ac2a5ea0fde77f0a09c7d65345c0d6304ac4e99740813</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2013</creationdate><topic>Algorithms</topic><topic>Databases, Protein</topic><topic>Phylogeny</topic><topic>Protein Conformation</topic><topic>Proteins - classification</topic><topic>Sequence Alignment - methods</topic><topic>Sequence Analysis, Protein</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Ortuño, Francisco M</creatorcontrib><creatorcontrib>Valenzuela, Olga</creatorcontrib><creatorcontrib>Rojas, Fernando</creatorcontrib><creatorcontrib>Pomares, Hector</creatorcontrib><creatorcontrib>Florido, Javier P</creatorcontrib><creatorcontrib>Urquiza, Jose M</creatorcontrib><creatorcontrib>Rojas, Ignacio</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>Biotechnology Research Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>Biotechnology and BioEngineering Abstracts</collection><jtitle>Bioinformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Ortuño, Francisco M</au><au>Valenzuela, Olga</au><au>Rojas, Fernando</au><au>Pomares, Hector</au><au>Florido, Javier P</au><au>Urquiza, Jose M</au><au>Rojas, Ignacio</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Optimizing multiple sequence alignments using a genetic algorithm based on three objectives: structural information, non-gaps percentage and totally conserved columns</atitle><jtitle>Bioinformatics</jtitle><addtitle>Bioinformatics</addtitle><date>2013-09-01</date><risdate>2013</risdate><volume>29</volume><issue>17</issue><spage>2112</spage><epage>2121</epage><pages>2112-2121</pages><issn>1367-4803</issn><eissn>1367-4811</eissn><eissn>1460-2059</eissn><abstract>Multiple sequence alignments (MSAs) are widely used approaches in bioinformatics to carry out other tasks such as structure predictions, biological function analyses or phylogenetic modeling. However, current tools usually provide partially optimal alignments, as each one is focused on specific biological features. Thus, the same set of sequences can produce different alignments, above all when sequences are less similar. Consequently, researchers and biologists do not agree about which is the most suitable way to evaluate MSAs. Recent evaluations tend to use more complex scores including further biological features. Among them, 3D structures are increasingly being used to evaluate alignments. Because structures are more conserved in proteins than sequences, scores with structural information are better suited to evaluate more distant relationships between sequences.
The proposed multiobjective algorithm, based on the non-dominated sorting genetic algorithm, aims to jointly optimize three objectives: STRIKE score, non-gaps percentage and totally conserved columns. It was significantly assessed on the BAliBASE benchmark according to the Kruskal-Wallis test (P < 0.01). This algorithm also outperforms other aligners, such as ClustalW, Multiple Sequence Alignment Genetic Algorithm (MSA-GA), PRRP, DIALIGN, Hidden Markov Model Training (HMMT), Pattern-Induced Multi-sequence Alignment (PIMA), MULTIALIGN, Sequence Alignment Genetic Algorithm (SAGA), PILEUP, Rubber Band Technique Genetic Algorithm (RBT-GA) and Vertical Decomposition Genetic Algorithm (VDGA), according to the Wilcoxon signed-rank test (P < 0.05), whereas it shows results not significantly different to 3D-COFFEE (P > 0.05) with the advantage of being able to use less structures. Structural information is included within the objective function to evaluate more accurately the obtained alignments.
The source code is available at http://www.ugr.es/~fortuno/MOSAStrE/MO-SAStrE.zip.</abstract><cop>England</cop><pmid>23793754</pmid><doi>10.1093/bioinformatics/btt360</doi><tpages>10</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1367-4803 |
ispartof | Bioinformatics, 2013-09, Vol.29 (17), p.2112-2121 |
issn | 1367-4803 1367-4811 1460-2059 |
language | eng |
recordid | cdi_proquest_miscellaneous_1434021562 |
source | Open Access: Oxford University Press Open Journals; PubMed Central |
subjects | Algorithms Databases, Protein Phylogeny Protein Conformation Proteins - classification Sequence Alignment - methods Sequence Analysis, Protein |
title | Optimizing multiple sequence alignments using a genetic algorithm based on three objectives: structural information, non-gaps percentage and totally conserved columns |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-11T23%3A47%3A57IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Optimizing%20multiple%20sequence%20alignments%20using%20a%20genetic%20algorithm%20based%20on%20three%20objectives:%20structural%20information,%20non-gaps%20percentage%20and%20totally%20conserved%20columns&rft.jtitle=Bioinformatics&rft.au=Ortu%C3%B1o,%20Francisco%20M&rft.date=2013-09-01&rft.volume=29&rft.issue=17&rft.spage=2112&rft.epage=2121&rft.pages=2112-2121&rft.issn=1367-4803&rft.eissn=1367-4811&rft_id=info:doi/10.1093/bioinformatics/btt360&rft_dat=%3Cproquest_cross%3E1420604888%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c389t-3434998b9ac1816276d3ac2a5ea0fde77f0a09c7d65345c0d6304ac4e99740813%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=1420604888&rft_id=info:pmid/23793754&rfr_iscdi=true |