Loading…

Optimizing multiple sequence alignments using a genetic algorithm based on three objectives: structural information, non-gaps percentage and totally conserved columns

Multiple sequence alignments (MSAs) are widely used approaches in bioinformatics to carry out other tasks such as structure predictions, biological function analyses or phylogenetic modeling. However, current tools usually provide partially optimal alignments, as each one is focused on specific biol...

Full description

Saved in:
Bibliographic Details
Published in:Bioinformatics 2013-09, Vol.29 (17), p.2112-2121
Main Authors: Ortuño, Francisco M, Valenzuela, Olga, Rojas, Fernando, Pomares, Hector, Florido, Javier P, Urquiza, Jose M, Rojas, Ignacio
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c389t-3434998b9ac1816276d3ac2a5ea0fde77f0a09c7d65345c0d6304ac4e99740813
cites cdi_FETCH-LOGICAL-c389t-3434998b9ac1816276d3ac2a5ea0fde77f0a09c7d65345c0d6304ac4e99740813
container_end_page 2121
container_issue 17
container_start_page 2112
container_title Bioinformatics
container_volume 29
creator Ortuño, Francisco M
Valenzuela, Olga
Rojas, Fernando
Pomares, Hector
Florido, Javier P
Urquiza, Jose M
Rojas, Ignacio
description Multiple sequence alignments (MSAs) are widely used approaches in bioinformatics to carry out other tasks such as structure predictions, biological function analyses or phylogenetic modeling. However, current tools usually provide partially optimal alignments, as each one is focused on specific biological features. Thus, the same set of sequences can produce different alignments, above all when sequences are less similar. Consequently, researchers and biologists do not agree about which is the most suitable way to evaluate MSAs. Recent evaluations tend to use more complex scores including further biological features. Among them, 3D structures are increasingly being used to evaluate alignments. Because structures are more conserved in proteins than sequences, scores with structural information are better suited to evaluate more distant relationships between sequences. The proposed multiobjective algorithm, based on the non-dominated sorting genetic algorithm, aims to jointly optimize three objectives: STRIKE score, non-gaps percentage and totally conserved columns. It was significantly assessed on the BAliBASE benchmark according to the Kruskal-Wallis test (P < 0.01). This algorithm also outperforms other aligners, such as ClustalW, Multiple Sequence Alignment Genetic Algorithm (MSA-GA), PRRP, DIALIGN, Hidden Markov Model Training (HMMT), Pattern-Induced Multi-sequence Alignment (PIMA), MULTIALIGN, Sequence Alignment Genetic Algorithm (SAGA), PILEUP, Rubber Band Technique Genetic Algorithm (RBT-GA) and Vertical Decomposition Genetic Algorithm (VDGA), according to the Wilcoxon signed-rank test (P < 0.05), whereas it shows results not significantly different to 3D-COFFEE (P > 0.05) with the advantage of being able to use less structures. Structural information is included within the objective function to evaluate more accurately the obtained alignments. The source code is available at http://www.ugr.es/~fortuno/MOSAStrE/MO-SAStrE.zip.
doi_str_mv 10.1093/bioinformatics/btt360
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1434021562</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1420604888</sourcerecordid><originalsourceid>FETCH-LOGICAL-c389t-3434998b9ac1816276d3ac2a5ea0fde77f0a09c7d65345c0d6304ac4e99740813</originalsourceid><addsrcrecordid>eNqNkctu1jAQhS0Eohd4BJCXLAgdx44Ts0MVFKRK3cA6cpxJ6sqXYDuVygPxnLj6yy-xYzUjzZnzjeYQ8obBBwaKX0w22rDE5HWxJl9MpXAJz8gp47JvxMDY82MP_ISc5XwHAB108iU5aXmveN-JU_L7ZivW2182rNTvrtjNIc34c8dgkGpn1-AxlEz3_CjRdMWAlVhHa0y23Ho66YwzjYGW24RI43SHpth7zB9pLmk3ZU_a0eOxMbynIYZm1VumGyZT7fVaWWGmJRbt3AM1MWRM99XWRLf7kF-RF4t2GV8_1XPy48vn75dfm-ubq2-Xn64bwwdVGi64UGqYlDZsYLLt5cy1aXWHGpYZ-34BDcr0s-y46AzMkoPQRqBSvYCB8XPy7uC7pVh_kMvobTbonA4Y9zyyCoCWdbL9D2kLEsQwDFXaHaQmxZwTLuOWrNfpYWQwPqY5_pvmeEiz7r19QuyTx_m49Tc-_ge7KKWR</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1420604888</pqid></control><display><type>article</type><title>Optimizing multiple sequence alignments using a genetic algorithm based on three objectives: structural information, non-gaps percentage and totally conserved columns</title><source>Open Access: Oxford University Press Open Journals</source><source>PubMed Central</source><creator>Ortuño, Francisco M ; Valenzuela, Olga ; Rojas, Fernando ; Pomares, Hector ; Florido, Javier P ; Urquiza, Jose M ; Rojas, Ignacio</creator><creatorcontrib>Ortuño, Francisco M ; Valenzuela, Olga ; Rojas, Fernando ; Pomares, Hector ; Florido, Javier P ; Urquiza, Jose M ; Rojas, Ignacio</creatorcontrib><description>Multiple sequence alignments (MSAs) are widely used approaches in bioinformatics to carry out other tasks such as structure predictions, biological function analyses or phylogenetic modeling. However, current tools usually provide partially optimal alignments, as each one is focused on specific biological features. Thus, the same set of sequences can produce different alignments, above all when sequences are less similar. Consequently, researchers and biologists do not agree about which is the most suitable way to evaluate MSAs. Recent evaluations tend to use more complex scores including further biological features. Among them, 3D structures are increasingly being used to evaluate alignments. Because structures are more conserved in proteins than sequences, scores with structural information are better suited to evaluate more distant relationships between sequences. The proposed multiobjective algorithm, based on the non-dominated sorting genetic algorithm, aims to jointly optimize three objectives: STRIKE score, non-gaps percentage and totally conserved columns. It was significantly assessed on the BAliBASE benchmark according to the Kruskal-Wallis test (P &lt; 0.01). This algorithm also outperforms other aligners, such as ClustalW, Multiple Sequence Alignment Genetic Algorithm (MSA-GA), PRRP, DIALIGN, Hidden Markov Model Training (HMMT), Pattern-Induced Multi-sequence Alignment (PIMA), MULTIALIGN, Sequence Alignment Genetic Algorithm (SAGA), PILEUP, Rubber Band Technique Genetic Algorithm (RBT-GA) and Vertical Decomposition Genetic Algorithm (VDGA), according to the Wilcoxon signed-rank test (P &lt; 0.05), whereas it shows results not significantly different to 3D-COFFEE (P &gt; 0.05) with the advantage of being able to use less structures. Structural information is included within the objective function to evaluate more accurately the obtained alignments. The source code is available at http://www.ugr.es/~fortuno/MOSAStrE/MO-SAStrE.zip.</description><identifier>ISSN: 1367-4803</identifier><identifier>EISSN: 1367-4811</identifier><identifier>EISSN: 1460-2059</identifier><identifier>DOI: 10.1093/bioinformatics/btt360</identifier><identifier>PMID: 23793754</identifier><language>eng</language><publisher>England</publisher><subject>Algorithms ; Databases, Protein ; Phylogeny ; Protein Conformation ; Proteins - classification ; Sequence Alignment - methods ; Sequence Analysis, Protein</subject><ispartof>Bioinformatics, 2013-09, Vol.29 (17), p.2112-2121</ispartof><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c389t-3434998b9ac1816276d3ac2a5ea0fde77f0a09c7d65345c0d6304ac4e99740813</citedby><cites>FETCH-LOGICAL-c389t-3434998b9ac1816276d3ac2a5ea0fde77f0a09c7d65345c0d6304ac4e99740813</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27923,27924</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/23793754$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Ortuño, Francisco M</creatorcontrib><creatorcontrib>Valenzuela, Olga</creatorcontrib><creatorcontrib>Rojas, Fernando</creatorcontrib><creatorcontrib>Pomares, Hector</creatorcontrib><creatorcontrib>Florido, Javier P</creatorcontrib><creatorcontrib>Urquiza, Jose M</creatorcontrib><creatorcontrib>Rojas, Ignacio</creatorcontrib><title>Optimizing multiple sequence alignments using a genetic algorithm based on three objectives: structural information, non-gaps percentage and totally conserved columns</title><title>Bioinformatics</title><addtitle>Bioinformatics</addtitle><description>Multiple sequence alignments (MSAs) are widely used approaches in bioinformatics to carry out other tasks such as structure predictions, biological function analyses or phylogenetic modeling. However, current tools usually provide partially optimal alignments, as each one is focused on specific biological features. Thus, the same set of sequences can produce different alignments, above all when sequences are less similar. Consequently, researchers and biologists do not agree about which is the most suitable way to evaluate MSAs. Recent evaluations tend to use more complex scores including further biological features. Among them, 3D structures are increasingly being used to evaluate alignments. Because structures are more conserved in proteins than sequences, scores with structural information are better suited to evaluate more distant relationships between sequences. The proposed multiobjective algorithm, based on the non-dominated sorting genetic algorithm, aims to jointly optimize three objectives: STRIKE score, non-gaps percentage and totally conserved columns. It was significantly assessed on the BAliBASE benchmark according to the Kruskal-Wallis test (P &lt; 0.01). This algorithm also outperforms other aligners, such as ClustalW, Multiple Sequence Alignment Genetic Algorithm (MSA-GA), PRRP, DIALIGN, Hidden Markov Model Training (HMMT), Pattern-Induced Multi-sequence Alignment (PIMA), MULTIALIGN, Sequence Alignment Genetic Algorithm (SAGA), PILEUP, Rubber Band Technique Genetic Algorithm (RBT-GA) and Vertical Decomposition Genetic Algorithm (VDGA), according to the Wilcoxon signed-rank test (P &lt; 0.05), whereas it shows results not significantly different to 3D-COFFEE (P &gt; 0.05) with the advantage of being able to use less structures. Structural information is included within the objective function to evaluate more accurately the obtained alignments. The source code is available at http://www.ugr.es/~fortuno/MOSAStrE/MO-SAStrE.zip.</description><subject>Algorithms</subject><subject>Databases, Protein</subject><subject>Phylogeny</subject><subject>Protein Conformation</subject><subject>Proteins - classification</subject><subject>Sequence Alignment - methods</subject><subject>Sequence Analysis, Protein</subject><issn>1367-4803</issn><issn>1367-4811</issn><issn>1460-2059</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2013</creationdate><recordtype>article</recordtype><recordid>eNqNkctu1jAQhS0Eohd4BJCXLAgdx44Ts0MVFKRK3cA6cpxJ6sqXYDuVygPxnLj6yy-xYzUjzZnzjeYQ8obBBwaKX0w22rDE5HWxJl9MpXAJz8gp47JvxMDY82MP_ISc5XwHAB108iU5aXmveN-JU_L7ZivW2182rNTvrtjNIc34c8dgkGpn1-AxlEz3_CjRdMWAlVhHa0y23Ho66YwzjYGW24RI43SHpth7zB9pLmk3ZU_a0eOxMbynIYZm1VumGyZT7fVaWWGmJRbt3AM1MWRM99XWRLf7kF-RF4t2GV8_1XPy48vn75dfm-ubq2-Xn64bwwdVGi64UGqYlDZsYLLt5cy1aXWHGpYZ-34BDcr0s-y46AzMkoPQRqBSvYCB8XPy7uC7pVh_kMvobTbonA4Y9zyyCoCWdbL9D2kLEsQwDFXaHaQmxZwTLuOWrNfpYWQwPqY5_pvmeEiz7r19QuyTx_m49Tc-_ge7KKWR</recordid><startdate>20130901</startdate><enddate>20130901</enddate><creator>Ortuño, Francisco M</creator><creator>Valenzuela, Olga</creator><creator>Rojas, Fernando</creator><creator>Pomares, Hector</creator><creator>Florido, Javier P</creator><creator>Urquiza, Jose M</creator><creator>Rojas, Ignacio</creator><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>7QO</scope><scope>8FD</scope><scope>FR3</scope><scope>P64</scope></search><sort><creationdate>20130901</creationdate><title>Optimizing multiple sequence alignments using a genetic algorithm based on three objectives: structural information, non-gaps percentage and totally conserved columns</title><author>Ortuño, Francisco M ; Valenzuela, Olga ; Rojas, Fernando ; Pomares, Hector ; Florido, Javier P ; Urquiza, Jose M ; Rojas, Ignacio</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c389t-3434998b9ac1816276d3ac2a5ea0fde77f0a09c7d65345c0d6304ac4e99740813</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2013</creationdate><topic>Algorithms</topic><topic>Databases, Protein</topic><topic>Phylogeny</topic><topic>Protein Conformation</topic><topic>Proteins - classification</topic><topic>Sequence Alignment - methods</topic><topic>Sequence Analysis, Protein</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Ortuño, Francisco M</creatorcontrib><creatorcontrib>Valenzuela, Olga</creatorcontrib><creatorcontrib>Rojas, Fernando</creatorcontrib><creatorcontrib>Pomares, Hector</creatorcontrib><creatorcontrib>Florido, Javier P</creatorcontrib><creatorcontrib>Urquiza, Jose M</creatorcontrib><creatorcontrib>Rojas, Ignacio</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>Biotechnology Research Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>Biotechnology and BioEngineering Abstracts</collection><jtitle>Bioinformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Ortuño, Francisco M</au><au>Valenzuela, Olga</au><au>Rojas, Fernando</au><au>Pomares, Hector</au><au>Florido, Javier P</au><au>Urquiza, Jose M</au><au>Rojas, Ignacio</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Optimizing multiple sequence alignments using a genetic algorithm based on three objectives: structural information, non-gaps percentage and totally conserved columns</atitle><jtitle>Bioinformatics</jtitle><addtitle>Bioinformatics</addtitle><date>2013-09-01</date><risdate>2013</risdate><volume>29</volume><issue>17</issue><spage>2112</spage><epage>2121</epage><pages>2112-2121</pages><issn>1367-4803</issn><eissn>1367-4811</eissn><eissn>1460-2059</eissn><abstract>Multiple sequence alignments (MSAs) are widely used approaches in bioinformatics to carry out other tasks such as structure predictions, biological function analyses or phylogenetic modeling. However, current tools usually provide partially optimal alignments, as each one is focused on specific biological features. Thus, the same set of sequences can produce different alignments, above all when sequences are less similar. Consequently, researchers and biologists do not agree about which is the most suitable way to evaluate MSAs. Recent evaluations tend to use more complex scores including further biological features. Among them, 3D structures are increasingly being used to evaluate alignments. Because structures are more conserved in proteins than sequences, scores with structural information are better suited to evaluate more distant relationships between sequences. The proposed multiobjective algorithm, based on the non-dominated sorting genetic algorithm, aims to jointly optimize three objectives: STRIKE score, non-gaps percentage and totally conserved columns. It was significantly assessed on the BAliBASE benchmark according to the Kruskal-Wallis test (P &lt; 0.01). This algorithm also outperforms other aligners, such as ClustalW, Multiple Sequence Alignment Genetic Algorithm (MSA-GA), PRRP, DIALIGN, Hidden Markov Model Training (HMMT), Pattern-Induced Multi-sequence Alignment (PIMA), MULTIALIGN, Sequence Alignment Genetic Algorithm (SAGA), PILEUP, Rubber Band Technique Genetic Algorithm (RBT-GA) and Vertical Decomposition Genetic Algorithm (VDGA), according to the Wilcoxon signed-rank test (P &lt; 0.05), whereas it shows results not significantly different to 3D-COFFEE (P &gt; 0.05) with the advantage of being able to use less structures. Structural information is included within the objective function to evaluate more accurately the obtained alignments. The source code is available at http://www.ugr.es/~fortuno/MOSAStrE/MO-SAStrE.zip.</abstract><cop>England</cop><pmid>23793754</pmid><doi>10.1093/bioinformatics/btt360</doi><tpages>10</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1367-4803
ispartof Bioinformatics, 2013-09, Vol.29 (17), p.2112-2121
issn 1367-4803
1367-4811
1460-2059
language eng
recordid cdi_proquest_miscellaneous_1434021562
source Open Access: Oxford University Press Open Journals; PubMed Central
subjects Algorithms
Databases, Protein
Phylogeny
Protein Conformation
Proteins - classification
Sequence Alignment - methods
Sequence Analysis, Protein
title Optimizing multiple sequence alignments using a genetic algorithm based on three objectives: structural information, non-gaps percentage and totally conserved columns
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-11T23%3A47%3A57IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Optimizing%20multiple%20sequence%20alignments%20using%20a%20genetic%20algorithm%20based%20on%20three%20objectives:%20structural%20information,%20non-gaps%20percentage%20and%20totally%20conserved%20columns&rft.jtitle=Bioinformatics&rft.au=Ortu%C3%B1o,%20Francisco%20M&rft.date=2013-09-01&rft.volume=29&rft.issue=17&rft.spage=2112&rft.epage=2121&rft.pages=2112-2121&rft.issn=1367-4803&rft.eissn=1367-4811&rft_id=info:doi/10.1093/bioinformatics/btt360&rft_dat=%3Cproquest_cross%3E1420604888%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c389t-3434998b9ac1816276d3ac2a5ea0fde77f0a09c7d65345c0d6304ac4e99740813%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=1420604888&rft_id=info:pmid/23793754&rfr_iscdi=true