Loading…

Analysis of E. coli promoter recognition problem in dinucleotide feature space

Patterns in the promoter sequences within a species are known to be conserved but there exist many exceptions to this rule which makes the promoter recognition a complex problem. Although many complex feature extraction schemes coupled with several classifiers have been proposed for promoter recogni...

Full description

Saved in:
Bibliographic Details
Published in:Bioinformatics (Oxford, England) England), 2007-03, Vol.23 (5), p.582-588
Main Authors: Rani, T Sobha, Bhavani, S Durga, Bapi, Raju S
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Patterns in the promoter sequences within a species are known to be conserved but there exist many exceptions to this rule which makes the promoter recognition a complex problem. Although many complex feature extraction schemes coupled with several classifiers have been proposed for promoter recognition in the current literature, the problem is still open. A dinucleotide global feature extraction method is proposed for the recognition of sigma-70 promoters in Escherichia coli in this article. The positive data set consists of sigma-70 promoters with known transcription starting points which are part of regulonDB and promec databases. Four different kinds of negative data sets are considered, two of them biological sets (Gordon et al., 2003) and the other two synthetic data sets. Our results reveal that a single-layer perceptron using dinucleotide features is able to achieve an accuracy of 80% against a background of biological non-promoters and 96% for random data sets. A scheme for locating the promoter regions in a given genome sequence is proposed. A deeper analysis of the data set shows that there is a bifurcation of the data set into two distinct classes, a majority class and a minority class. Our results point out that majority class constituting the majority promoter and the majority non-promoter signal is linearly separable. Also the minority class is linearly separable. We further show that the feature extraction and classification methods proposed in the paper are generic enough to be applied to the more complex problem of eucaryotic promoter recognition. We present Drosophila promoter recognition as a case study. http://202.41.85.117/htmfiles/faculty/tsr/tsr.html.
ISSN:1367-4811
DOI:10.1093/bioinformatics/btl670