Loading…

Sequence landscapes

We describe a method for representing the structure of repeating sequences in nucleic-acids, proteins and other texts. A portion of the sequence is presented at the bottom of a CRT screen. Above the sequence is its landscape, which looks like a mountain range. Each mountain corresponds to a subseque...

Full description

Saved in:
Bibliographic Details
Published in:Nucleic acids research 1986-01, Vol.14 (1), p.141-158
Main Authors: Clift, B, Haussler, D, McConnell, R, Schneider, T.D, Stormo, G.D
Format: Article
Language:English
Subjects:
Citations: Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:We describe a method for representing the structure of repeating sequences in nucleic-acids, proteins and other texts. A portion of the sequence is presented at the bottom of a CRT screen. Above the sequence is its landscape, which looks like a mountain range. Each mountain corresponds to a subsequence of the sequence. At the peak of every mountain is written the number of times that the subsequence appears. A data structure called a DAWG, which can be built in time proportional to the length of the sequence, is used to construct the landscape. For the 40 thousand bases of bacteriophage T7, the DAWG can be built in 30 seconds. The time to display any portion of the landscape is less than a second. Using sequence landscapes, one can quickly locate significant repeats.
ISSN:0305-1048
1362-4962
DOI:10.1093/nar/14.1.141