Loading…

q‐frame hash comparison based exact string matching algorithms for DNA sequences

The importance of string matching is due to its applications in many fields, such as medicine and bioinformatics. Various string matching algorithms are developed to speed up the search. Especially, hash‐based exact string matching algorithms are among the most time‐efficient ones. The efficiency of...

Full description

Saved in:
Bibliographic Details
Published in:Concurrency and computation 2022-04, Vol.34 (9), p.n/a
Main Authors: Karcioglu, Abdullah Ammar, Bulut, Hasan
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The importance of string matching is due to its applications in many fields, such as medicine and bioinformatics. Various string matching algorithms are developed to speed up the search. Especially, hash‐based exact string matching algorithms are among the most time‐efficient ones. The efficiency of hash‐based approaches depends on the hash function. Hence, perfect hashing plays an essential role in hash‐based string matching. In this study, two q‐frame hash comparison‐based exact string matching algorithms, Hq‐QF and HqBM‐QF, are proposed. We have used a collision‐free perfect hash function for DNA sequences in the proposed algorithms. In the first approach, after hash values match for the last qcharacters, the character comparisons in the Hash‐q algorithm are replaced with q‐frame hash comparison. In the second approach, we improved the first approach by utilizing the shift size indicated at the (m−1)th entry in the good suffix shift table. Since the number of character comparisons is minimized, the worst‐case time complexity of the proposed algorithms is Onm−mqq. In both approaches, q‐frame hash comparisons replace most character comparisons as a trade‐off. The results show that the proposed approaches are more efficient than the Hash‐q algorithm in terms of runtime efficiency and the number of character comparisons.
ISSN:1532-0626
1532-0634
DOI:10.1002/cpe.6505