Loading…
Minimal Absent Words on Run-Length Encoded Strings
A string \(w\) is called a minimal absent word (MAW) for another string \(T\) if \(w\) does not occur (as a substring) in \(T\) and any proper substring of \(w\) occurs in \(T\). State-of-the-art data structures for reporting the set \(\mathsf{MAW}(T)\) of MAWs from a given string \(T\) of length \(...
Saved in:
Published in: | arXiv.org 2022-04 |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | A string \(w\) is called a minimal absent word (MAW) for another string \(T\) if \(w\) does not occur (as a substring) in \(T\) and any proper substring of \(w\) occurs in \(T\). State-of-the-art data structures for reporting the set \(\mathsf{MAW}(T)\) of MAWs from a given string \(T\) of length \(n\) require \(O(n)\) space, can be built in \(O(n)\) time, and can report all MAWs in \(O(|\mathsf{MAW}(T)|)\) time upon a query. This paper initiates the problem of computing MAWs from a compressed representation of a string. In particular, we focus on the most basic compressed representation of a string, run-length encoding (RLE), which represents each maximal run of the same characters \(a\) by \(a^p\) where \(p\) is the length of the run. Let \(m\) be the RLE-size of string \(T\). After categorizing the MAWs into five disjoint sets \(\mathcal{M}_1\), \(\mathcal{M}_2\), \(\mathcal{M}_3\), \(\mathcal{M}_4\), \(\mathcal{M}_5\) using RLE, we present matching upper and lower bounds for the number of MAWs in \(\mathcal{M}_i\) for \(i = 1,2,4,5\) in terms of RLE-size \(m\), except for \(\mathcal{M}_3\) whose size is unbounded by \(m\). We then present a compact \(O(m)\)-space data structure that can report all MAWs in optimal \(O(|\mathsf{MAW}(T)|)\) time. |
---|---|
ISSN: | 2331-8422 |