Loading…

Structure in on-line documents

We present a hierarchical approach for extracting homogeneous regions in on-line documents. The problem of identifying and processing ruled and unruled tables, text and drawings is addressed. The on-line document is first segmented into regions with only text strokes and regions with both text and n...

Full description

Saved in:
Bibliographic Details
Main Authors: Jain, K., Namboodiri, A.M., Subrahmonia, J.
Format: Conference Proceeding
Language:English
Subjects:
Citations: Items that cite this one
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:We present a hierarchical approach for extracting homogeneous regions in on-line documents. The problem of identifying and processing ruled and unruled tables, text and drawings is addressed. The on-line document is first segmented into regions with only text strokes and regions with both text and non-text strokes. The text region is further classified as unruled table or plain text. Stroke clustering is used to segment the non-text regions. Each nontext segment is then classified as drawing, ruled table or underlined keyword using stroke properties. The individual regions are processed and the results are assembled to identify the structure of the on-line document.
DOI:10.1109/ICDAR.2001.953906