Loading…

Incremental Evolution of Fuzzy Grammar Fragments to Enhance Instance Matching and Text Mining

In many applications, it is useful to extract structured data from sections of unstructured text. A common approach is to use pattern matching (e.g., regular expressions) or more general grammar-based techniques. In cases where exact templates or grammar fragments are not known, it is possible to us...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on fuzzy systems 2008-12, Vol.16 (6), p.1425-1438
Main Authors: Martin, T., Yun Shen, Azvine, B.
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In many applications, it is useful to extract structured data from sections of unstructured text. A common approach is to use pattern matching (e.g., regular expressions) or more general grammar-based techniques. In cases where exact templates or grammar fragments are not known, it is possible to use machine learning approaches, based on words or n-grams, to identify the structured data. This is generally a two-stage (train/use) process that cannot easily cope with incremental extensions of the training set. In this paper, we combine a fuzzy grammar-based approach with incremental learning. This enables a set of grammar fragments to evolve incrementally, each time a new example is given, while guaranteeing that it can parse previously seen examples. We propose a novel measure of overlap between fuzzy grammar fragments that can also be used to determine the degree to which a string is parsed by a grammar fragment. This measure of overlap allows us to compare the range of two fuzzy grammar fragments (i.e., to estimate and compare the sets of strings that fuzzily conform to each grammar) without explicitly parsing any strings. A simple application shows the method's validity.
ISSN:1063-6706
1941-0034
DOI:10.1109/TFUZZ.2008.925920