Loading…
A Pattern-Based Approach Using Compound Unit Recognition and Its Hybridization with Rule-Based Translation
This paper describes a compound unit (CU) recognizer as a pattern‐based approach and its hybridization with rule‐based translation. A compound unit is a combined concept including collocations, idioms, and compound nouns. CU recognition reduces part of speech ambiguities by combining several words i...
Saved in:
Published in: | Computational intelligence 1999-05, Vol.15 (2), p.114-127 |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | This paper describes a compound unit (CU) recognizer as a pattern‐based approach and its hybridization with rule‐based translation. A compound unit is a combined concept including collocations, idioms, and compound nouns. CU recognition reduces part of speech ambiguities by combining several words into a unit and consequently lessening the parsing load. It also provides pretranslated natural equivalents. Our focus in this paper is to obtain flexibility and efficiency from pattern‐based machine translation, and high‐quality translation by hybridization. A modified trie, our search index structure using “method” strategy is used to manage heterogeneous property of the constituents. Syntactic verification is integrated to obtain precise CU recognition by means of pruning wrongly recognized units that are caused by improper variable hypotheses. The experimental result with verification shows that the precision of CU recognition is increased to 99.69% with 31 CFG rules on the cyclic trie structure for 1,268 Wall Street Journal articles of the Penn Treebank. Another experiment with CU recognition also shows that it raises the understandability of translation for Web documents. |
---|---|
ISSN: | 0824-7935 1467-8640 |
DOI: | 10.1111/0824-7935.00087 |