Loading…
A phonetic similarity model for automatic extraction of transliteration pairs
This article proposes an approach for the automatic extraction of transliteration pairs from Chinese Web corpora. In this approach, we formulate the machine transliteration process using a syllable-based phonetic similarity model which consists of phonetic confusion matrices and a Chinese character...
Saved in:
Published in: | ACM transactions on Asian language information processing 2007-09, Vol.6 (2), p.6 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | This article proposes an approach for the automatic extraction of transliteration pairs from Chinese Web corpora. In this approach, we formulate the machine transliteration process using a syllable-based phonetic similarity model which consists of phonetic confusion matrices and a Chinese character
n
-gram language model. With the phonetic similarity model, the extraction of transliteration pairs becomes a two-step process of
recognition followed by validation:
First, in the
recognition
process, we identify the most probable transliteration in the
k
-neighborhood of a recognized English word. Then, in the
validation
process, we qualify the transliteration pair candidates with a hypothesis test. We carry out an analytical study on the statistics of several key factors in English-Chinese transliteration to help formulate phonetic similarity modeling. We then conduct both supervised and unsupervised learning of a phonetic similarity model on a development database. The experimental results validate the effectiveness of the phonetic similarity model by achieving an
F
-measure of 0.739 in supervised learning. The unsupervised learning approach works almost as well as the supervised one, thus allowing us to deploy automatic extraction of transliteration pairs in the Web space. |
---|---|
ISSN: | 1530-0226 1558-3430 |
DOI: | 10.1145/1282080.1282081 |