Loading…

A phonetic similarity model for automatic extraction of transliteration pairs

This article proposes an approach for the automatic extraction of transliteration pairs from Chinese Web corpora. In this approach, we formulate the machine transliteration process using a syllable-based phonetic similarity model which consists of phonetic confusion matrices and a Chinese character...

Full description

Saved in:
Bibliographic Details
Published in:ACM transactions on Asian language information processing 2007-09, Vol.6 (2), p.6
Main Authors: Kuo, Jin-Shea, Li, Haizhou, Yang, Ying-Kuei
Format: Article
Language:English
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c187t-a1bca398502dca65aafbbdaff1c362f366d704514ac9034dc35b88b3c833e35e3
cites cdi_FETCH-LOGICAL-c187t-a1bca398502dca65aafbbdaff1c362f366d704514ac9034dc35b88b3c833e35e3
container_end_page
container_issue 2
container_start_page 6
container_title ACM transactions on Asian language information processing
container_volume 6
creator Kuo, Jin-Shea
Li, Haizhou
Yang, Ying-Kuei
description This article proposes an approach for the automatic extraction of transliteration pairs from Chinese Web corpora. In this approach, we formulate the machine transliteration process using a syllable-based phonetic similarity model which consists of phonetic confusion matrices and a Chinese character n -gram language model. With the phonetic similarity model, the extraction of transliteration pairs becomes a two-step process of recognition followed by validation: First, in the recognition process, we identify the most probable transliteration in the k -neighborhood of a recognized English word. Then, in the validation process, we qualify the transliteration pair candidates with a hypothesis test. We carry out an analytical study on the statistics of several key factors in English-Chinese transliteration to help formulate phonetic similarity modeling. We then conduct both supervised and unsupervised learning of a phonetic similarity model on a development database. The experimental results validate the effectiveness of the phonetic similarity model by achieving an F -measure of 0.739 in supervised learning. The unsupervised learning approach works almost as well as the supervised one, thus allowing us to deploy automatic extraction of transliteration pairs in the Web space.
doi_str_mv 10.1145/1282080.1282081
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_30970956</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>30970956</sourcerecordid><originalsourceid>FETCH-LOGICAL-c187t-a1bca398502dca65aafbbdaff1c362f366d704514ac9034dc35b88b3c833e35e3</originalsourceid><addsrcrecordid>eNotkEtPwzAQhC0EEqVw5poTt1CvN3adY1Xxkoq4wDnaOLYwSuJguxL99_R1-mZ2R3MYxu6BPwJUcgFCC6735ki4YDOQUpdYIb88aOQlF0Jds5uUfjgHqUDM2PuqmL7DaLM3RfKD7yn6vCuG0Nm-cCEWtM1hoMPb_uVIJvswFsEVez2m3mcb6XiayMd0y64c9cnenTlnX89Pn-vXcvPx8rZebUoDeplLgtYQ1lpy0RlSksi1bUfOgUElHCrVLXkloSJTc6w6g7LVukWjES1Ki3P2cOqdYvjd2pSbwSdj-55GG7apQV4veS3VPrg4BU0MKUXrmin6geKuAd4cZmvOs50J-A-Ws2EA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>30970956</pqid></control><display><type>article</type><title>A phonetic similarity model for automatic extraction of transliteration pairs</title><source>Association for Computing Machinery:Jisc Collections:ACM OPEN Journals 2023-2025 (reading list)</source><creator>Kuo, Jin-Shea ; Li, Haizhou ; Yang, Ying-Kuei</creator><creatorcontrib>Kuo, Jin-Shea ; Li, Haizhou ; Yang, Ying-Kuei</creatorcontrib><description>This article proposes an approach for the automatic extraction of transliteration pairs from Chinese Web corpora. In this approach, we formulate the machine transliteration process using a syllable-based phonetic similarity model which consists of phonetic confusion matrices and a Chinese character n -gram language model. With the phonetic similarity model, the extraction of transliteration pairs becomes a two-step process of recognition followed by validation: First, in the recognition process, we identify the most probable transliteration in the k -neighborhood of a recognized English word. Then, in the validation process, we qualify the transliteration pair candidates with a hypothesis test. We carry out an analytical study on the statistics of several key factors in English-Chinese transliteration to help formulate phonetic similarity modeling. We then conduct both supervised and unsupervised learning of a phonetic similarity model on a development database. The experimental results validate the effectiveness of the phonetic similarity model by achieving an F -measure of 0.739 in supervised learning. The unsupervised learning approach works almost as well as the supervised one, thus allowing us to deploy automatic extraction of transliteration pairs in the Web space.</description><identifier>ISSN: 1530-0226</identifier><identifier>EISSN: 1558-3430</identifier><identifier>DOI: 10.1145/1282080.1282081</identifier><language>eng</language><ispartof>ACM transactions on Asian language information processing, 2007-09, Vol.6 (2), p.6</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c187t-a1bca398502dca65aafbbdaff1c362f366d704514ac9034dc35b88b3c833e35e3</citedby><cites>FETCH-LOGICAL-c187t-a1bca398502dca65aafbbdaff1c362f366d704514ac9034dc35b88b3c833e35e3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Kuo, Jin-Shea</creatorcontrib><creatorcontrib>Li, Haizhou</creatorcontrib><creatorcontrib>Yang, Ying-Kuei</creatorcontrib><title>A phonetic similarity model for automatic extraction of transliteration pairs</title><title>ACM transactions on Asian language information processing</title><description>This article proposes an approach for the automatic extraction of transliteration pairs from Chinese Web corpora. In this approach, we formulate the machine transliteration process using a syllable-based phonetic similarity model which consists of phonetic confusion matrices and a Chinese character n -gram language model. With the phonetic similarity model, the extraction of transliteration pairs becomes a two-step process of recognition followed by validation: First, in the recognition process, we identify the most probable transliteration in the k -neighborhood of a recognized English word. Then, in the validation process, we qualify the transliteration pair candidates with a hypothesis test. We carry out an analytical study on the statistics of several key factors in English-Chinese transliteration to help formulate phonetic similarity modeling. We then conduct both supervised and unsupervised learning of a phonetic similarity model on a development database. The experimental results validate the effectiveness of the phonetic similarity model by achieving an F -measure of 0.739 in supervised learning. The unsupervised learning approach works almost as well as the supervised one, thus allowing us to deploy automatic extraction of transliteration pairs in the Web space.</description><issn>1530-0226</issn><issn>1558-3430</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2007</creationdate><recordtype>article</recordtype><recordid>eNotkEtPwzAQhC0EEqVw5poTt1CvN3adY1Xxkoq4wDnaOLYwSuJguxL99_R1-mZ2R3MYxu6BPwJUcgFCC6735ki4YDOQUpdYIb88aOQlF0Jds5uUfjgHqUDM2PuqmL7DaLM3RfKD7yn6vCuG0Nm-cCEWtM1hoMPb_uVIJvswFsEVez2m3mcb6XiayMd0y64c9cnenTlnX89Pn-vXcvPx8rZebUoDeplLgtYQ1lpy0RlSksi1bUfOgUElHCrVLXkloSJTc6w6g7LVukWjES1Ki3P2cOqdYvjd2pSbwSdj-55GG7apQV4veS3VPrg4BU0MKUXrmin6geKuAd4cZmvOs50J-A-Ws2EA</recordid><startdate>200709</startdate><enddate>200709</enddate><creator>Kuo, Jin-Shea</creator><creator>Li, Haizhou</creator><creator>Yang, Ying-Kuei</creator><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>200709</creationdate><title>A phonetic similarity model for automatic extraction of transliteration pairs</title><author>Kuo, Jin-Shea ; Li, Haizhou ; Yang, Ying-Kuei</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c187t-a1bca398502dca65aafbbdaff1c362f366d704514ac9034dc35b88b3c833e35e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2007</creationdate><toplevel>online_resources</toplevel><creatorcontrib>Kuo, Jin-Shea</creatorcontrib><creatorcontrib>Li, Haizhou</creatorcontrib><creatorcontrib>Yang, Ying-Kuei</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>ACM transactions on Asian language information processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Kuo, Jin-Shea</au><au>Li, Haizhou</au><au>Yang, Ying-Kuei</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A phonetic similarity model for automatic extraction of transliteration pairs</atitle><jtitle>ACM transactions on Asian language information processing</jtitle><date>2007-09</date><risdate>2007</risdate><volume>6</volume><issue>2</issue><spage>6</spage><pages>6-</pages><issn>1530-0226</issn><eissn>1558-3430</eissn><abstract>This article proposes an approach for the automatic extraction of transliteration pairs from Chinese Web corpora. In this approach, we formulate the machine transliteration process using a syllable-based phonetic similarity model which consists of phonetic confusion matrices and a Chinese character n -gram language model. With the phonetic similarity model, the extraction of transliteration pairs becomes a two-step process of recognition followed by validation: First, in the recognition process, we identify the most probable transliteration in the k -neighborhood of a recognized English word. Then, in the validation process, we qualify the transliteration pair candidates with a hypothesis test. We carry out an analytical study on the statistics of several key factors in English-Chinese transliteration to help formulate phonetic similarity modeling. We then conduct both supervised and unsupervised learning of a phonetic similarity model on a development database. The experimental results validate the effectiveness of the phonetic similarity model by achieving an F -measure of 0.739 in supervised learning. The unsupervised learning approach works almost as well as the supervised one, thus allowing us to deploy automatic extraction of transliteration pairs in the Web space.</abstract><doi>10.1145/1282080.1282081</doi></addata></record>
fulltext fulltext
identifier ISSN: 1530-0226
ispartof ACM transactions on Asian language information processing, 2007-09, Vol.6 (2), p.6
issn 1530-0226
1558-3430
language eng
recordid cdi_proquest_miscellaneous_30970956
source Association for Computing Machinery:Jisc Collections:ACM OPEN Journals 2023-2025 (reading list)
title A phonetic similarity model for automatic extraction of transliteration pairs
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-02T21%3A35%3A22IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20phonetic%20similarity%20model%20for%20automatic%20extraction%20of%20transliteration%20pairs&rft.jtitle=ACM%20transactions%20on%20Asian%20language%20information%20processing&rft.au=Kuo,%20Jin-Shea&rft.date=2007-09&rft.volume=6&rft.issue=2&rft.spage=6&rft.pages=6-&rft.issn=1530-0226&rft.eissn=1558-3430&rft_id=info:doi/10.1145/1282080.1282081&rft_dat=%3Cproquest_cross%3E30970956%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c187t-a1bca398502dca65aafbbdaff1c362f366d704514ac9034dc35b88b3c833e35e3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=30970956&rft_id=info:pmid/&rfr_iscdi=true