Loading…

Neural Machine Translation for Low-Resourced Indian Languages

A large number of significant assets are available online in English, which is frequently translated into native languages to ease the information sharing among local people who are not much familiar with English. However, manual translation is a very tedious, costly, and time-taking process. To thi...

Full description

Saved in:
Bibliographic Details
Published in:arXiv.org 2020-04
Main Authors: Choudhary, Himanshu, Rao, Shivansh, Rohilla, Rajesh
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Choudhary, Himanshu
Rao, Shivansh
Rohilla, Rajesh
description A large number of significant assets are available online in English, which is frequently translated into native languages to ease the information sharing among local people who are not much familiar with English. However, manual translation is a very tedious, costly, and time-taking process. To this end, machine translation is an effective approach to convert text to a different language without any human involvement. Neural machine translation (NMT) is one of the most proficient translation techniques amongst all existing machine translation systems. In this paper, we have applied NMT on two of the most morphological rich Indian languages, i.e. English-Tamil and English-Malayalam. We proposed a novel NMT model using Multihead self-attention along with pre-trained Byte-Pair-Encoded (BPE) and MultiBPE embeddings to develop an efficient translation system that overcomes the OOV (Out Of Vocabulary) problem for low resourced morphological rich Indian languages which do not have much translation available online. We also collected corpus from different sources, addressed the issues with these publicly available data and refined them for further uses. We used the BLEU score for evaluating our system performance. Experimental results and survey confirmed that our proposed translator (24.34 and 9.78 BLEU score) outperforms Google translator (9.40 and 5.94 BLEU score) respectively.
format article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2396462464</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2396462464</sourcerecordid><originalsourceid>FETCH-proquest_journals_23964624643</originalsourceid><addsrcrecordid>eNqNi7EKwjAUAIMgWLT_EHAu1Jc06uAkikJ1kO7l0b7WlpJoXoO_bwc_wOmGu5uJCJTaJDsNsBAxc5-mKZgtZJmKxOFOweMgb1g9O0uy8Gh5wLFzVjbOy9x9kgexC76iWl5t3aGVOdo2YEu8EvMGB6b4x6VYn0_F8ZK8vHsH4rHsp9NOqgS1N9qANlr9V30BYZ03xQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2396462464</pqid></control><display><type>article</type><title>Neural Machine Translation for Low-Resourced Indian Languages</title><source>Publicly Available Content Database</source><creator>Choudhary, Himanshu ; Rao, Shivansh ; Rohilla, Rajesh</creator><creatorcontrib>Choudhary, Himanshu ; Rao, Shivansh ; Rohilla, Rajesh</creatorcontrib><description>A large number of significant assets are available online in English, which is frequently translated into native languages to ease the information sharing among local people who are not much familiar with English. However, manual translation is a very tedious, costly, and time-taking process. To this end, machine translation is an effective approach to convert text to a different language without any human involvement. Neural machine translation (NMT) is one of the most proficient translation techniques amongst all existing machine translation systems. In this paper, we have applied NMT on two of the most morphological rich Indian languages, i.e. English-Tamil and English-Malayalam. We proposed a novel NMT model using Multihead self-attention along with pre-trained Byte-Pair-Encoded (BPE) and MultiBPE embeddings to develop an efficient translation system that overcomes the OOV (Out Of Vocabulary) problem for low resourced morphological rich Indian languages which do not have much translation available online. We also collected corpus from different sources, addressed the issues with these publicly available data and refined them for further uses. We used the BLEU score for evaluating our system performance. Experimental results and survey confirmed that our proposed translator (24.34 and 9.78 BLEU score) outperforms Google translator (9.40 and 5.94 BLEU score) respectively.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Languages ; Machine translation ; Morphology</subject><ispartof>arXiv.org, 2020-04</ispartof><rights>2020. This work is published under http://creativecommons.org/publicdomain/zero/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/2396462464?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>776,780,25731,36989,44566</link.rule.ids></links><search><creatorcontrib>Choudhary, Himanshu</creatorcontrib><creatorcontrib>Rao, Shivansh</creatorcontrib><creatorcontrib>Rohilla, Rajesh</creatorcontrib><title>Neural Machine Translation for Low-Resourced Indian Languages</title><title>arXiv.org</title><description>A large number of significant assets are available online in English, which is frequently translated into native languages to ease the information sharing among local people who are not much familiar with English. However, manual translation is a very tedious, costly, and time-taking process. To this end, machine translation is an effective approach to convert text to a different language without any human involvement. Neural machine translation (NMT) is one of the most proficient translation techniques amongst all existing machine translation systems. In this paper, we have applied NMT on two of the most morphological rich Indian languages, i.e. English-Tamil and English-Malayalam. We proposed a novel NMT model using Multihead self-attention along with pre-trained Byte-Pair-Encoded (BPE) and MultiBPE embeddings to develop an efficient translation system that overcomes the OOV (Out Of Vocabulary) problem for low resourced morphological rich Indian languages which do not have much translation available online. We also collected corpus from different sources, addressed the issues with these publicly available data and refined them for further uses. We used the BLEU score for evaluating our system performance. Experimental results and survey confirmed that our proposed translator (24.34 and 9.78 BLEU score) outperforms Google translator (9.40 and 5.94 BLEU score) respectively.</description><subject>Languages</subject><subject>Machine translation</subject><subject>Morphology</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNqNi7EKwjAUAIMgWLT_EHAu1Jc06uAkikJ1kO7l0b7WlpJoXoO_bwc_wOmGu5uJCJTaJDsNsBAxc5-mKZgtZJmKxOFOweMgb1g9O0uy8Gh5wLFzVjbOy9x9kgexC76iWl5t3aGVOdo2YEu8EvMGB6b4x6VYn0_F8ZK8vHsH4rHsp9NOqgS1N9qANlr9V30BYZ03xQ</recordid><startdate>20200419</startdate><enddate>20200419</enddate><creator>Choudhary, Himanshu</creator><creator>Rao, Shivansh</creator><creator>Rohilla, Rajesh</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20200419</creationdate><title>Neural Machine Translation for Low-Resourced Indian Languages</title><author>Choudhary, Himanshu ; Rao, Shivansh ; Rohilla, Rajesh</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_23964624643</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Languages</topic><topic>Machine translation</topic><topic>Morphology</topic><toplevel>online_resources</toplevel><creatorcontrib>Choudhary, Himanshu</creatorcontrib><creatorcontrib>Rao, Shivansh</creatorcontrib><creatorcontrib>Rohilla, Rajesh</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Choudhary, Himanshu</au><au>Rao, Shivansh</au><au>Rohilla, Rajesh</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Neural Machine Translation for Low-Resourced Indian Languages</atitle><jtitle>arXiv.org</jtitle><date>2020-04-19</date><risdate>2020</risdate><eissn>2331-8422</eissn><abstract>A large number of significant assets are available online in English, which is frequently translated into native languages to ease the information sharing among local people who are not much familiar with English. However, manual translation is a very tedious, costly, and time-taking process. To this end, machine translation is an effective approach to convert text to a different language without any human involvement. Neural machine translation (NMT) is one of the most proficient translation techniques amongst all existing machine translation systems. In this paper, we have applied NMT on two of the most morphological rich Indian languages, i.e. English-Tamil and English-Malayalam. We proposed a novel NMT model using Multihead self-attention along with pre-trained Byte-Pair-Encoded (BPE) and MultiBPE embeddings to develop an efficient translation system that overcomes the OOV (Out Of Vocabulary) problem for low resourced morphological rich Indian languages which do not have much translation available online. We also collected corpus from different sources, addressed the issues with these publicly available data and refined them for further uses. We used the BLEU score for evaluating our system performance. Experimental results and survey confirmed that our proposed translator (24.34 and 9.78 BLEU score) outperforms Google translator (9.40 and 5.94 BLEU score) respectively.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2020-04
issn 2331-8422
language eng
recordid cdi_proquest_journals_2396462464
source Publicly Available Content Database
subjects Languages
Machine translation
Morphology
title Neural Machine Translation for Low-Resourced Indian Languages
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-07T07%3A32%3A28IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Neural%20Machine%20Translation%20for%20Low-Resourced%20Indian%20Languages&rft.jtitle=arXiv.org&rft.au=Choudhary,%20Himanshu&rft.date=2020-04-19&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2396462464%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_journals_23964624643%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2396462464&rft_id=info:pmid/&rfr_iscdi=true