Loading…
Development of Bangla Spell and Grammar Checkers: Resource Creation and Evaluation
A spell and grammar checker is profoundly essential for diverse publications especially for Bangla language in particular as it is spoken by millions of native speakers around the world. Considering the lack of research efforts, we demonstrate the development of a comprehensive Bangla spell and gram...
Saved in:
Published in: | IEEE access 2021, Vol.9, p.141079-141097 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c408t-e3f37a0b40a2b6704b2431d3adee77f1a5ab7a6721c27a29aaabc9c225a016fc3 |
---|---|
cites | cdi_FETCH-LOGICAL-c408t-e3f37a0b40a2b6704b2431d3adee77f1a5ab7a6721c27a29aaabc9c225a016fc3 |
container_end_page | 141097 |
container_issue | |
container_start_page | 141079 |
container_title | IEEE access |
container_volume | 9 |
creator | Hossain, Nahid Islam, Salekul Huda, Mohammad Nurul |
description | A spell and grammar checker is profoundly essential for diverse publications especially for Bangla language in particular as it is spoken by millions of native speakers around the world. Considering the lack of research efforts, we demonstrate the development of a comprehensive Bangla spell and grammar checker with necessary resources. At first, a full-fledged and generalised Bangla monolingual corpus comprising over 100 million words has been built by scraping reputed, diversified online sources and then an extensive Bangla lexicon consisting of over 1 million unique words has been extracted from that corpus. Based on these corpus and lexicon, we have developed a combined spell and grammar checker application that simultaneously detects distinct spelling and grammatical mistakes and provides appropriate suggestions for both as well. The spell checker uses the Double Metaphone algorithm and Edit distance based on the distributed lexicons and numerical suffix dataset to detect all types of Bangla spelling mistakes with an accuracy rate of 97.21% individually. The grammar checker detects errors based on language model probability i.e. combination of bigram and trigram, and generates suggestions based on the Cosine similarity measure with the accuracy rate of 94.29% individually. The datasets and codes used in this work are freely available at https://git.io/JzJ4w . |
doi_str_mv | 10.1109/ACCESS.2021.3119627 |
format | article |
fullrecord | <record><control><sourceid>proquest_doaj_</sourceid><recordid>TN_cdi_doaj_primary_oai_doaj_org_article_71c47823a27947af8713cc719e63d0ac</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9568876</ieee_id><doaj_id>oai_doaj_org_article_71c47823a27947af8713cc719e63d0ac</doaj_id><sourcerecordid>2583639508</sourcerecordid><originalsourceid>FETCH-LOGICAL-c408t-e3f37a0b40a2b6704b2431d3adee77f1a5ab7a6721c27a29aaabc9c225a016fc3</originalsourceid><addsrcrecordid>eNpNUdtOwkAQbYwmEuQLeGnic3Ev7V58w4pIQmIC-ryZbqdYLF3cFhL_3kKJcV7mknPOzOQEwZiSCaVEP0zTdLZeTxhhdMIp1YLJq2DAqNART7i4_lffBqOm2ZIuVDdK5CBYPeMRK7ffYd2GrgifoN5UEK73WFUh1Hk497DbgQ_TT7Rf6JvHcIWNO3iLYeoR2tLVZ9zsCNXh3N4FNwVUDY4ueRh8vMze09do-TZfpNNlZGOi2gh5wSWQLCbAMiFJnLGY05xDjihlQSGBTIKQjFomgWkAyKy2jCVAqCgsHwaLXjd3sDV7X3Zn_hgHpTkPnN8Y8G1pKzSS2lgqxoFJHUsolKTcWkk1Cp4TOGnd91p7774P2LRm2_1Yd-cbliguuE6I6lC8R1nvmsZj8beVEnPywvRemJMX5uJFxxr3rBIR_xg6EUpJwX8BA2iERQ</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2583639508</pqid></control><display><type>article</type><title>Development of Bangla Spell and Grammar Checkers: Resource Creation and Evaluation</title><source>IEEE Open Access Journals</source><creator>Hossain, Nahid ; Islam, Salekul ; Huda, Mohammad Nurul</creator><creatorcontrib>Hossain, Nahid ; Islam, Salekul ; Huda, Mohammad Nurul</creatorcontrib><description>A spell and grammar checker is profoundly essential for diverse publications especially for Bangla language in particular as it is spoken by millions of native speakers around the world. Considering the lack of research efforts, we demonstrate the development of a comprehensive Bangla spell and grammar checker with necessary resources. At first, a full-fledged and generalised Bangla monolingual corpus comprising over 100 million words has been built by scraping reputed, diversified online sources and then an extensive Bangla lexicon consisting of over 1 million unique words has been extracted from that corpus. Based on these corpus and lexicon, we have developed a combined spell and grammar checker application that simultaneously detects distinct spelling and grammatical mistakes and provides appropriate suggestions for both as well. The spell checker uses the Double Metaphone algorithm and Edit distance based on the distributed lexicons and numerical suffix dataset to detect all types of Bangla spelling mistakes with an accuracy rate of 97.21% individually. The grammar checker detects errors based on language model probability i.e. combination of bigram and trigram, and generates suggestions based on the Cosine similarity measure with the accuracy rate of 94.29% individually. The datasets and codes used in this work are freely available at https://git.io/JzJ4w .</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2021.3119627</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Algorithms ; Bangla ; Checkers ; corpus ; Datasets ; Grammar ; grammar checker ; lexicon ; Manuals ; Measurement uncertainty ; Neural networks ; Numerical models ; Quality assurance ; spell checker ; Words (language)</subject><ispartof>IEEE access, 2021, Vol.9, p.141079-141097</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2021</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c408t-e3f37a0b40a2b6704b2431d3adee77f1a5ab7a6721c27a29aaabc9c225a016fc3</citedby><cites>FETCH-LOGICAL-c408t-e3f37a0b40a2b6704b2431d3adee77f1a5ab7a6721c27a29aaabc9c225a016fc3</cites><orcidid>0000-0002-7262-0060 ; 0000-0002-1325-8209</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9568876$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,4024,27633,27923,27924,27925,54933</link.rule.ids></links><search><creatorcontrib>Hossain, Nahid</creatorcontrib><creatorcontrib>Islam, Salekul</creatorcontrib><creatorcontrib>Huda, Mohammad Nurul</creatorcontrib><title>Development of Bangla Spell and Grammar Checkers: Resource Creation and Evaluation</title><title>IEEE access</title><addtitle>Access</addtitle><description>A spell and grammar checker is profoundly essential for diverse publications especially for Bangla language in particular as it is spoken by millions of native speakers around the world. Considering the lack of research efforts, we demonstrate the development of a comprehensive Bangla spell and grammar checker with necessary resources. At first, a full-fledged and generalised Bangla monolingual corpus comprising over 100 million words has been built by scraping reputed, diversified online sources and then an extensive Bangla lexicon consisting of over 1 million unique words has been extracted from that corpus. Based on these corpus and lexicon, we have developed a combined spell and grammar checker application that simultaneously detects distinct spelling and grammatical mistakes and provides appropriate suggestions for both as well. The spell checker uses the Double Metaphone algorithm and Edit distance based on the distributed lexicons and numerical suffix dataset to detect all types of Bangla spelling mistakes with an accuracy rate of 97.21% individually. The grammar checker detects errors based on language model probability i.e. combination of bigram and trigram, and generates suggestions based on the Cosine similarity measure with the accuracy rate of 94.29% individually. The datasets and codes used in this work are freely available at https://git.io/JzJ4w .</description><subject>Algorithms</subject><subject>Bangla</subject><subject>Checkers</subject><subject>corpus</subject><subject>Datasets</subject><subject>Grammar</subject><subject>grammar checker</subject><subject>lexicon</subject><subject>Manuals</subject><subject>Measurement uncertainty</subject><subject>Neural networks</subject><subject>Numerical models</subject><subject>Quality assurance</subject><subject>spell checker</subject><subject>Words (language)</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>DOA</sourceid><recordid>eNpNUdtOwkAQbYwmEuQLeGnic3Ev7V58w4pIQmIC-ryZbqdYLF3cFhL_3kKJcV7mknPOzOQEwZiSCaVEP0zTdLZeTxhhdMIp1YLJq2DAqNART7i4_lffBqOm2ZIuVDdK5CBYPeMRK7ffYd2GrgifoN5UEK73WFUh1Hk497DbgQ_TT7Rf6JvHcIWNO3iLYeoR2tLVZ9zsCNXh3N4FNwVUDY4ueRh8vMze09do-TZfpNNlZGOi2gh5wSWQLCbAMiFJnLGY05xDjihlQSGBTIKQjFomgWkAyKy2jCVAqCgsHwaLXjd3sDV7X3Zn_hgHpTkPnN8Y8G1pKzSS2lgqxoFJHUsolKTcWkk1Cp4TOGnd91p7774P2LRm2_1Yd-cbliguuE6I6lC8R1nvmsZj8beVEnPywvRemJMX5uJFxxr3rBIR_xg6EUpJwX8BA2iERQ</recordid><startdate>2021</startdate><enddate>2021</enddate><creator>Hossain, Nahid</creator><creator>Islam, Salekul</creator><creator>Huda, Mohammad Nurul</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0002-7262-0060</orcidid><orcidid>https://orcid.org/0000-0002-1325-8209</orcidid></search><sort><creationdate>2021</creationdate><title>Development of Bangla Spell and Grammar Checkers: Resource Creation and Evaluation</title><author>Hossain, Nahid ; Islam, Salekul ; Huda, Mohammad Nurul</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c408t-e3f37a0b40a2b6704b2431d3adee77f1a5ab7a6721c27a29aaabc9c225a016fc3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Algorithms</topic><topic>Bangla</topic><topic>Checkers</topic><topic>corpus</topic><topic>Datasets</topic><topic>Grammar</topic><topic>grammar checker</topic><topic>lexicon</topic><topic>Manuals</topic><topic>Measurement uncertainty</topic><topic>Neural networks</topic><topic>Numerical models</topic><topic>Quality assurance</topic><topic>spell checker</topic><topic>Words (language)</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Hossain, Nahid</creatorcontrib><creatorcontrib>Islam, Salekul</creatorcontrib><creatorcontrib>Huda, Mohammad Nurul</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) Online</collection><collection>IEEE</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Hossain, Nahid</au><au>Islam, Salekul</au><au>Huda, Mohammad Nurul</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Development of Bangla Spell and Grammar Checkers: Resource Creation and Evaluation</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2021</date><risdate>2021</risdate><volume>9</volume><spage>141079</spage><epage>141097</epage><pages>141079-141097</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>A spell and grammar checker is profoundly essential for diverse publications especially for Bangla language in particular as it is spoken by millions of native speakers around the world. Considering the lack of research efforts, we demonstrate the development of a comprehensive Bangla spell and grammar checker with necessary resources. At first, a full-fledged and generalised Bangla monolingual corpus comprising over 100 million words has been built by scraping reputed, diversified online sources and then an extensive Bangla lexicon consisting of over 1 million unique words has been extracted from that corpus. Based on these corpus and lexicon, we have developed a combined spell and grammar checker application that simultaneously detects distinct spelling and grammatical mistakes and provides appropriate suggestions for both as well. The spell checker uses the Double Metaphone algorithm and Edit distance based on the distributed lexicons and numerical suffix dataset to detect all types of Bangla spelling mistakes with an accuracy rate of 97.21% individually. The grammar checker detects errors based on language model probability i.e. combination of bigram and trigram, and generates suggestions based on the Cosine similarity measure with the accuracy rate of 94.29% individually. The datasets and codes used in this work are freely available at https://git.io/JzJ4w .</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2021.3119627</doi><tpages>19</tpages><orcidid>https://orcid.org/0000-0002-7262-0060</orcidid><orcidid>https://orcid.org/0000-0002-1325-8209</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 2169-3536 |
ispartof | IEEE access, 2021, Vol.9, p.141079-141097 |
issn | 2169-3536 2169-3536 |
language | eng |
recordid | cdi_doaj_primary_oai_doaj_org_article_71c47823a27947af8713cc719e63d0ac |
source | IEEE Open Access Journals |
subjects | Algorithms Bangla Checkers corpus Datasets Grammar grammar checker lexicon Manuals Measurement uncertainty Neural networks Numerical models Quality assurance spell checker Words (language) |
title | Development of Bangla Spell and Grammar Checkers: Resource Creation and Evaluation |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T13%3A14%3A47IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_doaj_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Development%20of%20Bangla%20Spell%20and%20Grammar%20Checkers:%20Resource%20Creation%20and%20Evaluation&rft.jtitle=IEEE%20access&rft.au=Hossain,%20Nahid&rft.date=2021&rft.volume=9&rft.spage=141079&rft.epage=141097&rft.pages=141079-141097&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2021.3119627&rft_dat=%3Cproquest_doaj_%3E2583639508%3C/proquest_doaj_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c408t-e3f37a0b40a2b6704b2431d3adee77f1a5ab7a6721c27a29aaabc9c225a016fc3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2583639508&rft_id=info:pmid/&rft_ieee_id=9568876&rfr_iscdi=true |