Loading…

Development of Bangla Spell and Grammar Checkers: Resource Creation and Evaluation

A spell and grammar checker is profoundly essential for diverse publications especially for Bangla language in particular as it is spoken by millions of native speakers around the world. Considering the lack of research efforts, we demonstrate the development of a comprehensive Bangla spell and gram...

Full description

Saved in:
Bibliographic Details
Published in:IEEE access 2021, Vol.9, p.141079-141097
Main Authors: Hossain, Nahid, Islam, Salekul, Huda, Mohammad Nurul
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c408t-e3f37a0b40a2b6704b2431d3adee77f1a5ab7a6721c27a29aaabc9c225a016fc3
cites cdi_FETCH-LOGICAL-c408t-e3f37a0b40a2b6704b2431d3adee77f1a5ab7a6721c27a29aaabc9c225a016fc3
container_end_page 141097
container_issue
container_start_page 141079
container_title IEEE access
container_volume 9
creator Hossain, Nahid
Islam, Salekul
Huda, Mohammad Nurul
description A spell and grammar checker is profoundly essential for diverse publications especially for Bangla language in particular as it is spoken by millions of native speakers around the world. Considering the lack of research efforts, we demonstrate the development of a comprehensive Bangla spell and grammar checker with necessary resources. At first, a full-fledged and generalised Bangla monolingual corpus comprising over 100 million words has been built by scraping reputed, diversified online sources and then an extensive Bangla lexicon consisting of over 1 million unique words has been extracted from that corpus. Based on these corpus and lexicon, we have developed a combined spell and grammar checker application that simultaneously detects distinct spelling and grammatical mistakes and provides appropriate suggestions for both as well. The spell checker uses the Double Metaphone algorithm and Edit distance based on the distributed lexicons and numerical suffix dataset to detect all types of Bangla spelling mistakes with an accuracy rate of 97.21% individually. The grammar checker detects errors based on language model probability i.e. combination of bigram and trigram, and generates suggestions based on the Cosine similarity measure with the accuracy rate of 94.29% individually. The datasets and codes used in this work are freely available at https://git.io/JzJ4w .
doi_str_mv 10.1109/ACCESS.2021.3119627
format article
fullrecord <record><control><sourceid>proquest_doaj_</sourceid><recordid>TN_cdi_doaj_primary_oai_doaj_org_article_71c47823a27947af8713cc719e63d0ac</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9568876</ieee_id><doaj_id>oai_doaj_org_article_71c47823a27947af8713cc719e63d0ac</doaj_id><sourcerecordid>2583639508</sourcerecordid><originalsourceid>FETCH-LOGICAL-c408t-e3f37a0b40a2b6704b2431d3adee77f1a5ab7a6721c27a29aaabc9c225a016fc3</originalsourceid><addsrcrecordid>eNpNUdtOwkAQbYwmEuQLeGnic3Ev7V58w4pIQmIC-ryZbqdYLF3cFhL_3kKJcV7mknPOzOQEwZiSCaVEP0zTdLZeTxhhdMIp1YLJq2DAqNART7i4_lffBqOm2ZIuVDdK5CBYPeMRK7ffYd2GrgifoN5UEK73WFUh1Hk497DbgQ_TT7Rf6JvHcIWNO3iLYeoR2tLVZ9zsCNXh3N4FNwVUDY4ueRh8vMze09do-TZfpNNlZGOi2gh5wSWQLCbAMiFJnLGY05xDjihlQSGBTIKQjFomgWkAyKy2jCVAqCgsHwaLXjd3sDV7X3Zn_hgHpTkPnN8Y8G1pKzSS2lgqxoFJHUsolKTcWkk1Cp4TOGnd91p7774P2LRm2_1Yd-cbliguuE6I6lC8R1nvmsZj8beVEnPywvRemJMX5uJFxxr3rBIR_xg6EUpJwX8BA2iERQ</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2583639508</pqid></control><display><type>article</type><title>Development of Bangla Spell and Grammar Checkers: Resource Creation and Evaluation</title><source>IEEE Open Access Journals</source><creator>Hossain, Nahid ; Islam, Salekul ; Huda, Mohammad Nurul</creator><creatorcontrib>Hossain, Nahid ; Islam, Salekul ; Huda, Mohammad Nurul</creatorcontrib><description>A spell and grammar checker is profoundly essential for diverse publications especially for Bangla language in particular as it is spoken by millions of native speakers around the world. Considering the lack of research efforts, we demonstrate the development of a comprehensive Bangla spell and grammar checker with necessary resources. At first, a full-fledged and generalised Bangla monolingual corpus comprising over 100 million words has been built by scraping reputed, diversified online sources and then an extensive Bangla lexicon consisting of over 1 million unique words has been extracted from that corpus. Based on these corpus and lexicon, we have developed a combined spell and grammar checker application that simultaneously detects distinct spelling and grammatical mistakes and provides appropriate suggestions for both as well. The spell checker uses the Double Metaphone algorithm and Edit distance based on the distributed lexicons and numerical suffix dataset to detect all types of Bangla spelling mistakes with an accuracy rate of 97.21% individually. The grammar checker detects errors based on language model probability i.e. combination of bigram and trigram, and generates suggestions based on the Cosine similarity measure with the accuracy rate of 94.29% individually. The datasets and codes used in this work are freely available at https://git.io/JzJ4w .</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2021.3119627</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Algorithms ; Bangla ; Checkers ; corpus ; Datasets ; Grammar ; grammar checker ; lexicon ; Manuals ; Measurement uncertainty ; Neural networks ; Numerical models ; Quality assurance ; spell checker ; Words (language)</subject><ispartof>IEEE access, 2021, Vol.9, p.141079-141097</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2021</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c408t-e3f37a0b40a2b6704b2431d3adee77f1a5ab7a6721c27a29aaabc9c225a016fc3</citedby><cites>FETCH-LOGICAL-c408t-e3f37a0b40a2b6704b2431d3adee77f1a5ab7a6721c27a29aaabc9c225a016fc3</cites><orcidid>0000-0002-7262-0060 ; 0000-0002-1325-8209</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9568876$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,4024,27633,27923,27924,27925,54933</link.rule.ids></links><search><creatorcontrib>Hossain, Nahid</creatorcontrib><creatorcontrib>Islam, Salekul</creatorcontrib><creatorcontrib>Huda, Mohammad Nurul</creatorcontrib><title>Development of Bangla Spell and Grammar Checkers: Resource Creation and Evaluation</title><title>IEEE access</title><addtitle>Access</addtitle><description>A spell and grammar checker is profoundly essential for diverse publications especially for Bangla language in particular as it is spoken by millions of native speakers around the world. Considering the lack of research efforts, we demonstrate the development of a comprehensive Bangla spell and grammar checker with necessary resources. At first, a full-fledged and generalised Bangla monolingual corpus comprising over 100 million words has been built by scraping reputed, diversified online sources and then an extensive Bangla lexicon consisting of over 1 million unique words has been extracted from that corpus. Based on these corpus and lexicon, we have developed a combined spell and grammar checker application that simultaneously detects distinct spelling and grammatical mistakes and provides appropriate suggestions for both as well. The spell checker uses the Double Metaphone algorithm and Edit distance based on the distributed lexicons and numerical suffix dataset to detect all types of Bangla spelling mistakes with an accuracy rate of 97.21% individually. The grammar checker detects errors based on language model probability i.e. combination of bigram and trigram, and generates suggestions based on the Cosine similarity measure with the accuracy rate of 94.29% individually. The datasets and codes used in this work are freely available at https://git.io/JzJ4w .</description><subject>Algorithms</subject><subject>Bangla</subject><subject>Checkers</subject><subject>corpus</subject><subject>Datasets</subject><subject>Grammar</subject><subject>grammar checker</subject><subject>lexicon</subject><subject>Manuals</subject><subject>Measurement uncertainty</subject><subject>Neural networks</subject><subject>Numerical models</subject><subject>Quality assurance</subject><subject>spell checker</subject><subject>Words (language)</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>DOA</sourceid><recordid>eNpNUdtOwkAQbYwmEuQLeGnic3Ev7V58w4pIQmIC-ryZbqdYLF3cFhL_3kKJcV7mknPOzOQEwZiSCaVEP0zTdLZeTxhhdMIp1YLJq2DAqNART7i4_lffBqOm2ZIuVDdK5CBYPeMRK7ffYd2GrgifoN5UEK73WFUh1Hk497DbgQ_TT7Rf6JvHcIWNO3iLYeoR2tLVZ9zsCNXh3N4FNwVUDY4ueRh8vMze09do-TZfpNNlZGOi2gh5wSWQLCbAMiFJnLGY05xDjihlQSGBTIKQjFomgWkAyKy2jCVAqCgsHwaLXjd3sDV7X3Zn_hgHpTkPnN8Y8G1pKzSS2lgqxoFJHUsolKTcWkk1Cp4TOGnd91p7774P2LRm2_1Yd-cbliguuE6I6lC8R1nvmsZj8beVEnPywvRemJMX5uJFxxr3rBIR_xg6EUpJwX8BA2iERQ</recordid><startdate>2021</startdate><enddate>2021</enddate><creator>Hossain, Nahid</creator><creator>Islam, Salekul</creator><creator>Huda, Mohammad Nurul</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0002-7262-0060</orcidid><orcidid>https://orcid.org/0000-0002-1325-8209</orcidid></search><sort><creationdate>2021</creationdate><title>Development of Bangla Spell and Grammar Checkers: Resource Creation and Evaluation</title><author>Hossain, Nahid ; Islam, Salekul ; Huda, Mohammad Nurul</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c408t-e3f37a0b40a2b6704b2431d3adee77f1a5ab7a6721c27a29aaabc9c225a016fc3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Algorithms</topic><topic>Bangla</topic><topic>Checkers</topic><topic>corpus</topic><topic>Datasets</topic><topic>Grammar</topic><topic>grammar checker</topic><topic>lexicon</topic><topic>Manuals</topic><topic>Measurement uncertainty</topic><topic>Neural networks</topic><topic>Numerical models</topic><topic>Quality assurance</topic><topic>spell checker</topic><topic>Words (language)</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Hossain, Nahid</creatorcontrib><creatorcontrib>Islam, Salekul</creatorcontrib><creatorcontrib>Huda, Mohammad Nurul</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) Online</collection><collection>IEEE</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Hossain, Nahid</au><au>Islam, Salekul</au><au>Huda, Mohammad Nurul</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Development of Bangla Spell and Grammar Checkers: Resource Creation and Evaluation</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2021</date><risdate>2021</risdate><volume>9</volume><spage>141079</spage><epage>141097</epage><pages>141079-141097</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>A spell and grammar checker is profoundly essential for diverse publications especially for Bangla language in particular as it is spoken by millions of native speakers around the world. Considering the lack of research efforts, we demonstrate the development of a comprehensive Bangla spell and grammar checker with necessary resources. At first, a full-fledged and generalised Bangla monolingual corpus comprising over 100 million words has been built by scraping reputed, diversified online sources and then an extensive Bangla lexicon consisting of over 1 million unique words has been extracted from that corpus. Based on these corpus and lexicon, we have developed a combined spell and grammar checker application that simultaneously detects distinct spelling and grammatical mistakes and provides appropriate suggestions for both as well. The spell checker uses the Double Metaphone algorithm and Edit distance based on the distributed lexicons and numerical suffix dataset to detect all types of Bangla spelling mistakes with an accuracy rate of 97.21% individually. The grammar checker detects errors based on language model probability i.e. combination of bigram and trigram, and generates suggestions based on the Cosine similarity measure with the accuracy rate of 94.29% individually. The datasets and codes used in this work are freely available at https://git.io/JzJ4w .</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2021.3119627</doi><tpages>19</tpages><orcidid>https://orcid.org/0000-0002-7262-0060</orcidid><orcidid>https://orcid.org/0000-0002-1325-8209</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2169-3536
ispartof IEEE access, 2021, Vol.9, p.141079-141097
issn 2169-3536
2169-3536
language eng
recordid cdi_doaj_primary_oai_doaj_org_article_71c47823a27947af8713cc719e63d0ac
source IEEE Open Access Journals
subjects Algorithms
Bangla
Checkers
corpus
Datasets
Grammar
grammar checker
lexicon
Manuals
Measurement uncertainty
Neural networks
Numerical models
Quality assurance
spell checker
Words (language)
title Development of Bangla Spell and Grammar Checkers: Resource Creation and Evaluation
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T13%3A14%3A47IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_doaj_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Development%20of%20Bangla%20Spell%20and%20Grammar%20Checkers:%20Resource%20Creation%20and%20Evaluation&rft.jtitle=IEEE%20access&rft.au=Hossain,%20Nahid&rft.date=2021&rft.volume=9&rft.spage=141079&rft.epage=141097&rft.pages=141079-141097&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2021.3119627&rft_dat=%3Cproquest_doaj_%3E2583639508%3C/proquest_doaj_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c408t-e3f37a0b40a2b6704b2431d3adee77f1a5ab7a6721c27a29aaabc9c225a016fc3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2583639508&rft_id=info:pmid/&rft_ieee_id=9568876&rfr_iscdi=true