Loading…
Classification of toxic comments unified through diverse internet forums
In the last half-decade, India has seen exponential growth in the Internet and social media. This huge growth resulted in better communication among friends and families and freely spread information, content, opinions, and ideas. Some users misusethis freedom and make social media platforms intoler...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | In the last half-decade, India has seen exponential growth in the Internet and social media. This huge growth resulted in better communication among friends and families and freely spread information, content, opinions, and ideas. Some users misusethis freedom and make social media platforms intolerable. The magnitude of detrimental content online, such as toxic comments or content, is not manageable by humans. This study creates a homogeneous dataset by manually labelling comments taken from social platforms and combining them with some publicly available datasets. We have classified them into two category labels, toxic and non-toxic. This work presents our unified dataset, including a wide spectrum of comments and an approach to classify Hinglish comments using the BERT transformer model. The study also includes training baseline models and depicting their performance based on selected evaluation criteria. The BERT model outperformed the baseline and other models trained on the unified dataset. This study gives importance to Hinglish Comments and provides an implementation for classifying them to make internet platform much more secure and friendly for regional language users. |
---|---|
ISSN: | 0094-243X 1551-7616 |
DOI: | 10.1063/5.0169608 |