Loading…

An Approach Based on the Visualization Model for the Ukrainian Web Content Classification

In the course of the work, it was established that the best vectorization method for the modern Ukrainian everyday language is the TextRank method. It is proposed an approach for building a classifier based on SVM, which allows classifying web content of everyday Ukrainian language. The proposed app...

Full description

Saved in:
Bibliographic Details
Main Authors: Slobodzian, Vitalii, Molchanova, Maryna, Kovalchuk, Oleksii, Sobko, Olena, Mazurets, Olexander, Barmak, Olexander, Krak, Iurii
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In the course of the work, it was established that the best vectorization method for the modern Ukrainian everyday language is the TextRank method. It is proposed an approach for building a classifier based on SVM, which allows classifying web content of everyday Ukrainian language. The proposed approach's effectiveness was investigated using the following metrics: precision, recall, F1-norm, and confusion matrix. The classifier showed high efficiency by all metrics, with an accuracy of over 99%. The research was conducted on sets of Ukrainian-language texts of two categories, each of which had more than 200 texts, with a length of about 500 words. The conducted research provides an opportunity to use the proposed approach to classify modern everyday Ukrainian language to solve problems of textual content analysis and its classification by various features. The study results can be used to address socially significant issues: detecting bullying in social networks, detecting negative emotional content, and warning users about possible harmful content.
ISSN:2770-5218
DOI:10.1109/ACIT54803.2022.9913162