Loading…

TextGraph - A lexicon based framework for concept extraction and visualization

The shift in the news consumption from traditional newspapers to online news has led media analysts and researchers to apply powerful text mining techniques on the vast amount of news data. News has a profound influence on public as it informs about the events happening around them and may affect th...

Full description

Saved in:
Bibliographic Details
Published in:Journal of intelligent & fuzzy systems 2022-01, Vol.43 (2), p.2035-2044
Main Authors: Shaikh, Anoud, Mahoto, Naeem Ahmed, Unar, Mukhtiar Ali
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The shift in the news consumption from traditional newspapers to online news has led media analysts and researchers to apply powerful text mining techniques on the vast amount of news data. News has a profound influence on public as it informs about the events happening around them and may affect them. It keeps the people connected and allows them to engage in the decision making process. The words used in the news language are sometimes taken from the regional languages so as to express a new phenomenon, event or idea. In this paper, we have proposed a lexicon based framework named as TextGraph that automatically extracts the concepts from the Dawn news using the Term Frequency–Inverse Document Frequency (TF-IDF) weighting factor and visualizes them in a formal way. To achieve value-add insights, we have developed Pakistani English corpus and used it along with other existing dictionaries. Our proposed corpus incorporates the Pakistani English words used in the Dawn news stories, which are annotated and validated by a human expert. Experimental results show that our concept extraction method out performs and gives more specific concepts. Our research suggests that the proposed framework and corpus opens multiple directions for promising future research in this domain.
ISSN:1064-1246
1875-8967
DOI:10.3233/JIFS-219303