Loading…
Rhetorical Sentence Categorization for Scientific Paper Using Word2Vec Semantic Representation
One of some ways to summarize scientific papers is by employing rhetorical structure of sentences. Determining rhetorical sentence itself passes through the process of text categorization. In order to get good performance, some works in text categorization have been done by employing semantic simila...
Saved in:
Published in: | Journal of physics. Conference series 2017-01, Vol.801 (1), p.12070 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | One of some ways to summarize scientific papers is by employing rhetorical structure of sentences. Determining rhetorical sentence itself passes through the process of text categorization. In order to get good performance, some works in text categorization have been done by employing semantic similarity words. Therefore, this paper aims to present the rhetorical sentence categorization from scientific paper by using selected features, added previous label, and Word2Vec to capture semantic similarity words. Then, this paper shows the result of employing resampling for balancing the existing instances per class and combining resampling and Word2Vec representation itself. Every experiment is tested in two classifiers, namely IBk and J48 tree. It shows that the use of previous label, Word2Vec (Skip-Gram), and resampling improves performance. After doing all the experiments in the 10-fold cross-validation, the highest performance of F-measure is achieved 84.97% by combining Word2Vec (Skip-Gram), all features, and resampling. |
---|---|
ISSN: | 1742-6588 1742-6596 |
DOI: | 10.1088/1742-6596/801/1/012070 |