Loading…
Automatic Detection of Machine Generated Texts: Need More Tokens
Current advances in text generation using neural approaches make it possible to create texts hardly distinguishable from human texts. A survey to improve the efficiency of automatic discriminators to detect machine-generated text could be useful in revealing features directly affecting the quality o...
Saved in:
Main Authors: | , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Current advances in text generation using neural approaches make it possible to create texts hardly distinguishable from human texts. A survey to improve the efficiency of automatic discriminators to detect machine-generated text could be useful in revealing features directly affecting the quality of detection. Recently, many works have appeared in the natural language processing (NLP) and machine learning (ML) communities to create accurate detectors for the English language. Despite the importance of this problem, all the works that exist for Russian rely only on short sequence length. In this work, we argue that context length matters. First, we present novel open dataset for Russian language with long texts for the task of machinegenerated text detection. We describe the collection, generative models selection, and sampling process in detail and present exploratory analysis of the quality of various discriminators. Second, we conduct a set of learning experiments to build accurate machine-generated text detectors for both English and Russian languages. In addition, we conduct a comparative analysis of the quality of discriminators when training a multi-task model. |
---|---|
ISSN: | 2831-5847 |
DOI: | 10.1109/IVMEM57067.2022.9983964 |