Loading…

El análisis estilométrico aplicado a la literatura española: las novelas policiacas e históricas

This paper demonstrates that a computer can determine the authorship of a text. To this end we created a corpus of 122 contemporary novels written in Spanish (69 historical novels, 50 crime novels, and 3 westerns). The corpus was then studied using stylo, a stylometric analysis package written in th...

Full description

Saved in:
Bibliographic Details
Published in:Caracteres (Salamanca) 2016, Vol.5 (2), p.196-245
Main Author: Fradejas Rueda, José Manuel
Format: Article
Language:Spanish
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This paper demonstrates that a computer can determine the authorship of a text. To this end we created a corpus of 122 contemporary novels written in Spanish (69 historical novels, 50 crime novels, and 3 westerns). The corpus was then studied using stylo, a stylometric analysis package written in the programming language R. We chose to apply the simplest of the multiple types of analysis offered by this package: cluster analysis. The results are very interesting: by taking into account just the 100 most frequently used words (MFW), the computer was able to group the different works of each author as well as assigning those published under a pseudonym to the true author without incurring in any errors. En este artículo se trata de mostrar si un ordenador es capaz de determinar la autoría de un texto. Para ello se ha creado un corpus de 122 novelas contemporáneas (69 de tema histórico, 50 policiacas y 3 del oeste) y se han analizado con el paquete de análisis estilométrico stylo. De todos los análisis que ofrece este paquete, escrito en R, se ha utilizado el más sencillo: el análisis de grupos. Los resultados han sido muy interesantes ya que con un mínimo de 100 palabras (las más frecuentes) el ordenador ha sido capaz de agrupar, sin error alguno, las distintas obras de cada autor y ha sabido asignar al autor real aquellas que se publicaron bajo seudónimo.
ISSN:2254-4496
2254-4496