Loading…

The B-Subtle framework: tailoring subtitles to your needs

Large amounts of subtitles, from movies and TV shows, can be easily found on the web, for free, in almost every language. Several corpora, built from subtitles, with different annotations and purposes, are currently available. Considering that new sets of subtitles are constantly being released, we...

Full description

Saved in:
Bibliographic Details
Published in:Language resources and evaluation 2020-12, Vol.54 (4), p.1143-1159
Main Authors: Ventura, Miguel, Veiga, Jessica, Coheur, Luisa, Gama, Sandra
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Large amounts of subtitles, from movies and TV shows, can be easily found on the web, for free, in almost every language. Several corpora, built from subtitles, with different annotations and purposes, are currently available. Considering that new sets of subtitles are constantly being released, we propose B-Subtle, an open source framework that allows the automatic creation of corpora constituted of sequential pairs of dialogue turns, gathered from subtitles. With the help of a configuration file, the B-Subtle framework permits to enrich subtitles and dialogue turns with extra information (such as movie genre or the polarity of an utterance); in addition, it allows different types of filtering to be applied to both subtitle files and dialogue turns. Therefore, with B-Subtle, each one can create his/her own corpus, tailored to his/her needs. Moreover, in order to replicate the process in a future experiment, the user just needs to save the configuration file. In this paper, we describe B-Subtle and demonstrate how to build different corpora with it.
ISSN:1574-020X
1574-0218
DOI:10.1007/s10579-020-09507-3