Loading…
The B-Subtle framework: tailoring subtitles to your needs
Large amounts of subtitles, from movies and TV shows, can be easily found on the web, for free, in almost every language. Several corpora, built from subtitles, with different annotations and purposes, are currently available. Considering that new sets of subtitles are constantly being released, we...
Saved in:
Published in: | Language resources and evaluation 2020-12, Vol.54 (4), p.1143-1159 |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Large amounts of subtitles, from movies and TV shows, can be easily found on the web, for free, in almost every language. Several corpora, built from subtitles, with different annotations and purposes, are currently available. Considering that new sets of subtitles are constantly being released, we propose B-Subtle, an open source framework that allows the automatic creation of corpora constituted of sequential pairs of dialogue turns, gathered from subtitles. With the help of a configuration file, the B-Subtle framework permits to enrich subtitles and dialogue turns with extra information (such as movie genre or the polarity of an utterance); in addition, it allows different types of filtering to be applied to both subtitle files and dialogue turns. Therefore, with B-Subtle, each one can create his/her own corpus, tailored to his/her needs. Moreover, in order to replicate the process in a future experiment, the user just needs to save the configuration file. In this paper, we describe B-Subtle and demonstrate how to build different corpora with it. |
---|---|
ISSN: | 1574-020X 1574-0218 |
DOI: | 10.1007/s10579-020-09507-3 |