Loading…

Domain bias in distinguishing Flemish and Dutch subtitles

This paper describes experiments in which I tried to distinguish between Flemish and Netherlandic Dutch subtitles, as originally proposed in the VarDial 2018 Dutch–Flemish Subtitle task. However, rather than using all data as a monolithic block, I divided them into two non-overlapping domains and th...

Full description

Saved in:
Bibliographic Details
Published in:Natural language engineering 2020-09, Vol.26 (5), p.493-510
Main Author: van Halteren, Hans
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This paper describes experiments in which I tried to distinguish between Flemish and Netherlandic Dutch subtitles, as originally proposed in the VarDial 2018 Dutch–Flemish Subtitle task. However, rather than using all data as a monolithic block, I divided them into two non-overlapping domains and then investigated how the relation between training and test domains influences the recognition quality. I show that the best estimate of the level of recognizability of the language varieties is derived when training on one domain and testing on another. Apart from the quantitative results, I also present a qualitative analysis, by investigating in detail the most distinguishing features in the various scenarios. Here too, it is with the out-of-domain recognition that some genuine differences between Flemish and Netherlandic Dutch can be found.
ISSN:1351-3249
1469-8110
DOI:10.1017/S1351324919000445