Loading…

On classification trees and random forests in corpus linguistics: Some words of caution and suggestions for improvement

This paper is a discussion of methodological problems that (can) arise in the analysis of multifactorial data analyzed with tree-based or forest-based classifiers in (corpus) linguistics. I showcase a data set that highlights where such methods can fail at providing optimal results and then discuss...

Full description

Saved in:
Bibliographic Details
Published in:Corpus linguistics and linguistic theory 2020-12, Vol.16 (3), p.617-647
Main Author: Th Gries, Stefan
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This paper is a discussion of methodological problems that (can) arise in the analysis of multifactorial data analyzed with tree-based or forest-based classifiers in (corpus) linguistics. I showcase a data set that highlights where such methods can fail at providing optimal results and then discuss solutions to this problem as well as the interpretation of random forests more generally.
ISSN:1613-7027
1613-7035
DOI:10.1515/cllt-2018-0078