Loading…
On classification trees and random forests in corpus linguistics: Some words of caution and suggestions for improvement
This paper is a discussion of methodological problems that (can) arise in the analysis of multifactorial data analyzed with tree-based or forest-based classifiers in (corpus) linguistics. I showcase a data set that highlights where such methods can fail at providing optimal results and then discuss...
Saved in:
Published in: | Corpus linguistics and linguistic theory 2020-12, Vol.16 (3), p.617-647 |
---|---|
Main Author: | |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | This paper is a discussion of methodological problems that (can) arise in the analysis of multifactorial data analyzed with tree-based or forest-based classifiers in (corpus) linguistics. I showcase a data set that highlights where such methods can fail at providing optimal results and then discuss solutions to this problem as well as the interpretation of random forests more generally. |
---|---|
ISSN: | 1613-7027 1613-7035 |
DOI: | 10.1515/cllt-2018-0078 |