Loading…

Are your random effects normal? A simulation study of methods for estimating whether subjects or items come from more than one population by examining the distribution of random effects in mixed-effects logistic regression

With mixed-effects regression models becoming a mainstream tool for every psycholinguist, there has become an increasing need to understand them more fully. In the last decade, most work on mixed-effects models in psycholinguistics has focused on properly specifying the random-effects structure to m...

Full description

Saved in:
Bibliographic Details
Published in:Behavior research methods 2024-09, Vol.56 (6), p.5557-5587
Main Authors: Houghton, Zachary N., Kapatsinski, Vsevolod
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites cdi_FETCH-LOGICAL-c326t-62f5a5a9d145706259bce4643e5ac350db6d4c1c6708a7226b5a3af4276b4d453
container_end_page 5587
container_issue 6
container_start_page 5557
container_title Behavior research methods
container_volume 56
creator Houghton, Zachary N.
Kapatsinski, Vsevolod
description With mixed-effects regression models becoming a mainstream tool for every psycholinguist, there has become an increasing need to understand them more fully. In the last decade, most work on mixed-effects models in psycholinguistics has focused on properly specifying the random-effects structure to minimize error in evaluating the statistical significance of fixed-effects predictors. The present study examines a potential misspecification of random effects that has not been discussed in psycholinguistics: violation of the single-subject-population assumption, in the context of logistic regression. Estimated random-effects distributions in real studies often appear to be bi- or multimodal. However, there is no established way to estimate whether a random-effects distribution corresponds to more than one underlying population, especially in the more common case of a multivariate distribution of random effects. We show that violations of the single-subject-population assumption can usually be detected by assessing the (multivariate) normality of the inferred random-effects structure, unless the data show quasi-separability, i.e., many subjects or items show near-categorical behavior. In the absence of quasi-separability, several clustering methods are successful in determining which group each participant belongs to. The BIC difference between a two-cluster and a one-cluster solution can be used to determine that subjects (or items) do not come from a single population. This then allows the researcher to define and justify a new post hoc variable specifying the groups to which participants or items belong, which can be incorporated into regression analysis.
doi_str_mv 10.3758/s13428-023-02287-y
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_2895259227</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3095151173</sourcerecordid><originalsourceid>FETCH-LOGICAL-c326t-62f5a5a9d145706259bce4643e5ac350db6d4c1c6708a7226b5a3af4276b4d453</originalsourceid><addsrcrecordid>eNp9Uctu1DAUjRCIlpYfYIGuxIZNwM8ks0KjipdUiQ1dW05yM-NRbA92Ipqf5Vu4nelA1UUXlq17XtY9RfGGsw-y1s3HzKUSTcmEpCOaulyeFedca1VKLZrnD95nxaucd4zJRnD1sjiTDeO1YOq8-LNOCEucEyQb-ugBhwG7KUOIydvxE6whOz-PdnIxQJ7mfoE4gMdpG_sMQ0yAeXKe8LCB31uaY4I8t7uDC8FuQp-hix5hSBTgIyVOWxsgBoR93J_M2wXw1noX7pzIBnqXp-Ta-YBS6KMfugDe3WJfngZj3JDCdZBwkzBnkl0WLwY7Znx9f18UN18-_7z6Vl7_-Pr9an1ddlJUU1mJQVttVz1XumaV0Ku2Q1Upidp2UrO-rXrV8a6qWWNrIapWW2kHJeqqVb3S8qJ4f_Tdp_hrpo0Y73KH42gDxjkb0aw0uQpRE_XdI-qO1h_od0ayleaa81oSSxxZXYo5JxzMPtGW02I4M3ftm2P7hto3h_bNQqK399Zz67H_JznVTQR5JGSCwgbT_-wnbP8Cp8vAwQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3095151173</pqid></control><display><type>article</type><title>Are your random effects normal? A simulation study of methods for estimating whether subjects or items come from more than one population by examining the distribution of random effects in mixed-effects logistic regression</title><source>Springer Link</source><creator>Houghton, Zachary N. ; Kapatsinski, Vsevolod</creator><creatorcontrib>Houghton, Zachary N. ; Kapatsinski, Vsevolod</creatorcontrib><description>With mixed-effects regression models becoming a mainstream tool for every psycholinguist, there has become an increasing need to understand them more fully. In the last decade, most work on mixed-effects models in psycholinguistics has focused on properly specifying the random-effects structure to minimize error in evaluating the statistical significance of fixed-effects predictors. The present study examines a potential misspecification of random effects that has not been discussed in psycholinguistics: violation of the single-subject-population assumption, in the context of logistic regression. Estimated random-effects distributions in real studies often appear to be bi- or multimodal. However, there is no established way to estimate whether a random-effects distribution corresponds to more than one underlying population, especially in the more common case of a multivariate distribution of random effects. We show that violations of the single-subject-population assumption can usually be detected by assessing the (multivariate) normality of the inferred random-effects structure, unless the data show quasi-separability, i.e., many subjects or items show near-categorical behavior. In the absence of quasi-separability, several clustering methods are successful in determining which group each participant belongs to. The BIC difference between a two-cluster and a one-cluster solution can be used to determine that subjects (or items) do not come from a single population. This then allows the researcher to define and justify a new post hoc variable specifying the groups to which participants or items belong, which can be incorporated into regression analysis.</description><identifier>ISSN: 1554-3528</identifier><identifier>EISSN: 1554-3528</identifier><identifier>DOI: 10.3758/s13428-023-02287-y</identifier><identifier>PMID: 38017204</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Behavioral Science and Psychology ; Cognitive Psychology ; Computer Simulation ; Data Interpretation, Statistical ; Humans ; Logistic Models ; Models, Statistical ; Original Manuscript ; Population studies ; Psycholinguistics - methods ; Psychology ; Regression analysis ; Statistical models</subject><ispartof>Behavior research methods, 2024-09, Vol.56 (6), p.5557-5587</ispartof><rights>The Psychonomic Society, Inc. 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><rights>2023. The Psychonomic Society, Inc.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c326t-62f5a5a9d145706259bce4643e5ac350db6d4c1c6708a7226b5a3af4276b4d453</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/38017204$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Houghton, Zachary N.</creatorcontrib><creatorcontrib>Kapatsinski, Vsevolod</creatorcontrib><title>Are your random effects normal? A simulation study of methods for estimating whether subjects or items come from more than one population by examining the distribution of random effects in mixed-effects logistic regression</title><title>Behavior research methods</title><addtitle>Behav Res</addtitle><addtitle>Behav Res Methods</addtitle><description>With mixed-effects regression models becoming a mainstream tool for every psycholinguist, there has become an increasing need to understand them more fully. In the last decade, most work on mixed-effects models in psycholinguistics has focused on properly specifying the random-effects structure to minimize error in evaluating the statistical significance of fixed-effects predictors. The present study examines a potential misspecification of random effects that has not been discussed in psycholinguistics: violation of the single-subject-population assumption, in the context of logistic regression. Estimated random-effects distributions in real studies often appear to be bi- or multimodal. However, there is no established way to estimate whether a random-effects distribution corresponds to more than one underlying population, especially in the more common case of a multivariate distribution of random effects. We show that violations of the single-subject-population assumption can usually be detected by assessing the (multivariate) normality of the inferred random-effects structure, unless the data show quasi-separability, i.e., many subjects or items show near-categorical behavior. In the absence of quasi-separability, several clustering methods are successful in determining which group each participant belongs to. The BIC difference between a two-cluster and a one-cluster solution can be used to determine that subjects (or items) do not come from a single population. This then allows the researcher to define and justify a new post hoc variable specifying the groups to which participants or items belong, which can be incorporated into regression analysis.</description><subject>Behavioral Science and Psychology</subject><subject>Cognitive Psychology</subject><subject>Computer Simulation</subject><subject>Data Interpretation, Statistical</subject><subject>Humans</subject><subject>Logistic Models</subject><subject>Models, Statistical</subject><subject>Original Manuscript</subject><subject>Population studies</subject><subject>Psycholinguistics - methods</subject><subject>Psychology</subject><subject>Regression analysis</subject><subject>Statistical models</subject><issn>1554-3528</issn><issn>1554-3528</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNp9Uctu1DAUjRCIlpYfYIGuxIZNwM8ks0KjipdUiQ1dW05yM-NRbA92Ipqf5Vu4nelA1UUXlq17XtY9RfGGsw-y1s3HzKUSTcmEpCOaulyeFedca1VKLZrnD95nxaucd4zJRnD1sjiTDeO1YOq8-LNOCEucEyQb-ugBhwG7KUOIydvxE6whOz-PdnIxQJ7mfoE4gMdpG_sMQ0yAeXKe8LCB31uaY4I8t7uDC8FuQp-hix5hSBTgIyVOWxsgBoR93J_M2wXw1noX7pzIBnqXp-Ta-YBS6KMfugDe3WJfngZj3JDCdZBwkzBnkl0WLwY7Znx9f18UN18-_7z6Vl7_-Pr9an1ddlJUU1mJQVttVz1XumaV0Ku2Q1Upidp2UrO-rXrV8a6qWWNrIapWW2kHJeqqVb3S8qJ4f_Tdp_hrpo0Y73KH42gDxjkb0aw0uQpRE_XdI-qO1h_od0ayleaa81oSSxxZXYo5JxzMPtGW02I4M3ftm2P7hto3h_bNQqK399Zz67H_JznVTQR5JGSCwgbT_-wnbP8Cp8vAwQ</recordid><startdate>20240901</startdate><enddate>20240901</enddate><creator>Houghton, Zachary N.</creator><creator>Kapatsinski, Vsevolod</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>4T-</scope><scope>7TK</scope><scope>K9.</scope><scope>7X8</scope></search><sort><creationdate>20240901</creationdate><title>Are your random effects normal? A simulation study of methods for estimating whether subjects or items come from more than one population by examining the distribution of random effects in mixed-effects logistic regression</title><author>Houghton, Zachary N. ; Kapatsinski, Vsevolod</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c326t-62f5a5a9d145706259bce4643e5ac350db6d4c1c6708a7226b5a3af4276b4d453</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Behavioral Science and Psychology</topic><topic>Cognitive Psychology</topic><topic>Computer Simulation</topic><topic>Data Interpretation, Statistical</topic><topic>Humans</topic><topic>Logistic Models</topic><topic>Models, Statistical</topic><topic>Original Manuscript</topic><topic>Population studies</topic><topic>Psycholinguistics - methods</topic><topic>Psychology</topic><topic>Regression analysis</topic><topic>Statistical models</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Houghton, Zachary N.</creatorcontrib><creatorcontrib>Kapatsinski, Vsevolod</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Docstoc</collection><collection>Neurosciences Abstracts</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>MEDLINE - Academic</collection><jtitle>Behavior research methods</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Houghton, Zachary N.</au><au>Kapatsinski, Vsevolod</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Are your random effects normal? A simulation study of methods for estimating whether subjects or items come from more than one population by examining the distribution of random effects in mixed-effects logistic regression</atitle><jtitle>Behavior research methods</jtitle><stitle>Behav Res</stitle><addtitle>Behav Res Methods</addtitle><date>2024-09-01</date><risdate>2024</risdate><volume>56</volume><issue>6</issue><spage>5557</spage><epage>5587</epage><pages>5557-5587</pages><issn>1554-3528</issn><eissn>1554-3528</eissn><abstract>With mixed-effects regression models becoming a mainstream tool for every psycholinguist, there has become an increasing need to understand them more fully. In the last decade, most work on mixed-effects models in psycholinguistics has focused on properly specifying the random-effects structure to minimize error in evaluating the statistical significance of fixed-effects predictors. The present study examines a potential misspecification of random effects that has not been discussed in psycholinguistics: violation of the single-subject-population assumption, in the context of logistic regression. Estimated random-effects distributions in real studies often appear to be bi- or multimodal. However, there is no established way to estimate whether a random-effects distribution corresponds to more than one underlying population, especially in the more common case of a multivariate distribution of random effects. We show that violations of the single-subject-population assumption can usually be detected by assessing the (multivariate) normality of the inferred random-effects structure, unless the data show quasi-separability, i.e., many subjects or items show near-categorical behavior. In the absence of quasi-separability, several clustering methods are successful in determining which group each participant belongs to. The BIC difference between a two-cluster and a one-cluster solution can be used to determine that subjects (or items) do not come from a single population. This then allows the researcher to define and justify a new post hoc variable specifying the groups to which participants or items belong, which can be incorporated into regression analysis.</abstract><cop>New York</cop><pub>Springer US</pub><pmid>38017204</pmid><doi>10.3758/s13428-023-02287-y</doi><tpages>31</tpages></addata></record>
fulltext fulltext
identifier ISSN: 1554-3528
ispartof Behavior research methods, 2024-09, Vol.56 (6), p.5557-5587
issn 1554-3528
1554-3528
language eng
recordid cdi_proquest_miscellaneous_2895259227
source Springer Link
subjects Behavioral Science and Psychology
Cognitive Psychology
Computer Simulation
Data Interpretation, Statistical
Humans
Logistic Models
Models, Statistical
Original Manuscript
Population studies
Psycholinguistics - methods
Psychology
Regression analysis
Statistical models
title Are your random effects normal? A simulation study of methods for estimating whether subjects or items come from more than one population by examining the distribution of random effects in mixed-effects logistic regression
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-08T02%3A22%3A01IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Are%20your%20random%20effects%20normal?%20A%20simulation%20study%20of%20methods%20for%20estimating%20whether%20subjects%20or%20items%20come%20from%20more%20than%20one%20population%20by%20examining%20the%20distribution%20of%20random%20effects%20in%20mixed-effects%20logistic%20regression&rft.jtitle=Behavior%20research%20methods&rft.au=Houghton,%20Zachary%20N.&rft.date=2024-09-01&rft.volume=56&rft.issue=6&rft.spage=5557&rft.epage=5587&rft.pages=5557-5587&rft.issn=1554-3528&rft.eissn=1554-3528&rft_id=info:doi/10.3758/s13428-023-02287-y&rft_dat=%3Cproquest_cross%3E3095151173%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c326t-62f5a5a9d145706259bce4643e5ac350db6d4c1c6708a7226b5a3af4276b4d453%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=3095151173&rft_id=info:pmid/38017204&rfr_iscdi=true