Loading…

Multivariate generalized linear mixed models for underdispersed count data

Researchers are often interested in understanding the relationship between a set of covariates and a set of response variables. To achieve this goal, the use of regression analysis, either linear or generalized linear models, is largely applied. However, such models only allow users to model one res...

Full description

Saved in:
Bibliographic Details
Published in:Journal of statistical computation and simulation 2023-09, Vol.93 (14), p.2410-2427
Main Authors: da Silva, Guilherme Parreira, Laureano, Henrique Aparecido, Petterle, Ricardo Rasmussen, Ribeiro Jr, Paulo Justiniano, Bonat, Wagner Hugo
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites cdi_FETCH-LOGICAL-c286t-debd158c680128622ce999bbc6b991ba81b1bc70e6789815f56fb3692570a33
container_end_page 2427
container_issue 14
container_start_page 2410
container_title Journal of statistical computation and simulation
container_volume 93
creator da Silva, Guilherme Parreira
Laureano, Henrique Aparecido
Petterle, Ricardo Rasmussen
Ribeiro Jr, Paulo Justiniano
Bonat, Wagner Hugo
description Researchers are often interested in understanding the relationship between a set of covariates and a set of response variables. To achieve this goal, the use of regression analysis, either linear or generalized linear models, is largely applied. However, such models only allow users to model one response variable at a time. Moreover, it is not possible to directly calculate from the regression model a correlation measure between the response variables. In this article, we employed the Multivariate Generalized Linear Mixed Models framework, which allows the specification of a set of response variables and calculates the correlation between them through a random effect structure that follows a multivariate normal distribution. We used the maximum likelihood estimation framework to estimate all model parameters using Laplace approximation to integrate out the random effects. The derivatives are provided by automatic differentiation. The outer maximization was made using a general-purpose algorithm such as PORT and Broyden-Fletcher-Goldfarb-Shanno algorithm ( BFGS ). We delimited this problem by studying count response variables with the following distributions: Poisson, negative binomial, Conway-Maxwell-Poisson (COM-Poisson), and double Poisson. While the first distribution can model only equidispersed data, the second models equi and overdispersed, and the third and fourth models all types of dispersion (i.e. including underdispersion). The models were implemented on software R with package TMB , based on C++ templates. Besides the full specification, models with simpler structures in the covariance matrix were considered (fixed and common variance, and ρ set to 0) and fixed dispersion. These models were applied to a dataset from the National Health and Nutrition Examination Survey, where two response variables are underdispersed and one can be considered equidispersed that were measured at 1281 subjects. The double Poisson full model specification overcame the other three competitors considering three goodness-of-fit measures: Akaike Information Criteria (AIC), Bayesian Information Criteria (BIC), and maximized log-likelihood. Consequently, it estimated parameters with smaller standard error and a greater number of significant correlation coefficients. Therefore, the proposed model can deal with multivariate count responses and measures the correlation between them taking into account the effects of the covariates.
doi_str_mv 10.1080/00949655.2023.2184474
format article
fullrecord <record><control><sourceid>proquest_infor</sourceid><recordid>TN_cdi_proquest_journals_2850409497</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2850409497</sourcerecordid><originalsourceid>FETCH-LOGICAL-c286t-debd158c680128622ce999bbc6b991ba81b1bc70e6789815f56fb3692570a33</originalsourceid><addsrcrecordid>eNp9kM1OwzAQhC0EEqXwCEiROKfYTuzYN1AFBVTEAe6W_4JcJXFZJ0B5ehK1XDmtdndmVvshdEnwgmCBrzGWpeSMLSimxYISUZZVeYRmhPEiZ4QXx2g2afJJdIrOUtpgjAlhdIaenoemD58agu599u47D7oJP95lTei8hqwN32PTRueblNURsqFzHlxIWw9p3Ng4dH3mdK_P0Umtm-QvDnWOXu_v3pYP-fpl9bi8XeeWCt7nzhtHmLBcYDIOKLVeSmmM5UZKYrQghhhbYc8rIQVhNeO1KbikrMK6KOboap-6hfgx-NSrTRygGw8qKhguJxbVqGJ7lYWYEvhabSG0GnaKYDVBU3_Q1ARNHaCNvpu9L3Tjr63-itA41etdE6EG3dmQVPF_xC-EBXN3</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2850409497</pqid></control><display><type>article</type><title>Multivariate generalized linear mixed models for underdispersed count data</title><source>Taylor and Francis Science and Technology Collection</source><creator>da Silva, Guilherme Parreira ; Laureano, Henrique Aparecido ; Petterle, Ricardo Rasmussen ; Ribeiro Jr, Paulo Justiniano ; Bonat, Wagner Hugo</creator><creatorcontrib>da Silva, Guilherme Parreira ; Laureano, Henrique Aparecido ; Petterle, Ricardo Rasmussen ; Ribeiro Jr, Paulo Justiniano ; Bonat, Wagner Hugo</creatorcontrib><description>Researchers are often interested in understanding the relationship between a set of covariates and a set of response variables. To achieve this goal, the use of regression analysis, either linear or generalized linear models, is largely applied. However, such models only allow users to model one response variable at a time. Moreover, it is not possible to directly calculate from the regression model a correlation measure between the response variables. In this article, we employed the Multivariate Generalized Linear Mixed Models framework, which allows the specification of a set of response variables and calculates the correlation between them through a random effect structure that follows a multivariate normal distribution. We used the maximum likelihood estimation framework to estimate all model parameters using Laplace approximation to integrate out the random effects. The derivatives are provided by automatic differentiation. The outer maximization was made using a general-purpose algorithm such as PORT and Broyden-Fletcher-Goldfarb-Shanno algorithm ( BFGS ). We delimited this problem by studying count response variables with the following distributions: Poisson, negative binomial, Conway-Maxwell-Poisson (COM-Poisson), and double Poisson. While the first distribution can model only equidispersed data, the second models equi and overdispersed, and the third and fourth models all types of dispersion (i.e. including underdispersion). The models were implemented on software R with package TMB , based on C++ templates. Besides the full specification, models with simpler structures in the covariance matrix were considered (fixed and common variance, and ρ set to 0) and fixed dispersion. These models were applied to a dataset from the National Health and Nutrition Examination Survey, where two response variables are underdispersed and one can be considered equidispersed that were measured at 1281 subjects. The double Poisson full model specification overcame the other three competitors considering three goodness-of-fit measures: Akaike Information Criteria (AIC), Bayesian Information Criteria (BIC), and maximized log-likelihood. Consequently, it estimated parameters with smaller standard error and a greater number of significant correlation coefficients. Therefore, the proposed model can deal with multivariate count responses and measures the correlation between them taking into account the effects of the covariates.</description><identifier>ISSN: 0094-9655</identifier><identifier>EISSN: 1563-5163</identifier><identifier>DOI: 10.1080/00949655.2023.2184474</identifier><language>eng</language><publisher>Abingdon: Taylor &amp; Francis</publisher><subject>Algorithms ; automatic differentiation ; Correlation coefficients ; Covariance matrix ; Criteria ; Dispersion ; Generalized linear models ; Goodness of fit ; Laplace approximation ; Maximum likelihood estimation ; Multivariate analysis ; multivariate models ; Normal distribution ; optimization ; Parameter estimation ; Regression analysis ; Regression models ; Specifications ; Standard error ; Statistical analysis ; Statistical models ; template model builder ; Variables</subject><ispartof>Journal of statistical computation and simulation, 2023-09, Vol.93 (14), p.2410-2427</ispartof><rights>2023 Informa UK Limited, trading as Taylor &amp; Francis Group 2023</rights><rights>2023 Informa UK Limited, trading as Taylor &amp; Francis Group</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c286t-debd158c680128622ce999bbc6b991ba81b1bc70e6789815f56fb3692570a33</cites><orcidid>0000-0001-5302-9446 ; 0000-0001-6040-6465 ; 0000-0001-7735-1077 ; 0000-0002-0349-7054 ; 0000-0003-1654-8356</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>da Silva, Guilherme Parreira</creatorcontrib><creatorcontrib>Laureano, Henrique Aparecido</creatorcontrib><creatorcontrib>Petterle, Ricardo Rasmussen</creatorcontrib><creatorcontrib>Ribeiro Jr, Paulo Justiniano</creatorcontrib><creatorcontrib>Bonat, Wagner Hugo</creatorcontrib><title>Multivariate generalized linear mixed models for underdispersed count data</title><title>Journal of statistical computation and simulation</title><description>Researchers are often interested in understanding the relationship between a set of covariates and a set of response variables. To achieve this goal, the use of regression analysis, either linear or generalized linear models, is largely applied. However, such models only allow users to model one response variable at a time. Moreover, it is not possible to directly calculate from the regression model a correlation measure between the response variables. In this article, we employed the Multivariate Generalized Linear Mixed Models framework, which allows the specification of a set of response variables and calculates the correlation between them through a random effect structure that follows a multivariate normal distribution. We used the maximum likelihood estimation framework to estimate all model parameters using Laplace approximation to integrate out the random effects. The derivatives are provided by automatic differentiation. The outer maximization was made using a general-purpose algorithm such as PORT and Broyden-Fletcher-Goldfarb-Shanno algorithm ( BFGS ). We delimited this problem by studying count response variables with the following distributions: Poisson, negative binomial, Conway-Maxwell-Poisson (COM-Poisson), and double Poisson. While the first distribution can model only equidispersed data, the second models equi and overdispersed, and the third and fourth models all types of dispersion (i.e. including underdispersion). The models were implemented on software R with package TMB , based on C++ templates. Besides the full specification, models with simpler structures in the covariance matrix were considered (fixed and common variance, and ρ set to 0) and fixed dispersion. These models were applied to a dataset from the National Health and Nutrition Examination Survey, where two response variables are underdispersed and one can be considered equidispersed that were measured at 1281 subjects. The double Poisson full model specification overcame the other three competitors considering three goodness-of-fit measures: Akaike Information Criteria (AIC), Bayesian Information Criteria (BIC), and maximized log-likelihood. Consequently, it estimated parameters with smaller standard error and a greater number of significant correlation coefficients. Therefore, the proposed model can deal with multivariate count responses and measures the correlation between them taking into account the effects of the covariates.</description><subject>Algorithms</subject><subject>automatic differentiation</subject><subject>Correlation coefficients</subject><subject>Covariance matrix</subject><subject>Criteria</subject><subject>Dispersion</subject><subject>Generalized linear models</subject><subject>Goodness of fit</subject><subject>Laplace approximation</subject><subject>Maximum likelihood estimation</subject><subject>Multivariate analysis</subject><subject>multivariate models</subject><subject>Normal distribution</subject><subject>optimization</subject><subject>Parameter estimation</subject><subject>Regression analysis</subject><subject>Regression models</subject><subject>Specifications</subject><subject>Standard error</subject><subject>Statistical analysis</subject><subject>Statistical models</subject><subject>template model builder</subject><subject>Variables</subject><issn>0094-9655</issn><issn>1563-5163</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><recordid>eNp9kM1OwzAQhC0EEqXwCEiROKfYTuzYN1AFBVTEAe6W_4JcJXFZJ0B5ehK1XDmtdndmVvshdEnwgmCBrzGWpeSMLSimxYISUZZVeYRmhPEiZ4QXx2g2afJJdIrOUtpgjAlhdIaenoemD58agu599u47D7oJP95lTei8hqwN32PTRueblNURsqFzHlxIWw9p3Ng4dH3mdK_P0Umtm-QvDnWOXu_v3pYP-fpl9bi8XeeWCt7nzhtHmLBcYDIOKLVeSmmM5UZKYrQghhhbYc8rIQVhNeO1KbikrMK6KOboap-6hfgx-NSrTRygGw8qKhguJxbVqGJ7lYWYEvhabSG0GnaKYDVBU3_Q1ARNHaCNvpu9L3Tjr63-itA41etdE6EG3dmQVPF_xC-EBXN3</recordid><startdate>20230922</startdate><enddate>20230922</enddate><creator>da Silva, Guilherme Parreira</creator><creator>Laureano, Henrique Aparecido</creator><creator>Petterle, Ricardo Rasmussen</creator><creator>Ribeiro Jr, Paulo Justiniano</creator><creator>Bonat, Wagner Hugo</creator><general>Taylor &amp; Francis</general><general>Taylor &amp; Francis Ltd</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0001-5302-9446</orcidid><orcidid>https://orcid.org/0000-0001-6040-6465</orcidid><orcidid>https://orcid.org/0000-0001-7735-1077</orcidid><orcidid>https://orcid.org/0000-0002-0349-7054</orcidid><orcidid>https://orcid.org/0000-0003-1654-8356</orcidid></search><sort><creationdate>20230922</creationdate><title>Multivariate generalized linear mixed models for underdispersed count data</title><author>da Silva, Guilherme Parreira ; Laureano, Henrique Aparecido ; Petterle, Ricardo Rasmussen ; Ribeiro Jr, Paulo Justiniano ; Bonat, Wagner Hugo</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c286t-debd158c680128622ce999bbc6b991ba81b1bc70e6789815f56fb3692570a33</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Algorithms</topic><topic>automatic differentiation</topic><topic>Correlation coefficients</topic><topic>Covariance matrix</topic><topic>Criteria</topic><topic>Dispersion</topic><topic>Generalized linear models</topic><topic>Goodness of fit</topic><topic>Laplace approximation</topic><topic>Maximum likelihood estimation</topic><topic>Multivariate analysis</topic><topic>multivariate models</topic><topic>Normal distribution</topic><topic>optimization</topic><topic>Parameter estimation</topic><topic>Regression analysis</topic><topic>Regression models</topic><topic>Specifications</topic><topic>Standard error</topic><topic>Statistical analysis</topic><topic>Statistical models</topic><topic>template model builder</topic><topic>Variables</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>da Silva, Guilherme Parreira</creatorcontrib><creatorcontrib>Laureano, Henrique Aparecido</creatorcontrib><creatorcontrib>Petterle, Ricardo Rasmussen</creatorcontrib><creatorcontrib>Ribeiro Jr, Paulo Justiniano</creatorcontrib><creatorcontrib>Bonat, Wagner Hugo</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Journal of statistical computation and simulation</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>da Silva, Guilherme Parreira</au><au>Laureano, Henrique Aparecido</au><au>Petterle, Ricardo Rasmussen</au><au>Ribeiro Jr, Paulo Justiniano</au><au>Bonat, Wagner Hugo</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Multivariate generalized linear mixed models for underdispersed count data</atitle><jtitle>Journal of statistical computation and simulation</jtitle><date>2023-09-22</date><risdate>2023</risdate><volume>93</volume><issue>14</issue><spage>2410</spage><epage>2427</epage><pages>2410-2427</pages><issn>0094-9655</issn><eissn>1563-5163</eissn><abstract>Researchers are often interested in understanding the relationship between a set of covariates and a set of response variables. To achieve this goal, the use of regression analysis, either linear or generalized linear models, is largely applied. However, such models only allow users to model one response variable at a time. Moreover, it is not possible to directly calculate from the regression model a correlation measure between the response variables. In this article, we employed the Multivariate Generalized Linear Mixed Models framework, which allows the specification of a set of response variables and calculates the correlation between them through a random effect structure that follows a multivariate normal distribution. We used the maximum likelihood estimation framework to estimate all model parameters using Laplace approximation to integrate out the random effects. The derivatives are provided by automatic differentiation. The outer maximization was made using a general-purpose algorithm such as PORT and Broyden-Fletcher-Goldfarb-Shanno algorithm ( BFGS ). We delimited this problem by studying count response variables with the following distributions: Poisson, negative binomial, Conway-Maxwell-Poisson (COM-Poisson), and double Poisson. While the first distribution can model only equidispersed data, the second models equi and overdispersed, and the third and fourth models all types of dispersion (i.e. including underdispersion). The models were implemented on software R with package TMB , based on C++ templates. Besides the full specification, models with simpler structures in the covariance matrix were considered (fixed and common variance, and ρ set to 0) and fixed dispersion. These models were applied to a dataset from the National Health and Nutrition Examination Survey, where two response variables are underdispersed and one can be considered equidispersed that were measured at 1281 subjects. The double Poisson full model specification overcame the other three competitors considering three goodness-of-fit measures: Akaike Information Criteria (AIC), Bayesian Information Criteria (BIC), and maximized log-likelihood. Consequently, it estimated parameters with smaller standard error and a greater number of significant correlation coefficients. Therefore, the proposed model can deal with multivariate count responses and measures the correlation between them taking into account the effects of the covariates.</abstract><cop>Abingdon</cop><pub>Taylor &amp; Francis</pub><doi>10.1080/00949655.2023.2184474</doi><tpages>18</tpages><orcidid>https://orcid.org/0000-0001-5302-9446</orcidid><orcidid>https://orcid.org/0000-0001-6040-6465</orcidid><orcidid>https://orcid.org/0000-0001-7735-1077</orcidid><orcidid>https://orcid.org/0000-0002-0349-7054</orcidid><orcidid>https://orcid.org/0000-0003-1654-8356</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 0094-9655
ispartof Journal of statistical computation and simulation, 2023-09, Vol.93 (14), p.2410-2427
issn 0094-9655
1563-5163
language eng
recordid cdi_proquest_journals_2850409497
source Taylor and Francis Science and Technology Collection
subjects Algorithms
automatic differentiation
Correlation coefficients
Covariance matrix
Criteria
Dispersion
Generalized linear models
Goodness of fit
Laplace approximation
Maximum likelihood estimation
Multivariate analysis
multivariate models
Normal distribution
optimization
Parameter estimation
Regression analysis
Regression models
Specifications
Standard error
Statistical analysis
Statistical models
template model builder
Variables
title Multivariate generalized linear mixed models for underdispersed count data
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-29T08%3A53%3A16IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_infor&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Multivariate%20generalized%20linear%20mixed%20models%20for%20underdispersed%20count%20data&rft.jtitle=Journal%20of%20statistical%20computation%20and%20simulation&rft.au=da%20Silva,%20Guilherme%20Parreira&rft.date=2023-09-22&rft.volume=93&rft.issue=14&rft.spage=2410&rft.epage=2427&rft.pages=2410-2427&rft.issn=0094-9655&rft.eissn=1563-5163&rft_id=info:doi/10.1080/00949655.2023.2184474&rft_dat=%3Cproquest_infor%3E2850409497%3C/proquest_infor%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c286t-debd158c680128622ce999bbc6b991ba81b1bc70e6789815f56fb3692570a33%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2850409497&rft_id=info:pmid/&rfr_iscdi=true