Loading…
Demystifying Softmax Gating Function in Gaussian Mixture of Experts
Understanding the parameter estimation of softmax gating Gaussian mixture of experts has remained a long-standing open problem in the literature. It is mainly due to three fundamental theoretical challenges associated with the softmax gating function: (i) the identifiability only up to the translati...
Saved in:
Published in: | arXiv.org 2023-10 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | |
container_end_page | |
container_issue | |
container_start_page | |
container_title | arXiv.org |
container_volume | |
creator | Nguyen, Huy Nguyen, TrungTin Ho, Nhat |
description | Understanding the parameter estimation of softmax gating Gaussian mixture of experts has remained a long-standing open problem in the literature. It is mainly due to three fundamental theoretical challenges associated with the softmax gating function: (i) the identifiability only up to the translation of parameters; (ii) the intrinsic interaction via partial differential equations between the softmax gating and the expert functions in the Gaussian density; (iii) the complex dependence between the numerator and denominator of the conditional density of softmax gating Gaussian mixture of experts. We resolve these challenges by proposing novel Voronoi loss functions among parameters and establishing the convergence rates of maximum likelihood estimator (MLE) for solving parameter estimation in these models. When the true number of experts is unknown and over-specified, our findings show a connection between the convergence rate of the MLE and a solvability problem of a system of polynomial equations. |
format | article |
fullrecord | <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2811057778</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2811057778</sourcerecordid><originalsourceid>FETCH-proquest_journals_28110577783</originalsourceid><addsrcrecordid>eNqNikEKwjAUBYMgWLR3CLgupIkx3ddWN650X4IkkmKTmv8D7e2t4AFcPWbmrUjGhSiL6sD5huQAPWOMHxWXUmSkPplhBnR2dv5Jb8HioCd61vjFNvkHuuCp84tKAE57enUTpmhosLSZRhMRdmRt9QtM_tst2bfNvb4UYwzvZAC7PqTol9TxqiyZVEpV4r_XBxgKOmI</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2811057778</pqid></control><display><type>article</type><title>Demystifying Softmax Gating Function in Gaussian Mixture of Experts</title><source>Publicly Available Content Database</source><creator>Nguyen, Huy ; Nguyen, TrungTin ; Ho, Nhat</creator><creatorcontrib>Nguyen, Huy ; Nguyen, TrungTin ; Ho, Nhat</creatorcontrib><description>Understanding the parameter estimation of softmax gating Gaussian mixture of experts has remained a long-standing open problem in the literature. It is mainly due to three fundamental theoretical challenges associated with the softmax gating function: (i) the identifiability only up to the translation of parameters; (ii) the intrinsic interaction via partial differential equations between the softmax gating and the expert functions in the Gaussian density; (iii) the complex dependence between the numerator and denominator of the conditional density of softmax gating Gaussian mixture of experts. We resolve these challenges by proposing novel Voronoi loss functions among parameters and establishing the convergence rates of maximum likelihood estimator (MLE) for solving parameter estimation in these models. When the true number of experts is unknown and over-specified, our findings show a connection between the convergence rate of the MLE and a solvability problem of a system of polynomial equations.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Maximum likelihood estimators ; Mixtures ; Normal distribution ; Parameter estimation ; Parameter identification ; Partial differential equations ; Polynomials</subject><ispartof>arXiv.org, 2023-10</ispartof><rights>2023. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/2811057778?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>780,784,25753,37012,44590</link.rule.ids></links><search><creatorcontrib>Nguyen, Huy</creatorcontrib><creatorcontrib>Nguyen, TrungTin</creatorcontrib><creatorcontrib>Ho, Nhat</creatorcontrib><title>Demystifying Softmax Gating Function in Gaussian Mixture of Experts</title><title>arXiv.org</title><description>Understanding the parameter estimation of softmax gating Gaussian mixture of experts has remained a long-standing open problem in the literature. It is mainly due to three fundamental theoretical challenges associated with the softmax gating function: (i) the identifiability only up to the translation of parameters; (ii) the intrinsic interaction via partial differential equations between the softmax gating and the expert functions in the Gaussian density; (iii) the complex dependence between the numerator and denominator of the conditional density of softmax gating Gaussian mixture of experts. We resolve these challenges by proposing novel Voronoi loss functions among parameters and establishing the convergence rates of maximum likelihood estimator (MLE) for solving parameter estimation in these models. When the true number of experts is unknown and over-specified, our findings show a connection between the convergence rate of the MLE and a solvability problem of a system of polynomial equations.</description><subject>Maximum likelihood estimators</subject><subject>Mixtures</subject><subject>Normal distribution</subject><subject>Parameter estimation</subject><subject>Parameter identification</subject><subject>Partial differential equations</subject><subject>Polynomials</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNqNikEKwjAUBYMgWLR3CLgupIkx3ddWN650X4IkkmKTmv8D7e2t4AFcPWbmrUjGhSiL6sD5huQAPWOMHxWXUmSkPplhBnR2dv5Jb8HioCd61vjFNvkHuuCp84tKAE57enUTpmhosLSZRhMRdmRt9QtM_tst2bfNvb4UYwzvZAC7PqTol9TxqiyZVEpV4r_XBxgKOmI</recordid><startdate>20231030</startdate><enddate>20231030</enddate><creator>Nguyen, Huy</creator><creator>Nguyen, TrungTin</creator><creator>Ho, Nhat</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20231030</creationdate><title>Demystifying Softmax Gating Function in Gaussian Mixture of Experts</title><author>Nguyen, Huy ; Nguyen, TrungTin ; Ho, Nhat</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_28110577783</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Maximum likelihood estimators</topic><topic>Mixtures</topic><topic>Normal distribution</topic><topic>Parameter estimation</topic><topic>Parameter identification</topic><topic>Partial differential equations</topic><topic>Polynomials</topic><toplevel>online_resources</toplevel><creatorcontrib>Nguyen, Huy</creatorcontrib><creatorcontrib>Nguyen, TrungTin</creatorcontrib><creatorcontrib>Ho, Nhat</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Databases</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Nguyen, Huy</au><au>Nguyen, TrungTin</au><au>Ho, Nhat</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Demystifying Softmax Gating Function in Gaussian Mixture of Experts</atitle><jtitle>arXiv.org</jtitle><date>2023-10-30</date><risdate>2023</risdate><eissn>2331-8422</eissn><abstract>Understanding the parameter estimation of softmax gating Gaussian mixture of experts has remained a long-standing open problem in the literature. It is mainly due to three fundamental theoretical challenges associated with the softmax gating function: (i) the identifiability only up to the translation of parameters; (ii) the intrinsic interaction via partial differential equations between the softmax gating and the expert functions in the Gaussian density; (iii) the complex dependence between the numerator and denominator of the conditional density of softmax gating Gaussian mixture of experts. We resolve these challenges by proposing novel Voronoi loss functions among parameters and establishing the convergence rates of maximum likelihood estimator (MLE) for solving parameter estimation in these models. When the true number of experts is unknown and over-specified, our findings show a connection between the convergence rate of the MLE and a solvability problem of a system of polynomial equations.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | EISSN: 2331-8422 |
ispartof | arXiv.org, 2023-10 |
issn | 2331-8422 |
language | eng |
recordid | cdi_proquest_journals_2811057778 |
source | Publicly Available Content Database |
subjects | Maximum likelihood estimators Mixtures Normal distribution Parameter estimation Parameter identification Partial differential equations Polynomials |
title | Demystifying Softmax Gating Function in Gaussian Mixture of Experts |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-29T18%3A28%3A10IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Demystifying%20Softmax%20Gating%20Function%20in%20Gaussian%20Mixture%20of%20Experts&rft.jtitle=arXiv.org&rft.au=Nguyen,%20Huy&rft.date=2023-10-30&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2811057778%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_journals_28110577783%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2811057778&rft_id=info:pmid/&rfr_iscdi=true |