Loading…

Demystifying Softmax Gating Function in Gaussian Mixture of Experts

Understanding the parameter estimation of softmax gating Gaussian mixture of experts has remained a long-standing open problem in the literature. It is mainly due to three fundamental theoretical challenges associated with the softmax gating function: (i) the identifiability only up to the translati...

Full description

Saved in:
Bibliographic Details
Published in:arXiv.org 2023-10
Main Authors: Nguyen, Huy, Nguyen, TrungTin, Ho, Nhat
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Nguyen, Huy
Nguyen, TrungTin
Ho, Nhat
description Understanding the parameter estimation of softmax gating Gaussian mixture of experts has remained a long-standing open problem in the literature. It is mainly due to three fundamental theoretical challenges associated with the softmax gating function: (i) the identifiability only up to the translation of parameters; (ii) the intrinsic interaction via partial differential equations between the softmax gating and the expert functions in the Gaussian density; (iii) the complex dependence between the numerator and denominator of the conditional density of softmax gating Gaussian mixture of experts. We resolve these challenges by proposing novel Voronoi loss functions among parameters and establishing the convergence rates of maximum likelihood estimator (MLE) for solving parameter estimation in these models. When the true number of experts is unknown and over-specified, our findings show a connection between the convergence rate of the MLE and a solvability problem of a system of polynomial equations.
format article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2811057778</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2811057778</sourcerecordid><originalsourceid>FETCH-proquest_journals_28110577783</originalsourceid><addsrcrecordid>eNqNikEKwjAUBYMgWLR3CLgupIkx3ddWN650X4IkkmKTmv8D7e2t4AFcPWbmrUjGhSiL6sD5huQAPWOMHxWXUmSkPplhBnR2dv5Jb8HioCd61vjFNvkHuuCp84tKAE57enUTpmhosLSZRhMRdmRt9QtM_tst2bfNvb4UYwzvZAC7PqTol9TxqiyZVEpV4r_XBxgKOmI</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2811057778</pqid></control><display><type>article</type><title>Demystifying Softmax Gating Function in Gaussian Mixture of Experts</title><source>Publicly Available Content Database</source><creator>Nguyen, Huy ; Nguyen, TrungTin ; Ho, Nhat</creator><creatorcontrib>Nguyen, Huy ; Nguyen, TrungTin ; Ho, Nhat</creatorcontrib><description>Understanding the parameter estimation of softmax gating Gaussian mixture of experts has remained a long-standing open problem in the literature. It is mainly due to three fundamental theoretical challenges associated with the softmax gating function: (i) the identifiability only up to the translation of parameters; (ii) the intrinsic interaction via partial differential equations between the softmax gating and the expert functions in the Gaussian density; (iii) the complex dependence between the numerator and denominator of the conditional density of softmax gating Gaussian mixture of experts. We resolve these challenges by proposing novel Voronoi loss functions among parameters and establishing the convergence rates of maximum likelihood estimator (MLE) for solving parameter estimation in these models. When the true number of experts is unknown and over-specified, our findings show a connection between the convergence rate of the MLE and a solvability problem of a system of polynomial equations.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Maximum likelihood estimators ; Mixtures ; Normal distribution ; Parameter estimation ; Parameter identification ; Partial differential equations ; Polynomials</subject><ispartof>arXiv.org, 2023-10</ispartof><rights>2023. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/2811057778?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>780,784,25753,37012,44590</link.rule.ids></links><search><creatorcontrib>Nguyen, Huy</creatorcontrib><creatorcontrib>Nguyen, TrungTin</creatorcontrib><creatorcontrib>Ho, Nhat</creatorcontrib><title>Demystifying Softmax Gating Function in Gaussian Mixture of Experts</title><title>arXiv.org</title><description>Understanding the parameter estimation of softmax gating Gaussian mixture of experts has remained a long-standing open problem in the literature. It is mainly due to three fundamental theoretical challenges associated with the softmax gating function: (i) the identifiability only up to the translation of parameters; (ii) the intrinsic interaction via partial differential equations between the softmax gating and the expert functions in the Gaussian density; (iii) the complex dependence between the numerator and denominator of the conditional density of softmax gating Gaussian mixture of experts. We resolve these challenges by proposing novel Voronoi loss functions among parameters and establishing the convergence rates of maximum likelihood estimator (MLE) for solving parameter estimation in these models. When the true number of experts is unknown and over-specified, our findings show a connection between the convergence rate of the MLE and a solvability problem of a system of polynomial equations.</description><subject>Maximum likelihood estimators</subject><subject>Mixtures</subject><subject>Normal distribution</subject><subject>Parameter estimation</subject><subject>Parameter identification</subject><subject>Partial differential equations</subject><subject>Polynomials</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNqNikEKwjAUBYMgWLR3CLgupIkx3ddWN650X4IkkmKTmv8D7e2t4AFcPWbmrUjGhSiL6sD5huQAPWOMHxWXUmSkPplhBnR2dv5Jb8HioCd61vjFNvkHuuCp84tKAE57enUTpmhosLSZRhMRdmRt9QtM_tst2bfNvb4UYwzvZAC7PqTol9TxqiyZVEpV4r_XBxgKOmI</recordid><startdate>20231030</startdate><enddate>20231030</enddate><creator>Nguyen, Huy</creator><creator>Nguyen, TrungTin</creator><creator>Ho, Nhat</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20231030</creationdate><title>Demystifying Softmax Gating Function in Gaussian Mixture of Experts</title><author>Nguyen, Huy ; Nguyen, TrungTin ; Ho, Nhat</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_28110577783</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Maximum likelihood estimators</topic><topic>Mixtures</topic><topic>Normal distribution</topic><topic>Parameter estimation</topic><topic>Parameter identification</topic><topic>Partial differential equations</topic><topic>Polynomials</topic><toplevel>online_resources</toplevel><creatorcontrib>Nguyen, Huy</creatorcontrib><creatorcontrib>Nguyen, TrungTin</creatorcontrib><creatorcontrib>Ho, Nhat</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Databases</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Nguyen, Huy</au><au>Nguyen, TrungTin</au><au>Ho, Nhat</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Demystifying Softmax Gating Function in Gaussian Mixture of Experts</atitle><jtitle>arXiv.org</jtitle><date>2023-10-30</date><risdate>2023</risdate><eissn>2331-8422</eissn><abstract>Understanding the parameter estimation of softmax gating Gaussian mixture of experts has remained a long-standing open problem in the literature. It is mainly due to three fundamental theoretical challenges associated with the softmax gating function: (i) the identifiability only up to the translation of parameters; (ii) the intrinsic interaction via partial differential equations between the softmax gating and the expert functions in the Gaussian density; (iii) the complex dependence between the numerator and denominator of the conditional density of softmax gating Gaussian mixture of experts. We resolve these challenges by proposing novel Voronoi loss functions among parameters and establishing the convergence rates of maximum likelihood estimator (MLE) for solving parameter estimation in these models. When the true number of experts is unknown and over-specified, our findings show a connection between the convergence rate of the MLE and a solvability problem of a system of polynomial equations.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2023-10
issn 2331-8422
language eng
recordid cdi_proquest_journals_2811057778
source Publicly Available Content Database
subjects Maximum likelihood estimators
Mixtures
Normal distribution
Parameter estimation
Parameter identification
Partial differential equations
Polynomials
title Demystifying Softmax Gating Function in Gaussian Mixture of Experts
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-29T18%3A28%3A10IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Demystifying%20Softmax%20Gating%20Function%20in%20Gaussian%20Mixture%20of%20Experts&rft.jtitle=arXiv.org&rft.au=Nguyen,%20Huy&rft.date=2023-10-30&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2811057778%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_journals_28110577783%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2811057778&rft_id=info:pmid/&rfr_iscdi=true