Loading…
Loss aware post-training quantization
Neural network quantization enables the deployment of large models on resource-constrained devices. Current post-training quantization methods fall short in terms of accuracy for INT4 (or lower) but provide reasonable accuracy for INT8 (or above). In this work, we study the effect of quantization on...
Saved in:
Published in: | Machine learning 2021-12, Vol.110 (11-12), p.3245-3262 |
---|---|
Main Authors: | , , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c363t-984a27739d06135bc1fd9b0075d30c639a1445a38619e5a5db6349c2b3d78d243 |
---|---|
cites | cdi_FETCH-LOGICAL-c363t-984a27739d06135bc1fd9b0075d30c639a1445a38619e5a5db6349c2b3d78d243 |
container_end_page | 3262 |
container_issue | 11-12 |
container_start_page | 3245 |
container_title | Machine learning |
container_volume | 110 |
creator | Nahshan, Yury Chmiel, Brian Baskin, Chaim Zheltonozhskii, Evgenii Banner, Ron Bronstein, Alex M. Mendelson, Avi |
description | Neural network quantization enables the deployment of large models on resource-constrained devices. Current post-training quantization methods fall short in terms of accuracy for INT4 (or lower) but provide reasonable accuracy for INT8 (or above). In this work, we study the effect of quantization on the structure of the loss landscape. We show that the structure is flat and separable for mild quantization, enabling straightforward post-training quantization methods to achieve good results. We show that with more aggressive quantization, the loss landscape becomes highly non-separable with steep curvature, making the selection of quantization parameters more challenging. Armed with this understanding, we design a method that quantizes the layer parameters jointly, enabling significant accuracy improvement over current post-training quantization methods. Reference implementation is available at
https://github.com/ynahshan/nn-quantization-pytorch/tree/master/lapq
. |
doi_str_mv | 10.1007/s10994-021-06053-z |
format | article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2601154875</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2601154875</sourcerecordid><originalsourceid>FETCH-LOGICAL-c363t-984a27739d06135bc1fd9b0075d30c639a1445a38619e5a5db6349c2b3d78d243</originalsourceid><addsrcrecordid>eNp9kE1LAzEQhoMoWKt_wFNBPEZnMplscpSiVSh40XPIfrRs0d022SL21xtdwZunubzPOy-PEJcINwhQ3CYE57QEhRIMMMnDkZggFySBDR-LCVjL0qDiU3GW0gYAlLFmIq6XfUqz8BFiM9v2aZBDDG3XduvZbh-6oT2Eoe27c3GyCm-pufi9U_H6cP8yf5TL58XT_G4pKzI0SGd1UEVBrgaDxGWFq9qVeR_XBJUhF1BrDmQNuoYD16Uh7SpVUl3YWmmaiquxdxv73b5Jg9_0-9jll14ZQGRtC84pNaaqmMfHZuW3sX0P8dMj-G8dftThsw7_o8MfMkQjlHK4Wzfxr_of6gvW1mEu</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2601154875</pqid></control><display><type>article</type><title>Loss aware post-training quantization</title><source>Springer Nature</source><creator>Nahshan, Yury ; Chmiel, Brian ; Baskin, Chaim ; Zheltonozhskii, Evgenii ; Banner, Ron ; Bronstein, Alex M. ; Mendelson, Avi</creator><creatorcontrib>Nahshan, Yury ; Chmiel, Brian ; Baskin, Chaim ; Zheltonozhskii, Evgenii ; Banner, Ron ; Bronstein, Alex M. ; Mendelson, Avi</creatorcontrib><description>Neural network quantization enables the deployment of large models on resource-constrained devices. Current post-training quantization methods fall short in terms of accuracy for INT4 (or lower) but provide reasonable accuracy for INT8 (or above). In this work, we study the effect of quantization on the structure of the loss landscape. We show that the structure is flat and separable for mild quantization, enabling straightforward post-training quantization methods to achieve good results. We show that with more aggressive quantization, the loss landscape becomes highly non-separable with steep curvature, making the selection of quantization parameters more challenging. Armed with this understanding, we design a method that quantizes the layer parameters jointly, enabling significant accuracy improvement over current post-training quantization methods. Reference implementation is available at
https://github.com/ynahshan/nn-quantization-pytorch/tree/master/lapq
.</description><identifier>ISSN: 0885-6125</identifier><identifier>EISSN: 1573-0565</identifier><identifier>DOI: 10.1007/s10994-021-06053-z</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Accuracy ; Artificial Intelligence ; Computer Science ; Control ; Machine Learning ; Measurement ; Mechatronics ; Natural Language Processing (NLP) ; Neural networks ; Parameters ; Robotics ; Simulation and Modeling ; Training</subject><ispartof>Machine learning, 2021-12, Vol.110 (11-12), p.3245-3262</ispartof><rights>The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature 2021</rights><rights>The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature 2021.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c363t-984a27739d06135bc1fd9b0075d30c639a1445a38619e5a5db6349c2b3d78d243</citedby><cites>FETCH-LOGICAL-c363t-984a27739d06135bc1fd9b0075d30c639a1445a38619e5a5db6349c2b3d78d243</cites><orcidid>0000-0002-5400-9321</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27903,27904</link.rule.ids></links><search><creatorcontrib>Nahshan, Yury</creatorcontrib><creatorcontrib>Chmiel, Brian</creatorcontrib><creatorcontrib>Baskin, Chaim</creatorcontrib><creatorcontrib>Zheltonozhskii, Evgenii</creatorcontrib><creatorcontrib>Banner, Ron</creatorcontrib><creatorcontrib>Bronstein, Alex M.</creatorcontrib><creatorcontrib>Mendelson, Avi</creatorcontrib><title>Loss aware post-training quantization</title><title>Machine learning</title><addtitle>Mach Learn</addtitle><description>Neural network quantization enables the deployment of large models on resource-constrained devices. Current post-training quantization methods fall short in terms of accuracy for INT4 (or lower) but provide reasonable accuracy for INT8 (or above). In this work, we study the effect of quantization on the structure of the loss landscape. We show that the structure is flat and separable for mild quantization, enabling straightforward post-training quantization methods to achieve good results. We show that with more aggressive quantization, the loss landscape becomes highly non-separable with steep curvature, making the selection of quantization parameters more challenging. Armed with this understanding, we design a method that quantizes the layer parameters jointly, enabling significant accuracy improvement over current post-training quantization methods. Reference implementation is available at
https://github.com/ynahshan/nn-quantization-pytorch/tree/master/lapq
.</description><subject>Accuracy</subject><subject>Artificial Intelligence</subject><subject>Computer Science</subject><subject>Control</subject><subject>Machine Learning</subject><subject>Measurement</subject><subject>Mechatronics</subject><subject>Natural Language Processing (NLP)</subject><subject>Neural networks</subject><subject>Parameters</subject><subject>Robotics</subject><subject>Simulation and Modeling</subject><subject>Training</subject><issn>0885-6125</issn><issn>1573-0565</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><recordid>eNp9kE1LAzEQhoMoWKt_wFNBPEZnMplscpSiVSh40XPIfrRs0d022SL21xtdwZunubzPOy-PEJcINwhQ3CYE57QEhRIMMMnDkZggFySBDR-LCVjL0qDiU3GW0gYAlLFmIq6XfUqz8BFiM9v2aZBDDG3XduvZbh-6oT2Eoe27c3GyCm-pufi9U_H6cP8yf5TL58XT_G4pKzI0SGd1UEVBrgaDxGWFq9qVeR_XBJUhF1BrDmQNuoYD16Uh7SpVUl3YWmmaiquxdxv73b5Jg9_0-9jll14ZQGRtC84pNaaqmMfHZuW3sX0P8dMj-G8dftThsw7_o8MfMkQjlHK4Wzfxr_of6gvW1mEu</recordid><startdate>20211201</startdate><enddate>20211201</enddate><creator>Nahshan, Yury</creator><creator>Chmiel, Brian</creator><creator>Baskin, Chaim</creator><creator>Zheltonozhskii, Evgenii</creator><creator>Banner, Ron</creator><creator>Bronstein, Alex M.</creator><creator>Mendelson, Avi</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7SC</scope><scope>7XB</scope><scope>88I</scope><scope>8AL</scope><scope>8AO</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M0N</scope><scope>M2P</scope><scope>P5Z</scope><scope>P62</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope><orcidid>https://orcid.org/0000-0002-5400-9321</orcidid></search><sort><creationdate>20211201</creationdate><title>Loss aware post-training quantization</title><author>Nahshan, Yury ; Chmiel, Brian ; Baskin, Chaim ; Zheltonozhskii, Evgenii ; Banner, Ron ; Bronstein, Alex M. ; Mendelson, Avi</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c363t-984a27739d06135bc1fd9b0075d30c639a1445a38619e5a5db6349c2b3d78d243</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Accuracy</topic><topic>Artificial Intelligence</topic><topic>Computer Science</topic><topic>Control</topic><topic>Machine Learning</topic><topic>Measurement</topic><topic>Mechatronics</topic><topic>Natural Language Processing (NLP)</topic><topic>Neural networks</topic><topic>Parameters</topic><topic>Robotics</topic><topic>Simulation and Modeling</topic><topic>Training</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Nahshan, Yury</creatorcontrib><creatorcontrib>Chmiel, Brian</creatorcontrib><creatorcontrib>Baskin, Chaim</creatorcontrib><creatorcontrib>Zheltonozhskii, Evgenii</creatorcontrib><creatorcontrib>Banner, Ron</creatorcontrib><creatorcontrib>Bronstein, Alex M.</creatorcontrib><creatorcontrib>Mendelson, Avi</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Computer and Information Systems Abstracts</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Science Database (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Computing Database</collection><collection>ProQuest Science Journals</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><jtitle>Machine learning</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Nahshan, Yury</au><au>Chmiel, Brian</au><au>Baskin, Chaim</au><au>Zheltonozhskii, Evgenii</au><au>Banner, Ron</au><au>Bronstein, Alex M.</au><au>Mendelson, Avi</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Loss aware post-training quantization</atitle><jtitle>Machine learning</jtitle><stitle>Mach Learn</stitle><date>2021-12-01</date><risdate>2021</risdate><volume>110</volume><issue>11-12</issue><spage>3245</spage><epage>3262</epage><pages>3245-3262</pages><issn>0885-6125</issn><eissn>1573-0565</eissn><abstract>Neural network quantization enables the deployment of large models on resource-constrained devices. Current post-training quantization methods fall short in terms of accuracy for INT4 (or lower) but provide reasonable accuracy for INT8 (or above). In this work, we study the effect of quantization on the structure of the loss landscape. We show that the structure is flat and separable for mild quantization, enabling straightforward post-training quantization methods to achieve good results. We show that with more aggressive quantization, the loss landscape becomes highly non-separable with steep curvature, making the selection of quantization parameters more challenging. Armed with this understanding, we design a method that quantizes the layer parameters jointly, enabling significant accuracy improvement over current post-training quantization methods. Reference implementation is available at
https://github.com/ynahshan/nn-quantization-pytorch/tree/master/lapq
.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s10994-021-06053-z</doi><tpages>18</tpages><orcidid>https://orcid.org/0000-0002-5400-9321</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0885-6125 |
ispartof | Machine learning, 2021-12, Vol.110 (11-12), p.3245-3262 |
issn | 0885-6125 1573-0565 |
language | eng |
recordid | cdi_proquest_journals_2601154875 |
source | Springer Nature |
subjects | Accuracy Artificial Intelligence Computer Science Control Machine Learning Measurement Mechatronics Natural Language Processing (NLP) Neural networks Parameters Robotics Simulation and Modeling Training |
title | Loss aware post-training quantization |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-21T16%3A43%3A35IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Loss%20aware%20post-training%20quantization&rft.jtitle=Machine%20learning&rft.au=Nahshan,%20Yury&rft.date=2021-12-01&rft.volume=110&rft.issue=11-12&rft.spage=3245&rft.epage=3262&rft.pages=3245-3262&rft.issn=0885-6125&rft.eissn=1573-0565&rft_id=info:doi/10.1007/s10994-021-06053-z&rft_dat=%3Cproquest_cross%3E2601154875%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c363t-984a27739d06135bc1fd9b0075d30c639a1445a38619e5a5db6349c2b3d78d243%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2601154875&rft_id=info:pmid/&rfr_iscdi=true |