Loading…

Scaling Laws for Precision

Low precision training and inference affect both the quality and cost of language models, but current scaling laws do not account for this. In this work, we devise "precision-aware" scaling laws for both training and inference. We propose that training in lower precision reduces the model&...

Full description

Saved in:

Bibliographic Details
Published in:	arXiv.org 2024-11
Main Authors:	Kumar, Tanishq, Ankner, Zachary, Spector, Benjamin F, Bordelon, Blake, Muennighoff, Niklas, Mansheej, Paul, Pehlevan, Cengiz, Ré, Christopher, Raghunathan, Aditi
Format:	Article
Language:	English
Subjects:	Degradation Inference Parameters Scaling laws
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites
container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Kumar, Tanishq Ankner, Zachary Spector, Benjamin F Bordelon, Blake Muennighoff, Niklas Mansheej, Paul Pehlevan, Cengiz Ré, Christopher Raghunathan, Aditi
description	Low precision training and inference affect both the quality and cost of language models, but current scaling laws do not account for this. In this work, we devise "precision-aware" scaling laws for both training and inference. We propose that training in lower precision reduces the model's "effective parameter count," allowing us to predict the additional loss incurred from training in low precision and post-train quantization. For inference, we find that the degradation introduced by post-training quantization increases as models are trained on more data, eventually making additional pretraining data actively harmful. For training, our scaling laws allow us to predict the loss of a model with different parts in different precisions, and suggest that training larger models in lower precision may be compute optimal. We unify the scaling laws for post and pretraining quantization to arrive at a single functional form that predicts degradation from training and inference in varied precisions. We fit on over 465 pretraining runs and validate our predictions on model sizes up to 1.7B parameters trained on up to 26B tokens.
format	article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_3126159854</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3126159854</sourcerecordid><originalsourceid>FETCH-proquest_journals_31261598543</originalsourceid><addsrcrecordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mSQCk5OzMnMS1fwSSwvVkjLL1IIKEpNzizOzM_jYWBNS8wpTuWF0twMym6uIc4eugVF-YWlqcUl8Vn5pUV5QKl4Y0MjM0NTSwtTE2PiVAEAqicqog</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3126159854</pqid></control><display><type>article</type><title>Scaling Laws for Precision</title><source>Publicly Available Content Database</source><creator>Kumar, Tanishq ; Ankner, Zachary ; Spector, Benjamin F ; Bordelon, Blake ; Muennighoff, Niklas ; Mansheej, Paul ; Pehlevan, Cengiz ; Ré, Christopher ; Raghunathan, Aditi</creator><creatorcontrib>Kumar, Tanishq ; Ankner, Zachary ; Spector, Benjamin F ; Bordelon, Blake ; Muennighoff, Niklas ; Mansheej, Paul ; Pehlevan, Cengiz ; Ré, Christopher ; Raghunathan, Aditi</creatorcontrib><description>Low precision training and inference affect both the quality and cost of language models, but current scaling laws do not account for this. In this work, we devise "precision-aware" scaling laws for both training and inference. We propose that training in lower precision reduces the model's "effective parameter count," allowing us to predict the additional loss incurred from training in low precision and post-train quantization. For inference, we find that the degradation introduced by post-training quantization increases as models are trained on more data, eventually making additional pretraining data actively harmful. For training, our scaling laws allow us to predict the loss of a model with different parts in different precisions, and suggest that training larger models in lower precision may be compute optimal. We unify the scaling laws for post and pretraining quantization to arrive at a single functional form that predicts degradation from training and inference in varied precisions. We fit on over 465 pretraining runs and validate our predictions on model sizes up to 1.7B parameters trained on up to 26B tokens.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Degradation ; Inference ; Parameters ; Scaling laws</subject><ispartof>arXiv.org, 2024-11</ispartof><rights>2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/3126159854?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>780,784,25753,37012,44590</link.rule.ids></links><search><creatorcontrib>Kumar, Tanishq</creatorcontrib><creatorcontrib>Ankner, Zachary</creatorcontrib><creatorcontrib>Spector, Benjamin F</creatorcontrib><creatorcontrib>Bordelon, Blake</creatorcontrib><creatorcontrib>Muennighoff, Niklas</creatorcontrib><creatorcontrib>Mansheej, Paul</creatorcontrib><creatorcontrib>Pehlevan, Cengiz</creatorcontrib><creatorcontrib>Ré, Christopher</creatorcontrib><creatorcontrib>Raghunathan, Aditi</creatorcontrib><title>Scaling Laws for Precision</title><title>arXiv.org</title><description>Low precision training and inference affect both the quality and cost of language models, but current scaling laws do not account for this. In this work, we devise "precision-aware" scaling laws for both training and inference. We propose that training in lower precision reduces the model's "effective parameter count," allowing us to predict the additional loss incurred from training in low precision and post-train quantization. For inference, we find that the degradation introduced by post-training quantization increases as models are trained on more data, eventually making additional pretraining data actively harmful. For training, our scaling laws allow us to predict the loss of a model with different parts in different precisions, and suggest that training larger models in lower precision may be compute optimal. We unify the scaling laws for post and pretraining quantization to arrive at a single functional form that predicts degradation from training and inference in varied precisions. We fit on over 465 pretraining runs and validate our predictions on model sizes up to 1.7B parameters trained on up to 26B tokens.</description><subject>Degradation</subject><subject>Inference</subject><subject>Parameters</subject><subject>Scaling laws</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mSQCk5OzMnMS1fwSSwvVkjLL1IIKEpNzizOzM_jYWBNS8wpTuWF0twMym6uIc4eugVF-YWlqcUl8Vn5pUV5QKl4Y0MjM0NTSwtTE2PiVAEAqicqog</recordid><startdate>20241130</startdate><enddate>20241130</enddate><creator>Kumar, Tanishq</creator><creator>Ankner, Zachary</creator><creator>Spector, Benjamin F</creator><creator>Bordelon, Blake</creator><creator>Muennighoff, Niklas</creator><creator>Mansheej, Paul</creator><creator>Pehlevan, Cengiz</creator><creator>Ré, Christopher</creator><creator>Raghunathan, Aditi</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20241130</creationdate><title>Scaling Laws for Precision</title><author>Kumar, Tanishq ; Ankner, Zachary ; Spector, Benjamin F ; Bordelon, Blake ; Muennighoff, Niklas ; Mansheej, Paul ; Pehlevan, Cengiz ; Ré, Christopher ; Raghunathan, Aditi</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_31261598543</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Degradation</topic><topic>Inference</topic><topic>Parameters</topic><topic>Scaling laws</topic><toplevel>online_resources</toplevel><creatorcontrib>Kumar, Tanishq</creatorcontrib><creatorcontrib>Ankner, Zachary</creatorcontrib><creatorcontrib>Spector, Benjamin F</creatorcontrib><creatorcontrib>Bordelon, Blake</creatorcontrib><creatorcontrib>Muennighoff, Niklas</creatorcontrib><creatorcontrib>Mansheej, Paul</creatorcontrib><creatorcontrib>Pehlevan, Cengiz</creatorcontrib><creatorcontrib>Ré, Christopher</creatorcontrib><creatorcontrib>Raghunathan, Aditi</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Kumar, Tanishq</au><au>Ankner, Zachary</au><au>Spector, Benjamin F</au><au>Bordelon, Blake</au><au>Muennighoff, Niklas</au><au>Mansheej, Paul</au><au>Pehlevan, Cengiz</au><au>Ré, Christopher</au><au>Raghunathan, Aditi</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Scaling Laws for Precision</atitle><jtitle>arXiv.org</jtitle><date>2024-11-30</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>Low precision training and inference affect both the quality and cost of language models, but current scaling laws do not account for this. In this work, we devise "precision-aware" scaling laws for both training and inference. We propose that training in lower precision reduces the model's "effective parameter count," allowing us to predict the additional loss incurred from training in low precision and post-train quantization. For inference, we find that the degradation introduced by post-training quantization increases as models are trained on more data, eventually making additional pretraining data actively harmful. For training, our scaling laws allow us to predict the loss of a model with different parts in different precisions, and suggest that training larger models in lower precision may be compute optimal. We unify the scaling laws for post and pretraining quantization to arrive at a single functional form that predicts degradation from training and inference in varied precisions. We fit on over 465 pretraining runs and validate our predictions on model sizes up to 1.7B parameters trained on up to 26B tokens.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2024-11
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_3126159854
source	Publicly Available Content Database
subjects	Degradation Inference Parameters Scaling laws
title	Scaling Laws for Precision
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-02T20%3A41%3A42IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Scaling%20Laws%20for%20Precision&rft.jtitle=arXiv.org&rft.au=Kumar,%20Tanishq&rft.date=2024-11-30&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E3126159854%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_journals_31261598543%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=3126159854&rft_id=info:pmid/&rfr_iscdi=true