Loading…

Scaling Laws for Precision

Low precision training and inference affect both the quality and cost of language models, but current scaling laws do not account for this. In this work, we devise "precision-aware" scaling laws for both training and inference. We propose that training in lower precision reduces the model&...

Full description

Saved in:
Bibliographic Details
Published in:arXiv.org 2024-11
Main Authors: Kumar, Tanishq, Ankner, Zachary, Spector, Benjamin F, Bordelon, Blake, Muennighoff, Niklas, Mansheej, Paul, Pehlevan, Cengiz, Ré, Christopher, Raghunathan, Aditi
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Kumar, Tanishq
Ankner, Zachary
Spector, Benjamin F
Bordelon, Blake
Muennighoff, Niklas
Mansheej, Paul
Pehlevan, Cengiz
Ré, Christopher
Raghunathan, Aditi
description Low precision training and inference affect both the quality and cost of language models, but current scaling laws do not account for this. In this work, we devise "precision-aware" scaling laws for both training and inference. We propose that training in lower precision reduces the model's "effective parameter count," allowing us to predict the additional loss incurred from training in low precision and post-train quantization. For inference, we find that the degradation introduced by post-training quantization increases as models are trained on more data, eventually making additional pretraining data actively harmful. For training, our scaling laws allow us to predict the loss of a model with different parts in different precisions, and suggest that training larger models in lower precision may be compute optimal. We unify the scaling laws for post and pretraining quantization to arrive at a single functional form that predicts degradation from training and inference in varied precisions. We fit on over 465 pretraining runs and validate our predictions on model sizes up to 1.7B parameters trained on up to 26B tokens.
format article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_3126159854</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3126159854</sourcerecordid><originalsourceid>FETCH-proquest_journals_31261598543</originalsourceid><addsrcrecordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mSQCk5OzMnMS1fwSSwvVkjLL1IIKEpNzizOzM_jYWBNS8wpTuWF0twMym6uIc4eugVF-YWlqcUl8Vn5pUV5QKl4Y0MjM0NTSwtTE2PiVAEAqicqog</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3126159854</pqid></control><display><type>article</type><title>Scaling Laws for Precision</title><source>Publicly Available Content Database</source><creator>Kumar, Tanishq ; Ankner, Zachary ; Spector, Benjamin F ; Bordelon, Blake ; Muennighoff, Niklas ; Mansheej, Paul ; Pehlevan, Cengiz ; Ré, Christopher ; Raghunathan, Aditi</creator><creatorcontrib>Kumar, Tanishq ; Ankner, Zachary ; Spector, Benjamin F ; Bordelon, Blake ; Muennighoff, Niklas ; Mansheej, Paul ; Pehlevan, Cengiz ; Ré, Christopher ; Raghunathan, Aditi</creatorcontrib><description>Low precision training and inference affect both the quality and cost of language models, but current scaling laws do not account for this. In this work, we devise "precision-aware" scaling laws for both training and inference. We propose that training in lower precision reduces the model's "effective parameter count," allowing us to predict the additional loss incurred from training in low precision and post-train quantization. For inference, we find that the degradation introduced by post-training quantization increases as models are trained on more data, eventually making additional pretraining data actively harmful. For training, our scaling laws allow us to predict the loss of a model with different parts in different precisions, and suggest that training larger models in lower precision may be compute optimal. We unify the scaling laws for post and pretraining quantization to arrive at a single functional form that predicts degradation from training and inference in varied precisions. We fit on over 465 pretraining runs and validate our predictions on model sizes up to 1.7B parameters trained on up to 26B tokens.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Degradation ; Inference ; Parameters ; Scaling laws</subject><ispartof>arXiv.org, 2024-11</ispartof><rights>2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/3126159854?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>780,784,25753,37012,44590</link.rule.ids></links><search><creatorcontrib>Kumar, Tanishq</creatorcontrib><creatorcontrib>Ankner, Zachary</creatorcontrib><creatorcontrib>Spector, Benjamin F</creatorcontrib><creatorcontrib>Bordelon, Blake</creatorcontrib><creatorcontrib>Muennighoff, Niklas</creatorcontrib><creatorcontrib>Mansheej, Paul</creatorcontrib><creatorcontrib>Pehlevan, Cengiz</creatorcontrib><creatorcontrib>Ré, Christopher</creatorcontrib><creatorcontrib>Raghunathan, Aditi</creatorcontrib><title>Scaling Laws for Precision</title><title>arXiv.org</title><description>Low precision training and inference affect both the quality and cost of language models, but current scaling laws do not account for this. In this work, we devise "precision-aware" scaling laws for both training and inference. We propose that training in lower precision reduces the model's "effective parameter count," allowing us to predict the additional loss incurred from training in low precision and post-train quantization. For inference, we find that the degradation introduced by post-training quantization increases as models are trained on more data, eventually making additional pretraining data actively harmful. For training, our scaling laws allow us to predict the loss of a model with different parts in different precisions, and suggest that training larger models in lower precision may be compute optimal. We unify the scaling laws for post and pretraining quantization to arrive at a single functional form that predicts degradation from training and inference in varied precisions. We fit on over 465 pretraining runs and validate our predictions on model sizes up to 1.7B parameters trained on up to 26B tokens.</description><subject>Degradation</subject><subject>Inference</subject><subject>Parameters</subject><subject>Scaling laws</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mSQCk5OzMnMS1fwSSwvVkjLL1IIKEpNzizOzM_jYWBNS8wpTuWF0twMym6uIc4eugVF-YWlqcUl8Vn5pUV5QKl4Y0MjM0NTSwtTE2PiVAEAqicqog</recordid><startdate>20241130</startdate><enddate>20241130</enddate><creator>Kumar, Tanishq</creator><creator>Ankner, Zachary</creator><creator>Spector, Benjamin F</creator><creator>Bordelon, Blake</creator><creator>Muennighoff, Niklas</creator><creator>Mansheej, Paul</creator><creator>Pehlevan, Cengiz</creator><creator>Ré, Christopher</creator><creator>Raghunathan, Aditi</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20241130</creationdate><title>Scaling Laws for Precision</title><author>Kumar, Tanishq ; Ankner, Zachary ; Spector, Benjamin F ; Bordelon, Blake ; Muennighoff, Niklas ; Mansheej, Paul ; Pehlevan, Cengiz ; Ré, Christopher ; Raghunathan, Aditi</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_31261598543</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Degradation</topic><topic>Inference</topic><topic>Parameters</topic><topic>Scaling laws</topic><toplevel>online_resources</toplevel><creatorcontrib>Kumar, Tanishq</creatorcontrib><creatorcontrib>Ankner, Zachary</creatorcontrib><creatorcontrib>Spector, Benjamin F</creatorcontrib><creatorcontrib>Bordelon, Blake</creatorcontrib><creatorcontrib>Muennighoff, Niklas</creatorcontrib><creatorcontrib>Mansheej, Paul</creatorcontrib><creatorcontrib>Pehlevan, Cengiz</creatorcontrib><creatorcontrib>Ré, Christopher</creatorcontrib><creatorcontrib>Raghunathan, Aditi</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Kumar, Tanishq</au><au>Ankner, Zachary</au><au>Spector, Benjamin F</au><au>Bordelon, Blake</au><au>Muennighoff, Niklas</au><au>Mansheej, Paul</au><au>Pehlevan, Cengiz</au><au>Ré, Christopher</au><au>Raghunathan, Aditi</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Scaling Laws for Precision</atitle><jtitle>arXiv.org</jtitle><date>2024-11-30</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>Low precision training and inference affect both the quality and cost of language models, but current scaling laws do not account for this. In this work, we devise "precision-aware" scaling laws for both training and inference. We propose that training in lower precision reduces the model's "effective parameter count," allowing us to predict the additional loss incurred from training in low precision and post-train quantization. For inference, we find that the degradation introduced by post-training quantization increases as models are trained on more data, eventually making additional pretraining data actively harmful. For training, our scaling laws allow us to predict the loss of a model with different parts in different precisions, and suggest that training larger models in lower precision may be compute optimal. We unify the scaling laws for post and pretraining quantization to arrive at a single functional form that predicts degradation from training and inference in varied precisions. We fit on over 465 pretraining runs and validate our predictions on model sizes up to 1.7B parameters trained on up to 26B tokens.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2024-11
issn 2331-8422
language eng
recordid cdi_proquest_journals_3126159854
source Publicly Available Content Database
subjects Degradation
Inference
Parameters
Scaling laws
title Scaling Laws for Precision
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-02T20%3A41%3A42IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Scaling%20Laws%20for%20Precision&rft.jtitle=arXiv.org&rft.au=Kumar,%20Tanishq&rft.date=2024-11-30&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E3126159854%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_journals_31261598543%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=3126159854&rft_id=info:pmid/&rfr_iscdi=true