Loading…
100K or 100 Days: Trade-offs when Pre-Training with Academic Resources
Pre-training is notoriously compute-intensive and academic researchers are notoriously under-resourced. It is, therefore, commonly assumed that academics can't pre-train models. In this paper, we seek to clarify this assumption. We first survey academic researchers to learn about their availabl...
Saved in:
Published in: | arXiv.org 2024-10 |
---|---|
Main Authors: | , , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | |
container_end_page | |
container_issue | |
container_start_page | |
container_title | arXiv.org |
container_volume | |
creator | Khandelwal, Apoorv Tian Yun Nayak, Nihal V Merullo, Jack Bach, Stephen H Chen, Sun Pavlick, Ellie |
description | Pre-training is notoriously compute-intensive and academic researchers are notoriously under-resourced. It is, therefore, commonly assumed that academics can't pre-train models. In this paper, we seek to clarify this assumption. We first survey academic researchers to learn about their available compute and then empirically measure the time to replicate models on such resources. We introduce a benchmark to measure the time to pre-train models on given GPUs and also identify ideal settings for maximizing training speed. We run our benchmark on a range of models and academic GPUs, spending 2,000 GPU-hours on our experiments. Our results reveal a brighter picture for academic pre-training: for example, although Pythia-1B was originally trained on 64 GPUs for 3 days, we find it is also possible to replicate this model (with the same hyper-parameters) in 3x fewer GPU-days: i.e. on 4 GPUs in 18 days. We conclude with a cost-benefit analysis to help clarify the trade-offs between price and pre-training time. We believe our benchmark will help academic researchers conduct experiments that require training larger models on more data. We fully release our codebase at: https://github.com/apoorvkh/academic-pretraining. |
format | article |
fullrecord | <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_3122752027</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3122752027</sourcerecordid><originalsourceid>FETCH-proquest_journals_31227520273</originalsourceid><addsrcrecordid>eNqNykEKwjAQheEgCBbtHQZcB9KJteJO1CK4Eem-lDq1KZpopqV4e7PwAK5-eO-biAi1TuRmhTgTMXOnlMJ1hmmqI5EnSp3BeQiFQ_XhLRS-upF0TcMwtmTh4kmGzVhj7zCavoVdHcTT1HAldoOviRdi2lQPpvjXuVjmx2J_ki_v3gNxX3YB2nCVOkHMUlSY6f_UF02gOM0</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3122752027</pqid></control><display><type>article</type><title>100K or 100 Days: Trade-offs when Pre-Training with Academic Resources</title><source>Publicly Available Content Database</source><creator>Khandelwal, Apoorv ; Tian Yun ; Nayak, Nihal V ; Merullo, Jack ; Bach, Stephen H ; Chen, Sun ; Pavlick, Ellie</creator><creatorcontrib>Khandelwal, Apoorv ; Tian Yun ; Nayak, Nihal V ; Merullo, Jack ; Bach, Stephen H ; Chen, Sun ; Pavlick, Ellie</creatorcontrib><description>Pre-training is notoriously compute-intensive and academic researchers are notoriously under-resourced. It is, therefore, commonly assumed that academics can't pre-train models. In this paper, we seek to clarify this assumption. We first survey academic researchers to learn about their available compute and then empirically measure the time to replicate models on such resources. We introduce a benchmark to measure the time to pre-train models on given GPUs and also identify ideal settings for maximizing training speed. We run our benchmark on a range of models and academic GPUs, spending 2,000 GPU-hours on our experiments. Our results reveal a brighter picture for academic pre-training: for example, although Pythia-1B was originally trained on 64 GPUs for 3 days, we find it is also possible to replicate this model (with the same hyper-parameters) in 3x fewer GPU-days: i.e. on 4 GPUs in 18 days. We conclude with a cost-benefit analysis to help clarify the trade-offs between price and pre-training time. We believe our benchmark will help academic researchers conduct experiments that require training larger models on more data. We fully release our codebase at: https://github.com/apoorvkh/academic-pretraining.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Benchmarks ; Cost benefit analysis ; Time measurement ; Tradeoffs</subject><ispartof>arXiv.org, 2024-10</ispartof><rights>2024. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/3122752027?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>778,782,25736,36995,44573</link.rule.ids></links><search><creatorcontrib>Khandelwal, Apoorv</creatorcontrib><creatorcontrib>Tian Yun</creatorcontrib><creatorcontrib>Nayak, Nihal V</creatorcontrib><creatorcontrib>Merullo, Jack</creatorcontrib><creatorcontrib>Bach, Stephen H</creatorcontrib><creatorcontrib>Chen, Sun</creatorcontrib><creatorcontrib>Pavlick, Ellie</creatorcontrib><title>100K or 100 Days: Trade-offs when Pre-Training with Academic Resources</title><title>arXiv.org</title><description>Pre-training is notoriously compute-intensive and academic researchers are notoriously under-resourced. It is, therefore, commonly assumed that academics can't pre-train models. In this paper, we seek to clarify this assumption. We first survey academic researchers to learn about their available compute and then empirically measure the time to replicate models on such resources. We introduce a benchmark to measure the time to pre-train models on given GPUs and also identify ideal settings for maximizing training speed. We run our benchmark on a range of models and academic GPUs, spending 2,000 GPU-hours on our experiments. Our results reveal a brighter picture for academic pre-training: for example, although Pythia-1B was originally trained on 64 GPUs for 3 days, we find it is also possible to replicate this model (with the same hyper-parameters) in 3x fewer GPU-days: i.e. on 4 GPUs in 18 days. We conclude with a cost-benefit analysis to help clarify the trade-offs between price and pre-training time. We believe our benchmark will help academic researchers conduct experiments that require training larger models on more data. We fully release our codebase at: https://github.com/apoorvkh/academic-pretraining.</description><subject>Benchmarks</subject><subject>Cost benefit analysis</subject><subject>Time measurement</subject><subject>Tradeoffs</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNqNykEKwjAQheEgCBbtHQZcB9KJteJO1CK4Eem-lDq1KZpopqV4e7PwAK5-eO-biAi1TuRmhTgTMXOnlMJ1hmmqI5EnSp3BeQiFQ_XhLRS-upF0TcMwtmTh4kmGzVhj7zCavoVdHcTT1HAldoOviRdi2lQPpvjXuVjmx2J_ki_v3gNxX3YB2nCVOkHMUlSY6f_UF02gOM0</recordid><startdate>20241030</startdate><enddate>20241030</enddate><creator>Khandelwal, Apoorv</creator><creator>Tian Yun</creator><creator>Nayak, Nihal V</creator><creator>Merullo, Jack</creator><creator>Bach, Stephen H</creator><creator>Chen, Sun</creator><creator>Pavlick, Ellie</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20241030</creationdate><title>100K or 100 Days: Trade-offs when Pre-Training with Academic Resources</title><author>Khandelwal, Apoorv ; Tian Yun ; Nayak, Nihal V ; Merullo, Jack ; Bach, Stephen H ; Chen, Sun ; Pavlick, Ellie</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_31227520273</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Benchmarks</topic><topic>Cost benefit analysis</topic><topic>Time measurement</topic><topic>Tradeoffs</topic><toplevel>online_resources</toplevel><creatorcontrib>Khandelwal, Apoorv</creatorcontrib><creatorcontrib>Tian Yun</creatorcontrib><creatorcontrib>Nayak, Nihal V</creatorcontrib><creatorcontrib>Merullo, Jack</creatorcontrib><creatorcontrib>Bach, Stephen H</creatorcontrib><creatorcontrib>Chen, Sun</creatorcontrib><creatorcontrib>Pavlick, Ellie</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Databases</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Khandelwal, Apoorv</au><au>Tian Yun</au><au>Nayak, Nihal V</au><au>Merullo, Jack</au><au>Bach, Stephen H</au><au>Chen, Sun</au><au>Pavlick, Ellie</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>100K or 100 Days: Trade-offs when Pre-Training with Academic Resources</atitle><jtitle>arXiv.org</jtitle><date>2024-10-30</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>Pre-training is notoriously compute-intensive and academic researchers are notoriously under-resourced. It is, therefore, commonly assumed that academics can't pre-train models. In this paper, we seek to clarify this assumption. We first survey academic researchers to learn about their available compute and then empirically measure the time to replicate models on such resources. We introduce a benchmark to measure the time to pre-train models on given GPUs and also identify ideal settings for maximizing training speed. We run our benchmark on a range of models and academic GPUs, spending 2,000 GPU-hours on our experiments. Our results reveal a brighter picture for academic pre-training: for example, although Pythia-1B was originally trained on 64 GPUs for 3 days, we find it is also possible to replicate this model (with the same hyper-parameters) in 3x fewer GPU-days: i.e. on 4 GPUs in 18 days. We conclude with a cost-benefit analysis to help clarify the trade-offs between price and pre-training time. We believe our benchmark will help academic researchers conduct experiments that require training larger models on more data. We fully release our codebase at: https://github.com/apoorvkh/academic-pretraining.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | EISSN: 2331-8422 |
ispartof | arXiv.org, 2024-10 |
issn | 2331-8422 |
language | eng |
recordid | cdi_proquest_journals_3122752027 |
source | Publicly Available Content Database |
subjects | Benchmarks Cost benefit analysis Time measurement Tradeoffs |
title | 100K or 100 Days: Trade-offs when Pre-Training with Academic Resources |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-16T20%3A42%3A34IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=100K%20or%20100%20Days:%20Trade-offs%20when%20Pre-Training%20with%20Academic%20Resources&rft.jtitle=arXiv.org&rft.au=Khandelwal,%20Apoorv&rft.date=2024-10-30&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E3122752027%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_journals_31227520273%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=3122752027&rft_id=info:pmid/&rfr_iscdi=true |