Search Results - Naman, Luke
-
1
-
2
-
3
-
4
-
5
-
6
On the Role of Bidirectionality in Language Model Pre-Training
Published in arXiv.orgGet full text
Article -
7
-
8
BASE Layers: Simplifying Training of Large, Sparse Models
Published in arXiv.orgGet full text
Article -
9
Scaling Laws for Generative Mixed-Modal Language Models
Published in arXiv.orgGet full text
Article -
10
VeLO: Training Versatile Learned Optimizers by Scaling Up
Published in arXiv.orgGet full text
Article -
11
Better Fine-Tuning by Reducing Representational Collapse
Published in arXiv.orgGet full text
Article -
12
-
13
Multilingual Denoising Pre-training for Neural Machine Translation
Published in arXiv.orgGet full text
Article -
14
-
15
-
16
-
17
Unsupervised Cross-lingual Representation Learning at Scale
Published in arXiv.orgGet full text
Article -
18
-
19
Efficient Large Scale Language Modeling with Mixtures of Experts
Published in arXiv.orgGet full text
Article -
20
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Published in arXiv.orgGet full text
Article