↓Skip to main content

Twitter Thread

Thermodynamics of Prediction

6 December 2023·72 words·1 min

Loss Scaling in FP16 Training

15 October 2023·179 words·1 min

Phi-1: The Tiny Model Outsmarting Giants

12 September 2023·255 words·2 mins

Phi-nomenal Code Gen: Phi-1-Base Beats August Models

3 September 2023·172 words·1 min

Tiny Weights, Mighty Models: The Mystery of Weight Decay

29 August 2023·308 words·2 mins

Learning Rates Without Fortune Telling

6 August 2023·415 words·2 mins

LLAMA's Math Leap: From 11% to 49% with Scaling

6 August 2023·424 words·2 mins

When Math Gets Abstract: The 6th Grade Shift

19 July 2023·269 words·2 mins

Understanding "Understanding": Quantifying AI Comprehension

27 June 2023·896 words·5 mins

Web Giants: Users Do the Heavy Lifting

11 June 2023·84 words·1 min

←
1
⋯
3
4
5
⋯
12
→