↓
Skip to main content
Shital Shah’s Chain of Thought
Home
Blog
About
Home
Blog
About
Twitter Thread
Thermodynamics of Prediction
6 December 2023
·
72 words
·
1 min
Loss Scaling in FP16 Training
15 October 2023
·
179 words
·
1 min
Phi-1: The Tiny Model Outsmarting Giants
12 September 2023
·
255 words
·
2 mins
Phi-nomenal Code Gen: Phi-1-Base Beats August Models
3 September 2023
·
172 words
·
1 min
Tiny Weights, Mighty Models: The Mystery of Weight Decay
29 August 2023
·
308 words
·
2 mins
Learning Rates Without Fortune Telling
6 August 2023
·
415 words
·
2 mins
LLAMA's Math Leap: From 11% to 49% with Scaling
6 August 2023
·
424 words
·
2 mins
When Math Gets Abstract: The 6th Grade Shift
19 July 2023
·
269 words
·
2 mins
Understanding "Understanding": Quantifying AI Comprehension
27 June 2023
·
896 words
·
5 mins
Web Giants: Users Do the Heavy Lifting
11 June 2023
·
84 words
·
1 min
←
1
⋯
3
4
5
⋯
12
→