↓
Skip to main content
Shital Shah’s Chain of Thought
Home
Blog
About
Home
Blog
About
Blog
Linear Thinking: Simple Nets Tackle Language and Math
17 September 2023
·
42 words
·
1 min
Phi-1: The Tiny Model Outsmarting Giants
12 September 2023
·
255 words
·
2 mins
Reasoning AWOL? Paper Claims Only ICL Emerged
7 September 2023
·
50 words
·
1 min
Weight Averaging: Have Your Cake and Save Your Model
4 September 2023
·
44 words
·
1 min
Phi-nomenal Code Gen: Phi-1-Base Beats August Models
3 September 2023
·
172 words
·
1 min
Weaving Longer Contexts: Yarn Scaling for LLMs
2 September 2023
·
25 words
·
1 min
Tensor Shape-Shifting: See Shapes Before Values
1 September 2023
·
33 words
·
1 min
AI Watches YouTube, Masters Minecraft—Harder Than AlphaGo?
31 August 2023
·
50 words
·
1 min
Tiny Weights, Mighty Models: The Mystery of Weight Decay
29 August 2023
·
308 words
·
2 mins
Conquering FP16 Frustrations: One-Liner to FP8 Gold on H100s
24 August 2023
·
54 words
·
1 min
←
1
⋯
13
14
15
⋯
100
→