GPT-4's New Fitness Plan: From 25K to 2K GPUs
·83 words·1 min
If you were to train GPT-4, 1.8T params model,
On A100, it will take 25k A100s and take 3-5 months.
On H100, it will take 8k GPUs and take ~3 months.
On B100, it will take 2k GPUs and take ~ 3 months.
- Jenson at GTC.
One of the big consequence here is that you don’t have span multiple colos to train GPT-4 class models. This significantly reduces complexity and puts this class of models right in to hands of many startups.