GPT-4 Training Time

If you were to train GPT-4, 1.8T params model,

On A100, it will take 25k A100s and take 3-5 months.

On H100, it will take 8k GPUs and take ~3 months.

On B100, it will take 2k GPUs and take ~ 3 months.

Jenson at GTC.

One of the big consequence here is that you don’t have span multiple colos to train GPT-4 class models. This significantly reduces complexity and puts this class of models right in to hands of many startups.

Discussion