GPT-3's $3.6M Training Regimen
GPT-3/175B model required 3.14E23 flops of compute for training. Even at theoretical 28 TFLOPS for V100 and lowest reserved Azure pricing, this will take 355 GPU-years and cost $3.6M for a single training run!
GPT-3/175B model required 3.14E23 flops of compute for training. Even at theoretical 28 TFLOPS for V100 and lowest reserved Azure pricing, this will take 355 GPU-years and cost $3.6M for a single training run!