Skip to main content

When GPUs Say No: Taming Big Batches with Gradient Accumulation

·47 words·1 min

Nice trick: Deep learning jobs in cloud on cheap GPUs like K80 often error outs for imagenet batch sizes > 64 due to small RAM. Solution (Pytorch): Simple inner loop dividing batch in to smaller groups, call .backward() each iteration accumulating gradients and then call step().

Discussion