When GPUs Say No: Taming Big Batches with Gradient Accumulation
·47 words·1 min
Nice trick: Deep learning jobs in cloud on cheap GPUs like K80 often error outs for imagenet batch sizes > 64 due to small RAM. Solution (Pytorch): Simple inner loop dividing batch in to smaller groups, call .backward() each iteration accumulating gradients and then call step().