Bit by Bit: Scaling Laws and Quantization Band-Aids
·49 words·1 min
This is a great paper! It makes sense to have scaling laws in terms of model bits as opposed to model parameters.
This also immediately forces us to view quantization as rather a band-aid for our inability to find direct training procedure for an arbitrary sized parameters.… https://x.com/finbarrtimbers/status/1642563284409917440