Skip to main content

Bit by Bit: Scaling Laws and Quantization Band-Aids

·49 words·1 min

This is a great paper! It makes sense to have scaling laws in terms of model bits as opposed to model parameters.

This also immediately forces us to view quantization as rather a band-aid for our inability to find direct training procedure for an arbitrary sized parameters.… https://x.com/finbarrtimbers/status/1642563284409917440

Discussion