Training Tidbits: Behind Stability's New 3B Model with WandB
·40 words·1 min
Love this training report for Stability’s new 3B model created in WandB. It includes whole config, full dataset breakdown and various other cool tidbits such as adjusting attention mask to ignore irrelevant doc in the larger 4K context: