Skip to main content

Training Tidbits: Behind Stability's New 3B Model with WandB

·40 words·1 min

Love this training report for Stability’s new 3B model created in WandB. It includes whole config, full dataset breakdown and various other cool tidbits such as adjusting attention mask to ignore irrelevant doc in the larger 4K context:

https://stability.wandb.io/stability-llm/stable-lm/reports/StableLM-3B-4E1T--VmlldzoyMjU4?accessToken=u3zujipenkx5g7rtcj9qojjgxpconyjktjkli2po09nffrffdhhchq045vp0wyfo

Discussion