Skip to main content

6B Params and a Dream: GPT-JT Challenges GPT-3

·40 words·1 min

GPT-JT is fine tuned version of GPT-J with ~1B tokens of instruction/prompt datasets and ~2B Pile tokens using UL2 objective. This results in 6B param model that might be competitive with GPT-3 (no study on contamination/overfit as usual).

https://huggingface.co/togethercomputer/GPT-JT-6B-v1

Discussion