6B Params and a Dream: GPT-JT Challenges GPT-3
·40 words·1 min
GPT-JT is fine tuned version of GPT-J with ~1B tokens of instruction/prompt datasets and ~2B Pile tokens using UL2 objective. This results in 6B param model that might be competitive with GPT-3 (no study on contamination/overfit as usual).