Small Wonder: 20B Encoder-Decoder Beats 70B Chinchilla
Very interesting… 20B encoder-decoder model beats almost 8X larger decoder-only model on CLM task! This is significant improvement over Chinchilla’s 70B model with bonus of also excelling at tasks that seq-2-seq models typically excels. https://x.com/SalehSoltan/status/1554588857835966464