LLM Judges Prefer Fluff—Just Like Humans!
·47 words·1 min
LLM-as-judge will give you a higher score if you throw in a lot of relevant information that doesn’t actually answer the question vs the models that does answer the question but are concise :).
To think of it, that often works with humans as well! https://x.com/corbtt/status/1814056457626862035