MathVista Unveils GPT-4V's Visual Weaknesses
·41 words·1 min
The new benchmark, MathVista, looks amazing! It’s ~6k diverse examples nicely categorized for measuring visual mathematical reasoning.
Best part? It comes with human baseline!
Punch line: GPT-4V has huge gap with humans.
MathVista is a great separator between models! https://x.com/lupantech/status/1714060074396913699