MathVista Benchmark

The new benchmark, MathVista, looks amazing! It’s ~6k diverse examples nicely categorized for measuring visual mathematical reasoning.

Best part? It comes with human baseline!

Punch line: GPT-4V has huge gap with humans.