Skip to main content

AGI's Pop Quiz: TheoremQA Stumps AI

·45 words·1 min

How does the model do on a new unseen out-of-distribution benchmark is a core distinguishing feature towards AGI and a fundamental differentiator from classical ML where we only cared about fixed set of specific benchmarks or in-distribution test sets. The new work, TheoremQA,… https://x.com/WenhuChen/status/1710827254408679807

https://pbs.twimg.com/media/F8CO33SaYAAV2Ck.jpg

Discussion