Minor Tweaks, Major Leaps: Glimpsing Frontier Models

We are getting first glimpse of what even minor improvements in frontier models would mean. https://x.com/KevinAFischer/status/1764892031233765421

More surprises piling up: https://x.com/GillVerd/status/1764901418664882327

Another example… I will keep this thread going.

https://x.com/hahahahohohe/status/1765088860592394250

Thread continues… So, this model natively talks in ASCII and reveals things that would make us go OMG!

https://x.com/DimitrisPapail/status/1765115754024751107?s=20

To get perfect SAT scores, you have to be in top 0.07% of the test takers. Claude 3 is currently the only model to achieve this (for the reading part, which IMO is harder).

https://x.com/wangzjeff/status/1764850689258451096

“it’s the most intensely I’ve been shocked yet” 🤯

https://x.com/BenBlaiszik/status/1765208130177077321

In context “fine tuning” becoming pretty real with large context! https://x.com/skirano/status/1765461051896721439

Long context + better model pushes to truly “co-worker” level productivity. Claude 3 here did same work E2E in zero shot as highly experienced engineer would do in an hour! https://x.com/moyix/status/1765967602982027550

This is kind of reasoning I haven’t seen in previous frontier models. https://x.com/sytelus/status/1765557725365502177

Claude can fix fairly complex bugs! https://x.com/cognitivecompai/status/1766702300620554292

Discussion