Segment Anything vs. The Torture Test: Zero-Shot Showdown

Just tried out my “torture test” image on Meta’s new Segment Anything. I am happy to report it works great, at human level, perhaps even better.

Zero-shot segmentation of previously unseen objects at this level is a massive advancement. https://segment-anything.com/

The model is trained on new 10X larger dataset than ImageNet. The checkpoint and inference code are open source!

https://github.com/facebookresearch/segment-anything

We started with revolution in CV, followed by NLP. I have little doubt that next 2 years will be years of “big unification”.

Language, vision, audio and other data are different ways to express facets of reality and possibilities. There is no reason to treat them separately.

Discussion