Visual riddles a commonsense and world knowledge challenge for large vision and language models.
Visual Riddles.
A commonsense and world knowledge challenge for large vision and language models.
Imagine observing someone scratching their arm; to understand why, additional context would be necessary.
Join the discussion on this paper page.
Leave a reply