A rubber duck is then placed on a paper atlas and Gemini is able to identify where the object has been placed. It does all sorts of things – identifying objects, finding where things have been hidden and switched under cups, and more.
Google wowed the internet with a demo video showing the multimodal capabilities of its latest large language model Gemini
Google then cherry-picked its best outputs and narrated them alongside the footage to make it seem as if the model could respond flawlessly in real time.
"For the purposes of this demo, latency has been reduced and Gemini outputs have been shortened for brevity," the description for the video on YouTube reads.
Oriol Vinyals, VP of research and deep learning lead at Google DeepMind, who helped lead the Gemini project, admitted that the video demonstrates "what the multimodal user experiences built with Gemini could look like"
https://twitter.com/OriolVinyalsML/status/1732885990291775553