A rubber duck is then placed on a paper atlas and Gemini is able to identify where the object has been placed. It does all sorts of things – identifying objects, finding where things have been hidden and switched under cups, and more.
Google wowed the internet with a demo video showing the multimodal capabilities of its latest large language model Gemini
But in reality, the model was not prompted using audio and its responses were only text-based. They were not generated in real time either.
The person speaking in the demo was actually reading out some of the text prompts that were passed to the model, and the robot voice given to Gemini was reading out responses it had generated in text. Still images taken from the video – like the rock, paper, scissors – were fed to the model, and it was asked to guess the game.
Oriol Vinyals, VP of research and deep learning lead at Google DeepMind, who helped lead the Gemini project, admitted that the video demonstrates "what the multimodal user experiences built with Gemini could look like"
https://twitter.com/OriolVinyalsML/status/1732885990291775553
So.....
Don't be fooled: Google faked its Gemini AI voice demo
#AI