A rubber duck is then placed on a paper atlas and Gemini is able to identify where the object has been placed. It does all sorts of things – identifying objects, finding where things have been hidden and switched under cups, and more.

Google wowed the internet with a demo video showing the multimodal capabilities of its latest large language model Gemini

youtu.be/UIZAiXYceBI

But in reality, the model was not prompted using audio and its responses were only text-based. They were not generated in real time either.

The person speaking in the demo was actually reading out some of the text prompts that were passed to the model, and the robot voice given to Gemini was reading out responses it had generated in text. Still images taken from the video – like the rock, paper, scissors – were fed to the model, and it was asked to guess the game.

Follow

Google then cherry-picked its best outputs and narrated them alongside the footage to make it seem as if the model could respond flawlessly in real time.

"For the purposes of this demo, latency has been reduced and Gemini outputs have been shortened for brevity," the description for the video on YouTube reads.

Oriol Vinyals, VP of research and deep learning lead at Google DeepMind, who helped lead the Gemini project, admitted that the video demonstrates "what the multimodal user experiences built with Gemini could look like"

twitter.com/OriolVinyalsML/sta

So.....

Don't be fooled: Google faked its Gemini AI voice demo

Sign in to participate in the conversation

CounterSocial is the first Social Network Platform to take a zero-tolerance stance to hostile nations, bot accounts and trolls who are weaponizing OUR social media platforms and freedoms to engage in influence operations against us. And we're here to counter it.