That research, published in the journal Science, claimed the model was able to connect various words and phrases uttered by the subject to the experiences captured in the frames of the videos. Presented with a word or phrase, the model was able to recall relevant images.
https://www.science.org/doi/10.1126/science.adi1374#con2
What's with AI boffins strapping GoPros to toddlers? We take a closer look
https://www.theregister.com/2024/05/12/boffins_hope_to_make_ai/