Google DeepMind's Project Astra -- an AI assistant prototype with eyes, ears and voice being built for smart glasses, phone or computer -- can see what the user does.
Greg Wayne, director of research at Google DeepMind, calls the technology "a parrot on your shoulder. "It can see what you're going and talk with you about it."
He referred to the old Google AI assistant that acted like command-and-control technology, where you ask it to turn on a song or do a search.
Project Astra, which Wayne referred to as a universal assistant, can see and hear the world around it to determine how to interact with the user and provide feedback.
The demonstration occurred during a podcast with professor Hannah Fry as she explores the world of technology.
The prototype, Project Astra, is powered by Gemini 2.0 and uses an Android app or prototype glasses to record the world as a person is seeing it.
Astra can summarize what it sees and answer questions pulling from Google services such as Search, Maps, Lens and Gemini.
Wayne finds it important to share what Google DeepMind developers work on, although the technology is not ready for general use.
"If we're going to make this as a helpful thing for humanity, people need to use it and tell us how they feel about it," he said. "We've had trusted testers."
People use it to get fashion advice such as what types of clothes go together and match. "It's like you're having a conversation with a smarter version of yourself," he said.
Wayne believes there is a major benefit to support those who see but don't understand something, or those who are blind and cannot see.
The technology can remember the past 10 minutes photographically, but also remember what someone said about with the technology at any time in the past. Offline, the memory session will summarize and keep the past interaction.
Wayne did not talk about ways it could help with privacy and targeting ads. But if Project Astra can remember past conversations, it is likely to remember past conversations and remember likes and dislikes for certain websites, publisher content and purchases, which would help eliminate any type of privacy concerns.
The technology could potentially end up being the next generation of privacy-focused technology because the user would need to opt in and allow it to remember all types of interactions. It knows 20 language event in the same conversation.
There are glitches in the technology, Wayne says. Sometimes it cannot see something, even when the user provides encouragement that it can. It also can struggle in noisy environments and have trouble distinguishing one voice from another.
Watching the visual podcast provides insight into what is possible in 2025.