Tech
Google DeepMind showcases robot navigation with Gemini AI
Google’s DeepMind Robotics team has achieved a significant breakthrough in robot navigation using its Gemini 1.5 Pro AI.
In a recent paper titled “Mobility VLA: Multimodal Instruction Navigation with Long-Context VLMs and Topological Graphs,” the team showcases how robots can now respond to complex commands and navigate office environments.
The project marks a leap forward in integrating natural language interactions with advanced AI capabilities.
Read Also:
Google launches Android auto 12.3 update
TikTok introduces generative AI avatars, AI dubbing tool for enhanced branded content
Videos released by DeepMind demonstrate the robots responding to verbal prompts like “OK, Robot” followed by tasks such as guiding humans to specific locations within their 9,000-square-foot office space.
Before executing tasks, the robots undergo training through Multimodal Instruction Navigation with demonstration Tours (MINT).
This involves physically guiding the robots around the office while verbally identifying key landmarks.
The system’s hierarchical Vision-Language-Action (VLA) framework enhances their understanding by combining environmental perception with reasoning abilities.
DeepMind reports an impressive success rate of approximately 90% across more than 50 interactions with employees.
This breakthrough highlights the potential of generative AI not only in robot navigation but also in enhancing human-robot interactions and expanding applications in office automation and beyond.
Google’s DeepMind Robotics team showcases breakthroughs in robot navigation using Gemini AI, integrating natural language commands and visual cues.
BREAKING NEWS Federal Government Tinubu pic.twitter.com/LHXPjndKMB— TopNaija.ng (@topnaijang) July 11, 2024