Connect with us


Google DeepMind showcases robot navigation with Gemini AI



Google DeepMind showcases robot navigation with Gemini AI

Google’s DeepMind Robotics team has achieved a significant breakthrough in robot navigation using its Gemini 1.5 Pro AI.

In a recent paper titled “Mobility VLA: Multimodal Instruction Navigation with Long-Context VLMs and Topological Graphs,” the team showcases how robots can now respond to complex commands and navigate office environments.

The project marks a leap forward in integrating natural language interactions with advanced AI capabilities.

Read Also:

Google launches Android auto 12.3 update

TikTok introduces generative AI avatars, AI dubbing tool for enhanced branded content

Videos released by DeepMind demonstrate the robots responding to verbal prompts like “OK, Robot” followed by tasks such as guiding humans to specific locations within their 9,000-square-foot office space.

Before executing tasks, the robots undergo training through Multimodal Instruction Navigation with demonstration Tours (MINT).

This involves physically guiding the robots around the office while verbally identifying key landmarks.

The system’s hierarchical Vision-Language-Action (VLA) framework enhances their understanding by combining environmental perception with reasoning abilities.

DeepMind reports an impressive success rate of approximately 90% across more than 50 interactions with employees.

This breakthrough highlights the potential of generative AI not only in robot navigation but also in enhancing human-robot interactions and expanding applications in office automation and beyond.


Lawrence Agbo, a tech journalist for over four years, excels in crafting SEO-driven content that boosts business success. He also serves as an AI tutor, sharing his knowledge to educate others. His work has been cited on Wikipedia and various online media platforms.