Google Deepmind unveils SIMA: An AI model mastering multiple 3D games with human-like interaction

Learning from video games

Additionally, the model is exposed to videos of players instructing one another in-game.

For instance, SIMA may learn from the visual patterns on the screen that represent “moving forward” or recognize the interaction of a character with a door-like object as “opening a door.” These are simple tasks that go beyond pressing a key or identifying an object, providing a more comprehensive understanding of the gameplay.

The training videos span multiple games, including Valheim and Goat Simulator 3, with the consent of the developers involved. The primary objective was to evaluate whether training an AI to play one set of games enables it to generalize and perform well in others it hasn’t encountered.

Google Deepmind unveils SIMA: An AI model mastering multiple 3D games with human-like interaction image 68 — A map of several dozen actions SIMA recognizes and can perform or combine.

The results are promising, demonstrating that AI agents trained on multiple games outperformed those exposed to a single game. However, the researchers acknowledge that specific mechanics or terms unique to certain games can pose challenges for even well-prepared AI.

The key lies in providing sufficient training data to enable the model to adapt and learn new elements. SIMA’s recognition map, featuring several dozen primitives, reflects the variety of actions the agent can currently identify, showcasing its adaptability and learning capabilities.

The ultimate ambition of the researchers behind SIMA is to revolutionize agent-based AI and create a more natural gaming companion. Unlike rigid, hard-coded AI opponents, SIMA aims to be a cooperative presence that users can instruct during gameplay.

The model’s reliance on pixels from the game screen mirrors human learning processes, fostering adaptability and the emergence of novel behaviors.

When compared to traditional methods, such as the simulator approach, SIMA stands out. While the simulator approach relies on reinforcement learning with specific reward signals, SIMA’s unique training approach enables it to adapt and excel across a spectrum of 3D games.

Google Deepmind unveils SIMA: An AI model mastering multiple 3D games with human-like interaction image 67 — SIMA comprises pre-trained vision models, and a main model that includes a memory and outputs keyboard and mouse actions.

“In the games that we use, such as the commercial games from our partners,” he continued, “We do not have access to such a reward signal. Moreover, we are interested in agents that can do a wide variety of tasks described in open-ended text – it’s not feasible for each game to evaluate a ‘reward’ signal for each possible goal. Instead, we train agents using imitation learning from human behavior, given goals in text.”

Other companies are looking into this kind of open-ended collaboration and creation as well; conversations with NPCs are being looked at pretty hard as opportunities to put an LLM-type chatbot to work, for instance. And simple improvised actions or interactions are also being simulated and tracked by AI in some really interesting research into agents.