AI models in gaming have a rich history, often excelling at specific games while adhering to a single-minded goal: winning.
However, Google Deepmind is breaking new ground with its latest creation, SIMA (scalable instructable multiworld agent), a model designed not just to play multiple 3D games but to comprehend and act upon verbal instructions, mimicking human-like interaction.
Unlike conventional AI or computer characters within games, which are usually governed by formal in-game commands, SIMA takes a unique approach. It lacks access to a game’s internal code or rules but is instead trained on extensive hours of human gameplay videos. The model learns to associate visual representations of actions, objects, and interactions from the data and annotations provided by human labelers.
Learning from video games
Additionally, the model is exposed to videos of players instructing one another in-game.
For instance, SIMA may learn from the visual patterns on the screen that represent “moving forward” or recognize the interaction of a character with a door-like object as “opening a door.” These are simple tasks that go beyond pressing a key or identifying an object, providing a more comprehensive understanding of the gameplay.
The training videos span multiple games, including Valheim and Goat Simulator 3, with the consent of the developers involved. The primary objective was to evaluate whether training an AI to play one set of games enables it to generalize and perform well in others it hasn’t encountered.
The results are promising, demonstrating that AI agents trained on multiple games outperformed those exposed to a single game. However, the researchers acknowledge that specific mechanics or terms unique to certain games can pose challenges for even well-prepared AI.
The key lies in providing sufficient training data to enable the model to adapt and learn new elements. SIMA’s recognition map, featuring several dozen primitives, reflects the variety of actions the agent can currently identify, showcasing its adaptability and learning capabilities.
The ultimate ambition of the researchers behind SIMA is to revolutionize agent-based AI and create a more natural gaming companion. Unlike rigid, hard-coded AI opponents, SIMA aims to be a cooperative presence that users can instruct during gameplay.
The model’s reliance on pixels from the game screen mirrors human learning processes, fostering adaptability and the emergence of novel behaviors.
When compared to traditional methods, such as the simulator approach, SIMA stands out. While the simulator approach relies on reinforcement learning with specific reward signals, SIMA’s unique training approach enables it to adapt and excel across a spectrum of 3D games.
“In the games that we use, such as the commercial games from our partners,” he continued, “We do not have access to such a reward signal. Moreover, we are interested in agents that can do a wide variety of tasks described in open-ended text – it’s not feasible for each game to evaluate a ‘reward’ signal for each possible goal. Instead, we train agents using imitation learning from human behavior, given goals in text.”
Other companies are looking into this kind of open-ended collaboration and creation as well; conversations with NPCs are being looked at pretty hard as opportunities to put an LLM-type chatbot to work, for instance. And simple improvised actions or interactions are also being simulated and tracked by AI in some really interesting research into agents.