Game-playing AI trained on 40,000 hours of human gameplay
Nvidia has introduced NitroGen, a new AI model designed to play video games directly from raw video footage, without relying on rewards, game objectives, or internal game data.
Developed by Nvidia, NitroGen takes in video frames from gameplay and outputs controller actions, such as joystick movements and button presses. The model is trained entirely through large-scale imitation learning, meaning it learns by observing how humans play rather than being guided by explicit rules or scores.
NitroGen was trained on around 40,000 hours of publicly available gameplay videos covering more than 1,000 games. These videos include on-screen controller overlays, allowing the system to automatically extract player inputs on a frame-by-frame basis. According to Nvidia, the dataset spans a wide range of genres, with a strong focus on action, platformer, and action role-playing games.
At its core, NitroGen uses a vision transformer to process video frames, followed by a diffusion-based model that generates gamepad actions over time. The system contains just under 500 million parameters and works best with games designed for controllers. It is less effective for titles that depend heavily on mouse and keyboard inputs, such as real-time strategy or multiplayer online battle arena games.
In testing, the model demonstrated the ability to perform basic tasks across a variety of unfamiliar games without prior fine-tuning. When adapted to new titles using limited data, Nvidia reported that NitroGen achieved task success rates up to 52% higher than models trained from scratch under similar conditions.
Nvidia says the project is intended for research and development and is not a commercial gaming product. Potential uses include automated game testing, experimental game AI, and broader research into general-purpose embodied artificial intelligence, where systems learn to act in complex environments based solely on visual input.


Comments