A game world model where the NPC follows high-level strategies, not just appears as background pixels โ and the strategy module transfers zero-shot to a new game.
ReactiveGWM decouples player control from NPC behavior: player buttons enter the diffusion backbone as a lightweight additive bias, while NPC intents (Offense / Defense / Control) are grounded through cross-attention. Trained on one game, the cross-attention modules plug directly into an unannotated world model of a different game โ unlocking steerable NPCs without retraining.
Method
Two non-interfering pathways inside the DiT block: an additive bias for fine-grained player buttons, and cross-attention for high-level NPC strategy. Self-attention and FFN keep modeling the game's native dynamics.
Each clip is paired with an NPC-only structured prompt โ a strategy (Offense / Defense / Control) plus active & passive behaviors โ separated from player actions and scene captions.
Player buttons are pooled to the latent frame rate and added as a residual bias. NPC strategy is encoded as text and injected via cross-attention. Self-attention and FFN are left intact.
ReactiveGWMbase: full fine-tuning on a source game with strategy annotations. ReactiveGWMtransfer: reuse the target game's vanilla backbone and plug in our trained cross-attention โ steerable NPCs without any retraining on the new game.
Configurations
The player is controlled via low-level buttons; the NPC is steered via a high-level strategy.
Player โ controlled by buttons NPC โ steered by strategy
Demo ยท Street Fighter 2
Compare the vanilla backbone, ReactiveGWMbase, and ReactiveGWMtransfer under the same player input but different NPC strategies (Offense / Defense / Control).
SF2 Button Mapping
No video selected yet.
Demo ยท Street Fighter 3
ReactiveGWMtransfer reuses the strategy modules trained on SF2 on top of an unannotated SF3 backbone โ steerable NPCs emerge without any retraining on this game.
SF3 Button Mapping
No video selected yet.