ReactiveGWM: Steering NPC in Reactive Game World Models

Zeqing Wang1,2, Danze Chen1,2, Zhaohu Xing1,4, Zizhao Tong1,5, Yinhan Zhang1,6, Xingyi Yang3, Yeying Jin1

1 Tencent, 2 National University of Singapore, 3 The Hong Kong Polytechnic University, 4 The Hong Kong University of Science and Technology (Guangzhou), 5 University of Chinese Academy of Sciences, 6 The Hong Kong University of Science and Technology

TL;DR

A game world model where the NPC follows high-level strategies, not just appears as background pixels โ€” and the strategy module transfers zero-shot to a new game.

ReactiveGWM decouples player control from NPC behavior: player buttons enter the diffusion backbone as a lightweight additive bias, while NPC intents (Offense / Defense / Control) are grounded through cross-attention. Trained on one game, the cross-attention modules plug directly into an unannotated world model of a different game โ€” unlocking steerable NPCs without retraining.

Method

Decoupling player control and NPC strategy

Two non-interfering pathways inside the DiT block: an additive bias for fine-grained player buttons, and cross-attention for high-level NPC strategy. Self-attention and FFN keep modeling the game's native dynamics.

  1. 1

    Strategy-aligned data

    Each clip is paired with an NPC-only structured prompt โ€” a strategy (Offense / Defense / Control) plus active & passive behaviors โ€” separated from player actions and scene captions.

    Data construction pipeline: gameplay clips paired with NPC-only structured prompts
    Construction of strategy-aligned data: every clip is annotated with player actions and an NPC-only structured prompt.
  2. 2

    Two pathways inside the DiT block

    Player buttons are pooled to the latent frame rate and added as a residual bias. NPC strategy is encoded as text and injected via cross-attention. Self-attention and FFN are left intact.

    DiT block with player additive bias and NPC strategy cross-attention
    DiT block with action module (additive bias) and strategy cross-attention.
  3. 3

    Train once, transfer zero-shot

    ReactiveGWMbase: full fine-tuning on a source game with strategy annotations. ReactiveGWMtransfer: reuse the target game's vanilla backbone and plug in our trained cross-attention โ€” steerable NPCs without any retraining on the new game.

    Training and training-free transfer of the strategy module across games
    Overview of ReactiveGWM training and training-free transfer to a different game.

Configurations

Control interface

The player is controlled via low-level buttons; the NPC is steered via a high-level strategy.

Player โ€” controlled by buttons NPC โ€” steered by strategy

Frame from Street Fighter showing both characters with the player highlighted in blue and the NPC highlighted in red
Frame from Street Fighter showing both characters with the player highlighted in blue and the NPC highlighted in red

Demo ยท Street Fighter 2

Same buttons, different strategies

Compare the vanilla backbone, ReactiveGWMbase, and ReactiveGWMtransfer under the same player input but different NPC strategies (Offense / Defense / Control).

SF2 Button Mapping

XLight Punch (LP)
YMedium Punch (MP)
ZHeavy Punch (HP)
ALight Kick (LK)
BMedium Kick (MK)
CHeavy Kick (HK)

Prompt Detail

๐Ÿ’ก Click the corresponding video to view its prompt details.

No video selected yet.

Demo ยท Street Fighter 3

Cross-game strategy transfer

ReactiveGWMtransfer reuses the strategy modules trained on SF2 on top of an unannotated SF3 backbone โ€” steerable NPCs emerge without any retraining on this game.

SF3 Button Mapping

XHeavy Punch (HP)
YMedium Punch (MP)
ZHeavy Kick (HK)
ALight Punch (LP)
BLight Kick (LK)
CMedium Kick (MK)

Prompt Detail

๐Ÿ’ก Click the corresponding video to view its prompt details.

No video selected yet.