👋 Hi there! My name is Zeqing Wang (王则清).
I am currently pursuing a Master of Science in Computer Engineering at the National University of Singapore. Previously, I obtained my bachelor degree in computer science from Xidian University.
Currently, my research interests include:
-
Efficient AI: model compression, model acceleration, distributed computing.
-
Diffusion Models: World Models, Video Diffusion Models, Diffusion Language Models.
-
Continual Learning and Transfer Learning.
🚀 On the Job Market: I am actively seeking full-time opportunities! If your team is working on Efficient AI, Diffusion Models, or World Models, I would love to connect.
🔥 News
- 2026.02: 🥳 One paper was accepted to CVPR’26.
- 2026.01: 🎉 SparseD was accepted by ICLR’26.
- 2025.11: 🌟 Three papers (DualParal, SAMCL and Jump-teaching) were accepted by AAAI’26.
- 2025.08: ⛵ Start my Msc. journey in NUS!
- 2025.06: 🎉 Got my Bachelor’s degree from Xidian University! Thanks to my supervisor and all my friends in XDU!
📝 Publications
Arxiv 2026

ReactiveGWM: Steering NPC in Reactive Game World Models 
Zeqing Wang, Danze Chen, Zhaohu Xing, Zizhao Tong, Yinhan Zhang, Xingyi Yang, Yeying Jin
[project page]
[paper]
[code]
[abstract]
Current game world models simulate environments from a subjective, player-centric perspective. However, by treating the Non-Player Character (NPC) merely as background pixels, these models cannot capture interactions between the player and NPC. In that sense, they act as passive video renderers rather than real simulation engines, lacking the physical understanding needed to model action-induced NPC reactivities. We introduce ReactiveGWM, a reactive game world model that synthesizes dynamic interactions between the player and NPC. Instead of entangling all interaction dynamics, ReactiveGWM explicitly decouples player controls from NPC behaviors. Player actions are injected into the diffusion backbone via a lightweight additive bias, while high-level NPC responses (e.g., Offense, Control, Defense) are grounded through cross-attention modules. Crucially, these modules learn a game-agnostic representation of interactive logic. This enables zero-shot strategy transfer: our learned modules can be plugged directly into off-the-shelf, unannotated world models of different games. This instantly unlocks steerable NPC interactions without any domain-specific retraining. Evaluated on two Street Fighter games, ReactiveGWM maintains fine-grain player controllability while achieving robust, prompt-aligned NPC strategy adherence, paving the way for scalable, strategy-rich interaction with the NPC.
ICLR 2026

SparseD: Sparse Attention for Diffusion Language Models 
Zeqing Wang, Gongfan Fang, Xinyin Ma, Xingyi Yang, Xinchao Wang
[paper]
[code]
[abstract]
While diffusion language models (DLMs) offer a promising alternative to autoregressive models (ARs), existing open-source DLMs suffer from high inference latency. This bottleneck is mainly due to the attention's quadratic complexity with respect to context length in computing all query-key pairs. Intuitively, to reduce this complexity, a natural strategy is to restrict attention to sparse patterns that retain only the most relevant connections. Such approaches are well-established in ARs, where attention follows fixed and clearly defined sparse patterns. However, in DLMs, we observe distinct sparsity behaviors: (1) attention patterns vary across heads, (2) attention patterns in each head remain highly similar across denoising steps, and (3) early denoising steps are critical for generation. These findings render sparse attention methods designed for ARs largely incompatible with DLMs, as they fail to capture head-specific structures and risk degrading generation when applied in early denoising steps. To address these challenges, we propose SparseD, a novel sparse attention method for DLMs. Leveraging the observations, SparseD only requires pre-computing head-specific sparse patterns one time, and reuses them across all steps. This prevents recomputing sparse patterns at each denoising step. Meanwhile, SparseD uses full attention in the early steps, then switches to sparse attention later to maintain generation quality. Together, these establish SparseD as a practical and efficient solution for deploying DLMs in long-context applications. Experimental results demonstrate that SparseD achieves lossless acceleration, delivering up to 1.50x speedup over FlashAttention at a 64k context length with 1,024 denoising steps.
AAAI 2026

Minute-Long Videos with Dual Parallelisms 
Zeqing Wang, Bowen Zheng, Xingyi Yang, Zhenxiong Tan, Yuecong Xu, Xinchao Wang
[project page]
[paper]
[code]
[abstract]
Diffusion Transformer (DiT)-based video diffusion models generate high-quality videos at scale but incur prohibitive processing latency and memory costs for long videos. To address this, we propose a novel distributed inference strategy, termed DualParal. The core idea is that, instead of generating an entire video on a single GPU, we parallelize both temporal frames and model layers across GPUs. However, a naive implementation of this division faces a key limitation: since diffusion models require synchronized noise levels across frames, this implementation leads to the serialization of original parallelisms. We leverage a block-wise denoising scheme to handle this. Namely, we process a sequence of frame blocks through the pipeline with progressively decreasing noise levels. Each GPU handles a specific block and layer subset while passing previous results to the next GPU, enabling asynchronous computation and communication. To further optimize performance, we incorporate two key enhancements. Firstly, a feature cache is implemented on each GPU to store and reuse features from the prior block as context, minimizing inter-GPU communication and redundant computation. Secondly, we employ a coordinated noise initialization strategy, ensuring globally consistent temporal dynamics by sharing initial noise patterns across GPUs without extra resource costs. Together, these enable fast, artifact-free, and infinitely long video generation. Applied to the latest diffusion transformer video generator, our method efficiently produces 1,025-frame videos with up to 6.54x lower latency and 1.48x lower memory cost on 8xRTX 4090 GPUs.
AAAI 2026 Oral

SAMCL: Empowering SAM to Continually Learn from Dynamic Domains with Extreme Storage Efficiencys 
Zeqing Wang, Kangye Ji, Di Wang, Haibin Zhang, Fei Cheng
[paper]
[code]
[abstract]
Segment Anything Model (SAM) struggles in open-world scenarios with diverse domains. In such settings, naive fine-tuning with a well-designed learning module is inadequate and often causes catastrophic forgetting issue when learning incrementally. To address this issue, we propose a novel continual learning (CL) method for SAM, termed SAMCL. Rather than relying on a fixed learning module, our method decomposes incremental knowledge into separate modules and trains a selector to choose the appropriate one during inference. However, this intuitive design introduces two key challenges: ensuring effective module learning and selection, and managing storage as tasks accumulate.
To tackle these, we introduce two components: AugModule and Module Selector. AugModule reduces the storage of the popular LoRA learning module by sharing parameters across layers while maintaining accuracy. It also employs heatmaps—generated from point prompts—to further enhance domain adaptation with minimal additional cost. Module Selector leverages the observation that SAM’s embeddings can effectively distinguish domains, enabling high selection accuracy by training on low-consumed embeddings instead of raw images.
Experiments show that SAMCL outperforms state-of-the-art methods, achieving only 0.19% forgetting and at least 2.5% gain on unseen domains. Each AugModule requires just 0.233 MB, reducing storage by at least 24.3% over other fine-tuning approaches. The buffer storage for Module Selector is further reduced by up to 256x.
ASP-DAC 2025

LightCL: Compact Continual Learning with Low Memory Footprint For Edge Device 
Zeqing Wang, Fei Cheng, Kangye Ji, Bohu Huang
[paper]
[code]
[abstract]
Continual learning (CL) is a technique that enables neural networks to constantly adapt to their dynamic surroundings. Despite being overlooked for a long time, this technology can considerably address the customized needs of users in edge devices. Actually, most CL methods require huge resource consumption by the training behavior to acquire generalizability among all tasks for delaying forgetting regardless of edge scenarios. Therefore, this paper proposes a compact algorithm called LightCL, which evaluates and compresses the redundancy of already generalized components in structures of the neural network. Specifically, we consider two factors of generalizability, learning plasticity and memory stability, and design metrics of both to quantitatively assess generalizability of neural networks during CL. This evaluation shows that generalizability of different layers in a neural network exhibits a significant variation. Thus, we Maintain Generalizability by freezing generalized parts without the resource-intensive training process and Memorize Feature Patterns by stabilizing feature extracting of previous tasks to enhance generalizability for less-generalized parts with a little extra memory, which is far less than the reduction by freezing. Experiments illustrate that LightCL outperforms other state-of-the-art methods and reduces at most 6.16x memory footprint. We also verify the effectiveness of LightCL on the edge device.
-
SpotEdit: Selective Region Editing in Diffusion Transformers. CVPR 2026.
[paper]
[code]
[abstract]
Diffusion Transformer (DiT)-based models have significantly advanced image editing by encoding conditional images and integrating them into transformer layers. However, most edits involve modifying only small regions, while current methods uniformly process and denoise all tokens at every timestep, causing redundant computation and potentially degrading unchanged areas. This raises a fundamental question: Is it truly necessary to regenerate every region during editing? To address this, we propose SpotEdit, a training-free diffusion editing framework that selectively updates only the modified regions. SpotEdit comprises two key components: SpotSelector identifies stable regions via perceptual similarity and skips their computation by reusing conditional image features; SpotFusion adaptively blends these features with edited tokens through a dynamic fusion mechanism, preserving contextual coherence and editing quality. By reducing unnecessary computation and maintaining high fidelity in unmodified areas, SpotEdit achieves efficient and precise image editing.
Zhibin Qin, Zhenxiong Tan, Zeqing Wang, Songhua Liu, Xinchao Wang.
-
Jump-teaching: Combating Sample Selection Bias via Temporal Disagreement. AAAI 2026.
[paper]
[code]
[abstract]
Sample selection is a straightforward technique to combat noisy labels, aiming to prevent mislabeled samples from degrading the robustness of neural networks. However, existing methods mitigate compounding selection bias either by leveraging dual-network disagreement or additional forward propagations, leading to multiplied training overhead. To address this challenge, we introduce Jump-teaching, an efficient sample selection framework for debiased model update and simplified selection criterion. Based on a key observation that a neural network exhibits significant disagreement across different training iterations, Jump-teaching proposes a jump-manner model update strategy to enable self-correction of selection bias by harnessing temporal disagreement, eliminating the need for multi-network or multi-round training. Furthermore, we employ a sample-wise selection criterion building on the intra variance of a decomposed single loss for a fine-grained selection without relying on batch-wise ranking or dataset-wise modeling. Extensive experiments demonstrate that Jump-teaching outperforms state-of-the-art counterparts while achieving a nearly overhead-free selection procedure, which boosts training speed by up to 4.47x and reduces peak memory footprint by 54%
Kangye Ji, Fei Cheng, Zeqing Wang, Qichang Zhang, Bohu Huang.
🎖 Honors and Awards
- 2021-2025(B.Eng.): National Scholarship of 2021-2022, National Scholarship of 2022-2023, 2025 CCF Elite Collegiate Award, Duan Baoyan Innovation Fund (Only 4 undergraduate students at Xidian University)
📖 Educations
- 2025.08 - (now), Msc. Student in College of Design and Engineering, National University of Singapore
- 2021.09 - 2025.06, B.Eng. in Computer Science and Technology, Xidian University