SteadyDancer: Harmonized and Coherent Human Image Animation with First-Frame Preservation

Jiaming Zhang · Shengming Cao · Rui Li · Xiaotong Zhao · Yutao Cui
Xinglin Hou · Gangshan Wu · Haolan Chen · Yu Xu · Limin Wang · Kai Ma

Paper PDF Project Page
Multimedia Computing Group, Nanjing University   |   Platform and Content Group (PCG), Tencent

This repository is the checkpoint of paper "SteadyDancer: Harmonized and Coherent Human Image Animation with First-Frame Preservation". SteadyDancer is a strong animation framework based on Image-to-Video paradigm, ensuring robust first-frame preservation. In contrast to prior Reference-to-Video approaches that often suffer from identity drift due to spatio-temporal misalignments common in real-world applications, SteadyDancer generates high-fidelity and temporally coherent human animations, outperforming existing methods in visual quality and control while requiring significantly fewer training resources.

teaser

Notice

  • This is a diffusers gguf ,not a comfyUI gguf ,注意,这是基于diffuser的管线量化的gguf模型,如果使用comfyUI原生加载,需要用city96的量化方式,或者加载时候修改键名以适配comfyUI的模型结构

pipeline

from diffusers import  GGUFQuantizationConfig,WanTransformer3DModel,WanVideoToVideoPipeline
from transformers import UMT5EncoderModel
from diffusers.models import AutoencoderKLWan

gguf_path="https://huggingface.co/smthem/SteadyDancer-14B-gguf/blob/main/SteadyDancer-14B-Q8_0.gguf"
model_id="Wan-AI/Wan2.1-I2V-14B-720P-Diffusers" 

transformer = WanTransformer3DModel.from_single_file(
        gguf_path,
        config=model_id,
        quantization_config=GGUFQuantizationConfig(compute_dtype=torch.bfloat16),
        torch_dtype=torch.bfloat16,
        )

vae=AutoencoderKLWan.from_pretrained(model_id, torch_dtype=torch.bfloat16,)
text_encoder=UMT5EncoderModel.from_pretrained(model_id, torch_dtype=torch.bfloat16,)

pipe = WanVideoToVideoPipeline.from_pretrained(model_id, vae=vae,transformer=transformer,text_encoder=text_encoder, torch_dtype=torch.bfloat16)

# run infer
...

📚 Citation

If you find our paper or this codebase useful for your research, please cite us.

@misc{zhang2025steadydancer,
      title={SteadyDancer: Harmonized and Coherent Human Image Animation with First-Frame Preservation}, 
      author={Jiaming Zhang and Shengming Cao and Rui Li and Xiaotong Zhao and Yutao Cui and Xinglin Hou and Gangshan Wu and Haolan Chen and Yu Xu and Limin Wang and Kai Ma},
      year={2025},
      eprint={2511.19320},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2511.19320}, 
}
Downloads last month
30,558
GGUF
Model size
16B params
Architecture
wan
Hardware compatibility
Log In to view the estimation

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for smthem/SteadyDancer-14B-gguf

Quantized
(3)
this model

Dataset used to train smthem/SteadyDancer-14B-gguf