RAE: Diffusion Transformers with Representation Autoencoders

This repository contains the official PyTorch checkpoints for Representation Autoencoders.

Representation Autoencoders (RAE) are a class of autoencoders that utilize pretrained, frozen representation encoders such as DINOv2 and SigLIP2 as encoders with trained ViT decoders. RAE can be used in a two-stage training pipeline for high-fidelity image synthesis, where a Stage 2 diffusion model is trained on the latent space of a pretrained RAE to generate images.

Website: https://rae-dit.github.io/

Code: https://github.com/bytetriper/RAE

Paper: https://huggingface.co/papers/2510.11690

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

Unconditional Image Generation

This model isn't deployed by any Inference Provider. 🙋 1 Ask for provider support

Collection including nyu-visionx/RAE-collections

RAE

Collection

Collection for Diffusion Transformers with Representation Autoencoders • 1 item • Updated Oct 14 • 10