Possibility of replacing base pretrained models for inference

by jing-yi - opened Mar 15, 2025

Mar 15, 2025

Hello!

I was reading the documentation for this model.

Under the hood, it uses https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0 and https://huggingface.co/laion/CLIP-ViT-bigG-14-laion2B-39B-b160k.

I was wondering..is it possible to replace them to smaller models during inference? For example, https://huggingface.co/segmind/Segmind-Vega and https://huggingface.co/openai/clip-vit-large-patch14.

doge1516

Owner Mar 19, 2025

MS-Diffusion's trainable adapters are built on SDXL and CLIP-G. They transform the CLIP image features into SDXL cross-attention tokens. A distilled SDXL can be used if it has the same cross-attention layers. However, since the output image features of CLIP-L and CLIP-G are different in shape, CLIP-G cannot be replaced by CLIP-L.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment