Instructions to use QWW/EditCLIP-IP2P with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use QWW/EditCLIP-IP2P with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline from diffusers.utils import load_image # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("QWW/EditCLIP-IP2P", dtype=torch.bfloat16, device_map="cuda") prompt = "Turn this cat into a dog" input_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cat.png") image = pipe(image=input_image, prompt=prompt).images[0] - Notebooks
- Google Colab
- Kaggle
license: mit
datasets:
- timbrooks/instructpix2pix-clip-filtered
- Aleksandar/Top-Bench-X
language:
- en
base_model:
- stable-diffusion-v1-5/stable-diffusion-v1-5
pipeline_tag: image-to-image
library_name: diffusers
EditCLIP: Representation Learning for Image Editing
π‘ Abstract
We introduce EditCLIP, a novel representation-learning approach for image editing. Our method learns a unified representation of edits by jointly encoding an input image and its edited counterpart, effectively capturing their transformation. To evaluate its effectiveness, we employ EditCLIP to solve two tasks: exemplar-based image editing and automated edit evaluation. In exemplar-based image editing, we replace text-based instructions in InstructPix2Pix with EditCLIP embeddings computed from a reference exemplar image pair. Experiments demonstrate that our approach outperforms state-of-the-art methods while being more efficient and versatile. For automated evaluation, EditCLIP assesses image edits by measuring the similarity between the EditCLIP embedding of a given image pair and either a textual editing instruction or the EditCLIP embedding of another reference image pair. Experiments show that EditCLIP aligns more closely with human judgments than existing CLIP-based metrics, providing a reliable measure of edit quality and structural preservation.
π Benchmark
We evaluate EditCLIP using Top-Bench-X, a benchmark for image editing evaluation:
- Dataset: Top-Bench-X
- Link: https://huggingface.co/datasets/Aleksandar/Top-Bench-X
π Citation
@inproceedings{wang2025editclip,
title={EditCLIP: Representation Learning for Image Editing},
author={Wang, Qian and Cveji{\'c}, Aleksandar and Eldesokey, Abdelrahman and Wonka, Peter},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
pages={15960--15970},
year={2025}
}