File size: 5,411 Bytes
57917d6
 
 
 
 
 
 
 
 
 
f51d708
4df4949
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
57917d6
 
a60bfd9
be310dc
 
57917d6
 
be310dc
57917d6
 
ca85b97
 
57917d6
be310dc
57917d6
 
 
 
ca85b97
 
be310dc
57917d6
ca85b97
 
 
 
 
 
57917d6
 
be310dc
57917d6
 
 
 
 
be310dc
 
57917d6
 
 
 
 
 
 
 
 
 
be310dc
 
ca85b97
 
57917d6
be310dc
57917d6
b9c4df6
57917d6
 
 
a60bfd9
 
57917d6
a60bfd9
57917d6
 
 
 
b9c4df6
 
 
 
 
 
 
57917d6
a60bfd9
 
 
 
57917d6
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
---
license: apache-2.0
language:
- en
- zh
pipeline_tag: image-to-image
library_name: diffusers
---

## 🔥🔥🔥 News!!
Nov 26, 2025: 👋 We release [Step1X-Edit-v1p2](https://huggingface.co/stepfun-ai/Step1X-Edit-v1p2) (referred to as **ReasonEdit-S** in the paper), a native reasoning edit model with better performance on KRIS-Bench and GEdit-Bench. Technical report can be found [here](https://arxiv.org/abs/2511.22625).
<table>
<thead>
<tr>
  <th rowspan="2">Models</th>
  <th colspan="3"> <div align="center">GEdit-Bench</div> </th>
  <th colspan="4"> <div align="center">Kris-Bench</div> </th>
</tr>
<tr>
  <th>G_SC⬆️</th> <th>G_PQ⬆️ </th> <th>G_O⬆️</th> <th>FK⬆️</th> <th>CK⬆️</th> <th>PK⬆️ </th> <th>Overall⬆️</th>
</tr>
</thead>
<tbody>
<tr>  
  <td>Flux-Kontext-dev </td> <td>7.16</td> <td>7.37</td> <td>6.51</td> <td>53.28</td> <td>50.36</td> <td>42.53</td> <td>49.54</td>
</tr>
<tr>   
  <td>Qwen-Image-Edit-2509 </td> <td>8.00</td> <td>7.86</td> <td>7.56</td> <td>61.47</td> <td>56.79</td> <td>47.07</td> <td>56.15</td>
</tr>
<tr>
  <td>Step1X-Edit v1.1 </td> <td>7.66</td> <td>7.35</td> <td>6.97</td> <td>53.05</td> <td>54.34</td> <td>44.66</td> <td>51.59</td>
</tr>
<tr>
  <td>Step1x-edit-v1p2-preview </td> <td>8.14</td> <td>7.55</td> <td>7.42</td> <td>60.49</td> <td>58.81</td> <td>41.77</td> <td>52.51</td>
</tr>
<tr>
  <td>Step1x-edit-v1p2 (base) </td> <td>7.77</td> <td>7.65</td> <td>7.24</td> <td>58.23</td> <td>60.55</td> <td>46.21</td> <td>56.33</td>
</tr>
<tr>
  <td>Step1x-edit-v1p2 (thinking) </td> <td>8.02</td> <td>7.64</td> <td>7.36</td> <td>59.79</td> <td>62.76</td> <td>49.78</td> <td>58.64</td>
</tr>
<tr>
  <td>Step1x-edit-v1p2 (thinking + reflection) </td> <td>8.18</td> <td>7.85</td> <td>7.58</td> <td>62.44</td> <td>65.72</td> <td>50.42</td> <td>60.93</td>
</tr>
</table>


## ⚡️ Model Usages
Make sure your `transformers==4.55.0` (we tested on this version)

Install the `diffusers` package from the following command:
```bash
git clone -b step1xedit_v1p2 https://github.com/Peyton-Chen/diffusers.git
cd diffusers
pip install -e .

pip install RegionE # optional, for faster inference
```
Here is an example for using the `Step1X-Edit-v1p2` model to edit images:
```python
import torch
from diffusers import Step1XEditPipelineV1P2
from diffusers.utils import load_image
from RegionE import RegionEHelper

pipe = Step1XEditPipelineV1P2.from_pretrained("stepfun-ai/Step1X-Edit-v1p2", torch_dtype=torch.bfloat16)
pipe.to("cuda")

# Import the RegionEHelper, optional, for faster inference
regionehelper = RegionEHelper(pipe)
regionehelper.set_params()   # default hyperparameter
regionehelper.enable()

print("=== processing image ===")
image = load_image("examples/0000.jpg").convert("RGB")
prompt = "add a ruby pendant on the girl's neck."
enable_thinking_mode=True
enable_reflection_mode=True
pipe_output = pipe(
    image=image,
    prompt=prompt,
    num_inference_steps=50,
    true_cfg_scale=6,
    generator=torch.Generator().manual_seed(42),
    enable_thinking_mode=enable_thinking_mode,
    enable_reflection_mode=enable_reflection_mode,
)
if enable_thinking_mode:
    print("Reformat Prompt:", pipe_output.reformat_prompt)
for image_idx in range(len(pipe_output.images)):
    pipe_output.images[image_idx].save(f"0001-{image_idx}.jpg", lossless=True)
    if enable_reflection_mode:
        print(pipe_output.think_info[image_idx])
        print(pipe_output.best_info[image_idx])
pipe_output.final_images[0].save(f"0001-final.jpg", lossless=True)

regionehelper.disable()
```
The results looks like:
<div align="center">
<img width="1080" alt="results" src="assets/v1p2_vis.jpeg">
</div>


## 📖 Introduction
Step1X-Edit-v1p2 represents a step towards reasoning-enhanced image editing models. We show that unlocking the reasoning capabilities of MLLMs can further expand the limits of instruction-based editing. Specifically, we introduce two complementary reasoning mechanisms, thinking and reflection, to improve instruction comprehension and editing accuracy. Building on these mechanisms, our framework performs editing in a thinking–editing–reflection loop: **the thinking stage** leverages MLLM world knowledge to interpret abstract instructions, while **the reflection stage** reviews the edited outputs, corrects unintended changes, and determines when to stop. For more details, please refer to our technical report.
<div align="center">
<img width="1080" alt="results" src="assets/ReasonEdit_intro.jpg">
</div>

## Citation
```
@article{yin2025reasonedit,
  title={ReasonEdit: Towards Reasoning-Enhanced Image Editing Models}, 
  author={Fukun Yin, Shiyu Liu, Yucheng Han, Zhibo Wang, Peng Xing, Rui Wang, Wei Cheng, Yingming Wang, Aojie Li, Zixin Yin, Pengtao Chen, Xiangyu Zhang, Daxin Jiang, Xianfang Zeng, Gang Yu},
  journal={arXiv preprint arXiv:2511.22625},
  year={2025}
}

@article{liu2025step1x-edit,
  title={Step1X-Edit: A Practical Framework for General Image Editing}, 
  author={Shiyu Liu and Yucheng Han and Peng Xing and Fukun Yin and Rui Wang and Wei Cheng and Jiaqi Liao and Yingming Wang and Honghao Fu and Chunrui Han and Guopeng Li and Yuang Peng and Quan Sun and Jingwei Wu and Yan Cai and Zheng Ge and Ranchen Ming and Lei Xia and Xianfang Zeng and Yibo Zhu and Binxing Jiao and Xiangyu Zhang and Gang Yu and Daxin Jiang},
  journal={arXiv preprint arXiv:2504.17761},
  year={2025}
}
```