Spaces:
Runtime error
Runtime error
Update README.md
Browse files
README.md
CHANGED
|
@@ -29,7 +29,8 @@ Segment Anything Model (SAM) has shown impressive zero-shot transfer performance
|
|
| 29 |
|
| 30 |
## Installation
|
| 31 |
```bash
|
| 32 |
-
|
|
|
|
| 33 |
# download pretrained checkpoint
|
| 34 |
mkdir weights && cd weights
|
| 35 |
wget https://github.com/THU-MIG/RepViT/releases/download/v1.0/repvit_sam.pt
|
|
@@ -42,11 +43,11 @@ python app/app.py
|
|
| 42 |
```
|
| 43 |
|
| 44 |
## CoreML export
|
| 45 |
-
Please refer to [coreml_example.ipynb](
|
| 46 |
|
| 47 |
|
| 48 |
## Latency comparisons
|
| 49 |
-
Comparison between RepViT-SAM and others in terms of latency. The latency (ms) is measured with the standard resolution of 1024
|
| 50 |
|
| 51 |
<table class="tg">
|
| 52 |
<thead>
|
|
@@ -74,195 +75,6 @@ Comparison between RepViT-SAM and others in terms of latency. The latency (ms) i
|
|
| 74 |
</tbody>
|
| 75 |
</table>
|
| 76 |
|
| 77 |
-
|
| 78 |
-
## Zero-shot edge detection
|
| 79 |
-
|
| 80 |
-
Comparison results on BSDS500.
|
| 81 |
-
|
| 82 |
-
<table class="tg">
|
| 83 |
-
<thead>
|
| 84 |
-
<tr>
|
| 85 |
-
<th class="tg-c3ow" rowspan="2">Model</th>
|
| 86 |
-
<th class="tg-c3ow" colspan="3">zero-shot edge detection</th>
|
| 87 |
-
</tr>
|
| 88 |
-
<tr>
|
| 89 |
-
<th class="tg-c3ow">ODS</th>
|
| 90 |
-
<th class="tg-c3ow">OIS</th>
|
| 91 |
-
<th class="tg-c3ow">AP</th>
|
| 92 |
-
</tr>
|
| 93 |
-
</thead>
|
| 94 |
-
<tbody>
|
| 95 |
-
<tr>
|
| 96 |
-
<td class="tg-c3ow">ViT-H-SAM</td>
|
| 97 |
-
<td class="tg-c3ow"><b>.768</b></td>
|
| 98 |
-
<td class="tg-c3ow"><b>.786</b></td>
|
| 99 |
-
<td class="tg-c3ow"><b>.794</b></td>
|
| 100 |
-
</tr>
|
| 101 |
-
<tr>
|
| 102 |
-
<td class="tg-c3ow">ViT-B-SAM</td>
|
| 103 |
-
<td class="tg-c3ow">.743</td>
|
| 104 |
-
<td class="tg-c3ow">.764</td>
|
| 105 |
-
<td class="tg-c3ow">.726</td>
|
| 106 |
-
</tr>
|
| 107 |
-
<tr>
|
| 108 |
-
<td class="tg-c3ow">MobileSAM</td>
|
| 109 |
-
<td class="tg-c3ow">.756</td>
|
| 110 |
-
<td class="tg-c3ow">.768</td>
|
| 111 |
-
<td class="tg-c3ow">.746</td>
|
| 112 |
-
</tr>
|
| 113 |
-
<tr>
|
| 114 |
-
<td class="tg-c3ow">RepViT-SAM</td>
|
| 115 |
-
<td class="tg-c3ow"><ins>.764</ins></td>
|
| 116 |
-
<td class="tg-c3ow"><ins>.786</ins></td>
|
| 117 |
-
<td class="tg-c3ow"><ins>.773</ins></td>
|
| 118 |
-
</tr>
|
| 119 |
-
</tbody>
|
| 120 |
-
</table>
|
| 121 |
-
|
| 122 |
-
|
| 123 |
-
## Zero-shot instance segmentation and SegInW
|
| 124 |
-
Comparison results on COCO and SegInW.
|
| 125 |
-
|
| 126 |
-
<table class="tg">
|
| 127 |
-
<thead>
|
| 128 |
-
<tr>
|
| 129 |
-
<th class="tg-c3ow" rowspan="2">Model</th>
|
| 130 |
-
<th class="tg-c3ow" colspan="4">zero-shot instance segmentation</th>
|
| 131 |
-
<th class="tg-c3ow">SegInW</th>
|
| 132 |
-
</tr>
|
| 133 |
-
<tr>
|
| 134 |
-
<th class="tg-c3ow">AP</th>
|
| 135 |
-
<th class="tg-c3ow">$AP^{S}$</th>
|
| 136 |
-
<th class="tg-c3ow">$AP^{M}$</th>
|
| 137 |
-
<th class="tg-c3ow">$AP^{L}$</th>
|
| 138 |
-
<th class="tg-c3ow">Mean AP</th>
|
| 139 |
-
</tr>
|
| 140 |
-
</thead>
|
| 141 |
-
<tbody>
|
| 142 |
-
<tr>
|
| 143 |
-
<td class="tg-c3ow">ViT-H-SAM</td>
|
| 144 |
-
<td class="tg-c3ow"><b>46.8</b></td>
|
| 145 |
-
<td class="tg-c3ow"><b>31.8</b></td>
|
| 146 |
-
<td class="tg-c3ow"><b>51.0</b></td>
|
| 147 |
-
<td class="tg-c3ow"><b>63.6</b></td>
|
| 148 |
-
<td class="tg-c3ow"><b>48.7</b></td>
|
| 149 |
-
</tr>
|
| 150 |
-
<tr>
|
| 151 |
-
<td class="tg-c3ow">ViT-B-SAM</td>
|
| 152 |
-
<td class="tg-c3ow">42.5</td>
|
| 153 |
-
<td class="tg-c3ow"><ins>29.8</ins></td>
|
| 154 |
-
<td class="tg-c3ow">47.0</td>
|
| 155 |
-
<td class="tg-c3ow">56.8</td>
|
| 156 |
-
<td class="tg-c3ow">44.8</td>
|
| 157 |
-
</tr>
|
| 158 |
-
<tr>
|
| 159 |
-
<td class="tg-c3ow">MobileSAM</td>
|
| 160 |
-
<td class="tg-c3ow">42.7</td>
|
| 161 |
-
<td class="tg-c3ow">27.0</td>
|
| 162 |
-
<td class="tg-c3ow">46.5</td>
|
| 163 |
-
<td class="tg-c3ow">61.1</td>
|
| 164 |
-
<td class="tg-c3ow">43.9</td>
|
| 165 |
-
</tr>
|
| 166 |
-
<tr>
|
| 167 |
-
<td class="tg-c3ow">RepViT-SAM</td>
|
| 168 |
-
<td class="tg-c3ow"><ins>44.4</ins></td>
|
| 169 |
-
<td class="tg-c3ow">29.1</td>
|
| 170 |
-
<td class="tg-c3ow"><ins>48.6</ins></td>
|
| 171 |
-
<td class="tg-c3ow"><ins>61.4</ins></td>
|
| 172 |
-
<td class="tg-c3ow"><ins>46.1</ins></td>
|
| 173 |
-
</tr>
|
| 174 |
-
</tbody>
|
| 175 |
-
</table>
|
| 176 |
-
|
| 177 |
-
## Zero-shot video object/instance segmentation
|
| 178 |
-
Comparison results on DAVIS 2017 and UVO.
|
| 179 |
-
|
| 180 |
-
<table class="tg">
|
| 181 |
-
<thead>
|
| 182 |
-
<tr>
|
| 183 |
-
<th class="tg-c3ow" rowspan="2">Model</th>
|
| 184 |
-
<th class="tg-c3ow" colspan="3">z.s. VOS</th>
|
| 185 |
-
<th class="tg-c3ow">z.s. VIS</th>
|
| 186 |
-
</tr>
|
| 187 |
-
<tr>
|
| 188 |
-
<th class="tg-c3ow">$\mathcal{J\&F}$</th>
|
| 189 |
-
<th class="tg-c3ow">$\mathcal{J}$</th>
|
| 190 |
-
<th class="tg-c3ow">$\mathcal{F}$</th>
|
| 191 |
-
<th class="tg-c3ow">AR100</th>
|
| 192 |
-
</tr>
|
| 193 |
-
</thead>
|
| 194 |
-
<tbody>
|
| 195 |
-
<tr>
|
| 196 |
-
<td class="tg-c3ow">ViT-H-SAM</td>
|
| 197 |
-
<td class="tg-c3ow"><b>77.4</b></td>
|
| 198 |
-
<td class="tg-c3ow"><b>74.6</b></td>
|
| 199 |
-
<td class="tg-c3ow"><b>80.2</b></td>
|
| 200 |
-
<td class="tg-c3ow"><b>28.8</b></td>
|
| 201 |
-
</tr>
|
| 202 |
-
<tr>
|
| 203 |
-
<td class="tg-c3ow">ViT-B-SAM</td>
|
| 204 |
-
<td class="tg-c3ow">71.3</td>
|
| 205 |
-
<td class="tg-c3ow">68.5</td>
|
| 206 |
-
<td class="tg-c3ow">74.1</td>
|
| 207 |
-
<td class="tg-c3ow">19.1</td>
|
| 208 |
-
</tr>
|
| 209 |
-
<tr>
|
| 210 |
-
<td class="tg-c3ow">MobileSAM</td>
|
| 211 |
-
<td class="tg-c3ow">71.1</td>
|
| 212 |
-
<td class="tg-c3ow">68.6</td>
|
| 213 |
-
<td class="tg-c3ow">73.6</td>
|
| 214 |
-
<td class="tg-c3ow">22.7</td>
|
| 215 |
-
</tr>
|
| 216 |
-
<tr>
|
| 217 |
-
<td class="tg-c3ow">RepViT-SAM</td>
|
| 218 |
-
<td class="tg-c3ow"><ins>73.5</ins></td>
|
| 219 |
-
<td class="tg-c3ow"><ins>71.0</ins></td>
|
| 220 |
-
<td class="tg-c3ow"><ins>76.1</ins></td>
|
| 221 |
-
<td class="tg-c3ow"><ins>25.3</ins></td>
|
| 222 |
-
</tr>
|
| 223 |
-
</tbody>
|
| 224 |
-
</table>
|
| 225 |
-
|
| 226 |
-
## Zero-shot salient object segmentation
|
| 227 |
-
Comparison results on DUTS.
|
| 228 |
-
## Zero-shot anomaly detection
|
| 229 |
-
Comparison results on MVTec.
|
| 230 |
-
<table class="tg">
|
| 231 |
-
<thead>
|
| 232 |
-
<tr>
|
| 233 |
-
<th class="tg-c3ow" rowspan="2">Model</th>
|
| 234 |
-
<th class="tg-c3ow">z.s. s.o.s.</th>
|
| 235 |
-
<th class="tg-c3ow">z.s. a.d.</th>
|
| 236 |
-
</tr>
|
| 237 |
-
<tr>
|
| 238 |
-
<th class="tg-c3ow">$\mathcal{M}$ $\downarrow$</th>
|
| 239 |
-
<th class="tg-c3ow">$\mathcal{F}_{p}$</th>
|
| 240 |
-
</tr>
|
| 241 |
-
</thead>
|
| 242 |
-
<tbody>
|
| 243 |
-
<tr>
|
| 244 |
-
<td class="tg-c3ow">ViT-H-SAM</td>
|
| 245 |
-
<td class="tg-c3ow"><b>0.046</b></td>
|
| 246 |
-
<td class="tg-c3ow"><ins>37.65</ins></td>
|
| 247 |
-
</tr>
|
| 248 |
-
<tr>
|
| 249 |
-
<td class="tg-c3ow">ViT-B-SAM</td>
|
| 250 |
-
<td class="tg-c3ow">0.121</td>
|
| 251 |
-
<td class="tg-c3ow">36.62</td>
|
| 252 |
-
</tr>
|
| 253 |
-
<tr>
|
| 254 |
-
<td class="tg-c3ow">MobileSAM</td>
|
| 255 |
-
<td class="tg-c3ow">0.147</td>
|
| 256 |
-
<td class="tg-c3ow">36.44</td>
|
| 257 |
-
</tr>
|
| 258 |
-
<tr>
|
| 259 |
-
<td class="tg-c3ow">RepViT-SAM</td>
|
| 260 |
-
<td class="tg-c3ow"><ins>0.066</ins></td>
|
| 261 |
-
<td class="tg-c3ow"><b>37.96</b></td>
|
| 262 |
-
</tr>
|
| 263 |
-
</tbody>
|
| 264 |
-
</table>
|
| 265 |
-
|
| 266 |
## Acknowledgement
|
| 267 |
|
| 268 |
The code base is partly built with [SAM](https://github.com/facebookresearch/segment-anything) and [MobileSAM](https://github.com/ChaoningZhang/MobileSAM).
|
|
|
|
| 29 |
|
| 30 |
## Installation
|
| 31 |
```bash
|
| 32 |
+
git clone https://github.com/THU-MIG/RepViT
|
| 33 |
+
cd sam && pip install -e .
|
| 34 |
# download pretrained checkpoint
|
| 35 |
mkdir weights && cd weights
|
| 36 |
wget https://github.com/THU-MIG/RepViT/releases/download/v1.0/repvit_sam.pt
|
|
|
|
| 43 |
```
|
| 44 |
|
| 45 |
## CoreML export
|
| 46 |
+
Please refer to [coreml_example.ipynb](https://github.com/THU-MIG/RepViT/blob/main/sam/notebooks/coreml_example.ipynb)
|
| 47 |
|
| 48 |
|
| 49 |
## Latency comparisons
|
| 50 |
+
Comparison between RepViT-SAM and others in terms of latency. The latency (ms) is measured with the standard resolution of 1024 x 1024 on iPhone 12 and Macbook M1 Pro by Core ML Tools. OOM means out of memory.
|
| 51 |
|
| 52 |
<table class="tg">
|
| 53 |
<thead>
|
|
|
|
| 75 |
</tbody>
|
| 76 |
</table>
|
| 77 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 78 |
## Acknowledgement
|
| 79 |
|
| 80 |
The code base is partly built with [SAM](https://github.com/facebookresearch/segment-anything) and [MobileSAM](https://github.com/ChaoningZhang/MobileSAM).
|