Update README.md
Browse files
README.md
CHANGED
|
@@ -3,3 +3,86 @@ license: other
|
|
| 3 |
license_name: adelaidet-non-commercial
|
| 4 |
license_link: https://github.com/zbwxp/SegVit/blob/master/LICENSE
|
| 5 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
license_name: adelaidet-non-commercial
|
| 4 |
license_link: https://github.com/zbwxp/SegVit/blob/master/LICENSE
|
| 5 |
---
|
| 6 |
+
|
| 7 |
+
# Official Pytorch Implementation of SegViT [[code]](https://github.com/zbwxp/SegVit)
|
| 8 |
+
|
| 9 |
+
### SegViT: Semantic Segmentation with Plain Vision Transformers
|
| 10 |
+
|
| 11 |
+
Zhang, Bowen and Tian, Zhi and Tang, Quan and Chu, Xiangxiang and Wei, Xiaolin and Shen, Chunhua and Liu, Yifan.
|
| 12 |
+
|
| 13 |
+
NeurIPS 2022. [[paper]](https://arxiv.org/abs/2210.05844)
|
| 14 |
+
|
| 15 |
+
### SegViTv2: Exploring Efficient and Continual Semantic Segmentation with Plain Vision Transformers
|
| 16 |
+
|
| 17 |
+
Bowen Zhang, Liyang Liu, Minh Hieu Phan, Zhi Tian, Chunhua Shen and Yifan Liu.
|
| 18 |
+
|
| 19 |
+
IJCV 2023. [[paper]](https://arxiv.org/abs/2306.06289) [we are refactoring code for release ...]
|
| 20 |
+
|
| 21 |
+
This repository contains the official Pytorch implementation of training & evaluation code and the pretrained models for SegViT and the extended version SegViT v2.
|
| 22 |
+
|
| 23 |
+
## Highlights
|
| 24 |
+
* **Simple Decoder:** The Attention-to-Mask (ATM) decoder provides a simple segmentation head for Plain Vision Transformer, which is easy to extend to other downstream tasks.
|
| 25 |
+
* **Light Structure:** We proposed *Shrunk* structure that can save up to **40%** computational cost in a structure with ViT backbone.
|
| 26 |
+
* **Stronger performance:** We got state-of-the-art performance mIoU **55.2%** on ADE20K, mIoU **50.3%** on COCOStuff10K, and mIoU **65.3%** on PASCAL-Context datasets with the least amount of computational cost among counterparts using ViT backbone.
|
| 27 |
+
* **Scaleability** SegViT v2 employed more powerful backbones (BEiT-V2) obtained state-of-the-art performance mIoU **58.2%** (MS) on ADE20K, mIoU **53.5%** (MS) on COCOStuff10K, and mIoU **67.14%** (MS) on PASCAL-Context datasets, showcasing strong scalability.
|
| 28 |
+
* **Continuals Learning** We propose to adapt SegViT v2 for continual semantic segmentation, demonstrating nearly zero forgetting of previously learned knowledge.
|
| 29 |
+
|
| 30 |
+
As shown in the following figure, the similarity between the class query and the image features is transfered to the segmentation mask.
|
| 31 |
+
|
| 32 |
+
<img src="./resources/v2_figure_1.png">
|
| 33 |
+
<img src="./resources/teaser-01.png">
|
| 34 |
+
<img src="resources/atm_arch-1.png">
|
| 35 |
+
|
| 36 |
+
|
| 37 |
+
## Getting started
|
| 38 |
+
|
| 39 |
+
1. Install the [mmsegmentation](https://github.com/open-mmlab/mmsegmentation) library and some required packages.
|
| 40 |
+
|
| 41 |
+
```bash
|
| 42 |
+
pip install mmcv-full==1.4.4 mmsegmentation==0.24.0
|
| 43 |
+
pip install scipy timm
|
| 44 |
+
```
|
| 45 |
+
## Training
|
| 46 |
+
```
|
| 47 |
+
python tools/dist_train.sh configs/segvit/segvit_vit-l_jax_640x640_160k_ade20k.py
|
| 48 |
+
```
|
| 49 |
+
## Evaluation
|
| 50 |
+
```
|
| 51 |
+
python tools/dist_test.sh configs/segvit/segvit_vit-l_jax_640x640_160k_ade20k.py {path_to_ckpt}
|
| 52 |
+
```
|
| 53 |
+
|
| 54 |
+
## Datasets
|
| 55 |
+
Please follow the instructions of [mmsegmentation](https://github.com/open-mmlab/mmsegmentation) data preparation
|
| 56 |
+
|
| 57 |
+
## Results
|
| 58 |
+
| Model backbone |datasets| mIoU | mIoU (ms) | GFlops | ckpt
|
| 59 |
+
| ------------------ |--------------|---------------- | -------------- |--- |---
|
| 60 |
+
Vit-Base | ADE20k | 51.3 | 53.0 | 120.9 |[model](https://huggingface.co/Akide/SegViTv1/blob/main/ade_51.3.pth)
|
| 61 |
+
Vit-Large (Shrunk) | ADE20k | 53.9 | 55.1 | 373.5 | [model](https://huggingface.co/Akide/SegViTv1/blob/main/ade_shrunk_53.9.pth)
|
| 62 |
+
Vit-Large | ADE20k | 54.6 | 55.2 | 637.9 | [model](https://huggingface.co/Akide/SegViTv1/blob/main/ade_54.6.pth)
|
| 63 |
+
Vit-Large (Shrunk) | COCOStuff10K | 49.1 | 49.4 | 224.8 | [model](https://huggingface.co/Akide/SegViTv1/blob/main/COCOstuff10k_shrunk_49.1.pth)
|
| 64 |
+
Vit-Large | COCOStuff10K | 49.9 | 50.3| 383.9 | [model](https://huggingface.co/Akide/SegViTv1/blob/main/COCOstuff_49.9.pth)
|
| 65 |
+
Vit-Large (Shrunk) | PASCAL-Context (59cls)| 62.3 | 63.7 | 186.9 | [model](https://huggingface.co/Akide/SegViTv1/blob/main/PC59cls_62.3.pth)
|
| 66 |
+
Vit-Large | PASCAL-Context (59cls)| 64.1 | 65.3 | 321.6 | [model](https://huggingface.co/Akide/SegViTv1/blob/main/PC59cls_64.1.pth)
|
| 67 |
+
|
| 68 |
+
|
| 69 |
+
|
| 70 |
+
## License
|
| 71 |
+
For academic use, this project is licensed under the 2-clause BSD License - see the LICENSE file for details. For commercial use, please contact the authors.
|
| 72 |
+
|
| 73 |
+
## Citation
|
| 74 |
+
```
|
| 75 |
+
@article{zhang2022segvit,
|
| 76 |
+
title={SegViT: Semantic Segmentation with Plain Vision Transformers},
|
| 77 |
+
author={Zhang, Bowen and Tian, Zhi and Tang, Quan and Chu, Xiangxiang and Wei, Xiaolin and Shen, Chunhua and Liu, Yifan},
|
| 78 |
+
journal={NeurIPS},
|
| 79 |
+
year={2022}
|
| 80 |
+
}
|
| 81 |
+
|
| 82 |
+
@article{zhang2023segvitv2,
|
| 83 |
+
title={SegViTv2: Exploring Efficient and Continual Semantic Segmentation with Plain Vision Transformers},
|
| 84 |
+
author={Zhang, Bowen and Liu, Liyang and Phan, Minh Hieu and Tian, Zhi and Shen, Chunhua and Liu, Yifan},
|
| 85 |
+
journal={IJCV},
|
| 86 |
+
year={2023}
|
| 87 |
+
}
|
| 88 |
+
```
|