Akide
/

SegViTv1

Model card Files Files and versions

xet

Community

Akide commited on Dec 29, 2023

Commit

1cfdaf6

1 Parent(s): 80a9dba

Update README.md

Browse files

Files changed (1) hide show

README.md +83 -0

README.md CHANGED Viewed

@@ -3,3 +3,86 @@ license: other
 license_name: adelaidet-non-commercial
 license_link: https://github.com/zbwxp/SegVit/blob/master/LICENSE
 ---

 license_name: adelaidet-non-commercial
 license_link: https://github.com/zbwxp/SegVit/blob/master/LICENSE
 ---
+# Official Pytorch Implementation of SegViT [[code]](https://github.com/zbwxp/SegVit)
+### SegViT: Semantic Segmentation with Plain Vision Transformers
+Zhang, Bowen and Tian, Zhi and Tang, Quan and Chu, Xiangxiang and Wei, Xiaolin and Shen, Chunhua and Liu, Yifan.
+NeurIPS 2022. [[paper]](https://arxiv.org/abs/2210.05844)
+### SegViTv2: Exploring Efficient and Continual Semantic Segmentation with Plain Vision Transformers
+Bowen Zhang, Liyang Liu, Minh Hieu Phan, Zhi Tian, Chunhua Shen and Yifan Liu.
+IJCV 2023. [[paper]](https://arxiv.org/abs/2306.06289) [we are refactoring code for release ...]
+This repository contains the official Pytorch implementation of training & evaluation code and the pretrained models for SegViT and the extended version SegViT v2.
+## Highlights
+* **Simple Decoder:** The Attention-to-Mask (ATM) decoder provides a simple segmentation head for Plain Vision Transformer, which is easy to extend to other downstream tasks.
+* **Light Structure:** We proposed *Shrunk* structure that can save up to **40%** computational cost in a structure with ViT backbone.
+* **Stronger performance:** We got state-of-the-art performance mIoU **55.2%** on ADE20K, mIoU **50.3%** on COCOStuff10K, and mIoU **65.3%** on PASCAL-Context datasets with the least amount of computational cost among counterparts using ViT backbone.
+* **Scaleability** SegViT v2 employed more powerful backbones (BEiT-V2) obtained state-of-the-art performance mIoU **58.2%** (MS) on ADE20K, mIoU **53.5%** (MS) on COCOStuff10K, and mIoU **67.14%** (MS) on PASCAL-Context datasets, showcasing strong scalability.
+* **Continuals Learning** We propose to adapt SegViT v2 for continual semantic segmentation, demonstrating nearly zero forgetting of previously learned knowledge.
+As shown in the following figure, the similarity between the class query and the image features is transfered to the segmentation mask.
+<img src="./resources/v2_figure_1.png">
+<img src="./resources/teaser-01.png">
+<img src="resources/atm_arch-1.png">
+## Getting started
+1. Install the [mmsegmentation](https://github.com/open-mmlab/mmsegmentation) library and some required packages.
+```bash
+pip install mmcv-full==1.4.4 mmsegmentation==0.24.0
+pip install scipy timm
+```
+## Training
+```
+python tools/dist_train.sh  configs/segvit/segvit_vit-l_jax_640x640_160k_ade20k.py
+```
+## Evaluation
+```
+python tools/dist_test.sh configs/segvit/segvit_vit-l_jax_640x640_160k_ade20k.py   {path_to_ckpt}
+```
+## Datasets
+Please follow the instructions of [mmsegmentation](https://github.com/open-mmlab/mmsegmentation) data preparation
+## Results
+| Model backbone        |datasets| mIoU  | mIoU (ms) | GFlops | ckpt
+| ------------------ |--------------|---------------- | -------------- |--- |---
+Vit-Base | ADE20k | 51.3 | 53.0 | 120.9 |[model](https://huggingface.co/Akide/SegViTv1/blob/main/ade_51.3.pth)
+Vit-Large (Shrunk) | ADE20k | 53.9 | 55.1 | 373.5 | [model](https://huggingface.co/Akide/SegViTv1/blob/main/ade_shrunk_53.9.pth)
+Vit-Large | ADE20k | 54.6 | 55.2 | 637.9 | [model](https://huggingface.co/Akide/SegViTv1/blob/main/ade_54.6.pth)
+Vit-Large (Shrunk) | COCOStuff10K | 49.1 | 49.4 | 224.8 | [model](https://huggingface.co/Akide/SegViTv1/blob/main/COCOstuff10k_shrunk_49.1.pth)
+Vit-Large | COCOStuff10K | 49.9 | 50.3| 383.9 | [model](https://huggingface.co/Akide/SegViTv1/blob/main/COCOstuff_49.9.pth)
+Vit-Large (Shrunk) | PASCAL-Context (59cls)| 62.3 | 63.7  | 186.9 | [model](https://huggingface.co/Akide/SegViTv1/blob/main/PC59cls_62.3.pth)
+Vit-Large  | PASCAL-Context (59cls)| 64.1 | 65.3  | 321.6 | [model](https://huggingface.co/Akide/SegViTv1/blob/main/PC59cls_64.1.pth)
+## License
+For academic use, this project is licensed under the 2-clause BSD License - see the LICENSE file for details. For commercial use, please contact the authors.
+## Citation
+```
+@article{zhang2022segvit,
+  title={SegViT: Semantic Segmentation with Plain Vision Transformers},
+  author={Zhang, Bowen and Tian, Zhi and Tang, Quan and Chu, Xiangxiang and Wei, Xiaolin and Shen, Chunhua and Liu, Yifan},
+  journal={NeurIPS},
+  year={2022}
+}
+@article{zhang2023segvitv2,
+  title={SegViTv2: Exploring Efficient and Continual Semantic Segmentation with Plain Vision Transformers},
+  author={Zhang, Bowen and Liu, Liyang and Phan, Minh Hieu and Tian, Zhi and Shen, Chunhua and Liu, Yifan},
+  journal={IJCV},
+  year={2023}
+}
+```