Update README.md
Browse files
README.md
CHANGED
|
@@ -12,14 +12,14 @@ base_model: Qwen/Qwen2.5-VL-72B-Instruct
|
|
| 12 |
library_name: transformers
|
| 13 |
---
|
| 14 |
|
| 15 |
-
# Qwen2.5-VL-72B-Instruct-quantized-
|
| 16 |
|
| 17 |
## Model Overview
|
| 18 |
- **Model Architecture:** Qwen/Qwen2.5-VL-72B-Instruct
|
| 19 |
- **Input:** Vision-Text
|
| 20 |
- **Output:** Text
|
| 21 |
- **Model Optimizations:**
|
| 22 |
-
- **Weight quantization:**
|
| 23 |
- **Activation quantization:** FP16
|
| 24 |
- **Release Date:** 2/24/2025
|
| 25 |
- **Version:** 1.0
|
|
@@ -29,7 +29,7 @@ Quantized version of [Qwen/Qwen2.5-VL-72B-Instruct](https://huggingface.co/Qwen/
|
|
| 29 |
|
| 30 |
### Model Optimizations
|
| 31 |
|
| 32 |
-
This model was obtained by quantizing the weights of [Qwen/Qwen2.5-VL-72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-72B-Instruct) to
|
| 33 |
|
| 34 |
## Deployment
|
| 35 |
|
|
@@ -203,10 +203,10 @@ The model was evaluated using [mistral-evals](https://github.com/neuralmagic/mis
|
|
| 203 |
- chartqa
|
| 204 |
|
| 205 |
```
|
| 206 |
-
vllm serve
|
| 207 |
|
| 208 |
python -m eval.run eval_vllm \
|
| 209 |
-
--model_name
|
| 210 |
--url http://0.0.0.0:8000 \
|
| 211 |
--output_dir ~/tmp \
|
| 212 |
--eval_name <vision_task_name>
|
|
|
|
| 12 |
library_name: transformers
|
| 13 |
---
|
| 14 |
|
| 15 |
+
# Qwen2.5-VL-72B-Instruct-quantized-w4a16
|
| 16 |
|
| 17 |
## Model Overview
|
| 18 |
- **Model Architecture:** Qwen/Qwen2.5-VL-72B-Instruct
|
| 19 |
- **Input:** Vision-Text
|
| 20 |
- **Output:** Text
|
| 21 |
- **Model Optimizations:**
|
| 22 |
+
- **Weight quantization:** INT4
|
| 23 |
- **Activation quantization:** FP16
|
| 24 |
- **Release Date:** 2/24/2025
|
| 25 |
- **Version:** 1.0
|
|
|
|
| 29 |
|
| 30 |
### Model Optimizations
|
| 31 |
|
| 32 |
+
This model was obtained by quantizing the weights of [Qwen/Qwen2.5-VL-72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-72B-Instruct) to INT4 data type, ready for inference with vLLM >= 0.5.2.
|
| 33 |
|
| 34 |
## Deployment
|
| 35 |
|
|
|
|
| 203 |
- chartqa
|
| 204 |
|
| 205 |
```
|
| 206 |
+
vllm serve RedHatAI/Qwen2.5-VL-72B-Instruct-quantized.w4a16 --tensor_parallel_size 1 --max_model_len 25000 --trust_remote_code --max_num_seqs 8 --gpu_memory_utilization 0.9 --dtype float16 --limit_mm_per_prompt image=7
|
| 207 |
|
| 208 |
python -m eval.run eval_vllm \
|
| 209 |
+
--model_name RedHatAI/Qwen2.5-VL-72B-Instruct-quantized.w4a16 \
|
| 210 |
--url http://0.0.0.0:8000 \
|
| 211 |
--output_dir ~/tmp \
|
| 212 |
--eval_name <vision_task_name>
|