This model always predicts some few nonsense sequences

by CharlesChen2023 - opened 17 days ago

CharlesChen2023

•

I am encountering an issue with a quantized version of the [Model Name] model. The model frequently generates nonsense sequences (e.g., 人事, 出生 etc.) these two words should be '(.*?).

wenhuach

Intel org 17 days ago

Thank you for the information. Would you mind sharing the serving command and the evaluation prompts , which we can use to evaluate model quality when producing a new quantized version?
The issue has been tracked here. https://github.com/intel/auto-round/issues/1480

CharlesChen2023

17 days ago

/root/miniconda3/envs/vllm-glm-int4/bin/python -m vllm.entrypoints.openai.api_server
--model $MODEL_ID
--served-model-name claude-opus-4-6
--port 80
--trust-remote-code
--max-model-len 202752
--tensor-parallel-size 8
--gpu-memory-utilization 0.85
--tool-call-parser glm47
--reasoning-parser glm45
--enable-auto-tool-choice
--max-num-seqs 16

nonsense characters, should be in/output

wenhuach

Intel org 15 days ago

Could you share some text inputs to reproduce this issue

CharlesChen2023

14 days ago

It is difficult to reproduce, I use it in the claude code. But the cases like that often show up.

wenhuach

Intel org 4 days ago

Please expect a delayed fix since our server is currently very busy. I typically do not have enough resources to verify such a large model.

draggyz

about 10 hours ago

ព is one of the letters of the Khmer alphabet.

wenhuach

Intel org about 8 hours ago

We’re working on a fix for this issue. An updated model will be uploaded within about one day. Since the model is too large for thorough testing, we’ve adjusted two factors to mitigate the problem:

1 Change the model dtype from FP16 to BF16. FP16 can cause overflow, but it was previously the only option during quantization.

2 Reduced the group size from 128 to 64.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment