HyperCLOVAX-SEED-Think-32B AWQ (W4A16)

naver-hyperclovax/HyperCLOVAX-SEED-Think-32B 모델을 AWQ 방식으로 4비트 양자화한 버전입니다. compressed-tensors==0.13.0 버전에서 제작되었습니다.

Model Details

Attribute Value
Base Model naver-hyperclovax/HyperCLOVAX-SEED-Think-32B
Architecture HCXVisionForCausalLM (VLM)
Quantization AWQ (W4A16)
Bits 4-bit weights, 16-bit activations
Calibration Dataset ChuGyouk/Asan-AMC-Healthinfo
Quantization Tool llmcompressor

Quantization Config

# HyperCLOVAX VLM 모델에 맞는 커스텀 AWQ 매핑
hyperclovax_mappings = [
    AWQMapping(
        smooth_layer="re:.*language_model.*layers\\.\\d+\\.input_layernorm$",
        balance_layers=[
            "re:.*language_model.*layers\\.\\d+\\.self_attn\\.q_proj$",
            "re:.*language_model.*layers\\.\\d+\\.self_attn\\.k_proj$",
            "re:.*language_model.*layers\\.\\d+\\.self_attn\\.v_proj$",
        ],
    ),
    AWQMapping(
        smooth_layer="re:.*language_model.*layers\\.\\d+\\.post_attention_layernorm$",
        balance_layers=[
            "re:.*language_model.*layers\\.\\d+\\.mlp\\.gate_proj$",
            "re:.*language_model.*layers\\.\\d+\\.mlp\\.up_proj$",
        ],
    ),
    AWQMapping(
        smooth_layer="re:.*language_model.*layers\\.\\d+\\.mlp\\.up_proj$",
        balance_layers=["re:.*language_model.*layers\\.\\d+\\.mlp\\.down_proj$"],
    ),
]

AWQModifier(
    ignore=["lm_head", "re:.*vision_model.*", "re:.*visual.*"],
    scheme="W4A16",
    targets=["Linear"],
    mappings=hyperclovax_mappings,
)
  • lm_head: 출력 레이어는 양자화 제외
  • vision_model, visual: 비전 모델 부분은 양자화 제외

Installation

pip install compressed-tensors==0.13.0

호환성을 위해 위 버전 설치를 권장합니다.

Usage

With vLLM (Recommended)

from vllm import LLM, SamplingParams

model = LLM(
    model="NotoriousH2/HyperCLOVAX-SEED-Think-32B-awq-w4a16",
    trust_remote_code=True,
)
sampling_params = SamplingParams(temperature=0.7, max_tokens=512)

prompt = "고혈압 환자의 식이요법에 대해 설명해주세요."
output = model.generate([prompt], sampling_params)
print(output[0].outputs[0].text)

With Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "NotoriousH2/HyperCLOVAX-SEED-Think-32B-awq-w4a16",
    torch_dtype="auto",
    device_map="auto",
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(
    "NotoriousH2/HyperCLOVAX-SEED-Think-32B-awq-w4a16",
    trust_remote_code=True,
)

messages = [{"role": "user", "content": "고혈압 환자의 식이요법에 대해 설명해주세요."}]
input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
output = model.generate(input_ids, max_new_tokens=512)
print(tokenizer.decode(output[0], skip_special_tokens=True))

License

This model inherits the license from the base model. Please refer to naver-hyperclovax/HyperCLOVAX-SEED-Think-32B for license details.

Downloads last month
31
Safetensors
Model size
6B params
Tensor type
I64
·
I32
·
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for NotoriousH2/HyperCLOVAX-SEED-Think-32B-awq-w4a16

Quantized
(3)
this model