Ruslan's picture

Ruslan

uzvisa

·

AI & ML interests

None yet

Recent Activity

reacted to eaddario's post with 👍 about 11 hours ago

Experimental global target bits‑per‑weight quantization of Qwen/Qwen3.6-27B and Qwen/Qwen3.6-35B-A3B. Unlike standard llama.cpp quantizations that rely on fixed type heuristics (e.g., Q4_K_M), the Target BPW approach optimizes per-tensor precision where it matters the most, and produces high quality models that meet a precise global file size target. Key Advantages: - VRAM Maximization: Can generate high quality models sized exactly to fit hardware constraints (e.g., fitting the model into exactly 24GB VRAM). - Data-Driven Precision: Quantization mix is determined by actual weight error sensitivity rather than hardcoded rules, often yielding better PPL/KLD size trade-offs. Full benchmarks (PPL, KLD, ARC, GPQA, MMLU, etc.) and methodology in the models' cards. https://huggingface.co/eaddario/Qwen3.6-27B-GGUF https://huggingface.co/eaddario/Qwen3.6-35B-A3B-GGUF

reacted to eaddario's post with 🔥 about 11 hours ago

Experimental global target bits‑per‑weight quantization of Qwen/Qwen3.6-27B and Qwen/Qwen3.6-35B-A3B. Unlike standard llama.cpp quantizations that rely on fixed type heuristics (e.g., Q4_K_M), the Target BPW approach optimizes per-tensor precision where it matters the most, and produces high quality models that meet a precise global file size target. Key Advantages: - VRAM Maximization: Can generate high quality models sized exactly to fit hardware constraints (e.g., fitting the model into exactly 24GB VRAM). - Data-Driven Precision: Quantization mix is determined by actual weight error sensitivity rather than hardcoded rules, often yielding better PPL/KLD size trade-offs. Full benchmarks (PPL, KLD, ARC, GPQA, MMLU, etc.) and methodology in the models' cards. https://huggingface.co/eaddario/Qwen3.6-27B-GGUF https://huggingface.co/eaddario/Qwen3.6-35B-A3B-GGUF

updated a collection 24 days ago

View all activity

Organizations

None yet

uzvisa 's datasets

None public yet