Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
danielhanchen 
posted an update 20 days ago

fits almst perfectly into an a6000!

·

Hopefully it runs fast for you! :)

I run it on threadripper 3970x with 256gb system ram and offloading computation layers to a gtx 1660 6gb vram. Using llama.cpp with -nkvo -kvu and all MoE on CPU. With an amazing speed on 14/TpS generation speed using q8_0. I’m amazed

·

Awesome to hear, thanks for trying them out!

awsome!

Too bad that i can't use it, 8gb vram and 24gb ddr5 ram, i might be able to run it at unusable speed with q1 but at that point im better off using glm 4.7 flash

Excited for qwen 3.5 30-40B range coding or general moe model