John Leimgruber III PRO

ubergarm

https://blog.aifoundry.org/p/adventures-in-model-quantization

AI & ML interests

Open LLMs and Astrophotography image processing.

Recent Activity

new activity about 17 hours ago

ubergarm/GLM-4.7-GGUF:Stable run on 2x RTX 5090 and 2 Xeon E5 2696 V4 and DDR4 with ik_llama.cpp - 6.1 t/s on IQ4_K and 5.1 t/s on IQ5_K, opencode works with this

updated a model about 17 hours ago

ubergarm/Qwen3.5-397B-A17B-GGUF

new activity about 17 hours ago

tarruda/Qwen3.5-397B-A17B-GGUF:Great job on this one!

View all activity

Organizations

New activity in ubergarm/GLM-4.7-GGUF about 17 hours ago

Stable run on 2x RTX 5090 and 2 Xeon E5 2696 V4 and DDR4 with ik_llama.cpp - 6.1 t/s on IQ4_K and 5.1 t/s on IQ5_K, opencode works with this

👍 1

#5 opened 3 months ago by

martossien

updated a model about 17 hours ago

ubergarm/Qwen3.5-397B-A17B-GGUF

Text Generation • 396B • Updated about 17 hours ago • 5.12k • 33

New activity in tarruda/Qwen3.5-397B-A17B-GGUF about 17 hours ago

Great job on this one!

#1 opened about 17 hours ago by

ubergarm

liked 2 models about 17 hours ago

tarruda/Qwen3.5-397B-A17B-GGUF

Text Generation • 396B • Updated 3 days ago • 536 • 1

eousphoros/kappa-20b-131k-GGUF

Text Generation • 21B • Updated Mar 1 • 233 • 6

liked a model 2 days ago

bartowski/google_gemma-3-4b-it-GGUF

Image-Text-to-Text • 4B • Updated Mar 22, 2025 • 28.1k • 34

New activity in ubergarm/GLM-5-GGUF 6 days ago

Unreleased

#6 opened 8 days ago by

jpsequeira

New activity in ubergarm/Qwen3-Coder-Next-GGUF 6 days ago

Improving Qwen3 Coder Next 80b performance on ik_llama vs llama.cpp

👍👀 2

#6 opened 27 days ago by

sabotage3d

New activity in sokann/Qwen3.5-27B-GGUF-4.915bpw 9 days ago

Nice work thanks for more ik_llama.cpp quants!

#1 opened 17 days ago by

ubergarm

liked a model 10 days ago

rodrigomt/s2-pro-gguf

Text-to-Speech • 5B • Updated 11 days ago • 5.43k • 27

New activity in rodrigomt/s2-pro-gguf 10 days ago

I created an API server version of s2.cpp

👍 1

#4 opened 11 days ago by

mach9243

New activity in AesSedai/Qwen3.5-397B-A17B-GGUF 10 days ago

IQ2_XS?

🔥 2

#6 opened 20 days ago by

tarruda

New activity in ubergarm/Qwen3.5-27B-GGUF 10 days ago

Insight into the "weird" data.

130

#3 opened about 1 month ago by

espen96

New activity in ubergarm/Qwen3.5-397B-A17B-GGUF 10 days ago

Qwen3.5-397B-A17B-IQ4_KSS on 8 RTX 3090 context 161K tokens load by ik_llama.cpp , test with opencode

🔥 1

#12 opened 11 days ago by

martossien

New activity in tarruda/Qwen3.5-397B-A17B-heretic-smol-IQ2_XS-GGUF 10 days ago

Any chance of IQ2_XXS? IQ2_XS is just slightly too big for Strix Halo.

#2 opened 11 days ago by

Cortex0833

New activity in ubergarm/MiniMax-M2.5-GGUF 10 days ago

ik_llama.cpp version

#11 opened about 2 months ago by

geveent

New activity in ubergarm/Qwen3.5-27B-GGUF 10 days ago

Appraisal

🔥 1

#6 opened 16 days ago by

wonderfuldestruction

New activity in AesSedai/Mistral-Small-4-119B-2603-GGUF 11 days ago

Mistral-Small-4-119B-2603-Q5_K_M on 8 RTX 3090 with ik_llama.cpp ( compil 21 march 2026 )

❤️🔥 3

#1 opened 12 days ago by

martossien

liked a model 11 days ago

fishaudio/s2-pro

Text-to-Speech • 5B • Updated 22 days ago • 25k • 798

New activity in ubergarm/Qwen3.5-122B-A10B-GGUF 12 days ago

How to split this model between 2 (3) GPUs and CPU/RAM ?

#12 opened 15 days ago by

mancub

John Leimgruber III PRO

AI & ML interests

Recent Activity

Organizations

ubergarm's activity

Stable run on 2x RTX 5090 and 2 Xeon E5 2696 V4 and DDR4 with ik_llama.cpp - 6.1 t/s on IQ4_K and 5.1 t/s on IQ5_K, opencode works with this

Great job on this one!

Unreleased

Improving Qwen3 Coder Next 80b performance on ik_llama vs llama.cpp

Nice work thanks for more ik_llama.cpp quants!

I created an API server version of s2.cpp

IQ2_XS?

Insight into the "weird" data.

Qwen3.5-397B-A17B-IQ4_KSS on 8 RTX 3090 context 161K tokens load by ik_llama.cpp , test with opencode

Any chance of IQ2_XXS? IQ2_XS is just slightly too big for Strix Halo.

ik_llama.cpp version

Appraisal

Mistral-Small-4-119B-2603-Q5_K_M on 8 RTX 3090 with ik_llama.cpp ( compil 21 march 2026 )

How to split this model between 2 (3) GPUs and CPU/RAM ?