view post Post 99 I'm unemployed, I have a gaming GPU, and I just published a German LLM.qwen3-0.6b-german - fine-tuned Qwen3-0.6B in ~40h on an RTX 4070 Ti, using the exact same instruct datasets as the LLäMmlein paper (ACL 2025).HellaSwag-DE: 0.3111 → 0.3193 ✅ARC-DE: 0.2352 → 0.2575 ✅MMlu-DE: 0.3600 → 0.2475 🔻 (alignment tax - known trade-off)Instruction fine-tuning trades some factual breadth for better reasoning and format following. The model is more useful, even if not better on every metric.Weights, LoRA adapter, full training script and logs all public. philipp-zettl/qwen3-0.6b-germanIt ain't much, but it's honest work. See translation
CV datasets LSIbabnikz/lfw Viewer • Updated Dec 10, 2025 • 13.2k • 735 • 1 SaffalPoosh/casia_web_face Viewer • Updated May 15, 2025 • 491k • 214 • 1
Diffusion Language Models Experimental diffusion-style MLM built on top of ModernBERT. Inspired by https://nathan.rs/posts/roberta-diffusion/ philipp-zettl/modernbert-diffusion-instruct Fill-Mask • 0.1B • Updated Feb 6 philipp-zettl/modernbert-diffusion-code Fill-Mask • 0.1B • Updated Feb 7 philipp-zettl/modernbert-diffusion-universal Fill-Mask • 0.1B • Updated Feb 16 philipp-zettl/modernbert-diffusion-alpaca-ft Fill-Mask • 0.1B • Updated Feb 11
CV datasets LSIbabnikz/lfw Viewer • Updated Dec 10, 2025 • 13.2k • 735 • 1 SaffalPoosh/casia_web_face Viewer • Updated May 15, 2025 • 491k • 214 • 1
Diffusion Language Models Experimental diffusion-style MLM built on top of ModernBERT. Inspired by https://nathan.rs/posts/roberta-diffusion/ philipp-zettl/modernbert-diffusion-instruct Fill-Mask • 0.1B • Updated Feb 6 philipp-zettl/modernbert-diffusion-code Fill-Mask • 0.1B • Updated Feb 7 philipp-zettl/modernbert-diffusion-universal Fill-Mask • 0.1B • Updated Feb 16 philipp-zettl/modernbert-diffusion-alpaca-ft Fill-Mask • 0.1B • Updated Feb 11