Running Featured 84 Distilling 100B+ Models 40x Faster with TRL 📝 84 TRL distillation for 100B+ teachers, 40x faster
Marco-MoE Collection A suit of multilingual MoE models with highly-sparse architectures • 5 items • Updated Apr 8 • 17
view article Article Welcome Gemma 4: Frontier multimodal intelligence on device +5 merve, pcuenq, sergiopaniego, burtenshaw, Steveeeeeeen, alvarobartt, SaylorTwift • Apr 2 • 895
Marco-MoE Collection A suit of multilingual MoE models with highly-sparse architectures • 5 items • Updated Apr 8 • 17
Marco-MoE Collection A suit of multilingual MoE models with highly-sparse architectures • 5 items • Updated Apr 8 • 17
Marco-MoE Collection A suit of multilingual MoE models with highly-sparse architectures • 5 items • Updated Apr 8 • 17
Marco-MoE Collection A suit of multilingual MoE models with highly-sparse architectures • 5 items • Updated Apr 8 • 17