Models - MoE
updated
Paper
• 2401.04088
• Published
• 160
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
Paper
• 2401.15947
• Published
• 53
MoE-Mamba: Efficient Selective State Space Models with Mixture of
Experts
Paper
• 2401.04081
• Published
• 74
EdgeMoE: Fast On-Device Inference of MoE-based Large Language Models
Paper
• 2308.14352
• Published
HyperFormer: Enhancing Entity and Relation Interaction for
Hyper-Relational Knowledge Graph Completion
Paper
• 2308.06512
• Published
• 2
Experts Weights Averaging: A New General Training Scheme for Vision
Transformers
Paper
• 2308.06093
• Published
• 2
ConstitutionalExperts: Training a Mixture of Principle-based Prompts
Paper
• 2403.04894
• Published
• 2
Video Relationship Detection Using Mixture of Experts
Paper
• 2403.03994
• Published
• 2
CompeteSMoE -- Effective Training of Sparse Mixture of Experts via
Competition
Paper
• 2402.02526
• Published
• 3
GShard: Scaling Giant Models with Conditional Computation and Automatic
Sharding
Paper
• 2006.16668
• Published
• 4
Scaling Vision with Sparse Mixture of Experts
Paper
• 2106.05974
• Published
• 4
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
Paper
• 2402.01739
• Published
• 28
ST-MoE: Designing Stable and Transferable Sparse Expert Models
Paper
• 2202.08906
• Published
• 3
LocMoE: A Low-overhead MoE for Large Language Model Training
Paper
• 2401.13920
• Published
• 2
DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to
Power Next-Generation AI Scale
Paper
• 2201.05596
• Published
• 2
Scaling Laws for Fine-Grained Mixture of Experts
Paper
• 2402.07871
• Published
• 13
Text Generation
• Updated
• 145
• 2.39k
AetherResearch/Cerebrum-1.0-8x7b
Text Generation
• 47B • Updated
• 7
• 77
Text Generation
• Updated
• 865
• 1.19k
Text Generation
• 69.5M • Updated
• 56
• 29
LoneStriker/Mixtral_7Bx5_MoE_30B-8.0bpw-h8-exl2
Text Generation
• Updated
• 1
• 1
LoneStriker/laser-dolphin-mixtral-2x7b-dpo-8.0bpw-h8-exl2
Text Generation
• Updated
• 2
• 4
LoneStriker/Mixtral_7Bx5_MoE_30B-6.0bpw-h6-exl2
Text Generation
• Updated
• 3
• 1
Text Generation
• 9B • Updated
• 3.02k
• 251
mistralai/Mixtral-8x22B-Instruct-v0.1
141B • Updated
• 14.1k
• 746