theprint-MoE-8x3-0126
An 18B parameter Mixture of Experts model combining 8 specialized 3B experts, with 2 experts activated per token by default (configurable up to 4 at inference).
Architecture
- Base model: theprint/GeneralChat-Llama3.2-3B (provides shared attention layers)
- Total parameters: ~18B
- Active parameters: ~5B (2 experts) or ~9B (4 experts)
- Gate mode: Hidden (prompt-based router initialization)
Experts
| Expert | Specialization |
|---|---|
| LLM-Data-Science-Llama3.2-3B | Machine learning, neural networks, fine-tuning |
| CreativeWriter-Llama3.2-3B | Fiction writing, story structure, scene development |
| Pythonified-Llama-3.2-3B-Instruct | Python coding, debugging, implementation |
| CogBeTh-Llama3.2-3B | Mental health support, anxiety, stress, self-care |
| ReWiz-Llama-3.2-3B | Step-by-step reasoning, careful analysis |
| ReasonableMath-Llama-3.2-3B-Instruct | Calculation, equations, arithmetic |
| TextSynth-3B | Summarization, text analysis, rewriting |
| PersonalFinance-Llama3.2-3B | Budgeting, investing, financial planning |
Details
All experts are fine-tuned from Llama 3.2 3B Instruct, ensuring architectural compatibility across the MoE. The router was initialized using hidden state representations from domain-specific prompts. Built with mergekit.
Usage
Available quants in the GGUF repo: f16, q8_0, q6_k, q5_k_m, q4_k_m, q4_0, q4_1, iq4_nl, iq4_xs, q3_k_l, q3_k_m, q3_k_s, q2_k
For inference with more than 2 active experts, adjust num_experts_per_tok in your inference backend.
- Downloads last month
- 3