theprint-MoE-8x3-0126

An 18B parameter Mixture of Experts model combining 8 specialized 3B experts, with 2 experts activated per token by default (configurable up to 4 at inference).

Architecture

  • Base model: theprint/GeneralChat-Llama3.2-3B (provides shared attention layers)
  • Total parameters: ~18B
  • Active parameters: ~5B (2 experts) or ~9B (4 experts)
  • Gate mode: Hidden (prompt-based router initialization)

Experts

Expert Specialization
LLM-Data-Science-Llama3.2-3B Machine learning, neural networks, fine-tuning
CreativeWriter-Llama3.2-3B Fiction writing, story structure, scene development
Pythonified-Llama-3.2-3B-Instruct Python coding, debugging, implementation
CogBeTh-Llama3.2-3B Mental health support, anxiety, stress, self-care
ReWiz-Llama-3.2-3B Step-by-step reasoning, careful analysis
ReasonableMath-Llama-3.2-3B-Instruct Calculation, equations, arithmetic
TextSynth-3B Summarization, text analysis, rewriting
PersonalFinance-Llama3.2-3B Budgeting, investing, financial planning

Details

All experts are fine-tuned from Llama 3.2 3B Instruct, ensuring architectural compatibility across the MoE. The router was initialized using hidden state representations from domain-specific prompts. Built with mergekit.

Usage

Available quants in the GGUF repo: f16, q8_0, q6_k, q5_k_m, q4_k_m, q4_0, q4_1, iq4_nl, iq4_xs, q3_k_l, q3_k_m, q3_k_s, q2_k

For inference with more than 2 active experts, adjust num_experts_per_tok in your inference backend.

Downloads last month
3
Safetensors
Model size
18B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for theprint/theprint-moe-8x3-0126

Quantizations
3 models