theprint-MoE-8x3-0126

An 18B parameter Mixture of Experts model combining 8 specialized 3B experts, with 2 experts activated per token by default (configurable up to 4 at inference).

Architecture

Base model: theprint/GeneralChat-Llama3.2-3B (provides shared attention layers)
Total parameters: ~18B
Active parameters: ~5B (2 experts) or ~9B (4 experts)
Gate mode: Hidden (prompt-based router initialization)

Experts

Expert	Specialization
LLM-Data-Science-Llama3.2-3B	Machine learning, neural networks, fine-tuning
CreativeWriter-Llama3.2-3B	Fiction writing, story structure, scene development
Pythonified-Llama-3.2-3B-Instruct	Python coding, debugging, implementation
CogBeTh-Llama3.2-3B	Mental health support, anxiety, stress, self-care
ReWiz-Llama-3.2-3B	Step-by-step reasoning, careful analysis
ReasonableMath-Llama-3.2-3B-Instruct	Calculation, equations, arithmetic
TextSynth-3B	Summarization, text analysis, rewriting
PersonalFinance-Llama3.2-3B	Budgeting, investing, financial planning

Details

All experts are fine-tuned from Llama 3.2 3B Instruct, ensuring architectural compatibility across the MoE. The router was initialized using hidden state representations from domain-specific prompts. Built with mergekit.

Usage

Available quants in the GGUF repo: f16, q8_0, q6_k, q5_k_m, q4_k_m, q4_0, q4_1, iq4_nl, iq4_xs, q3_k_l, q3_k_m, q3_k_s, q2_k

For inference with more than 2 active experts, adjust num_experts_per_tok in your inference backend.

Downloads last month: 3

Safetensors

Model size

18B params

Tensor type

BF16

Model tree for theprint/theprint-moe-8x3-0126

Quantizations

3 models