Small-scale faithful replicas of the DeepSeek-V4 architecture for ablation and weight-transfer research.
-
kshitijthakkar/deepseek-v4-mini-300M-init
Text Generation • 0.3B • Updated • 39 -
kshitijthakkar/deepseek-v4-mini-1B-init
Text Generation • 1B • Updated • 21 -
kshitijthakkar/deepseek-v4-mini-3B-init
Text Generation • 3B • Updated • 10 -
kshitijthakkar/deepseek-v4-mini-6B-init
Text Generation • 8B • Updated • 45