Shunya-0.5B-Instruct
Shunya-0.5B-Instruct is an instruction-tuned version of Shunya-0.5B-Base, fine-tuned using ORPO on a curated preference dataset mix. It supports multi-turn chat, system prompts, and tool calling via a custom chat template.
Model Details
| Field | Value |
|---|---|
| Architecture | LlamaForCausalLM |
| Parameters | ~503M |
| Hidden size | 1,280 |
| Intermediate size | 4,864 |
| Layers | 20 |
| Attention heads | 10 |
| KV heads (GQA) | 2 |
| Head dim | 128 |
| Vocab size | 40,008 |
| Context window | 32,768 tokens |
| Positional encoding | RoPE (θ=10,000, 4× linear scaling) |
| Normalization | RMSNorm (ε=1e-6) |
| Activation | SiLU |
| Dtype | bfloat16 |
| Tied embeddings | Yes |
Special tokens: <|system|>, <|user|>, <|assistant|>, <|endofturn|>, <tool_call>, </tool_call>, <tool_response>, </tool_response>
Training
- Base model:
vivekmarakana/shunya-0.5b-base - Method: SFT (Supervised Fine Tuning) & ORPO (Odds Ratio Preference Optimization)
- Fine-tuning datasets:
HuggingFaceTB/smoltalk2NousResearch/hermes-function-calling-v1lmsys/lmsys-chat-1margilla/ultrafeedback-multi-binarized-preferences-cleanedmlabonne/orpo-dpo-mix-40k
- License: MIT
Usage
Chat
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
tokenizer = AutoTokenizer.from_pretrained("vivekmarakana/shunya-0.5b-instruct")
model = AutoModelForCausalLM.from_pretrained(
"vivekmarakana/shunya-0.5b-instruct",
torch_dtype=torch.bfloat16,
device_map="auto",
)
messages = [{"role": "user", "content": "Explain the difference between supervised and unsupervised learning."}]
inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
outputs = model.generate(inputs, max_new_tokens=256, do_sample=True, temperature=0.7)
print(tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True))
Tool Calling
The model supports structured tool calls using <tool_call> / <tool_response> tags. Pass a list of tools to apply_chat_template:
tools = [
{
"name": "get_weather",
"description": "Get current weather for a location.",
"parameters": {
"type": "object",
"properties": {"location": {"type": "string"}},
"required": ["location"]
}
}
]
messages = [{"role": "user", "content": "What's the weather in Mumbai?"}]
inputs = tokenizer.apply_chat_template(
messages, tools=tools, add_generation_prompt=True, return_tensors="pt"
).to(model.device)
outputs = model.generate(inputs, max_new_tokens=128)
print(tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=False))
Benchmarks
All evaluations were run with lm-evaluation-harness. Results use normalized accuracy (acc_norm) for completion tasks (ARC, PIQA, HellaSwag) and acc for classification tasks.
| Benchmark | Shots | Metric | Shunya-0.5B-Instruct | Qwen3-0.6B-Instruct | Gemma3-1B-IT |
|---|---|---|---|---|---|
| ARC-Challenge | 25 | acc_norm | 25.17 | 30.12 | 38.23 |
| ARC-Easy | 0 | acc_norm | 40.49 | 34.68 | 47.60 |
| HellaSwag | 10 | acc_norm | 34.94 | 38.42 | 41.22 |
| MMLU | 5 | acc | 24.38 | 22.95 | 29.08 |
| WinoGrande | 0 | acc | 52.49 | 53.51 | 55.25 |
| BoolQ | 0 | acc | 48.04 | 37.83 | 74.19 |
| PIQA | 0 | acc_norm | 64.85 | 64.96 | 68.88 |
| Social IQA | 0 | acc | 37.46 | 37.21 | 42.43 |
| GPQA Main | 5 | acc | 24.33 | 21.43 | 25.45 |
| GPQA Diamond | 5 | acc | 28.79 | 20.20 | 26.77 |
| AGIEval EN | 5 | acc | 16.80 | 17.68 | 18.43 |
Qwen3-0.6B-Instruct and Gemma3-1B-IT are included as reference points — both have more parameters and/or larger training budgets.
Notable results:
- On GPQA Diamond (graduate-level science), Shunya-0.5B-Instruct (28.79%) outperforms both Qwen3-0.6B-Instruct (20.20%) and Gemma3-1B-IT (26.77%).
- On ARC-Easy and BoolQ, Shunya-0.5B-Instruct outperforms Qwen3-0.6B-Instruct.
License
- Downloads last month
- 20
Model tree for vivekmarakana/shunya-0.5b-instruct
Datasets used to train vivekmarakana/shunya-0.5b-instruct
Evaluation results
- Accuracy, Normalized (25-shot) on ARC Challengeself-reported0.252
- Accuracy, Normalized (0-shot) on ARC Easyself-reported0.405
- Accuracy, Normalized (10-shot) on HellaSwagself-reported0.349
- Accuracy (5-shot) on MMLUself-reported0.244
- Accuracy (0-shot) on WinoGrandeself-reported0.525
- Accuracy (0-shot) on BoolQself-reported0.480
- Accuracy, Normalized (0-shot) on PIQAself-reported0.648
- Accuracy (0-shot) on Social IQaself-reported0.375
- Accuracy (5-shot) on GPQA Mainself-reported0.243
- Accuracy (5-shot) on GPQA Diamondself-reported0.288
- Accuracy (5-shot) on AGIEval Englishself-reported0.168