Shivik 1.7B - Phase 1 (General Knowledge)
Model Description
Shivik Phase 1 is a 1.7B parameter language model trained for reasoning and chain-of-thought (CoT) generation.
- Base Architecture: Based on Llama 3.2 1B architecture
- Parameters: ~1.7B
- Training: Phase 1 - General knowledge foundation (80K samples)
- Capabilities: General reasoning, basic CoT structure, math, code, and language understanding
Training Details
Phase 1 Training (This Model)
- Samples: 80,000
- Data Mix:
- 50% Web & General Knowledge (Cosmopedia, Tulu-3, PersonaHub, General-Knowledge)
- 20% Textbooks & Education (TextbookReasoning, GPTscience)
- 10% Medical & Health (medical-o1-reasoning, medical-QA)
- 10% Code (Magicoder-OSS, Magicoder-Evol)
- 5% STEM & Engineering (Electrical-engineering, OpenMathInstruct)
- 5% Reasoning Basics (reasoning-base-20k, thinker)
- Epochs: 1
- Max Length: 1024 tokens
- Training Method: LoRA fine-tuning (rank 64)
Architecture
- Hidden Size: 2048
- Layers: 16
- Attention Heads: 32 (8 KV heads)
- Vocabulary: 128,262 tokens (extended with reasoning tokens)
- Context Length: 131,072 tokens
Model Performance
Evaluation Results
- Format Score: 6/9
- Has
<think>tags: โ Yes - Has
<answer>tags: โ Yes - Correct answers: โ Yes (tested on math problems)
- Content generation: โ 1500+ chars average
- Status: โ ๏ธ Missing
<step>tags (can be added with better prompting)
Comparison
- vs Phase 2/3: Phase 1 is the ONLY working model (Phase 2/3 broken)
- vs Base Model: Significant improvement in reasoning structure
- Use Case: Best for general Q&A with reasoning, not yet perfect CoT format
Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
# Load model
model_id = "abhishek-0122/Shivik-1.7B-Phase1-General"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
)
# Format prompt
prompt = '''<|begin_of_text|><|start_header_id|>system<|end_header_id|>
You are Shivik, an advanced reasoning AI. Show your thinking using <think> tags. Break down your reasoning into steps. Provide answers in <answer> tags.
<|eot_id|><|start_header_id|>user<|end_header_id|>
What is 15 ร 24?
<|eot_id|><|start_header_id|>assistant<|end_header_id|>
'''
# Generate
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=1024,
temperature=0.7,
top_p=0.9,
do_sample=True,
)
response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
print(response)
Expected Output Format
<think>
15 ร 24 can be broken down step by step.
First, let me use the distributive property:
15 ร 24 = 15 ร (20 + 4)
= (15 ร 20) + (15 ร 4)
= 300 + 60
= 360
</think>
<answer>
360
</answer>
Recommended Generation Parameters
generation_config = {
"max_new_tokens": 1024, # Adjust based on task complexity
"temperature": 0.7, # Lower (0.3-0.5) for math, higher (0.7-0.9) for creative
"top_p": 0.9,
"repetition_penalty": 1.2, # Prevents repetition
"do_sample": True,
}
Limitations
- โ ๏ธ Incomplete CoT format: Has
<think>and<answer>tags, but missing<step>tags - โ ๏ธ Not production-ready: This is Phase 1, more training needed for perfect CoT
- โ ๏ธ Better with prompting: Needs explicit instructions to use step-by-step reasoning
- โ ๏ธ 1.7B size: Smaller than models like Qwen-3B, may have less knowledge
Recommended Use Cases
โ Good for:
- General Q&A with reasoning structure
- Math problems with explanation
- Code explanation
- Educational content
- Experimenting with CoT prompting
โ Not recommended for:
- Production CoT applications (wait for Phase 2 distilled)
- Tasks requiring perfect multi-step format
- Safety-critical applications
Model Family
This is part of the Shivik model series:
- Phase 1 (This Model): General knowledge foundation - WORKING
- Phase 2: Long-form CoT training - BROKEN (only outputs tags)
- Phase 3: Format refinement - BROKEN (built on broken Phase 2)
- Phase 2 Distilled (Upcoming): Fixed with teacher distillation
Future Plans
- ๐ Phase 2 Distilled: Training with teacher models (DeepSeek-R1, Qwen-Math, Qwen-Coder)
- โจ Phase 3 Refined: Perfect CoT format with
<step>and<verify>tags - ๐ Larger Models: 2.5B and 3.5B variants
- ๐ง GNN Memory: Graph neural network for persistent memory
Citation
@model{shivik-phase1-2025,
title={Shivik 1.7B Phase 1: General Knowledge Foundation},
author={Your Name},
year={2025},
url={https://huggingface.co/abhishek-0122/Shivik-1.7B-Phase1-General}
}
License
Apache 2.0
Contact
- Creator: [Your Name/Handle]
- Project: Shivik - Reasoning-capable small language models
Note: This is an experimental model from an active research project. Phase 1 works but is not production-ready. A distilled version with proper CoT format is in development.
- Downloads last month
- 30