Shivik 1.7B - Phase 1 (General Knowledge)

Model Description

Shivik Phase 1 is a 1.7B parameter language model trained for reasoning and chain-of-thought (CoT) generation.

  • Base Architecture: Based on Llama 3.2 1B architecture
  • Parameters: ~1.7B
  • Training: Phase 1 - General knowledge foundation (80K samples)
  • Capabilities: General reasoning, basic CoT structure, math, code, and language understanding

Training Details

Phase 1 Training (This Model)

  • Samples: 80,000
  • Data Mix:
    • 50% Web & General Knowledge (Cosmopedia, Tulu-3, PersonaHub, General-Knowledge)
    • 20% Textbooks & Education (TextbookReasoning, GPTscience)
    • 10% Medical & Health (medical-o1-reasoning, medical-QA)
    • 10% Code (Magicoder-OSS, Magicoder-Evol)
    • 5% STEM & Engineering (Electrical-engineering, OpenMathInstruct)
    • 5% Reasoning Basics (reasoning-base-20k, thinker)
  • Epochs: 1
  • Max Length: 1024 tokens
  • Training Method: LoRA fine-tuning (rank 64)

Architecture

  • Hidden Size: 2048
  • Layers: 16
  • Attention Heads: 32 (8 KV heads)
  • Vocabulary: 128,262 tokens (extended with reasoning tokens)
  • Context Length: 131,072 tokens

Model Performance

Evaluation Results

  • Format Score: 6/9
  • Has <think> tags: โœ… Yes
  • Has <answer> tags: โœ… Yes
  • Correct answers: โœ… Yes (tested on math problems)
  • Content generation: โœ… 1500+ chars average
  • Status: โš ๏ธ Missing <step> tags (can be added with better prompting)

Comparison

  • vs Phase 2/3: Phase 1 is the ONLY working model (Phase 2/3 broken)
  • vs Base Model: Significant improvement in reasoning structure
  • Use Case: Best for general Q&A with reasoning, not yet perfect CoT format

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load model
model_id = "abhishek-0122/Shivik-1.7B-Phase1-General"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)

# Format prompt
prompt = '''<|begin_of_text|><|start_header_id|>system<|end_header_id|>

You are Shivik, an advanced reasoning AI. Show your thinking using <think> tags. Break down your reasoning into steps. Provide answers in <answer> tags.
<|eot_id|><|start_header_id|>user<|end_header_id|>

What is 15 ร— 24?
<|eot_id|><|start_header_id|>assistant<|end_header_id|>

'''

# Generate
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_new_tokens=1024,
    temperature=0.7,
    top_p=0.9,
    do_sample=True,
)

response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
print(response)

Expected Output Format

<think>

15 ร— 24 can be broken down step by step.

First, let me use the distributive property:
15 ร— 24 = 15 ร— (20 + 4)
       = (15 ร— 20) + (15 ร— 4)
       = 300 + 60
       = 360

</think>
<answer>
360
</answer>

Recommended Generation Parameters

generation_config = {
    "max_new_tokens": 1024,      # Adjust based on task complexity
    "temperature": 0.7,           # Lower (0.3-0.5) for math, higher (0.7-0.9) for creative
    "top_p": 0.9,
    "repetition_penalty": 1.2,    # Prevents repetition
    "do_sample": True,
}

Limitations

  • โš ๏ธ Incomplete CoT format: Has <think> and <answer> tags, but missing <step> tags
  • โš ๏ธ Not production-ready: This is Phase 1, more training needed for perfect CoT
  • โš ๏ธ Better with prompting: Needs explicit instructions to use step-by-step reasoning
  • โš ๏ธ 1.7B size: Smaller than models like Qwen-3B, may have less knowledge

Recommended Use Cases

โœ… Good for:

  • General Q&A with reasoning structure
  • Math problems with explanation
  • Code explanation
  • Educational content
  • Experimenting with CoT prompting

โŒ Not recommended for:

  • Production CoT applications (wait for Phase 2 distilled)
  • Tasks requiring perfect multi-step format
  • Safety-critical applications

Model Family

This is part of the Shivik model series:

  1. Phase 1 (This Model): General knowledge foundation - WORKING
  2. Phase 2: Long-form CoT training - BROKEN (only outputs tags)
  3. Phase 3: Format refinement - BROKEN (built on broken Phase 2)
  4. Phase 2 Distilled (Upcoming): Fixed with teacher distillation

Future Plans

  • ๐Ÿ”„ Phase 2 Distilled: Training with teacher models (DeepSeek-R1, Qwen-Math, Qwen-Coder)
  • โœจ Phase 3 Refined: Perfect CoT format with <step> and <verify> tags
  • ๐Ÿ“ˆ Larger Models: 2.5B and 3.5B variants
  • ๐Ÿง  GNN Memory: Graph neural network for persistent memory

Citation

@model{shivik-phase1-2025,
  title={Shivik 1.7B Phase 1: General Knowledge Foundation},
  author={Your Name},
  year={2025},
  url={https://huggingface.co/abhishek-0122/Shivik-1.7B-Phase1-General}
}

License

Apache 2.0

Contact

  • Creator: [Your Name/Handle]
  • Project: Shivik - Reasoning-capable small language models

Note: This is an experimental model from an active research project. Phase 1 works but is not production-ready. A distilled version with proper CoT format is in development.

Downloads last month
30
Safetensors
Model size
2B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support