AstraGPT-7B ๐Ÿš€

A 7-Billion Parameter Language Model โ€” Built From Scratch

Custom Architecture ยท Custom BPE Tokenizer ยท Reasoning Fine-Tuned on Dual RTX 4090

License Model Params GPU By

Built by Aditya Wakharkar | Tantra AI Labs


๐Ÿง  What is AstraGPT-7B?

AstraGPT-7B is a 7-billion parameter decoder-only language model designed for coding and chain-of-thought reasoning.

Unlike most open-source fine-tunes, every core component of AstraGPT was designed and implemented from scratch in PyTorch โ€” including the transformer architecture, the BPE tokenizer, and the supervised fine-tuning pipeline.

The model was then fine-tuned on a reasoning dataset using LoRA on a private VPS equipped with dual NVIDIA RTX 4090 GPUs, giving it native support for <think>...</think> style reasoning output.

"Most people fine-tune models. We built one."


๐Ÿ—๏ธ Built From Scratch โ€” Architecture Overview

Every layer of AstraGPT-7B was implemented from first principles in PyTorch. No AutoModel, no copy-paste โ€” pure custom code.

Input Token IDs
      โ”‚
      โ–ผ
Token Embedding  [64,000 โ†’ 4,096]
      โ”‚
      โ–ผ  ร—32 Transformer Blocks
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚           AstraGPT Block            โ”‚
โ”‚                                     โ”‚
โ”‚  RMSNorm (Pre-norm)                 โ”‚
โ”‚  โ†’ Grouped Query Attention (GQA)    โ”‚
โ”‚    ยท 32 Query Heads                 โ”‚
โ”‚    ยท 8 Key-Value Heads              โ”‚
โ”‚    ยท RoPE (ฮธ = 1,000,000)           โ”‚
โ”‚    ยท KV Cache for inference         โ”‚
โ”‚  โ†’ Residual Add                     โ”‚
โ”‚                                     โ”‚
โ”‚  RMSNorm (Pre-norm)                 โ”‚
โ”‚  โ†’ SwiGLU Feed-Forward Network      โ”‚
โ”‚    ยท gate_proj, up_proj, down_proj  โ”‚
โ”‚    ยท intermediate_size = 11,008     โ”‚
โ”‚  โ†’ Residual Add                     โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
      โ”‚
      โ–ผ
Final RMSNorm
      โ”‚
      โ–ผ
LM Head  [4,096 โ†’ 64,000]
      โ”‚
      โ–ผ
Logits โ†’ Next Token

Architecture Highlights

Component Implementation Why
Grouped Query Attention (GQA) 32Q / 8KV heads โ€” built from scratch 4ร— less KV memory vs MHA. Same used in LLaMA-3, Mistral
Rotary Position Embeddings (RoPE) Full RoPE math from scratch, ฮธ=1M Better long-context vs learned embeddings
SwiGLU FFN gate ร— SiLU(up) through down_proj Outperforms GELU/ReLU on LM benchmarks
RMSNorm Pre-norm, no bias, no mean subtraction ~30% faster than LayerNorm
Flash Attention PyTorch 2.0 scaled_dot_product_attention Memory-efficient attention with O(n) space

Parameter Count (~7B)

Component Parameters
Token Embedding (64K ร— 4096) ~262M
Attention ร— 32 layers ~2.15B
SwiGLU FFN ร— 32 layers ~4.32B
RMSNorm ร— 65 ~267K
LM Head ~262M
Total ~7.0B

๐Ÿ”ค Custom BPE Tokenizer โ€” From Scratch

AstraGPT uses a custom Byte Pair Encoding tokenizer built entirely from scratch โ€” no SentencePiece, no HuggingFace tokenizers library.

# Built from scratch
from tokenizer import BPETokenizer

tok = BPETokenizer(vocab_size=64_000)
tok.train(open("corpus.txt"), num_merges=60_000)

Tokenizer features:

  • Byte-level base vocabulary โ€” 256 raw bytes, handles any Unicode
  • GPT-4 style pre-tokenization regex โ€” smart word boundary splitting
  • 64,000 vocab size โ€” 60K BPE merges on top of byte base
  • Built-in special tokens: <think>, </think>, <|im_start|>, <|im_end|>, BOS, EOS, PAD
  • apply_chat_template() โ€” custom chat format support
  • Save/load โ€” JSON-serializable merge rules

โšก Training โ€” Dual RTX 4090 on Private VPS

Fine-tuning was performed on a private Linux VPS with 2ร— NVIDIA RTX 4090 GPUs (total 48GB VRAM).

Hardware Setup

Spec Value
GPUs 2ร— NVIDIA RTX 4090 (24GB VRAM each)
Total VRAM 48 GB
CPU High-core count server CPU
Infrastructure Private VPS (bare metal)
OS Ubuntu 22.04 LTS
CUDA 12.x

Training Pipeline โ€” Also Built From Scratch

The SFT (Supervised Fine-Tuning) training loop was implemented from scratch with production-grade features:

# Full custom training loop
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    dataset=dataset,
    # Dual GPU via DDP
    use_bf16=True,
    grad_accumulation=8,
    learning_rate=2e-4,
    use_wandb=True,
)
trainer.train()

Training loop features:

  • โœ… Gradient accumulation โ€” effective large batch training
  • โœ… Mixed precision (BF16) โ€” full RTX 4090 tensor core utilization
  • โœ… Cosine LR schedule with warmup โ€” smooth convergence
  • โœ… Gradient clipping โ€” stable training
  • โœ… W&B logging โ€” real-time loss/LR tracking
  • โœ… Checkpoint saving โ€” best model tracking by loss

Fine-Tuning Hyperparameters

Parameter Value
Method LoRA (PEFT) via Unsloth
LoRA Rank 16
LoRA Alpha 32
Target Modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Max Sequence Length 2,048 tokens
Effective Batch Size 16 (2 ร— grad_accum 8)
Learning Rate 2e-4
LR Scheduler Cosine with warmup
Warmup Ratio 5%
Epochs 3
Precision BF16 mixed precision
Optimizer AdamW 8-bit

Post-Training

After fine-tuning, the LoRA adapter was merged back into base model weights โ€” resulting in a single, self-contained model with no external adapter dependency.


๐Ÿค” Thinking / Reasoning Support

AstraGPT-7B natively generates <think> tag reasoning when triggered. This was trained in via the fine-tuning dataset, which used structured chain-of-thought formatting.

Example:

Input:

What is 15 * 47?

Output:

<think>
The multiplication involves multiplying 15 by 47.
  15 ร— 47 = 15 ร— 40 + 15 ร— 7
          = 600 + 105
          = 705
</think>
705

Trigger thinking mode:

# Append this to your prompt to force reasoning
prompt = tokenizer.apply_chat_template(messages, ...) + "<think>\n"

โšก Quick Start

Install

pip install transformers torch bitsandbytes accelerate

Basic Inference

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "adityawakharkar/AstraGPT-7B"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    device_map="auto"
)

messages = [
    {
        "role": "system",
        "content": "You are AstraGPT, a helpful coding AI built by Tantra AI Labs. Think carefully using <think>...</think> tags before answering."
    },
    {
        "role": "user",
        "content": "Write a Python function to reverse a linked list."
    }
]

prompt = tokenizer.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
) + "<think>\n"   # โ† triggers reasoning

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    output = model.generate(
        **inputs,
        max_new_tokens=1024,
        temperature=0.3,
        do_sample=True,
        repetition_penalty=1.1,
        pad_token_id=tokenizer.eos_token_id,
    )

response = tokenizer.decode(
    output[0][inputs["input_ids"].shape[1]:],
    skip_special_tokens=True
)
print(response)

4-bit Quantized (Runs on ~6GB VRAM)

from transformers import BitsAndBytesConfig

bnb = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True,
)

model = AutoModelForCausalLM.from_pretrained(
    "adityawakharkar/AstraGPT-7B",
    quantization_config=bnb,
    device_map="auto"
)

๐Ÿ“ Codebase

The full from-scratch implementation is open-source:

AstraGPT-7B-scratch/
โ”œโ”€โ”€ model/
โ”‚   โ”œโ”€โ”€ config.py              โ† AstraGPTConfig (7B hyperparams, 1B/3B presets)
โ”‚   โ”œโ”€โ”€ rotary_embedding.py    โ† RoPE from scratch (precompute + apply)
โ”‚   โ”œโ”€โ”€ attention.py           โ† GQA from scratch (32Q / 8KV + KV cache)
โ”‚   โ”œโ”€โ”€ feedforward.py         โ† SwiGLU + RMSNorm + TransformerBlock
โ”‚   โ””โ”€โ”€ transformer.py         โ† Full model + generate() + save/load
โ”œโ”€โ”€ tokenizer/
โ”‚   โ”œโ”€โ”€ bpe_tokenizer.py       โ† Full BPE tokenizer (train, encode, decode)
โ”‚   โ””โ”€โ”€ train_tokenizer.py     โ† Train on any text corpus
โ””โ”€โ”€ training/
    โ””โ”€โ”€ sft_trainer.py         โ† Complete SFT loop (grad accum, bf16, cosine LR)

Bias, Risks, and Limitations

  • Hallucination: Can produce confident but incorrect answers โ€” always verify
  • Math limits: Complex multi-step math may fail โ€” 7B is a small model
  • English-primary: Best performance in English
  • Reasoning trigger: <think> tags work most reliably with explicit <think>\n prefix in prompt

Environmental Impact

  • Hardware: 2ร— NVIDIA RTX 4090 (48GB combined VRAM)
  • Infrastructure: Private bare-metal VPS
  • Training Duration: ~3โ€“4 hours
  • Carbon Emitted: Estimated ~2โ€“3 kgCO2eq

Citation

@misc{astragpt7b2026,
  author       = {Aditya Wakharkar},
  title        = {AstraGPT-7B: A 7B LLM Built From Scratch with Chain-of-Thought Reasoning},
  year         = {2026},
  publisher    = {HuggingFace},
  organization = {Tantra AI Labs},
  url          = {https://huggingface.co/adityawakharkar/AstraGPT-7B},
  note         = {Custom architecture, custom BPE tokenizer, trained on 2ร— RTX 4090}
}

Model Card Authors

Aditya Wakharkar โ€” @adityawakharkar | GitHub @codewith-aditya

Contact


Built from scratch with โค๏ธ by Tantra AI Labs
Every layer. Every weight. Every line of code.
Downloads last month
339
Safetensors
Model size
8B params
Tensor type
F16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for adityawakharkar/AstraGPTCoder-7B

Unable to build the model tree, the base model loops to the model itself. Learn more.