YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

My GPT β€” Text Generation from Scratch

A 30M-parameter GPT-style transformer built from scratch in PyTorch, trained on Shakespeare + Alpaca + OpenWebText, with a Flask streaming chat interface.

Project Structure

ai-model-by-me/
β”œβ”€β”€ model.py          # GPT architecture (multi-head attention, transformer blocks)
β”œβ”€β”€ tokenizer.py      # BPE tokenizer (GPT-2/tiktoken) + char-level fallback
β”œβ”€β”€ train.py          # Training script (Apple M1/MPS optimized, checkpoint resume)
β”œβ”€β”€ data_loader.py    # Dataset loaders (Shakespeare, Alpaca, OpenWebText, custom)
β”œβ”€β”€ generate.py       # CLI text generation
β”œβ”€β”€ app.py            # Flask streaming chat interface
└── upload_to_hf.py   # Upload to Hugging Face Hub

Setup

conda create -n slm-env python=3.11
conda activate slm-env
pip install torch numpy flask tiktoken datasets huggingface_hub

Step 1 β€” Train

python train.py --datasets shakespeare,alpaca,openwebtext \
  --max_iters 15000 --batch_size 16 --n_layer 6 --n_head 6 --n_embd 384

Resume from a checkpoint:

python train.py --datasets shakespeare,alpaca,openwebtext \
  --max_iters 15000 --lr 1e-4 --resume

Saves best checkpoint to checkpoints/best_model.pt.

Step 2 β€” Generate Text (CLI)

python generate.py --prompt "To be or not to be" --max_new_tokens 300

Alpaca instruction-style:

python generate.py --instruction "Write a poem about the sea"

Step 3 β€” Run Chat Interface

python app.py

Open http://127.0.0.1:5000 in your browser (use incognito if your browser blocks localhost).

Model Architecture

Parameter Value
Type GPT (decoder-only transformer)
Tokenizer BPE β€” GPT-2 encoding (50,257 vocab)
Layers 6 transformer blocks
Attention heads 6
Embedding dim 384
Context length 256 tokens
Parameters ~30M
Training data Shakespeare + Alpaca 52K + OpenWebText sample
Best val loss 3.4163

Hardware

Optimized for Apple M1 via PyTorch MPS backend. Falls back to CUDA or CPU automatically.

Upload to Hugging Face

export HF_TOKEN=your_token_here
python upload_to_hf.py --username YOUR_HF_USERNAME --repo_name my-gpt-from-scratch
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support