My GPT — Text Generation from Scratch

A 30M-parameter GPT-style transformer built from scratch in PyTorch, trained on Shakespeare + Alpaca + OpenWebText, with a Flask streaming chat interface.

Project Structure

ai-model-by-me/
├── model.py          # GPT architecture (multi-head attention, transformer blocks)
├── tokenizer.py      # BPE tokenizer (GPT-2/tiktoken) + char-level fallback
├── train.py          # Training script (Apple M1/MPS optimized, checkpoint resume)
├── data_loader.py    # Dataset loaders (Shakespeare, Alpaca, OpenWebText, custom)
├── generate.py       # CLI text generation
├── app.py            # Flask streaming chat interface
└── upload_to_hf.py   # Upload to Hugging Face Hub

Setup

conda create -n slm-env python=3.11
conda activate slm-env
pip install torch numpy flask tiktoken datasets huggingface_hub

Step 1 — Train

python train.py --datasets shakespeare,alpaca,openwebtext \
  --max_iters 15000 --batch_size 16 --n_layer 6 --n_head 6 --n_embd 384

Resume from a checkpoint:

python train.py --datasets shakespeare,alpaca,openwebtext \
  --max_iters 15000 --lr 1e-4 --resume

Saves best checkpoint to checkpoints/best_model.pt.

Step 2 — Generate Text (CLI)

python generate.py --prompt "To be or not to be" --max_new_tokens 300

Alpaca instruction-style:

python generate.py --instruction "Write a poem about the sea"

Step 3 — Run Chat Interface

python app.py

Open http://127.0.0.1:5000 in your browser (use incognito if your browser blocks localhost).

Model Architecture

Parameter	Value
Type	GPT (decoder-only transformer)
Tokenizer	BPE — GPT-2 encoding (50,257 vocab)
Layers	6 transformer blocks
Attention heads	6
Embedding dim	384
Context length	256 tokens
Parameters	~30M
Training data	Shakespeare + Alpaca 52K + OpenWebText sample
Best val loss	3.4163

Hardware

Optimized for Apple M1 via PyTorch MPS backend. Falls back to CUDA or CPU automatically.

Upload to Hugging Face

export HF_TOKEN=your_token_here
python upload_to_hf.py --username YOUR_HF_USERNAME --repo_name my-gpt-from-scratch

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support