YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
My GPT β Text Generation from Scratch
A 30M-parameter GPT-style transformer built from scratch in PyTorch, trained on Shakespeare + Alpaca + OpenWebText, with a Flask streaming chat interface.
Project Structure
ai-model-by-me/
βββ model.py # GPT architecture (multi-head attention, transformer blocks)
βββ tokenizer.py # BPE tokenizer (GPT-2/tiktoken) + char-level fallback
βββ train.py # Training script (Apple M1/MPS optimized, checkpoint resume)
βββ data_loader.py # Dataset loaders (Shakespeare, Alpaca, OpenWebText, custom)
βββ generate.py # CLI text generation
βββ app.py # Flask streaming chat interface
βββ upload_to_hf.py # Upload to Hugging Face Hub
Setup
conda create -n slm-env python=3.11
conda activate slm-env
pip install torch numpy flask tiktoken datasets huggingface_hub
Step 1 β Train
python train.py --datasets shakespeare,alpaca,openwebtext \
--max_iters 15000 --batch_size 16 --n_layer 6 --n_head 6 --n_embd 384
Resume from a checkpoint:
python train.py --datasets shakespeare,alpaca,openwebtext \
--max_iters 15000 --lr 1e-4 --resume
Saves best checkpoint to checkpoints/best_model.pt.
Step 2 β Generate Text (CLI)
python generate.py --prompt "To be or not to be" --max_new_tokens 300
Alpaca instruction-style:
python generate.py --instruction "Write a poem about the sea"
Step 3 β Run Chat Interface
python app.py
Open http://127.0.0.1:5000 in your browser (use incognito if your browser blocks localhost).
Model Architecture
| Parameter | Value |
|---|---|
| Type | GPT (decoder-only transformer) |
| Tokenizer | BPE β GPT-2 encoding (50,257 vocab) |
| Layers | 6 transformer blocks |
| Attention heads | 6 |
| Embedding dim | 384 |
| Context length | 256 tokens |
| Parameters | ~30M |
| Training data | Shakespeare + Alpaca 52K + OpenWebText sample |
| Best val loss | 3.4163 |
Hardware
Optimized for Apple M1 via PyTorch MPS backend. Falls back to CUDA or CPU automatically.
Upload to Hugging Face
export HF_TOKEN=your_token_here
python upload_to_hf.py --username YOUR_HF_USERNAME --repo_name my-gpt-from-scratch
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support