micro-gpt-anomaly-detector

A zero-dependency, pure-Python GPT transformer trained to detect anomalous filenames via character-level language modeling. Built on Andrej Karpathy's micrograd / makemore philosophy — the entire algorithm in a single file, no PyTorch, no NumPy.

How It Works

The model learns the "grammar" of valid filenames from a training set (filename.txt). At inference time, it scores any filename by computing its negative log-likelihood (NLL) — how "surprised" the model is by each character given the preceding context.

Low NLL = filename fits the learned naming convention (normal)
High NLL = filename deviates from the pattern (anomalous)

Architecture

Component	Details
Model	GPT-2 style (RMSNorm, no biases, ReLU)
Embedding dim	32
Attention heads	4
Layers	2
Max sequence length	96 characters
Tokenizer	Character-level
Autograd	Scalar-valued (each Value wraps a single float)
Training	Adam optimizer with early stopping

Files

File	Description
`filename_anomaly_detector.py`	Training script — autograd engine, model, training loop with early stopping, anomaly scoring
`test_model.py`	Inference-only test script — loads `model.json` and scores filenames
`model.json`	Pre-trained weights (can be regenerated by running the training script)
`filename.txt`	Training data — 88 filenames following a naming convention
`training_loss.svg`	Train vs validation loss curve from the last training run
`main.py`	Original Karpathy GPT (trains on baby names dataset, for reference)
`input.txt`	Names dataset used by `main.py`

Quick Start: Using the Pre-trained Model

No dependencies required — just Python 3.6+.

Run the included tests

python test_model.py

Expected output:

Model loaded: 32d, 4h, 2L, vocab=45

--- Normal filenames (should have LOW NLL) ---
  NLL   16.93  |  acr_banner_spring25_enUS_v01.png
  NLL   18.76  |  acr_email_bf24_enGB_v02.jpg
  NLL   18.65  |  acr_video_demo_enUS_v01.mp4
  NLL   23.63  |  acr_logo_primary_enUS_v03.svg
  NLL   53.87  |  acr_report_fy24q4_enUS_v01.pdf

--- Anomalous filenames (should have HIGH NLL) ---
  NLL  101.55  |  DELETE_THIS_NOW.exe
  NLL  118.09  |  ..hidden_config.bat
  NLL  164.49  |  photo_2024_vacation_IMG_3847.HEIC
  NLL  168.67  |  meeting notes final FINAL v2 (1).docx
  NLL   68.93  |  acr banner spring enUS v01.png

Use in your own code

import json, math

# 1. Load the model
with open('model.json') as f:
    payload = json.load(f)

hp = payload['hyperparams']
n_embd, n_head, n_layer = hp['n_embd'], hp['n_head'], hp['n_layer']
block_size, head_dim = hp['block_size'], hp['head_dim']
uchars = payload['vocab']
vocab_size = payload['vocab_size']
weights = payload['weights']
BOS = vocab_size - 1
stoi = {ch: i for i, ch in enumerate(uchars)}

# 2. Define the forward pass (float-only, no autograd needed)
def linear(x, w):
    return [sum(wi * xi for wi, xi in zip(wo, x)) for wo in w]

def rmsnorm(x):
    ms = sum(xi * xi for xi in x) / len(x)
    return [xi * (ms + 1e-5) ** -0.5 for xi in x]

def softmax(logits):
    m = max(logits)
    exps = [math.exp(v - m) for v in logits]
    s = sum(exps)
    return [e / s for e in exps]

def gpt_forward(token_id, pos_id, keys, values):
    x = [t + p for t, p in zip(weights['wte'][token_id], weights['wpe'][pos_id])]
    x = rmsnorm(x)
    for li in range(n_layer):
        x_res = x
        x = rmsnorm(x)
        q = linear(x, weights[f'layer{li}.attn_wq'])
        k = linear(x, weights[f'layer{li}.attn_wk'])
        v = linear(x, weights[f'layer{li}.attn_wv'])
        keys[li].append(k); values[li].append(v)
        x_attn = []
        for h in range(n_head):
            hs = h * head_dim
            q_h = q[hs:hs+head_dim]
            k_h = [ki[hs:hs+head_dim] for ki in keys[li]]
            v_h = [vi[hs:hs+head_dim] for vi in values[li]]
            attn = [sum(q_h[j]*k_h[t][j] for j in range(head_dim)) / head_dim**0.5
                    for t in range(len(k_h))]
            aw = softmax(attn)
            x_attn.extend([sum(aw[t]*v_h[t][j] for t in range(len(v_h)))
                           for j in range(head_dim)])
        x = linear(x_attn, weights[f'layer{li}.attn_wo'])
        x = [a + b for a, b in zip(x, x_res)]
        x_res = x
        x = rmsnorm(x)
        x = [max(0, xi) for xi in linear(x, weights[f'layer{li}.mlp_fc1'])]
        x = linear(x, weights[f'layer{li}.mlp_fc2'])
        x = [a + b for a, b in zip(x, x_res)]
    return linear(x, weights['lm_head'])

# 3. Score a filename
def score_filename(name):
    """Returns NLL (lower = more normal, higher = more anomalous)."""
    toks = [BOS] + [stoi[c] for c in name if c in stoi] + [BOS]
    keys = [[] for _ in range(n_layer)]
    vals = [[] for _ in range(n_layer)]
    nll = 0.0
    for pos in range(len(toks) - 1):
        probs = softmax(gpt_forward(toks[pos], pos, keys, vals))
        p = probs[toks[pos + 1]]
        nll += -math.log(p) if p > 0 else 1e6
    return nll

# 4. Use it
nll = score_filename("acr_banner_spring25_enUS_v01.png")
print(f"NLL: {nll:.2f}")  # Low NLL = normal

nll = score_filename("DELETE_THIS_NOW.exe")
print(f"NLL: {nll:.2f}")  # High NLL = anomalous

Setting a threshold

A simple approach: score your known-good filenames and use the 95th percentile NLL as a threshold.

known_good = [line.strip() for line in open('filename.txt')]
scores = [score_filename(fn) for fn in known_good]
scores.sort()
threshold = scores[int(len(scores) * 0.95)]
print(f"Threshold: {threshold:.2f}")

# Flag anomalies
test_file = "suspicious_file.exe"
nll = score_filename(test_file)
print(f"{'ANOMALY' if nll > threshold else 'NORMAL'}: {test_file} (NLL={nll:.2f})")

Retraining

To train on your own filename convention:

Replace filename.txt with your filenames (one per line)
Delete model.json (so the script trains from scratch)
Run: python filename_anomaly_detector.py

Training uses early stopping (patience=5, checked every 50 steps) and saves the best model weights automatically. A live training_loss.svg plot is updated during training.

Note: Training uses scalar autograd (every multiply/add creates a Value node), so it's slow by design — this is an educational implementation. For production use, port the forward pass to NumPy/PyTorch.

Credits

Based on Andrej Karpathy's minimal GPT implementation.

Downloads last month: -; Downloads are not tracked for this model. How to track