kn1ght-bullet
A 4.3M parameter GPT trained to play chess by next-token prediction on PGN notation. Intended for use in chess tutoring applications via constrained decoding at inference time.
bullet refers to the model's size tier — small and fast, in the same spirit as chess time controls.
Model Details
| Architecture | GPT (4 layers, 4 heads, 256 embedding dim) |
| Parameters | 4.3M |
| Context length | 256 tokens |
| Vocabulary | 4,096 BPE tokens (chess PGN–specific) |
| Training format | PGN text ([g_start]1.e4 e5 2.Nf3 ...) |
The tokenizer is a BPE vocabulary built specifically for chess PGN notation, where most
moves (e4, Nf3, O-O, cxd5) encode as single tokens. This keeps inference fast and
makes constrained decoding straightforward — legal move masking is a one-step operation
for the large majority of positions.
Training Pipeline
Training proceeded in three phases. Pre-training used
InterwebAlchemy/pgn-dataset-including-special-tokens
(~3.5M games, average ELO ~2240, spanning 1783–2006), derived from the base
InterwebAlchemy/pgn-dataset
which adds [g_start] / [g_end] game boundary tokens.
Phase 1 — Pre-training 200,000 steps on 100,000 games. The model learns PGN structure and develops opening pattern recognition across a wide range of named lines.
Phase 2 — Legality-Filtered SFT (5 rounds × 5,000 steps) A self-improvement loop: generate continuations from named opening prompts, filter to legally-valid games, mix with HuggingFace anchor games to prevent forgetting, and fine-tune. Repeated five times, growing the legal training set from 67 games (9.1% pass rate) to 796 games (67.5% pass rate).
Phase 3 — DPO (300 steps) Stockfish-generated preference pairs (771 chosen/rejected pairs from 783 positions) rank legal moves by quality. Val reward accuracy: 0.885. SFT loss remains stable throughout, confirming the model retains PGN structure.
Evaluation
Evaluated against chess-specialist models and frontier LLMs on three tasks.
- kn1ght models use the custom 4,096-token chess BPE tokenizer with a
[g_start]game-start prefix. - HuggingFace specialist models (chessgpt-base-v1, chesspythia-70m) use their own model-specific tokenizers, loaded automatically via the HuggingFace pipeline. Input is raw PGN text with no special prefix.
- Frontier LLMs receive raw PGN prompts via the OpenRouter API; completion models
(gpt-3.5-turbo-instruct, gpt-oss-20b) get a bare PGN string, chat models get a
short system prompt (
"You play chess. Reply with only the next move in SAN notation.").
Phase B — Opening play (50 positions × 10 generations)
Centipawn loss (CPL) measures how much worse a model's move is compared to Stockfish's best move at depth 15. Lower is better.
| Model | Params | Mean CPL ↓ | Legality | Blunder % |
|---|---|---|---|---|
| Gemini 3.1 Flash Lite | ~8B | 2.58 | 100% | 0.0% |
| chessgpt-base-v1 | ~85M | 4.92 | 99.6% | 0.2% |
| gpt-3.5-turbo-instruct | ~175B | 5.79 | 99.4% | 0.0% |
| kn1ght-bullet (this model) | 4.3M | 5.83 | 99.8% | 0.0% |
| DeepSeek V3 | ~685B | 8.18 | 86.0% | 0.4% |
kn1ght-bullet matches gpt-3.5-turbo-instruct (a ~175B parameter frontier model) in mean CPL while being 40,000× smaller. Performance is strongest in Sicilian and Ruy Lopez variations well-represented in the training data, and weaker in less-common openings such as the Benoni and Colle System.
Phase C' — PGN puzzle accuracy (20 puzzles × 10 generations)
Puzzles are drawn from the Lichess Open Puzzle Database (ratings 1201–1895, mean 1551), presented as full PGN game context up to the puzzle position.
| Model | Top-1 Accuracy | Legality |
|---|---|---|
| Gemini 3.1 Flash Lite | 49% | 98% |
| chessgpt-base-v1 | 34% | 97% |
| gpt-3.5-turbo-instruct | 26% | 63% |
| kn1ght-bullet | 10% | 58% |
| DeepSeek V3 | 12% | 62% |
Tactical puzzle accuracy is constrained by model capacity at this scale. With constrained decoding at inference time, the model selects the highest-ranked legal move — puzzle accuracy is less relevant to the tutoring use case than opening-play CPL.
Phase C — FEN puzzle accuracy
kn1ght-bullet scores 0% on FEN-format puzzles, as expected. FEN notation was never present in the training data; feeding FEN to the model produces arbitrary output. This is a known and intentional limitation of PGN-only training.
Usage
With transformers.js (browser / Node.js)
The primary intended runtime. Use onnx/model_quantized.onnx (5.7 MB) for browser
delivery; onnx/model.onnx (21.6 MB) for full-precision inference.
import { pipeline } from "@xenova/transformers";
const generator = await pipeline("text-generation", "InterwebAlchemy/kn1ght-bullet");
const result = await generator("[g_start]1.e4 e5 2.Nf3 Nc6 3.Bb5", {
max_new_tokens: 10,
do_sample: true,
temperature: 0.8,
top_k: 40,
});
Constrained decoding is strongly recommended in production. At each move step,
mask the logits to only the token IDs of legal moves (from chess.js) before
sampling. This guarantees legal play and lets the model's probability distribution
over legal moves act as an opening-quality signal.
// Build per-position allowlist once, not inside the generation loop
const legalMoves = chess.moves();
const allowedIds = new Set(legalMoves.flatMap((san) => tokenizer.encode(san).input_ids));
function maskLogits(logits) {
for (let i = 0; i < logits.length; i++) {
if (!allowedIds.has(i)) logits[i] = -Infinity;
}
return logits;
}
With Python (PyTorch)
import torch
from tokenizers import Tokenizer
# Load the tokenizer
tokenizer = Tokenizer.from_pretrained("InterwebAlchemy/kn1ght-bullet")
# Load via ONNX (recommended)
import onnxruntime as ort
session = ort.InferenceSession("onnx/model.onnx")
pgn = "[g_start]1.e4 e5 2.Nf3 Nc6 3.Bb5"
input_ids = tokenizer.encode(pgn).ids
logits = session.run(["logits"], {"input_ids": [input_ids]})[0]
next_token = logits[0, -1].argmax()
print(tokenizer.decode([next_token]))
Limitations
- PGN-only: Cannot parse FEN notation. Positions must be provided as PGN move sequences.
- Opening-focused: Training data emphasises the opening phase. Middlegame and endgame play degrades without constrained decoding.
- 256-token context: Long games approaching move 60+ may exceed the context window.
- Not a chess engine: Does not perform search or lookahead. Move quality reflects learned opening patterns, not calculation.
Files
| File | Description |
|---|---|
onnx/model.onnx |
Full-precision ONNX (21.6 MB) |
onnx/model_quantized.onnx |
Int8 quantized ONNX (5.7 MB) — recommended for browser |
tokenizer.json |
BPE tokenizer, loadable by transformers.js and HF tokenizers |
config.json |
Model architecture |
generation_config.json |
Default generation parameters |
Citation
@misc{kn1ght-bullet,
author = {InterwebAlchemy},
title = {kn1ght-bullet: A 4.3M Parameter Chess Language Model},
year = {2026},
publisher = {HuggingFace},
url = {https://huggingface.co/InterwebAlchemy/kn1ght-bullet}
}
- Downloads last month
- 708
