Text Generation
Transformers
ONNX
Safetensors
English
gpt2
chess
pgn
causal-lm
game-playing
text-generation-inference

kn1ght-bullet

kn1ght-bullet

A 4.3M parameter GPT trained to play chess by next-token prediction on PGN notation. Intended for use in chess tutoring applications via constrained decoding at inference time.

bullet refers to the model's size tier — small and fast, in the same spirit as chess time controls.


Model Details

Architecture GPT (4 layers, 4 heads, 256 embedding dim)
Parameters 4.3M
Context length 256 tokens
Vocabulary 4,096 BPE tokens (chess PGN–specific)
Training format PGN text ([g_start]1.e4 e5 2.Nf3 ...)

The tokenizer is a BPE vocabulary built specifically for chess PGN notation, where most moves (e4, Nf3, O-O, cxd5) encode as single tokens. This keeps inference fast and makes constrained decoding straightforward — legal move masking is a one-step operation for the large majority of positions.


Training Pipeline

Training proceeded in three phases. Pre-training used InterwebAlchemy/pgn-dataset-including-special-tokens (~3.5M games, average ELO ~2240, spanning 1783–2006), derived from the base InterwebAlchemy/pgn-dataset which adds [g_start] / [g_end] game boundary tokens.

Phase 1 — Pre-training 200,000 steps on 100,000 games. The model learns PGN structure and develops opening pattern recognition across a wide range of named lines.

Phase 2 — Legality-Filtered SFT (5 rounds × 5,000 steps) A self-improvement loop: generate continuations from named opening prompts, filter to legally-valid games, mix with HuggingFace anchor games to prevent forgetting, and fine-tune. Repeated five times, growing the legal training set from 67 games (9.1% pass rate) to 796 games (67.5% pass rate).

Phase 3 — DPO (300 steps) Stockfish-generated preference pairs (771 chosen/rejected pairs from 783 positions) rank legal moves by quality. Val reward accuracy: 0.885. SFT loss remains stable throughout, confirming the model retains PGN structure.


Evaluation

Evaluated against chess-specialist models and frontier LLMs on three tasks.

  • kn1ght models use the custom 4,096-token chess BPE tokenizer with a [g_start] game-start prefix.
  • HuggingFace specialist models (chessgpt-base-v1, chesspythia-70m) use their own model-specific tokenizers, loaded automatically via the HuggingFace pipeline. Input is raw PGN text with no special prefix.
  • Frontier LLMs receive raw PGN prompts via the OpenRouter API; completion models (gpt-3.5-turbo-instruct, gpt-oss-20b) get a bare PGN string, chat models get a short system prompt ("You play chess. Reply with only the next move in SAN notation.").

Phase B — Opening play (50 positions × 10 generations)

Centipawn loss (CPL) measures how much worse a model's move is compared to Stockfish's best move at depth 15. Lower is better.

Model Params Mean CPL ↓ Legality Blunder %
Gemini 3.1 Flash Lite ~8B 2.58 100% 0.0%
chessgpt-base-v1 ~85M 4.92 99.6% 0.2%
gpt-3.5-turbo-instruct ~175B 5.79 99.4% 0.0%
kn1ght-bullet (this model) 4.3M 5.83 99.8% 0.0%
DeepSeek V3 ~685B 8.18 86.0% 0.4%

kn1ght-bullet matches gpt-3.5-turbo-instruct (a ~175B parameter frontier model) in mean CPL while being 40,000× smaller. Performance is strongest in Sicilian and Ruy Lopez variations well-represented in the training data, and weaker in less-common openings such as the Benoni and Colle System.

Phase C' — PGN puzzle accuracy (20 puzzles × 10 generations)

Puzzles are drawn from the Lichess Open Puzzle Database (ratings 1201–1895, mean 1551), presented as full PGN game context up to the puzzle position.

Model Top-1 Accuracy Legality
Gemini 3.1 Flash Lite 49% 98%
chessgpt-base-v1 34% 97%
gpt-3.5-turbo-instruct 26% 63%
kn1ght-bullet 10% 58%
DeepSeek V3 12% 62%

Tactical puzzle accuracy is constrained by model capacity at this scale. With constrained decoding at inference time, the model selects the highest-ranked legal move — puzzle accuracy is less relevant to the tutoring use case than opening-play CPL.

Phase C — FEN puzzle accuracy

kn1ght-bullet scores 0% on FEN-format puzzles, as expected. FEN notation was never present in the training data; feeding FEN to the model produces arbitrary output. This is a known and intentional limitation of PGN-only training.


Usage

With transformers.js (browser / Node.js)

The primary intended runtime. Use onnx/model_quantized.onnx (5.7 MB) for browser delivery; onnx/model.onnx (21.6 MB) for full-precision inference.

import { pipeline } from "@xenova/transformers";

const generator = await pipeline("text-generation", "InterwebAlchemy/kn1ght-bullet");
const result = await generator("[g_start]1.e4 e5 2.Nf3 Nc6 3.Bb5", {
  max_new_tokens: 10,
  do_sample: true,
  temperature: 0.8,
  top_k: 40,
});

Constrained decoding is strongly recommended in production. At each move step, mask the logits to only the token IDs of legal moves (from chess.js) before sampling. This guarantees legal play and lets the model's probability distribution over legal moves act as an opening-quality signal.

// Build per-position allowlist once, not inside the generation loop
const legalMoves = chess.moves();
const allowedIds = new Set(legalMoves.flatMap((san) => tokenizer.encode(san).input_ids));

function maskLogits(logits) {
  for (let i = 0; i < logits.length; i++) {
    if (!allowedIds.has(i)) logits[i] = -Infinity;
  }
  return logits;
}

With Python (PyTorch)

import torch
from tokenizers import Tokenizer

# Load the tokenizer
tokenizer = Tokenizer.from_pretrained("InterwebAlchemy/kn1ght-bullet")

# Load via ONNX (recommended)
import onnxruntime as ort
session = ort.InferenceSession("onnx/model.onnx")

pgn = "[g_start]1.e4 e5 2.Nf3 Nc6 3.Bb5"
input_ids = tokenizer.encode(pgn).ids
logits = session.run(["logits"], {"input_ids": [input_ids]})[0]
next_token = logits[0, -1].argmax()
print(tokenizer.decode([next_token]))

Limitations

  • PGN-only: Cannot parse FEN notation. Positions must be provided as PGN move sequences.
  • Opening-focused: Training data emphasises the opening phase. Middlegame and endgame play degrades without constrained decoding.
  • 256-token context: Long games approaching move 60+ may exceed the context window.
  • Not a chess engine: Does not perform search or lookahead. Move quality reflects learned opening patterns, not calculation.

Files

File Description
onnx/model.onnx Full-precision ONNX (21.6 MB)
onnx/model_quantized.onnx Int8 quantized ONNX (5.7 MB) — recommended for browser
tokenizer.json BPE tokenizer, loadable by transformers.js and HF tokenizers
config.json Model architecture
generation_config.json Default generation parameters

Citation

@misc{kn1ght-bullet,
  author       = {InterwebAlchemy},
  title        = {kn1ght-bullet: A 4.3M Parameter Chess Language Model},
  year         = {2026},
  publisher    = {HuggingFace},
  url          = {https://huggingface.co/InterwebAlchemy/kn1ght-bullet}
}
Downloads last month
708
Safetensors
Model size
4.53M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Datasets used to train InterwebAlchemy/kn1ght-bullet

Collection including InterwebAlchemy/kn1ght-bullet