Sparse Autoencoders β BabyLM GPT-2 Medium
SAEs trained on IParraMartin/gpt2-medium-bLM100M
using SAELens v6.
Base model trained on BabyLM-2026-Strict (100M tokens).
Training configuration
| Parameter | Value |
|---|---|
| Architecture | BatchTopK (saved as JumpReLU for inference) |
| d_in | 1024 |
| d_sae | 16384 (Γ16 expansion) |
| k | 64 active features per token |
| Training tokens | 100M |
| Learning rate | 2e-4 with 1000-step warmup |
| context_size | 128 |
| normalize_activations | expected_average_only_in |
Layers
layer_02/β residual stream attransformer.h.2layer_04/β residual stream attransformer.h.4layer_06/β residual stream attransformer.h.6layer_08/β residual stream attransformer.h.8layer_10/β residual stream attransformer.h.10layer_12/β residual stream attransformer.h.12layer_16/β residual stream attransformer.h.16layer_22/β residual stream attransformer.h.22
Usage
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from sae_lens import SAE
LAYER = 16
device = "cuda" if torch.cuda.is_available() else "cpu"
# Load SAE
sae = SAE.load_from_pretrained(
"whitepenguin/gpt2-medium-bLM100M-SAE",
subfolder=f"layer_{LAYER:02d}",
device=device,
)
sae.eval()
# Load GPT-2 (always required β SAE runs on top of its activations)
tokenizer = AutoTokenizer.from_pretrained("IParraMartin/gpt2-medium-bLM100M")
model = AutoModelForCausalLM.from_pretrained(
"IParraMartin/gpt2-medium-bLM100M").to(device).eval()
# Hook residual stream
cache = {}
hook = model.transformer.h[LAYER].register_forward_hook(
lambda m, i, o: cache.update({"resid": o[0].detach()})
)
inputs = tokenizer("The child looked at the dog.", return_tensors="pt").to(device)
with torch.no_grad():
model(**inputs)
hook.remove()
feature_acts = sae.encode(cache["resid"]) # [1, seq_len, 16384]
l0 = (feature_acts > 0).float().sum(-1).mean()
print(f"Mean active features/token: {l0:.1f}") # expect ~64
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support
Model tree for whitepenguin/gpt2-medium-bLM100M-SAE
Base model
openai-community/gpt2-medium Finetuned
IParraMartin/gpt2-medium-bLM100M