Dutch ModernBERT (512h-22L) - 1.35M steps

This is a Dutch ModernBERT model. It's a compact variant with 512 hidden dimensions and 22 layers, totaling ~144M parameters.

Usage

from transformers import AutoModelForMaskedLM, AutoTokenizer
import torch

model = AutoModelForMaskedLM.from_pretrained("yhavinga/dmbert-dutchl-512h-22l-1350000")
tokenizer = AutoTokenizer.from_pretrained("yhavinga/dmbert-dutchl-512h-22l-1350000")

# Example: Fill-mask (note: no space before <mask>)
text = "Amsterdam is de<mask> van Nederland."
inputs = tokenizer(text, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)

# Get predictions for mask token
mask_idx = (inputs.input_ids == tokenizer.mask_token_id).nonzero(as_tuple=True)[1][0]
logits = outputs.logits[0, mask_idx]
probs = torch.nn.functional.softmax(logits, dim=-1)
top_k = torch.topk(probs, k=5)

print("Top predictions:")
for prob, idx in zip(top_k.values, top_k.indices):
    print(f"  {tokenizer.decode([idx])}: {prob:.1%}")

Model Details

Architecture Configuration

{
  "hidden_size": 512,
  "num_hidden_layers": 22,
  "num_attention_heads": 8,
  "intermediate_size": 3072,
  "vocab_size": 32128,
  "max_position_embeddings": 8192,
  "global_attn_every_n_layers": 3,
  "local_attention": 128
}

Training Details

  • Batch Size: 8 per device (256 global on v4-32)
  • Sequence Length: 1024
  • Learning Rate: 1.41e-4 (cosine schedule with 20k warmup)
  • Weight Decay: 0.01
  • Compute Dtype: bfloat16
  • Max Steps: 2,000,000 (currently at 1,350,000)

Tokenizer Details

This model uses the Dutch LLaMA tokenizer with an added <mask> token:

  • Base vocabulary: 32,000 tokens
  • Mask token ID: 32,000
  • Total vocabulary (with padding): 32,128 tokens (padded for TPU efficiency)
  • Note: Use <mask> without preceding space for best results

Model Card

  • Developed by: Yeb Havinga
  • Model type: Masked Language Model (MLM)
  • License: Apache 2.0
  • Repository: mbert-jax

Acknowledgements

This model was trained on Google Cloud TPU v4-32. The architecture is based on ModernBERT by AnswerDotAI.

Downloads last month
38
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support