Dutch ModernBERT (512h-22L) - 1.35M steps
This is a Dutch ModernBERT model. It's a compact variant with 512 hidden dimensions and 22 layers, totaling ~144M parameters.
Usage
from transformers import AutoModelForMaskedLM, AutoTokenizer
import torch
model = AutoModelForMaskedLM.from_pretrained("yhavinga/dmbert-dutchl-512h-22l-1350000")
tokenizer = AutoTokenizer.from_pretrained("yhavinga/dmbert-dutchl-512h-22l-1350000")
# Example: Fill-mask (note: no space before <mask>)
text = "Amsterdam is de<mask> van Nederland."
inputs = tokenizer(text, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
# Get predictions for mask token
mask_idx = (inputs.input_ids == tokenizer.mask_token_id).nonzero(as_tuple=True)[1][0]
logits = outputs.logits[0, mask_idx]
probs = torch.nn.functional.softmax(logits, dim=-1)
top_k = torch.topk(probs, k=5)
print("Top predictions:")
for prob, idx in zip(top_k.values, top_k.indices):
print(f" {tokenizer.decode([idx])}: {prob:.1%}")
Model Details
- Architecture: ModernBERT (based on AnswerDotAI's ModernBERT)
- Language: Dutch (nl)
- Training Framework: JAX/Flax on TPU v4-32
- Parameters: 143,645,056
- Tokenizer: yhavinga/dutch-llama-tokenizer (32,128 vocab)
Architecture Configuration
{
"hidden_size": 512,
"num_hidden_layers": 22,
"num_attention_heads": 8,
"intermediate_size": 3072,
"vocab_size": 32128,
"max_position_embeddings": 8192,
"global_attn_every_n_layers": 3,
"local_attention": 128
}
Training Details
- Batch Size: 8 per device (256 global on v4-32)
- Sequence Length: 1024
- Learning Rate: 1.41e-4 (cosine schedule with 20k warmup)
- Weight Decay: 0.01
- Compute Dtype: bfloat16
- Max Steps: 2,000,000 (currently at 1,350,000)
Tokenizer Details
This model uses the Dutch LLaMA tokenizer with an added <mask> token:
- Base vocabulary: 32,000 tokens
- Mask token ID: 32,000
- Total vocabulary (with padding): 32,128 tokens (padded for TPU efficiency)
- Note: Use
<mask>without preceding space for best results
Model Card
- Developed by: Yeb Havinga
- Model type: Masked Language Model (MLM)
- License: Apache 2.0
- Repository: mbert-jax
Acknowledgements
This model was trained on Google Cloud TPU v4-32. The architecture is based on ModernBERT by AnswerDotAI.
- Downloads last month
- 38
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support