Repetition over Diversity: High-Signal Data Filtering for Sample-Efficient German Language Modeling Paper • 2604.28075 • Published Apr 30 • 20
Pre-Training Curriculum for Multi-Token Prediction in Language Models Paper • 2505.22757 • Published May 28, 2025
Repetition over Diversity: High-Signal Data Filtering for Sample-Efficient German Language Modeling Paper • 2604.28075 • Published Apr 30 • 20
MastermindEval: A Simple But Scalable Reasoning Benchmark Paper • 2503.05891 • Published Mar 7, 2025 • 1
Empirical Evaluation of Knowledge Distillation from Transformers to Subquadratic Language Models Paper • 2504.14366 • Published Apr 19, 2025 • 1
Sample-Efficient Language Modeling with Linear Attention and Lightweight Enhancements Paper • 2511.05560 • Published Nov 4, 2025 • 1
Familiarity: Better Evaluation of Zero-Shot Named Entity Recognition by Quantifying Label Shifts in Synthetic Training Data Paper • 2412.10121 • Published Dec 13, 2024 • 2