Towards a Unified View of Large Language Model Post-Training Paper β’ 2509.04419 β’ Published Sep 4 β’ 75 β’ 7
Softpick: No Attention Sink, No Massive Activations with Rectified Softmax Paper β’ 2504.20966 β’ Published Apr 29 β’ 32 β’ 5
A Refined Analysis of Massive Activations in LLMs Paper β’ 2503.22329 β’ Published Mar 28 β’ 14 β’ 3
How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training Paper β’ 2502.11196 β’ Published Feb 16 β’ 23 β’ 6
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training Paper β’ 2501.17161 β’ Published Jan 28 β’ 123 β’ 6
Emergent properties with repeated examples Paper β’ 2410.07041 β’ Published Oct 9, 2024 β’ 8 β’ 3
Discovering the Gems in Early Layers: Accelerating Long-Context LLMs with 1000x Input Token Reduction Paper β’ 2409.17422 β’ Published Sep 25, 2024 β’ 25 β’ 5
EuroLLM: Multilingual Language Models for Europe Paper β’ 2409.16235 β’ Published Sep 24, 2024 β’ 29 β’ 4
BaichuanSEED: Sharing the Potential of ExtensivE Data Collection and Deduplication by Introducing a Competitive Large Language Model Baseline Paper β’ 2408.15079 β’ Published Aug 27, 2024 β’ 54 β’ 4
Dolphin: Long Context as a New Modality for Energy-Efficient On-Device Language Models Paper β’ 2408.15518 β’ Published Aug 28, 2024 β’ 42 β’ 4
Better Alignment with Instruction Back-and-Forth Translation Paper β’ 2408.04614 β’ Published Aug 8, 2024 β’ 16 β’ 3
Gemma 2: Improving Open Language Models at a Practical Size Paper β’ 2408.00118 β’ Published Jul 31, 2024 β’ 79 β’ 3
MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention Paper β’ 2407.02490 β’ Published Jul 2, 2024 β’ 27 β’ 4
Self-Play Preference Optimization for Language Model Alignment Paper β’ 2405.00675 β’ Published May 1, 2024 β’ 27 β’ 7