Black-Box On-Policy Distillation of Large Language Models Paper • 2511.10643 • Published Nov 13, 2025 • 49
Instruction-Guided Lesion Segmentation for Chest X-rays with Automatically Generated Large-Scale Dataset Paper • 2511.15186 • Published Nov 19, 2025 • 25
When to Ensemble: Identifying Token-Level Points for Stable and Fast LLM Ensembling Paper • 2510.15346 • Published Oct 17, 2025 • 33
Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement Learning Paper • 2510.03259 • Published Sep 26, 2025 • 57
No Prompt Left Behind: Exploiting Zero-Variance Prompts in LLM Reinforcement Learning via Entropy-Guided Advantage Shaping Paper • 2509.21880 • Published Sep 26, 2025 • 52
ReviewScore: Misinformed Peer Review Detection with Large Language Models Paper • 2509.21679 • Published Sep 25, 2025 • 63
EAGLE3 Collection The collection of eagle3 series models for Qwen3 and Hunyuan. • 9 items • Updated Nov 3, 2025 • 2
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency Paper • 2508.18265 • Published Aug 25, 2025 • 211
EXAONE-4.0 Collection EXAONE unified model series of 1.2B and 32B, integrating non-reasoning and reasoning modes. • 20 items • Updated Jul 29, 2025 • 52
V-JEPA 2 Collection A frontier video understanding model developed by FAIR, Meta, which extends the pretraining objectives of https://ai.meta.com/blog/v-jepa-yann • 8 items • Updated Jun 13, 2025 • 180
Reasoning Model is Stubborn: Diagnosing Instruction Overriding in Reasoning Models Paper • 2505.17225 • Published May 22, 2025 • 64
MedGemma Release Collection Collection of Gemma 3 variants for performance on medical text and image comprehension to accelerate building healthcare-based AI applications. • 7 items • Updated Jul 11, 2025 • 369
DistiLLM-2: A Contrastive Approach Boosts the Distillation of LLMs Paper • 2503.07067 • Published Mar 10, 2025 • 31
view article Article DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge Feb 7, 2025 • 267