Collections
Discover the best community collections!
Collections including paper arxiv:2509.07054
-
Open Data Synthesis For Deep Research
Paper • 2509.00375 • Published • 71 -
Beyond Correctness: Harmonizing Process and Outcome Rewards through RL Training
Paper • 2509.03403 • Published • 23 -
LMEnt: A Suite for Analyzing Knowledge in Language Models from Pretraining Data to Representations
Paper • 2509.03405 • Published • 24 -
SATQuest: A Verifier for Logical Reasoning Evaluation and Reinforcement Fine-Tuning of LLMs
Paper • 2509.00930 • Published • 5
-
microsoft/bitnet-b1.58-2B-4T
Text Generation • Updated • 15k • 1.3k -
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
Paper • 2504.10449 • Published • 15 -
nvidia/Llama-3.1-Nemotron-8B-UltraLong-2M-Instruct
Text Generation • 8B • Updated • 31 • 15 -
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
Paper • 2504.11536 • Published • 63
-
Statistical Methods in Generative AI
Paper • 2509.07054 • Published • 11 -
Parallel-R1: Towards Parallel Thinking via Reinforcement Learning
Paper • 2509.07980 • Published • 104 -
Agent Learning via Early Experience
Paper • 2510.08558 • Published • 273 -
GigaEvo: An Open Source Optimization Framework Powered By LLMs And Evolution Algorithms
Paper • 2511.17592 • Published • 119
-
ReportBench: Evaluating Deep Research Agents via Academic Survey Tasks
Paper • 2508.15804 • Published • 15 -
Behavioral Fingerprinting of Large Language Models
Paper • 2509.04504 • Published • 6 -
Statistical Methods in Generative AI
Paper • 2509.07054 • Published • 11 -
CLUE: Non-parametric Verification from Experience via Hidden-State Clustering
Paper • 2510.01591 • Published • 28
-
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data
Paper • 2404.15653 • Published • 29 -
MoDE: CLIP Data Experts via Clustering
Paper • 2404.16030 • Published • 15 -
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
Paper • 2405.12130 • Published • 50 -
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
Paper • 2405.12981 • Published • 33
-
Statistical Methods in Generative AI
Paper • 2509.07054 • Published • 11 -
Parallel-R1: Towards Parallel Thinking via Reinforcement Learning
Paper • 2509.07980 • Published • 104 -
Agent Learning via Early Experience
Paper • 2510.08558 • Published • 273 -
GigaEvo: An Open Source Optimization Framework Powered By LLMs And Evolution Algorithms
Paper • 2511.17592 • Published • 119
-
Open Data Synthesis For Deep Research
Paper • 2509.00375 • Published • 71 -
Beyond Correctness: Harmonizing Process and Outcome Rewards through RL Training
Paper • 2509.03403 • Published • 23 -
LMEnt: A Suite for Analyzing Knowledge in Language Models from Pretraining Data to Representations
Paper • 2509.03405 • Published • 24 -
SATQuest: A Verifier for Logical Reasoning Evaluation and Reinforcement Fine-Tuning of LLMs
Paper • 2509.00930 • Published • 5
-
ReportBench: Evaluating Deep Research Agents via Academic Survey Tasks
Paper • 2508.15804 • Published • 15 -
Behavioral Fingerprinting of Large Language Models
Paper • 2509.04504 • Published • 6 -
Statistical Methods in Generative AI
Paper • 2509.07054 • Published • 11 -
CLUE: Non-parametric Verification from Experience via Hidden-State Clustering
Paper • 2510.01591 • Published • 28
-
microsoft/bitnet-b1.58-2B-4T
Text Generation • Updated • 15k • 1.3k -
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
Paper • 2504.10449 • Published • 15 -
nvidia/Llama-3.1-Nemotron-8B-UltraLong-2M-Instruct
Text Generation • 8B • Updated • 31 • 15 -
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
Paper • 2504.11536 • Published • 63
-
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data
Paper • 2404.15653 • Published • 29 -
MoDE: CLIP Data Experts via Clustering
Paper • 2404.16030 • Published • 15 -
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
Paper • 2405.12130 • Published • 50 -
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
Paper • 2405.12981 • Published • 33