models jinaai/ReaderLM-v2 Text Generation • 2B • Updated Mar 4, 2025 • 4.07k • • 753 m-a-p/YuE-s1-7B-anneal-en-cot Text Generation • 6B • Updated Mar 12, 2025 • 6.42k • 438 starvector/starvector-1b-im2svg Text Generation • 1B • Updated Mar 19, 2025 • 2.16k • 181 stepfun-ai/Step1X-Edit Image-to-Image • Updated Jul 9, 2025 • 84 • 327
papers DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models Paper • 2402.03300 • Published Feb 5, 2024 • 138 SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training Paper • 2501.17161 • Published Jan 28, 2025 • 123 Running 3.64k The Ultra-Scale Playbook 🌌 3.64k The ultimate guide to training LLM on large GPU Clusters Running 251 LLM训练终极指南 | The Ultra-Scale Playbook 🔥 251 了解LLM训练的方方面面
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models Paper • 2402.03300 • Published Feb 5, 2024 • 138
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training Paper • 2501.17161 • Published Jan 28, 2025 • 123
Running 3.64k The Ultra-Scale Playbook 🌌 3.64k The ultimate guide to training LLM on large GPU Clusters
models jinaai/ReaderLM-v2 Text Generation • 2B • Updated Mar 4, 2025 • 4.07k • • 753 m-a-p/YuE-s1-7B-anneal-en-cot Text Generation • 6B • Updated Mar 12, 2025 • 6.42k • 438 starvector/starvector-1b-im2svg Text Generation • 1B • Updated Mar 19, 2025 • 2.16k • 181 stepfun-ai/Step1X-Edit Image-to-Image • Updated Jul 9, 2025 • 84 • 327
papers DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models Paper • 2402.03300 • Published Feb 5, 2024 • 138 SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training Paper • 2501.17161 • Published Jan 28, 2025 • 123 Running 3.64k The Ultra-Scale Playbook 🌌 3.64k The ultimate guide to training LLM on large GPU Clusters Running 251 LLM训练终极指南 | The Ultra-Scale Playbook 🔥 251 了解LLM训练的方方面面
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models Paper • 2402.03300 • Published Feb 5, 2024 • 138
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training Paper • 2501.17161 • Published Jan 28, 2025 • 123
Running 3.64k The Ultra-Scale Playbook 🌌 3.64k The ultimate guide to training LLM on large GPU Clusters