-
BrokenMath: A Benchmark for Sycophancy in Theorem Proving with LLMs
Paper • 2510.04721 • Published -
FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models
Paper • 2505.02735 • Published • 34 -
PolyMath: Evaluating Mathematical Reasoning in Multilingual Contexts
Paper • 2504.18428 • Published -
MathConstruct: Challenging LLM Reasoning with Constructive Proofs
Paper • 2502.10197 • Published
Shuo Xing
shuoxing
AI & ML interests
MLLMs, LLMs
Recent Activity
upvoted
a
paper
2 days ago
Qwen3-VL Technical Report
upvoted
a
paper
3 days ago
ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration
updated
a model
6 days ago
shuoxing/qwen2-5-7b-full-pretrain-control-tweet-1m-en-no-packing-new-sft-bs32