When AI Co-Scientists Fail: SPOT-a Benchmark for Automated Verification of Scientific Research Paper • 2505.11855 • Published May 17 • 10
ArxivBench: Can LLMs Assist Researchers in Conducting Research? Paper • 2504.10496 • Published Apr 6 • 2