Beyond Static Leaderboards: Predictive Validity for the Evaluation of LLM Agents Paper • 2606.19704 • Published 5 days ago • 30
Evaluating Temporal Semantic Caching and Workflow Optimization in Agentic Plan-Execute Pipelines Paper • 2605.20630 • Published May 20 • 12
MCP-Cosmos: World Model-Augmented Agents for Complex Task Execution in MCP Environments Paper • 2605.09131 • Published May 9 • 59