58 28 10

Dhaval Patel

DhavalPatel

dhaval-patel-2b287033

AI & ML interests

None yet

Recent Activity

upvoted a paper 2 days ago

Beyond Static Leaderboards: Predictive Validity for the Evaluation of LLM Agents

submitted a paper 2 days ago

Beyond Static Leaderboards: Predictive Validity for the Evaluation of LLM Agents

upvoted a paper 5 days ago

Evoflux: Inference-Time Evolution of Executable Tool Workflows for Compact Agents

View all activity

Organizations

upvoted a paper 2 days ago

Beyond Static Leaderboards: Predictive Validity for the Evaluation of LLM Agents

Paper • 2606.19704 • Published 3 days ago • 27

submitted a paper to Daily Papers 2 days ago

Beyond Static Leaderboards: Predictive Validity for the Evaluation of LLM Agents

Paper • 2606.19704 • Published 3 days ago • 27

upvoted a paper 5 days ago

Evoflux: Inference-Time Evolution of Executable Tool Workflows for Compact Agents

Paper • 2606.12674 • Published 11 days ago • 5

New activity in ibm-research/AssetOpsBench 14 days ago

Request for agent's traces

#11 opened 2 months ago by

kyzor

liked a dataset 14 days ago

AnandMayank/QueST-PartNetMobility-SAPIEN

Viewer • Updated 12 days ago • 50.6k • 14.3k • 9

commented on Harness, Scaffold, and the AI Agent Terms Worth Getting Right 19 days ago

Appreciate the nice writeup. Can we add a) Leaderboard, b) Benchmark

https://github.com/IBM/AssetOpsBench

upvoted an article 19 days ago

Article

Beyond LLMs: Why Scalable Enterprise AI Adoption Depends on Agent Logic

ibm-research

•

19 days ago

• 87

New activity in ibm-research/AssetOpsBench 23 days ago

Update data/scenarios/all_utterance.jsonl

#12 opened 23 days ago by

shuxinl

upvoted an article 24 days ago

Article

ITBench-AA: Frontier Models Score Below 50% on the First Benchmark for Agentic Enterprise IT Tasks — by Artificial Analysis and IBM

ibm-research

•

24 days ago

• 17

upvoted a paper 25 days ago

Beyond Final Answers: Auditing Trajectory-Level Hallucinations in Multi-Agent Industrial Workflows

Paper • 2605.24219 • Published 26 days ago • 9

submitted a paper to Daily Papers 25 days ago

Beyond Final Answers: Auditing Trajectory-Level Hallucinations in Multi-Agent Industrial Workflows

Paper • 2605.24219 • Published 26 days ago • 9

commented 2 papers 25 days ago

Results and Retrospective Analysis of the CODS 2025 AssetOpsBench Challenge

Paper • 2605.08518 • Published May 8 • 11 •

Evaluating Temporal Semantic Caching and Workflow Optimization in Agentic Plan-Execute Pipelines

Paper • 2605.20630 • Published May 20 • 12 •

authored 3 papers about 1 month ago

DiagnosticIQ: A Benchmark for LLM-Based Industrial Maintenance Action Recommendation from Symbolic Rules

Paper • 2605.08614 • Published May 9 • 7

Code-Guided Reasoning for Small Language Models: Evaluating Executable MCQA Scaffolds

Paper • 2605.18827 • Published May 12 • 7

Evaluating Temporal Semantic Caching and Workflow Optimization in Agentic Plan-Execute Pipelines

Paper • 2605.20630 • Published May 20 • 12

upvoted a paper about 1 month ago

Evaluating Temporal Semantic Caching and Workflow Optimization in Agentic Plan-Execute Pipelines

Paper • 2605.20630 • Published May 20 • 12

submitted 2 papers to Daily Papers about 1 month ago

Evaluating Temporal Semantic Caching and Workflow Optimization in Agentic Plan-Execute Pipelines

Paper • 2605.20630 • Published May 20 • 12

Code-Guided Reasoning for Small Language Models: Evaluating Executable MCQA Scaffolds

Paper • 2605.18827 • Published May 12 • 7

authored a paper about 1 month ago

SPIRAL: Symbolic LLM Planning via Grounded and Reflective Search

Paper • 2512.23167 • Published Dec 29, 2025 • 1

Dhaval Patel

AI & ML interests

Recent Activity

Organizations

DhavalPatel's activity

Request for agent's traces

Beyond LLMs: Why Scalable Enterprise AI Adoption Depends on Agent Logic

Update data/scenarios/all_utterance.jsonl

ITBench-AA: Frontier Models Score Below 50% on the First Benchmark for Agentic Enterprise IT Tasks — by Artificial Analysis and IBM