Spaces:
Paused
Paused
Controlled study: AI operational experience improves performance by 1.07 SD (open data + code)
#115
by Rushnur - opened
Hi everyone,
We just published a controlled experiment measuring the effect of accumulated operational experience on AI assistant performance.
Quick summary:
- An AI assistant (ARIA) that has been operating for months, accumulating experience fragments and operational memory, was compared against the same base model (Claude Opus 4.6) without experience
- 50 real-world questions, 1,200 blind judgments from 3 independent judges
- Result: Cohen's d = 1.07, Friedman p < 10^-25
- The effect is domain-specific — strong on operational tasks, near zero on algorithmic controls
This builds on work by ExpeL, MemGPT, Generative Agents, and Reflexion — but measures experience effects in a production system rather than a sandbox.
Everything is open:
- Paper: https://zenodo.org/records/19533311
- Data + code: https://github.com/patechlabs/aria-experience-study
Would love feedback from this community. Also seeking an arXiv cs.AI endorser if anyone is qualified endorsement code MJLELZ
https://arxiv.org/auth/endorse?x=MJLELZ
Thanks!
Ravshan Nuraliev, PaTech Labs