Spaces:

fffiloni
/

MS-Image2Video

Paused

App Files Files Community

115

Controlled study: AI operational experience improves performance by 1.07 SD (open data + code)

#115

by Rushnur - opened 26 days ago

Discussion

Rushnur

26 days ago

Hi everyone,

We just published a controlled experiment measuring the effect of accumulated operational experience on AI assistant performance.

Quick summary:

An AI assistant (ARIA) that has been operating for months, accumulating experience fragments and operational memory, was compared against the same base model (Claude Opus 4.6) without experience
50 real-world questions, 1,200 blind judgments from 3 independent judges
Result: Cohen's d = 1.07, Friedman p < 10^-25
The effect is domain-specific — strong on operational tasks, near zero on algorithmic controls

This builds on work by ExpeL, MemGPT, Generative Agents, and Reflexion — but measures experience effects in a production system rather than a sandbox.

Everything is open:

Paper: https://zenodo.org/records/19533311
Data + code: https://github.com/patechlabs/aria-experience-study

Would love feedback from this community. Also seeking an arXiv cs.AI endorser if anyone is qualified endorsement code MJLELZ

https://arxiv.org/auth/endorse?x=MJLELZ

Thanks!
Ravshan Nuraliev, PaTech Labs

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment