AutoTrainess: Teaching Language Models to Improve Language Models Autonomously
Abstract
AutoTrainess enables autonomous language model training by providing structured agent-computer interfaces that guide planning, data preparation, training, evaluation, and logging operations more effectively than traditional command-line approaches.
Training language models (LMs) remains a highly human-intensive process, even as frontier language model agents become increasingly capable at software engineering and other long-horizon tasks. A central challenge is that autonomous post-training is not just a coding problem: it requires the agent to repeatedly plan iterations, construct benchmark-aligned data, run stable training jobs, evaluate checkpoints, and preserve experiment state across many hours of interaction. We present AutoTrainess, a LM agent that exposes these operations as a repository of agent-computer interfaces for planning, data preparation, training, evaluation, and logging. Rather than leaving the agent to operate in a raw CLI environment with an underspecified action space, AutoTrainess externalizes prior human experience as explicit workflows, rules, and execution constraints that guide the agent toward effective and reliable training behavior. On PostTrainBench, AutoTrainess consistently outperforms CLI-only baselines, achieving 26.94 average score with GPT-5.4 (Codex) versus 23.21 for CLI-only. It also generalizes across models and harnesses, improving DeepSeek-V4-Flash (OpenCode) from 12.13 to 19.58.
Community
AutoTrainess: Teaching Language Models to Improve Language Models Autonomously
How big do you think a model needs to be to be able to escape? Like size helps it be smart enough to escape, but also makes transferring and hiding its weights significantly harder, though have a big enough model and it can figure out decentralized serving and self-scaling. It's so end game lol. I built something similar, with more modularity and standardization of its interface. Direct it at itself as a training environment, ie can current agents train a model to train models. fable 5 training sonnet 5 to train haiku 5s.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- ANDES: Agent Native Data Evolving Synthesis Tool for Autonomous Instruction Alignment (2026)
- Exploring Autonomous Agentic Data Engineering for Model Specialization (2026)
- Can Generalist Agents Automate Data Curation? (2026)
- Adapting the Interface, Not the Model: Runtime Harness Adaptation for Deterministic LLM Agents (2026)
- LiteCoder-Terminal: Scaling Long-Horizon Terminal Environments for Learning Language Agents (2026)
- OpenThoughts-Agent: Data Recipes for Agentic Models (2026)
- WildClawBench: A Benchmark for Real-World, Long-Horizon Agent Evaluation (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper