karpathy/autoresearch | DeepWiki
- URL: https://deepwiki.com/karpathy/autoresearch
- Repository: https://github.com/karpathy/autoresearch
- Saved: 2026-04-10
Summary
DeepWiki describes karpathy/autoresearch as an autonomous ML research framework built around a deliberately tiny code surface:
prepare.pyis immutable and owns data preparation, tokenizer training, constants, and evaluation.train.pyis the mutable core that the agent edits to try ideas.program.mdis the human-authored research brief that tells the agent what to optimize and how to behave.
The system is designed to let an AI agent run repeated overnight experiments on a small but real LLM training setup. In each loop, the agent edits train.py, runs training for a fixed 5-minute wall-clock budget, extracts val_bpb, decides whether the change improved the result, and either keeps or discards the commit.
Key ideas surfaced by DeepWiki
- Fixed-time optimization: every experiment gets the same 5-minute budget, making runs directly comparable on a given machine.
- Single metric: the main target is
val_bpb(validation bits per byte), chosen because it remains comparable even if the tokenizer or vocabulary changes. - Single-file mutation: constraining edits to
train.pykeeps the search space manageable and diffs reviewable. - Human as org designer: instead of editing Python directly, the human mainly edits
program.md, which acts like lightweight “research org code” for the agent. - Keep/discard frontier: all attempts are logged, but only improvements advance the git branch frontier.
Useful references
- DeepWiki sections include: overview, system architecture, design principles, getting started, agent operation, metrics/evaluation, and advanced topics.
- README quick start uses
uv sync,uv run prepare.py, anduv run train.py. - The default workflow targets a single NVIDIA GPU and was tested on H100-class hardware.