karpathy/autoresearch | DeepWiki

URL: https://deepwiki.com/karpathy/autoresearch
Repository: https://github.com/karpathy/autoresearch
Saved: 2026-04-10

Summary

DeepWiki describes karpathy/autoresearch as an autonomous ML research framework built around a deliberately tiny code surface:

prepare.py is immutable and owns data preparation, tokenizer training, constants, and evaluation.
train.py is the mutable core that the agent edits to try ideas.
program.md is the human-authored research brief that tells the agent what to optimize and how to behave.

The system is designed to let an AI agent run repeated overnight experiments on a small but real LLM training setup. In each loop, the agent edits train.py, runs training for a fixed 5-minute wall-clock budget, extracts val_bpb, decides whether the change improved the result, and either keeps or discards the commit.

Key ideas surfaced by DeepWiki

Fixed-time optimization: every experiment gets the same 5-minute budget, making runs directly comparable on a given machine.
Single metric: the main target is val_bpb (validation bits per byte), chosen because it remains comparable even if the tokenizer or vocabulary changes.
Single-file mutation: constraining edits to train.py keeps the search space manageable and diffs reviewable.
Human as org designer: instead of editing Python directly, the human mainly edits program.md, which acts like lightweight “research org code” for the agent.
Keep/discard frontier: all attempts are logged, but only improvements advance the git branch frontier.

Useful references

DeepWiki sections include: overview, system architecture, design principles, getting started, agent operation, metrics/evaluation, and advanced topics.
README quick start uses uv sync, uv run prepare.py, and uv run train.py.
The default workflow targets a single NVIDIA GPU and was tested on H100-class hardware.

Carter's Knowledge Base

Explorer

karpathy-autoresearch-deepwiki

karpathy/autoresearch | DeepWiki

Summary

Key ideas surfaced by DeepWiki

Useful references

Graph View

Table of Contents