Parameter Golf

Parameter Golf is the OpenAI challenge used in this writeup to demonstrate agentic research with elastic compute.

Task framing

The challenge asks participants to compress a language model into a ≤16 MB artifact that runs on 8×H100 in under 10 minutes while minimizing bits-per-byte.

Why it is a good demo

The benchmark stresses multiple phases of work:

  • pipeline smoke tests
  • broad hyperparameter sweeps
  • full-scale validation
  • post-training bottleneck debugging
  • final optimization rounds

That makes it a strong example for modal and for the broader idea of agentic-research-autoscaling.

Reported outcome

The showcased agent ran 113 experiments across 238 GPU-hours, with the core training finishing about 5× faster than it would on a single workstation.