We build RL environments for long horizon software engineering tasks on production-grade codebases, involving the use of multiple tools, and reasoning in dynamic landscapes.
AI coding agents have become remarkably capable within the IDE - starting from autocomplete, to single-file edits, to making changes across the entire repository. However, true software engineering requires operating beyond the editor.
In practice, engineers reason across time, tools, and uncertainty: implementing a spec which requires modifying multiple services, managing CI/CD pipelines, resolving Sentry and Linear tickets, inspecting GCP metrics and logs, coordinating with teammates on Slack, and more. These workflows unfold over hours or days, involve reasoning about changes in the environment, and the use of a variety of tools.
Agents aren't good at this yet. We believe the next step-change in agentic coding will come from training and evaluating agents in realistic, long-horizon environments that span the full software development lifecycle.
Building these environments is non-trivial. We're creating environments that behave like real production systems. Tasks unfold over time, dependencies are enforced, and performance is defined by verifiable outcomes. We obsess over realism and quality because RL on bad data only degrades model performance, and we're building the core infrastructure to produce and run high fidelity environments and tasks at scale.
As professional software engineers and researchers, we've experienced firsthand how AI has transformed our work. We believe the next frontier is enabling agents to operate reliably outside of the IDE. The most difficult engineering problems live at the seams: deploying code, debugging production systems, and coordinating between teammates. Agents should work reliably at these boundaries - not just inside the repository.
We're excited about this future, and work with frontier labs to customize and scale these environments to unlock greater autonomy and reliability in software engineering agents. As agents improve, we introduce harder tasks, more complex environments, longer horizons, and continuously push the frontier of what's possible. Polymath is backed by Y Combinator.