PolymathSimulation environments for training & evaluating long-horizon AI agents.
- Applications: the services that agents interact with (e.g. databases, backend servers, slack, linear, etc)
- Data: information seeded into the environment which represents an initial state
- Tasks: descriptions for what the agent should accomplish
- Verifiers: criteria that evaluate how well agents perform on tasks in the environment
- Agent(s): AI actor(s) that navigate the environment and complete tasks using the available tools