Model card
Agent Q (GPT-4o).
MultiOnproprietaryUnknown paramsMCTS + DPO self-play web agent on GPT-4o
Agent Q uses Monte Carlo Tree Search + direct preference optimization via self-play. arxiv:2408.07199.
§ 01 · Benchmarks
Every benchmark Agent Q (GPT-4o) has a recorded score for.
| # | Benchmark | Area · Task | Metric | Value | Rank | Date | Source |
|---|---|---|---|---|---|---|---|
| 01 | WebArena | Agentic AI · Web & Desktop Agents | success-rate | 50.5% | #4 | 2023-07-26 | source ↗ |
Rank column shows this model’s position vs all other models scored on the same benchmark + metric (competitors after the slash). #1 in red means current SOTA. Sorted by rank, then newest result.
§ 03 · Papers
1 paper with results for Agent Q (GPT-4o).
- 2023-07-26· Agentic AI· 1 result
WebArena: A Realistic Web Environment for Building Autonomous Agents
§ 05 · Sources & freshness
Where these numbers come from.
arxiv
1
result
1 of 1 rows marked verified.