Agent Eval

Browser Agent Task Benchmark Builder

Turn a browser task, success criteria, selector hints, failure cases, and time budget into an AI-agent benchmark JSON, Playwright replay check, scoring rubric, SVG route map, and proof receipt.

Reviewed 2026-06-20

AI
Browser-firstReal exportsJSON receiptLaunch intent

Loading Browser Agent Task Benchmark Builder...

WHY THIS IS DIFFERENT

It creates launch artifacts, not generic advice.

Boundary: Not a guarantee that an AI agent will pass, not a substitute for real QA, and not a way to bypass site access controls, rate limits, authentication, or terms.