Loading Browser Agent Task Benchmark Builder...
Agent Eval
Browser Agent Task Benchmark Builder
Turn a browser task, success criteria, selector hints, failure cases, and time budget into an AI-agent benchmark JSON, Playwright replay check, scoring rubric, SVG route map, and proof receipt.
Reviewed 2026-06-20
AIBrowser-firstReal exportsJSON receiptLaunch intent
WHY THIS IS DIFFERENT
It creates launch artifacts, not generic advice.
- Builds a reusable AI-browser-agent evaluation from a real task instead of a vague prompt.
- Exports a replay script, scoring rubric, fallback selector hints, SVG route map, and receipt.
- Flags ambiguous success criteria, brittle selectors, overlays, and task blockers before agents fail silently.
Boundary: Not a guarantee that an AI agent will pass, not a substitute for real QA, and not a way to bypass site access controls, rate limits, authentication, or terms.