BROWSER EVIDENCE EVAL LAB

Replay agent runs, verify citations, and build safe eval fixtures.

This kit produces evidence artifacts that humans and AI agents can inspect: timelines, JSONL replay files, claim-source matrices, heatmaps, synthetic fixtures, schema contracts, and proof receipts.

Run Replay

Agent Run Replay Blackbox

Turn an AI-agent trace, browser steps, tool calls, console notes, screenshots, and outcome into a replayable timeline, JSONL sequence, postmortem, GitHub issue draft, and proof receipt.

RAG Eval

Retrieval Citation Replay Studio

Turn AI answer claims, retrieved chunks, URLs, scores, and source notes into a claim-source matrix, unsupported-claim report, retrieval replay JSON, heatmap SVG, eval cases, and proof receipt.

Fixtures

Privacy-Safe Eval Fixture Builder

Turn a schema, task, field rules, edge cases, redaction policy, and seed into deterministic synthetic CSV, JSONL fixtures, schema contract, edge cases, PII risk report, and proof receipt.

ARTIFACT PROMISE

Every tool produces a replay, eval, or fixture package that can be reused.

Expected outputs include HTML timelines, JSONL replay files, CSV matrices, source heatmaps, schema contracts, synthetic data, markdown reports, and receipts.