docs: add agent-browser + Compound Engineering patterns

- agent-browser (Vercel Labs): AI-native browser CLI, 12K+ stars, ~82.5% token reduction vs Playwright MCP. Section in §MCP Browser Tools, eval file added. - Compound Engineering philosophy (Every.to): Plan→Work→Review→Compound loop, docs/solutions/ pattern for cross-PR memory. Section in §CLAUDE.md Memory. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-04 10:14:58 +01:00 · 2026-03-04 10:14:58 +01:00 · 6e1f7a3e3b
commit 6e1f7a3e3b
parent 29e8b62679
2 changed files with 350 additions and 1 deletions
--- a/docs/resource-evaluations/2026-03-04-agent-browser-vercel-labs.md
+++ b/docs/resource-evaluations/2026-03-04-agent-browser-vercel-labs.md
@ -0,0 +1,184 @@
+# agent-browser (Vercel Labs) - Resource Evaluation
+
+**Evaluated**: 2026-03-04
+**Source**: https://github.com/vercel-labs/agent-browser
+**Type**: CLI Tool (Headless Browser Automation for AI Agents)
+**License**: MIT
+**Stars**: 12,100+ (as of 2026-03-04)
+**Status**: Active development — v0.15.0 (February 2026), rapid release cycle
+
+---
+
+## Executive Summary
+
+**Final Score**: **5/5 (CRITICAL)**
+
+agent-browser is a headless browser CLI built specifically for AI agents. Launched by Vercel Labs in January 2026, it uses Playwright under the hood but abstracts all complexity for LLM-native workflows. Key differentiator: 82.5% fewer tokens consumed vs Playwright MCP for identical test scenarios, thanks to compact element references (`@e1`, `@e2`) and minimal actionable output. Written in Rust for sub-millisecond startup. Directly relevant to the guide's testing and agentic sections — fills a documented gap at line 10510 where Playwright MCP is the only browser automation tool listed.
+
+---
+
+## Resource Overview
+
+### Key Features
+
+| Feature | Details |
+|---------|---------|
+| **Implementation** | Rust (primary, sub-ms startup) + Node.js fallback |
+| **Protocol** | Chrome DevTools Protocol (CDP) via Playwright |
+| **Browsers** | Chromium, Firefox, WebKit |
+| **Token reduction** | -82.5% vs Playwright MCP (documented benchmark, Pulumi blog 2026-03-03) |
+| **Element refs** | Stable `@e1`, `@e2` format (vs XPath/CSS selectors) |
+| **Security (v0.15.0)** | Auth vaults, domain allowlists, action policies, output length limits |
+| **Multi-session** | Isolated browser instances with separate cookies/storage/auth |
+| **Session persistence** | Save/restore state with AES-256-GCM encryption (`--session-name`) |
+| **Browser streaming** | WebSocket live preview for "pair browsing" (human + agent) |
+| **AI integrations** | Native support for Claude, Gemini, Cursor, GitHub Copilot |
+
+### Core Capabilities
+
+- Navigation, click, type, scroll, screenshot, PDF generation
+- Accessibility tree snapshots (optimized for LLM processing)
+- Visual pixel diffs against baselines
+- Accessibility tree diffs with customizable depth and selectors
+- Network interception and mocking
+- Cookie and storage management
+- Device emulation, geolocation simulation
+- Keyboard/mouse control, dialog handling
+- JavaScript evaluation, console capture
+- Performance tracing and profiling
+
+### Release Velocity
+
+v0.10.0 → v0.15.0 between mid-February and late February 2026. New in recent releases: advanced profiling, config files, enhanced device emulation, auth vaults, domain allowlists.
+
+---
+
+## Evaluation
+
+### Score: 5/5 (CRITICAL)
+
+**Justification**:
+
+1. **Direct gap**: Guide documents Playwright MCP at line 10510 with 4 tools and 3 use cases only. agent-browser is the purpose-built successor for agentic workflows.
+
+2. **Token efficiency is the primary driver**: The guide's core mission includes token optimization. -82.5% vs Playwright MCP on identical scenarios is a concrete, documented metric (not marketing).
+
+3. **Ralph Wiggum Loop pattern**: The Pulumi blog (2026-03-03) documents the self-verifying agent pattern that maps perfectly to the guide's agentic workflows sections. Build → Deploy → Verify → Iterate, all autonomous.
+
+4. **12,100+ stars in < 3 months**: Adoption signal at the level of the fastest-growing dev tools of 2025-2026.
+
+5. **RPA displacement**: Community consensus (Slack discussions, "10 Best Agentic Browsers" Bright Data 2026-02-04) positions agent-browser as the tool killing traditional RPA for web workflows.
+
+### Caveats
+
+- **Not a Playwright replacement for traditional E2E suites**: Still designed for agentic workflows, not drop-in for Playwright test runners (jest-playwright, etc.)
+- **Anti-bot wall unchanged**: IP reputation, session behavior, scroll patterns, user agents — none of this changes. Browserbase-type services still needed for external site scraping.
+- **Early stage**: Security features (v0.15.0) are new. Production hardening ongoing.
+
+---
+
+## Comparison: agent-browser vs Playwright MCP
+
+| Dimension | Playwright MCP | agent-browser |
+|-----------|---------------|---------------|
+| Primary audience | Developers (tests) | AI agents |
+| Output verbosity | High (full DOM) | Minimal (actionable only) |
+| Token usage | Baseline | **-82.5%** |
+| Element references | XPath/CSS | `@e1`, `@e2` (stable) |
+| Implementation | Node.js | **Rust** (sub-ms startup) |
+| Session persistence | No | Yes (AES-256-GCM) |
+| Multi-session | No | Yes |
+| Security controls | None | Auth vaults, allowlists |
+| Browser streaming | No | Yes (experimental) |
+| Self-verifying agents | Awkward | **Native pattern** |
+
+---
+
+## The Ralph Wiggum Loop
+
+The core pattern agent-browser enables for agentic workflows:
+
+```
+Build → Deploy → [agent-browser verifies] → Fix → Repeat
+```
+
+1. Agent writes code (feature, component, fix)
+2. Deploys (Vercel, Pulumi, or any target)
+3. Launches agent-browser autonomously
+4. Navigates to deployed URL, tests scenarios
+5. If failure: agent reads snapshot, fixes code, re-deploys
+6. No human in loop until verification passes
+
+Documented in production at Pulumi (2026-03-03) across 6 test scenarios: homepage load, URL shortening, dashboard view, analytics navigation, analytics overview, date filter.
+
+---
+
+## Use Cases for Claude Code Guide
+
+1. **E2E testing in agentic coding loops** (primary: replaces manual Playwright scripts)
+2. **Self-verifying deployments** (Ralph Wiggum Loop)
+3. **Observability feedback loops** (screenshot + accessibility diffs on each deploy)
+4. **Form automation** for AI agents needing to interact with web UIs
+5. **Visual regression testing** with pixel diffs against baselines
+
+---
+
+## Gap Analysis
+
+### Current Guide Coverage (Before Integration)
+
+| Section | Coverage |
+|---------|---------|
+| Playwright MCP (line 10510) | 4 tools, 3 use cases, no comparison |
+| agent-browser mention | Line 6602 (agentskills `allowed-tools` example only) |
+| Self-verifying agents pattern | ❌ Not documented |
+| Token comparison browser tools | ❌ Not documented |
+| Ralph Wiggum Loop | ❌ Not documented |
+
+### Post-Integration Target
+
+- New subsection: "agent-browser (Vercel Labs)" after Playwright MCP (line ~10527)
+- Comparison table: agent-browser vs Playwright MCP
+- Ralph Wiggum Loop workflow example
+- When to use which (decision guide)
+- Update reference.yaml with line pointers
+
+---
+
+## Integration Details
+
+### Placement
+
+**Section**: MCP Servers Ecosystem → Browser Automation
+**After**: `### Playwright (Browser Automation)` (line 10510)
+**Words added**: ~350 words
+**Tables**: 2 (features, comparison)
+**Code snippets**: 1 (installation)
+
+### reference.yaml keys to add
+
+```yaml
+agent_browser: "guide/ultimate-guide.md:XXXX"
+agent_browser_vs_playwright: "comparison table"
+ralph_wiggum_loop: "self-verifying agent pattern"
+agent_browser_repo: "https://github.com/vercel-labs/agent-browser"
+agent_browser_score: "5/5 CRITICAL - 2026-03-04"
+```
+
+---
+
+## Sources
+
+| Source | Type | Date |
+|--------|------|------|
+| [vercel-labs/agent-browser](https://github.com/vercel-labs/agent-browser) | Official repo | 2026-01-13 launch |
+| [Pulumi: Self-Verifying AI Agents](https://www.pulumi.com/blog/self-verifying-ai-agents-vercels-agent-browser-in-the-ralph-wiggum-loop/) | Production case study | 2026-03-03 |
+| [Releasebot: agent-browser releases](https://releasebot.io/updates/vercel-labs/agent-browser) | Release notes | 2026-02-22 to 03-03 |
+| [Bright Data: 10 Best Agentic Browsers](https://brightdata.com/blog/ai/best-agent-browsers) | Market analysis | 2026-02-04 |
+| [Towards AI: Vercel Solved Browser Automation](https://pub.towardsai.net/vercel-just-solved-browser-automation-for-ai-agents-b3414ebdb4d7) | Analysis | 2026-01-22 |
+| [aibase.com: Vercel Launches Agent Browser](https://news.aibase.com/news/24556) | News | 2026-01-13 |
+| Community Slack discussions | Primary source | 2026-03-04 |
+
+---
+
+**End of Evaluation**