* Fix blank terminal after split operations and add visual tests ## Blank Terminal Fix - Add `needsRefreshAfterWindowChange` flag in GhosttyTerminalView - Force terminal refresh when view is added to window, even if size unchanged - Add `ghostty_surface_refresh()` call in attachToView for same-view reattachment - Add debug logging for surface attachment lifecycle (DEBUG builds only) ## Bonsplit Migration - Add bonsplit as local Swift package (vendor/bonsplit submodule) - Replace custom SplitTree with BonsplitController - Add Panel protocol with TerminalPanel and BrowserPanel implementations - Add SidebarTab as main tab container with BonsplitController - Remove old Splits/ directory (SplitTree, SplitView, TerminalSplitTreeView) ## Visual Screenshot Tests - Add test_visual_screenshots.py for automated visual regression testing - Uses in-app screenshot API (CGWindowListCreateImage) - no screen recording needed - Generates HTML report with before/after comparisons - Tests: splits, browser panels, focus switching, close operations, rapid cycles - Includes annotation fields for easy feedback ## Browser Shortcut (⌘⇧B) - Add keyboard shortcut to open browser panel in current pane - Add openBrowser() method to TabManager - Add shortcut configuration in KeyboardShortcutSettings ## Screenshot Command - Add 'screenshot' command to TerminalController for in-app window capture - Returns OK with screenshot ID and path ## Other - Add tests/visual_output/ and tests/visual_report.html to .gitignore * Add browser title subscription and set tab height to 30px - Subscribe to BrowserPanel.$pageTitle changes to update bonsplit tabs - Update tab titles in real-time as page navigation occurs - Clean up subscriptions when panels are removed - Set bonsplit tab bar and tab height to 30px (in submodule) * Fix socket API regressions in list_surfaces, list_bonsplit_tabs, focus_pane - list_surfaces: Remove [terminal]/[browser] suffix to keep UUID-only format that clients and tests expect for parsing - list_bonsplit_tabs --pane: Properly look up pane by UUID instead of creating a new PaneID (requires bonsplit PaneID.id to be public) - focus_pane: Accept both UUID strings and integer indices as documented * Fix browser panel stability and keyboard shortcuts - Prevent WKWebView focus lifecycle crashes during split/view reshuffles - Match bracket shortcuts via keyCode (Cmd+Shift+[ / ], Cmd+Ctrl+[ / ]) - Support Ghostty config goto_split:* keybinds when WebView is focused - Add focus_webview/is_webview_focused socket commands and regression tests - Rename SidebarTab to Workspace and update docs * Make ctrl+enter keybind test skippable Skip when the Ghostty keybind isn't configured or when osascript can't send keystrokes (no Accessibility permission), so VM runs stay green. * Auto-focus browser omnibar when blank When a browser surface is focused but no URL is loaded yet, focus the address bar instead of the WKWebView. * Stabilize socket surface indexing * Focus browser omnibar escape; add webview keybind UI tests - Escape in omnibar now returns focus to WKWebView\n- Add UI tests for Cmd+Ctrl+H pane navigation with WebKit focused (including Ghostty config)\n- Avoid flaky element screenshots in UpdatePillUITests on the UTM VM * Fix browser drag-to-split blanks and socket parsing * Fix webview-focused shortcuts and stabilize browser splits - Match ctrl/shift shortcuts by keyCode where needed (Ctrl+H, bracket keys) - Load Ghostty goto_split triggers reliably and refresh on config load - Add debug socket helpers: set_shortcut + simulate_shortcut for tests - Convert browser goto_split/keybind tests to socket-based injection (no osascript) - Bump bonsplit for drag-to-split fixes * Fix split layout collapse and harden socket pane APIs * Stabilize OSC 99 notification test timing * Fix terminal focus routing after split reparent * Support simulate_shortcut enter for focus routing test * Stabilize terminal focus routing test * Fix frozen new terminal tabs after many splits * Fix frozen new terminal tabs after splits * Fix terminal freeze on launch/new tabs * Update ghostty submodule * Fix terminal focus/render stalls after split churn * Fix nested split collapsing existing pane * Fix nested split collapse + stabilize new-surface focus * Update bonsplit submodule * Fix SIGINT test flake * Remove bonsplit tab-switch crossfade * Remove PROJECTS.md * Remove bonsplit tab selection animation * Ignore generated test reports * Middle click closes tab * Revert unintended .gitignore change * Fix build after main merge * Revert "Fix build after main merge" This reverts commit 16bf9816d0856b5385d52f886aa5eb50f3c9d9a4. * Revert "Merge remote-tracking branch 'origin/main' into fix/blank-terminal-and-visual-tests" This reverts commit 7c20fb53fd71fea7a19a3673f2dd73e5f0c783c4, reversing changes made to 0aff107d787bc9d8bbc28220090b4ca7af72e040. * Remove tab close fade animation * Use terminal.fill icon * Make terminal tab icon smaller * Match browser globe tab icon size * Bonsplit: tab min width 48 and tighter close button * Bonsplit: smaller tab title font * Show unread notification badge in bonsplit tabs and improve UI polish Sync unread notification state to bonsplit tab badges (blue dot). Improve EmptyPanelView with Terminal/Browser buttons and shortcut hints. Add tooltips to close tab button and search overlay buttons. * Fix reload.sh single-instance safety check on macOS Replace GNU-only `ps -o etimes=` with portable `ps -o etime=` and parse the dd-hh:mm:ss format manually for macOS compatibility. * Centralize keyboard shortcut definitions into Action enum Replace per-shortcut boilerplate with a single Action enum that holds the label, defaults key, and default binding for each shortcut. All call sites now use shortcut(for:). Settings UI is data-driven via ForEach(Action.allCases). Titlebar tooltips update dynamically when shortcuts are changed. Remove duplicate .keyboardShortcut() modifiers from menu items that are already handled by the event monitor. * Fix WKWebView consuming app menu shortcuts and close panel confirmation Add CmuxWebView subclass that routes key equivalents through the main menu before WebKit, so Cmd+N/Cmd+W/tab switching work when a browser pane is focused. Fix Cmd+W close-panel path: bypass Bonsplit delegate gating after the user confirms the running-process dialog by tracking forceCloseTabIds. Add unit tests (CmuxWebViewKeyEquivalentTests) and UI test scaffolding (MenuKeyEquivalentRoutingUITests) with a new cmux-unit Xcode scheme. * Update CLAUDE.md and PROJECTS.md with recent changes CLAUDE.md: enforce --tag for reload commands, add cleanup safety rules. PROJECTS.md: log notification badge, reload.sh fix, Cmd+W fix, WebView key equiv fix, and centralized shortcuts work. * Keep selection index stable on close * Add concepts page documenting terminology hierarchy New docs page explaining Window > Workspace > Pane > Surface > Panel hierarchy with aligned ASCII diagram. Updated tabs.mdx and splits.mdx to use consistent terminology (workspace instead of tab, surface instead of panel) and corrected outdated CLI command references. * Update bonsplit submodule * WIP: improve split close stability and UI regressions * Close terminal panel on child exit; hide terminal dirty dot * Fix split close/focus regressions and stabilize UI tests * Add unread Dock/Cmd+Tab badge with settings toggle * Fix browser-surface shortcuts and Cmd+L browser opening * Snapshot current workspace state before regression fixes * Update bonsplit submodule snapshot * Stabilize split-close regression capture and sidebar resize assertions * Change default Show Notifications shortcut from Cmd+Shift+I to Cmd+I * Fix update check readiness race, enable release update logging, and improve checking spinner * Restore terminal file drop, fix browser omnibar click focus, and add panel workspace ID mutation for surface moves * Add Cmd+digit workspace hints, titlebar shortcut pills, sidebar drag-reorder, and workspace placement settings * Add v2 browser automation API, surface move/reorder commands, and short-handle ref system to TerminalController * Add CLI browser command surface, --id-format flag, and move/reorder commands * Extend test clients with move/reorder APIs, ref-handle support, and increased timeouts * Harden test runner scripts with deterministic builds, retry logic, and robust socket readiness * Stabilize existing test suites with focus-wait helpers, increased timeouts, and API shape updates * Add terminal file drop e2e regression test * Add v2 browser API, CLI ref resolution, and surface move/reorder test suites * Add unit tests for shortcut hints, workspace reorder, drop planner, and update UI test stabilization * Add cmux-debug-windows skill with snapshot script and agent config * Update project docs: mark browser parity and move/reorder phases complete, add parallel agent workflow guidelines * Update bonsplit submodule: re-entrant setPosition guard, tab shortcut hints, and moveTab/reorderTab API * Add browser agent UX improvements: snapshot refs, placement reuse, diagnostics, and skill docs - Upgrade browser.snapshot to emit accessibility tree text with element refs (eN) - Add right-sibling pane reuse policy for browser.open_split placement - Add rich not_found diagnostics with retry logic for selector actions - Support --snapshot-after for post-action verification on mutating commands - Allow browser fill with empty text for clearing inputs - Default CLI --id-format to refs-first (UUIDs opt-in via --id-format uuids|both) - Format legacy new-pane/new-surface output with short surface refs - Add skills/cmuxterm-browser/ and skills/cmuxterm/ end-user skill docs - Add regression tests for placement policy, snapshot refs, diagnostics, and ID defaults * Update bonsplit submodule: keep raster favicons in color when inactive
435 lines
16 KiB
Markdown
435 lines
16 KiB
Markdown
# Agent-Browser Port Spec
|
|
|
|
Last updated: February 13, 2026
|
|
Source inventory snapshot: `vercel-labs/agent-browser` @ `03a8cb9`
|
|
|
|
This document tracks implemented behavior and remaining parity gaps for the cmux browser port.
|
|
|
|
## Goals
|
|
|
|
1. Provide an LLM-friendly browser automation API in cmux with stable handles.
|
|
2. Keep v1 CLI/socket behavior working while v2 reaches full parity.
|
|
3. Port `agent-browser` command surface (where meaningful for `WKWebView`).
|
|
4. Ensure move/reorder operations preserve `surface_id` identity.
|
|
5. Rebuild/port tests so both v1 and v2 suites pass before deprecating v1.
|
|
|
|
## Validation Status
|
|
|
|
As of February 12, 2026:
|
|
1. `./scripts/run-tests-v1.sh` passes on `cmux-vm`.
|
|
2. `./scripts/run-tests-v2.sh` passes on `cmux-vm`.
|
|
3. Browser parity suites passing in v2: `test_browser_api_comprehensive.py`, `test_browser_api_p0.py`, `test_browser_api_extended_families.py`, `test_browser_api_unsupported_matrix.py`, and `test_browser_cli_agent_port.py`.
|
|
4. Visual suite note: `tests/test_visual_screenshots.py` and `tests_v2/test_visual_screenshots.py` both report D12 (`Nested: Close Top of T-shape`) as a known non-blocking VM failure when it reproduces (`VIEW_DETACHED`).
|
|
|
|
## Concepts (Canonical Terms)
|
|
|
|
1. `window`: native macOS window.
|
|
2. `workspace`: sidebar entry within a window (often called "tab" in UI).
|
|
3. `pane`: split region inside a workspace.
|
|
4. `surface`: tab within a pane (terminal or browser). This is the primary automation target.
|
|
5. `panel`: internal implementation term; CLI/API should prefer `surface`.
|
|
|
|
Terminology decision:
|
|
- Public v2 API and new CLI docs should standardize on `surface` and `pane`.
|
|
- Keep `--panel` as compatibility alias in CLI until v1 is retired.
|
|
|
|
## Self-Identify Requirement
|
|
|
|
`system.identify` is the canonical "where am I?" call for agents and should remain first-class.
|
|
|
|
Required response fields for agent workflows:
|
|
1. `focused.window_id`
|
|
2. `focused.workspace_id`
|
|
3. `focused.pane_id`
|
|
4. `focused.surface_id`
|
|
5. `caller` validation result when caller context is supplied
|
|
|
|
Recommended extension for browser workflows:
|
|
1. `focused.surface_type`
|
|
2. `focused.browser.url`
|
|
3. `focused.browser.title`
|
|
4. `focused.browser.loading`
|
|
|
|
## Agent-Browser Command Inventory
|
|
|
|
### Top-Level CLI Verbs (from `cli/src/commands.rs`)
|
|
|
|
1. `open|goto|navigate`
|
|
2. `back`
|
|
3. `forward`
|
|
4. `reload`
|
|
5. `click`
|
|
6. `dblclick`
|
|
7. `fill`
|
|
8. `type`
|
|
9. `hover`
|
|
10. `focus`
|
|
11. `check`
|
|
12. `uncheck`
|
|
13. `select`
|
|
14. `drag`
|
|
15. `upload`
|
|
16. `download`
|
|
17. `press|key`
|
|
18. `keydown`
|
|
19. `keyup`
|
|
20. `scroll`
|
|
21. `scrollintoview|scrollinto`
|
|
22. `wait`
|
|
23. `screenshot`
|
|
24. `pdf`
|
|
25. `snapshot`
|
|
26. `eval`
|
|
27. `close|quit|exit`
|
|
28. `connect`
|
|
29. `get`
|
|
30. `is`
|
|
31. `find`
|
|
32. `mouse`
|
|
33. `set`
|
|
34. `network`
|
|
35. `storage`
|
|
36. `cookies`
|
|
37. `tab`
|
|
38. `window`
|
|
39. `frame`
|
|
40. `dialog`
|
|
41. `trace`
|
|
42. `record`
|
|
43. `console`
|
|
44. `errors`
|
|
45. `highlight`
|
|
46. `state`
|
|
47. `tap`
|
|
48. `swipe`
|
|
49. `device`
|
|
|
|
### CLI Subcommands
|
|
|
|
1. `get`: `text|html|value|attr|url|title|count|box|styles`
|
|
2. `is`: `visible|enabled|checked`
|
|
3. `find`: `role|text|label|placeholder|alt|title|testid|first|last|nth`
|
|
4. `mouse`: `move|down|up|wheel`
|
|
5. `set`: `viewport|device|geo|geolocation|offline|headers|credentials|auth|media`
|
|
6. `network`: `route|unroute|requests`
|
|
7. `storage`: `local|session` + `get|set|clear`
|
|
8. `cookies`: default get, plus `set|clear`
|
|
9. `tab`: default list, plus `new|list|close|<index>`
|
|
10. `window`: `new`
|
|
11. `frame`: `<selector>|main`
|
|
12. `dialog`: `accept|dismiss`
|
|
13. `trace`: `start|stop`
|
|
14. `record`: `start|stop|restart`
|
|
15. `state`: `save|load`
|
|
16. `device`: `list`
|
|
|
|
### Global Flags
|
|
|
|
1. `--json`
|
|
2. `--full|-f`
|
|
3. `--headed`
|
|
4. `--debug`
|
|
5. `--session`
|
|
6. `--headers`
|
|
7. `--executable-path`
|
|
8. `--extension` (repeatable)
|
|
9. `--cdp`
|
|
10. `--profile`
|
|
11. `--state`
|
|
12. `--proxy`
|
|
13. `--proxy-bypass`
|
|
14. `--args`
|
|
15. `--user-agent`
|
|
16. `-p|--provider`
|
|
17. `--ignore-https-errors`
|
|
18. `--allow-file-access`
|
|
19. `--device`
|
|
|
|
### Protocol Actions in `src/protocol.ts`
|
|
|
|
Counts:
|
|
1. total actions: 125
|
|
2. directly emitted by CLI parser: 93
|
|
3. protocol-only (not directly emitted by CLI parser): 32
|
|
|
|
Protocol-only action names:
|
|
1. `addinitscript`
|
|
2. `addscript`
|
|
3. `addstyle`
|
|
4. `bringtofront`
|
|
5. `clear`
|
|
6. `clipboard`
|
|
7. `content`
|
|
8. `dispatch`
|
|
9. `evalhandle`
|
|
10. `expose`
|
|
11. `har_start`
|
|
12. `har_stop`
|
|
13. `innertext`
|
|
14. `input_keyboard`
|
|
15. `input_mouse`
|
|
16. `input_touch`
|
|
17. `inserttext`
|
|
18. `keyboard`
|
|
19. `locale`
|
|
20. `multiselect`
|
|
21. `pause`
|
|
22. `permissions`
|
|
23. `responsebody`
|
|
24. `screencast_start`
|
|
25. `screencast_stop`
|
|
26. `selectall`
|
|
27. `setcontent`
|
|
28. `setvalue`
|
|
29. `timezone`
|
|
30. `useragent`
|
|
31. `video_start`
|
|
32. `video_stop`
|
|
|
|
## cmux Target API (v2)
|
|
|
|
### Already Present in cmux
|
|
|
|
1. `system.ping`
|
|
2. `system.capabilities`
|
|
3. `system.identify`
|
|
4. `window.list|current|focus|create|close`
|
|
5. `workspace.list|create|select|current|close|move_to_window`
|
|
6. `pane.list|focus|surfaces|create`
|
|
7. `surface.list|focus|split|create|close|drag_to_split|refresh|health|send_text|send_key|trigger_flash`
|
|
8. `browser.open_split|navigate|back|forward|reload|url.get|focus_webview|is_webview_focused`
|
|
9. notification methods and debug/test methods
|
|
|
|
### New Browser Parity Method Families (Proposed)
|
|
|
|
P0 (core parity for daily automation):
|
|
1. `browser.snapshot`
|
|
2. `browser.eval`
|
|
3. `browser.wait`
|
|
4. `browser.click`
|
|
5. `browser.dblclick`
|
|
6. `browser.type`
|
|
7. `browser.fill`
|
|
8. `browser.press|keydown|keyup`
|
|
9. `browser.hover|focus`
|
|
10. `browser.check|uncheck`
|
|
11. `browser.select`
|
|
12. `browser.scroll|scroll_into_view`
|
|
13. `browser.get.*` (`url|title|text|html|value|attr|count|box|styles`)
|
|
14. `browser.is.*` (`visible|enabled|checked`)
|
|
15. `browser.screenshot`
|
|
16. `browser.focus_webview` and `browser.is_webview_focused` (already present, keep)
|
|
|
|
P1 (important but not blocking initial parity):
|
|
1. `browser.find.*` locators (`role|text|label|placeholder|alt|title|testid|nth|first|last`)
|
|
2. `browser.frame.select`
|
|
3. `browser.frame.main`
|
|
4. `browser.dialog.respond`
|
|
5. `browser.download.wait`
|
|
6. `browser.tab.*` compatibility aliases mapped to cmux surfaces
|
|
7. `browser.console.list`
|
|
8. `browser.errors.list`
|
|
9. `browser.highlight`
|
|
10. `browser.state.save|load` (browser state in cmux context)
|
|
|
|
P2 (advanced parity / optional):
|
|
1. network interception/mocking equivalents (`route|unroute|requests|responsebody`)
|
|
2. emulation/settings (`viewport|media|offline|geolocation|permissions|headers|credentials|useragent|locale|timezone|device`)
|
|
3. trace/video/screencast/har equivalents
|
|
4. script injection utilities (`addinitscript|addscript|addstyle|dispatch|expose|evalhandle`)
|
|
5. raw input device injection (`input_mouse|input_keyboard|input_touch`)
|
|
|
|
### Object/Handle Semantics
|
|
|
|
1. stable handles: `window_id`, `workspace_id`, `pane_id`, `surface_id`
|
|
2. browser refs (`@e1`) are session-local and ephemeral
|
|
3. move/reorder must preserve `surface_id`
|
|
4. responses may include `index` for debugging/order, but requests should accept IDs
|
|
|
|
## CLI Spec (Proposed)
|
|
|
|
Primary form:
|
|
```bash
|
|
cmux browser --surface <surface-id> <agent-browser-style-command...>
|
|
```
|
|
|
|
Shorthand:
|
|
```bash
|
|
cmux browser <surface-id> <agent-browser-style-command...>
|
|
```
|
|
|
|
Agent discovery:
|
|
```bash
|
|
cmux identify
|
|
cmux capabilities
|
|
cmux browser identify --surface <surface-id> # wrapper over system.identify + browser fields
|
|
```
|
|
|
|
Flash:
|
|
```bash
|
|
cmux trigger-flash [--workspace <id>] [--surface <id>]
|
|
```
|
|
|
|
Compatibility:
|
|
1. Keep v1 commands.
|
|
2. Add v1->v2 shim for migrated browser/surface commands.
|
|
3. Keep `--panel` as alias for `--surface` during migration.
|
|
|
|
## Move/Reorder Spec (Required)
|
|
|
|
Required capabilities:
|
|
1. reorder surfaces within a pane
|
|
2. move surfaces between panes in same workspace
|
|
3. move surfaces across workspaces
|
|
4. move surfaces across windows
|
|
5. reorder workspaces within window
|
|
|
|
Proposed methods:
|
|
1. `surface.move` with `surface_id` + destination (`pane_id` or `workspace_id`/`window_id`) + placement (`before_surface_id|after_surface_id|start|end`)
|
|
2. `surface.reorder` with `surface_id` + sibling anchor (`before_surface_id|after_surface_id`)
|
|
3. `workspace.reorder` with `workspace_id` + anchor (`before_workspace_id|after_workspace_id`)
|
|
|
|
Hard invariant:
|
|
1. `surface_id` must remain unchanged after all move/reorder operations.
|
|
|
|
## Comprehensive TODO
|
|
|
|
### Phase 0: Contract + Routing
|
|
|
|
- [x] Lock method names/payload schemas for all new `browser.*` methods.
|
|
- [x] Add schema validation for each new method with strict error codes (`invalid_params`, `not_found`, `invalid_state`).
|
|
- [x] Add `browser` command group in `CLI/cmux.swift` that accepts agent-browser-style command grammar.
|
|
- [x] Add `--surface` mandatory targeting (with fallback from `system.identify` when explicitly desired).
|
|
- [x] Add consistent JSON output mode for all browser commands.
|
|
- [x] Implement short-ref allocator and resolver for `window/pane/workspace/surface` (`window:N`, `workspace:N`, `pane:N`, `surface:N`).
|
|
- [x] Add `--id-format refs|uuids|both` across relevant CLI commands (`--json` default refs, plain-text default refs).
|
|
- [x] Ensure browser placement APIs always return decision-rich metadata (resolved target pane, created splits, resulting handles).
|
|
|
|
### Phase 1: Core Browser Parity (P0)
|
|
|
|
- [x] Implement `browser.snapshot` (with refs).
|
|
- [x] Implement `browser.eval`.
|
|
- [x] Implement `browser.wait` variants: selector, timeout, URL pattern, load state, function, text.
|
|
- [x] Implement click family: `click`, `dblclick`, `hover`, `focus`.
|
|
- [x] Implement input family: `type`, `fill`, `press`, `keydown`, `keyup`.
|
|
- [x] Implement checkbox/select family: `check`, `uncheck`, `select`.
|
|
- [x] Implement scrolling family: `scroll`, `scroll_into_view`.
|
|
- [x] Implement getters: text/html/value/attr/url/title/count/box/styles.
|
|
- [x] Implement state checks: visible/enabled/checked.
|
|
- [x] Implement screenshots (surface/full-page where feasible).
|
|
|
|
### Phase 2: Locator + Session Parity (P1)
|
|
|
|
- [x] Implement `browser.find.role`.
|
|
- [x] Implement `browser.find.text`.
|
|
- [x] Implement `browser.find.label`.
|
|
- [x] Implement `browser.find.placeholder`.
|
|
- [x] Implement `browser.find.alt`.
|
|
- [x] Implement `browser.find.title`.
|
|
- [x] Implement `browser.find.testid`.
|
|
- [x] Implement `browser.find.nth|first|last`.
|
|
- [x] Implement frame context switching (`frame.select`, `frame.main`).
|
|
- [x] Implement dialog handling (`accept`, `dismiss`, optional prompt text).
|
|
- [x] Implement download waiting.
|
|
- [x] Implement console/error buffers and retrieval.
|
|
- [x] Implement highlight helper.
|
|
- [x] Implement browser state save/load format.
|
|
|
|
### Phase 3: Move/Reorder + Window/Workspace Integration
|
|
|
|
- [x] Implement `surface.move` with handle-based destination rules.
|
|
- [x] Implement `surface.reorder` within pane.
|
|
- [x] Implement cross-workspace surface moves.
|
|
- [x] Implement cross-window surface moves.
|
|
- [x] Implement `workspace.reorder`.
|
|
- [x] Add CLI commands for tab/surface reordering and moving (`move-surface`, `reorder-surface`, `reorder-workspace`).
|
|
- [x] Add response payloads that confirm final `window_id/workspace_id/pane_id/surface_id`.
|
|
- [x] Add explicit invariants tests for `surface_id` stability.
|
|
|
|
### Phase 4: Advanced/Optional Parity (P2)
|
|
|
|
- [ ] Evaluate feasibility of request interception/mocking in `WKWebView`; implement supported subset.
|
|
- [ ] Add emulation settings that are feasible in `WKWebView`.
|
|
- [ ] Add trace/recording equivalents where practical.
|
|
- [x] Add script/style injection helpers.
|
|
- [x] Document unsupported commands with explicit error `not_supported`.
|
|
|
|
### Phase 5: Compatibility + Migration
|
|
|
|
- [x] Add v1-to-v2 shim for migrated command families.
|
|
- [x] Keep existing v1 behavior unchanged while shim is active.
|
|
- [ ] Document v1/v2 mapping table for all browser/topology commands.
|
|
- [ ] Add deprecation warnings only after parity + test completion.
|
|
|
|
### Phase 6: Docs + Examples
|
|
|
|
- [x] Update `docs/v2-api-migration.md` with browser parity status.
|
|
- [ ] Add dedicated browser automation doc in `docs-site`.
|
|
- [ ] Add examples for LLM workflow: identify -> choose surface -> snapshot -> act -> verify.
|
|
- [ ] Add explicit "surface vs pane vs workspace vs window" section to CLI docs.
|
|
|
|
## Test Port Plan (Comprehensive)
|
|
|
|
### Port Targets from `agent-browser`
|
|
|
|
1. `src/browser.test.ts` -> ported/adapted into:
|
|
- `tests_v2/test_browser_api_p0.py`
|
|
- `tests_v2/test_browser_api_comprehensive.py`
|
|
- `tests_v2/test_browser_api_unsupported_matrix.py`
|
|
2. `src/actions.test.ts` -> adapted negative coverage in `tests_v2/test_browser_api_comprehensive.py` (`invalid_params`, `not_found`, `timeout`).
|
|
3. `src/protocol.test.ts` -> adapted browser command/shape validation in `tests_v2/test_browser_api_unsupported_matrix.py` and existing `CLI/cmux.swift` command grammar checks.
|
|
4. `test/file-access.test.ts` and `test/launch-options.test.ts` -> partially applicable to `WKWebView`; currently tracked as follow-up parity work (not blocking current browser method coverage).
|
|
5. `src/daemon.test.ts`, `src/stream-server.test.ts`, `test/serverless.test.ts`, `src/ios-manager.test.ts` -> out-of-scope for cmux browser parity (different transport/runtime).
|
|
|
|
### Implemented cmux Browser Suites
|
|
|
|
1. `tests_v2/test_browser_api_p0.py`
|
|
2. `tests_v2/test_browser_api_comprehensive.py`
|
|
3. `tests_v2/test_browser_api_unsupported_matrix.py`
|
|
4. `tests_v2/test_browser_goto_split.py`
|
|
5. `tests_v2/test_browser_panel_stability.py`
|
|
6. `tests_v2/test_browser_custom_keybinds.py`
|
|
|
|
### Test Design Rules
|
|
|
|
1. Prefer deterministic local fixtures (embedded HTML or local HTTP server), not public websites.
|
|
2. Every command gets at least one positive and one negative test.
|
|
3. Every handle-accepting API gets tests for UUID target and index-compat shim target.
|
|
4. Every move/reorder test asserts `surface_id` stability pre/post operation.
|
|
5. Browser tests must verify behavior from both focused and unfocused webview states.
|
|
6. Self-identify tests must validate `focused` and `caller` fields.
|
|
|
|
### Migration Gate Criteria
|
|
|
|
1. New browser parity tests in `tests_v2/` pass.
|
|
2. Existing v2 regression suites still pass.
|
|
3. v1 suites still pass with shim active.
|
|
4. No regressions in existing window/workspace/surface workflows.
|
|
|
|
Planned verification commands at implementation completion:
|
|
1. `ssh cmux-vm 'cd /Users/cmux/GhosttyTabs && ./scripts/run-tests-v2.sh'`
|
|
2. `ssh cmux-vm 'cd /Users/cmux/GhosttyTabs && ./scripts/run-tests-v1.sh'`
|
|
|
|
## Decision Log (Locked - February 12, 2026)
|
|
|
|
1. `cmux browser tab ...` maps to browser `surface` tabs only (no separate workspace-level tab meaning inside `browser` namespace).
|
|
2. Default browser placement without explicit target is caller-relative: reuse the nearest right sibling pane; if none exists, split right from the caller pane.
|
|
3. Deeply nested layouts use local split ancestry: choose the nearest right sibling leaf in the caller's subtree path and avoid reshuffling unrelated panes.
|
|
4. Network parity target is full parity (not block-only phase).
|
|
5. Output shape is cmux-native overall, but `browser.snapshot` and selector `not_found` diagnostics intentionally mirror agent-browser semantics for agent usability.
|
|
6. ID model accepts UUIDs and short refs.
|
|
7. Short ref format uses full words and colon: `surface:N`, `pane:N`, `workspace:N`, `window:N`.
|
|
8. Short refs are global per daemon, monotonic, and never reused until daemon restart.
|
|
9. Plain-text CLI output defaults to short refs.
|
|
10. JSON output defaults to short refs (UUIDs available via `--id-format uuids|both`).
|
|
11. CLI supports `--id-format refs|uuids|both` for output shaping.
|
|
12. Browser create/move commands should expose enough placement/result metadata for agents to make deterministic follow-up decisions.
|
|
13. Reuse behavior is implicit by default (caller-relative right-pane reuse); explicit handles can still force deterministic targeting.
|
|
14. `browser fill` accepts empty text and treats it as a clear operation.
|
|
15. Mutating browser actions can opt into post-action verification snapshots via `snapshot_after` (`--snapshot-after` in CLI), returning `post_action_snapshot` (+ refs/title/url).
|
|
16. Legacy `new-pane`/`new-surface` plain output prefers short `surface:N` refs under default CLI ID formatting.
|
|
|
|
## Remaining Open Decisions
|
|
|
|
1. Unsupported command policy: strict `not_supported` errors vs best-effort fallback for commands that cannot be implemented on `WKWebView` with correct semantics.
|
|
2. Whether to expose protocol-only agent-browser actions in first public release of `cmux browser` or gate them behind a second rollout phase.
|