* Fix blank terminal after split operations and add visual tests ## Blank Terminal Fix - Add `needsRefreshAfterWindowChange` flag in GhosttyTerminalView - Force terminal refresh when view is added to window, even if size unchanged - Add `ghostty_surface_refresh()` call in attachToView for same-view reattachment - Add debug logging for surface attachment lifecycle (DEBUG builds only) ## Bonsplit Migration - Add bonsplit as local Swift package (vendor/bonsplit submodule) - Replace custom SplitTree with BonsplitController - Add Panel protocol with TerminalPanel and BrowserPanel implementations - Add SidebarTab as main tab container with BonsplitController - Remove old Splits/ directory (SplitTree, SplitView, TerminalSplitTreeView) ## Visual Screenshot Tests - Add test_visual_screenshots.py for automated visual regression testing - Uses in-app screenshot API (CGWindowListCreateImage) - no screen recording needed - Generates HTML report with before/after comparisons - Tests: splits, browser panels, focus switching, close operations, rapid cycles - Includes annotation fields for easy feedback ## Browser Shortcut (⌘⇧B) - Add keyboard shortcut to open browser panel in current pane - Add openBrowser() method to TabManager - Add shortcut configuration in KeyboardShortcutSettings ## Screenshot Command - Add 'screenshot' command to TerminalController for in-app window capture - Returns OK with screenshot ID and path ## Other - Add tests/visual_output/ and tests/visual_report.html to .gitignore * Add browser title subscription and set tab height to 30px - Subscribe to BrowserPanel.$pageTitle changes to update bonsplit tabs - Update tab titles in real-time as page navigation occurs - Clean up subscriptions when panels are removed - Set bonsplit tab bar and tab height to 30px (in submodule) * Fix socket API regressions in list_surfaces, list_bonsplit_tabs, focus_pane - list_surfaces: Remove [terminal]/[browser] suffix to keep UUID-only format that clients and tests expect for parsing - list_bonsplit_tabs --pane: Properly look up pane by UUID instead of creating a new PaneID (requires bonsplit PaneID.id to be public) - focus_pane: Accept both UUID strings and integer indices as documented * Fix browser panel stability and keyboard shortcuts - Prevent WKWebView focus lifecycle crashes during split/view reshuffles - Match bracket shortcuts via keyCode (Cmd+Shift+[ / ], Cmd+Ctrl+[ / ]) - Support Ghostty config goto_split:* keybinds when WebView is focused - Add focus_webview/is_webview_focused socket commands and regression tests - Rename SidebarTab to Workspace and update docs * Make ctrl+enter keybind test skippable Skip when the Ghostty keybind isn't configured or when osascript can't send keystrokes (no Accessibility permission), so VM runs stay green. * Auto-focus browser omnibar when blank When a browser surface is focused but no URL is loaded yet, focus the address bar instead of the WKWebView. * Stabilize socket surface indexing * Focus browser omnibar escape; add webview keybind UI tests - Escape in omnibar now returns focus to WKWebView\n- Add UI tests for Cmd+Ctrl+H pane navigation with WebKit focused (including Ghostty config)\n- Avoid flaky element screenshots in UpdatePillUITests on the UTM VM * Fix browser drag-to-split blanks and socket parsing * Fix webview-focused shortcuts and stabilize browser splits - Match ctrl/shift shortcuts by keyCode where needed (Ctrl+H, bracket keys) - Load Ghostty goto_split triggers reliably and refresh on config load - Add debug socket helpers: set_shortcut + simulate_shortcut for tests - Convert browser goto_split/keybind tests to socket-based injection (no osascript) - Bump bonsplit for drag-to-split fixes * Fix split layout collapse and harden socket pane APIs * Stabilize OSC 99 notification test timing * Fix terminal focus routing after split reparent * Support simulate_shortcut enter for focus routing test * Stabilize terminal focus routing test * Fix frozen new terminal tabs after many splits * Fix frozen new terminal tabs after splits * Fix terminal freeze on launch/new tabs * Update ghostty submodule * Fix terminal focus/render stalls after split churn * Fix nested split collapsing existing pane * Fix nested split collapse + stabilize new-surface focus * Update bonsplit submodule * Fix SIGINT test flake * Remove bonsplit tab-switch crossfade * Remove PROJECTS.md * Remove bonsplit tab selection animation * Ignore generated test reports * Middle click closes tab * Revert unintended .gitignore change * Fix build after main merge * Revert "Fix build after main merge" This reverts commit 16bf9816d0856b5385d52f886aa5eb50f3c9d9a4. * Revert "Merge remote-tracking branch 'origin/main' into fix/blank-terminal-and-visual-tests" This reverts commit 7c20fb53fd71fea7a19a3673f2dd73e5f0c783c4, reversing changes made to 0aff107d787bc9d8bbc28220090b4ca7af72e040. * Remove tab close fade animation * Use terminal.fill icon * Make terminal tab icon smaller * Match browser globe tab icon size * Bonsplit: tab min width 48 and tighter close button * Bonsplit: smaller tab title font * Show unread notification badge in bonsplit tabs and improve UI polish Sync unread notification state to bonsplit tab badges (blue dot). Improve EmptyPanelView with Terminal/Browser buttons and shortcut hints. Add tooltips to close tab button and search overlay buttons. * Fix reload.sh single-instance safety check on macOS Replace GNU-only `ps -o etimes=` with portable `ps -o etime=` and parse the dd-hh:mm:ss format manually for macOS compatibility. * Centralize keyboard shortcut definitions into Action enum Replace per-shortcut boilerplate with a single Action enum that holds the label, defaults key, and default binding for each shortcut. All call sites now use shortcut(for:). Settings UI is data-driven via ForEach(Action.allCases). Titlebar tooltips update dynamically when shortcuts are changed. Remove duplicate .keyboardShortcut() modifiers from menu items that are already handled by the event monitor. * Fix WKWebView consuming app menu shortcuts and close panel confirmation Add CmuxWebView subclass that routes key equivalents through the main menu before WebKit, so Cmd+N/Cmd+W/tab switching work when a browser pane is focused. Fix Cmd+W close-panel path: bypass Bonsplit delegate gating after the user confirms the running-process dialog by tracking forceCloseTabIds. Add unit tests (CmuxWebViewKeyEquivalentTests) and UI test scaffolding (MenuKeyEquivalentRoutingUITests) with a new cmux-unit Xcode scheme. * Update CLAUDE.md and PROJECTS.md with recent changes CLAUDE.md: enforce --tag for reload commands, add cleanup safety rules. PROJECTS.md: log notification badge, reload.sh fix, Cmd+W fix, WebView key equiv fix, and centralized shortcuts work. * Keep selection index stable on close * Add concepts page documenting terminology hierarchy New docs page explaining Window > Workspace > Pane > Surface > Panel hierarchy with aligned ASCII diagram. Updated tabs.mdx and splits.mdx to use consistent terminology (workspace instead of tab, surface instead of panel) and corrected outdated CLI command references. * Update bonsplit submodule * WIP: improve split close stability and UI regressions * Close terminal panel on child exit; hide terminal dirty dot * Fix split close/focus regressions and stabilize UI tests * Add unread Dock/Cmd+Tab badge with settings toggle * Fix browser-surface shortcuts and Cmd+L browser opening * Snapshot current workspace state before regression fixes * Update bonsplit submodule snapshot * Stabilize split-close regression capture and sidebar resize assertions * Change default Show Notifications shortcut from Cmd+Shift+I to Cmd+I * Fix update check readiness race, enable release update logging, and improve checking spinner * Restore terminal file drop, fix browser omnibar click focus, and add panel workspace ID mutation for surface moves * Add Cmd+digit workspace hints, titlebar shortcut pills, sidebar drag-reorder, and workspace placement settings * Add v2 browser automation API, surface move/reorder commands, and short-handle ref system to TerminalController * Add CLI browser command surface, --id-format flag, and move/reorder commands * Extend test clients with move/reorder APIs, ref-handle support, and increased timeouts * Harden test runner scripts with deterministic builds, retry logic, and robust socket readiness * Stabilize existing test suites with focus-wait helpers, increased timeouts, and API shape updates * Add terminal file drop e2e regression test * Add v2 browser API, CLI ref resolution, and surface move/reorder test suites * Add unit tests for shortcut hints, workspace reorder, drop planner, and update UI test stabilization * Add cmux-debug-windows skill with snapshot script and agent config * Update project docs: mark browser parity and move/reorder phases complete, add parallel agent workflow guidelines * Update bonsplit submodule: re-entrant setPosition guard, tab shortcut hints, and moveTab/reorderTab API * Add browser agent UX improvements: snapshot refs, placement reuse, diagnostics, and skill docs - Upgrade browser.snapshot to emit accessibility tree text with element refs (eN) - Add right-sibling pane reuse policy for browser.open_split placement - Add rich not_found diagnostics with retry logic for selector actions - Support --snapshot-after for post-action verification on mutating commands - Allow browser fill with empty text for clearing inputs - Default CLI --id-format to refs-first (UUIDs opt-in via --id-format uuids|both) - Format legacy new-pane/new-surface output with short surface refs - Add skills/cmuxterm-browser/ and skills/cmuxterm/ end-user skill docs - Add regression tests for placement policy, snapshot refs, diagnostics, and ID defaults * Update bonsplit submodule: keep raster favicons in color when inactive
16 KiB
16 KiB
Agent-Browser Port Spec
Last updated: February 13, 2026
Source inventory snapshot: vercel-labs/agent-browser @ 03a8cb9
This document tracks implemented behavior and remaining parity gaps for the cmux browser port.
Goals
- Provide an LLM-friendly browser automation API in cmux with stable handles.
- Keep v1 CLI/socket behavior working while v2 reaches full parity.
- Port
agent-browsercommand surface (where meaningful forWKWebView). - Ensure move/reorder operations preserve
surface_ididentity. - Rebuild/port tests so both v1 and v2 suites pass before deprecating v1.
Validation Status
As of February 12, 2026:
./scripts/run-tests-v1.shpasses oncmux-vm../scripts/run-tests-v2.shpasses oncmux-vm.- Browser parity suites passing in v2:
test_browser_api_comprehensive.py,test_browser_api_p0.py,test_browser_api_extended_families.py,test_browser_api_unsupported_matrix.py, andtest_browser_cli_agent_port.py. - Visual suite note:
tests/test_visual_screenshots.pyandtests_v2/test_visual_screenshots.pyboth report D12 (Nested: Close Top of T-shape) as a known non-blocking VM failure when it reproduces (VIEW_DETACHED).
Concepts (Canonical Terms)
window: native macOS window.workspace: sidebar entry within a window (often called "tab" in UI).pane: split region inside a workspace.surface: tab within a pane (terminal or browser). This is the primary automation target.panel: internal implementation term; CLI/API should prefersurface.
Terminology decision:
- Public v2 API and new CLI docs should standardize on
surfaceandpane. - Keep
--panelas compatibility alias in CLI until v1 is retired.
Self-Identify Requirement
system.identify is the canonical "where am I?" call for agents and should remain first-class.
Required response fields for agent workflows:
focused.window_idfocused.workspace_idfocused.pane_idfocused.surface_idcallervalidation result when caller context is supplied
Recommended extension for browser workflows:
focused.surface_typefocused.browser.urlfocused.browser.titlefocused.browser.loading
Agent-Browser Command Inventory
Top-Level CLI Verbs (from cli/src/commands.rs)
open|goto|navigatebackforwardreloadclickdblclickfilltypehoverfocuscheckuncheckselectdraguploaddownloadpress|keykeydownkeyupscrollscrollintoview|scrollintowaitscreenshotpdfsnapshotevalclose|quit|exitconnectgetisfindmousesetnetworkstoragecookiestabwindowframedialogtracerecordconsoleerrorshighlightstatetapswipedevice
CLI Subcommands
get:text|html|value|attr|url|title|count|box|stylesis:visible|enabled|checkedfind:role|text|label|placeholder|alt|title|testid|first|last|nthmouse:move|down|up|wheelset:viewport|device|geo|geolocation|offline|headers|credentials|auth|medianetwork:route|unroute|requestsstorage:local|session+get|set|clearcookies: default get, plusset|cleartab: default list, plusnew|list|close|<index>window:newframe:<selector>|maindialog:accept|dismisstrace:start|stoprecord:start|stop|restartstate:save|loaddevice:list
Global Flags
--json--full|-f--headed--debug--session--headers--executable-path--extension(repeatable)--cdp--profile--state--proxy--proxy-bypass--args--user-agent-p|--provider--ignore-https-errors--allow-file-access--device
Protocol Actions in src/protocol.ts
Counts:
- total actions: 125
- directly emitted by CLI parser: 93
- protocol-only (not directly emitted by CLI parser): 32
Protocol-only action names:
addinitscriptaddscriptaddstylebringtofrontclearclipboardcontentdispatchevalhandleexposehar_starthar_stopinnertextinput_keyboardinput_mouseinput_touchinserttextkeyboardlocalemultiselectpausepermissionsresponsebodyscreencast_startscreencast_stopselectallsetcontentsetvaluetimezoneuseragentvideo_startvideo_stop
cmux Target API (v2)
Already Present in cmux
system.pingsystem.capabilitiessystem.identifywindow.list|current|focus|create|closeworkspace.list|create|select|current|close|move_to_windowpane.list|focus|surfaces|createsurface.list|focus|split|create|close|drag_to_split|refresh|health|send_text|send_key|trigger_flashbrowser.open_split|navigate|back|forward|reload|url.get|focus_webview|is_webview_focused- notification methods and debug/test methods
New Browser Parity Method Families (Proposed)
P0 (core parity for daily automation):
browser.snapshotbrowser.evalbrowser.waitbrowser.clickbrowser.dblclickbrowser.typebrowser.fillbrowser.press|keydown|keyupbrowser.hover|focusbrowser.check|uncheckbrowser.selectbrowser.scroll|scroll_into_viewbrowser.get.*(url|title|text|html|value|attr|count|box|styles)browser.is.*(visible|enabled|checked)browser.screenshotbrowser.focus_webviewandbrowser.is_webview_focused(already present, keep)
P1 (important but not blocking initial parity):
browser.find.*locators (role|text|label|placeholder|alt|title|testid|nth|first|last)browser.frame.selectbrowser.frame.mainbrowser.dialog.respondbrowser.download.waitbrowser.tab.*compatibility aliases mapped to cmux surfacesbrowser.console.listbrowser.errors.listbrowser.highlightbrowser.state.save|load(browser state in cmux context)
P2 (advanced parity / optional):
- network interception/mocking equivalents (
route|unroute|requests|responsebody) - emulation/settings (
viewport|media|offline|geolocation|permissions|headers|credentials|useragent|locale|timezone|device) - trace/video/screencast/har equivalents
- script injection utilities (
addinitscript|addscript|addstyle|dispatch|expose|evalhandle) - raw input device injection (
input_mouse|input_keyboard|input_touch)
Object/Handle Semantics
- stable handles:
window_id,workspace_id,pane_id,surface_id - browser refs (
@e1) are session-local and ephemeral - move/reorder must preserve
surface_id - responses may include
indexfor debugging/order, but requests should accept IDs
CLI Spec (Proposed)
Primary form:
cmux browser --surface <surface-id> <agent-browser-style-command...>
Shorthand:
cmux browser <surface-id> <agent-browser-style-command...>
Agent discovery:
cmux identify
cmux capabilities
cmux browser identify --surface <surface-id> # wrapper over system.identify + browser fields
Flash:
cmux trigger-flash [--workspace <id>] [--surface <id>]
Compatibility:
- Keep v1 commands.
- Add v1->v2 shim for migrated browser/surface commands.
- Keep
--panelas alias for--surfaceduring migration.
Move/Reorder Spec (Required)
Required capabilities:
- reorder surfaces within a pane
- move surfaces between panes in same workspace
- move surfaces across workspaces
- move surfaces across windows
- reorder workspaces within window
Proposed methods:
surface.movewithsurface_id+ destination (pane_idorworkspace_id/window_id) + placement (before_surface_id|after_surface_id|start|end)surface.reorderwithsurface_id+ sibling anchor (before_surface_id|after_surface_id)workspace.reorderwithworkspace_id+ anchor (before_workspace_id|after_workspace_id)
Hard invariant:
surface_idmust remain unchanged after all move/reorder operations.
Comprehensive TODO
Phase 0: Contract + Routing
- Lock method names/payload schemas for all new
browser.*methods. - Add schema validation for each new method with strict error codes (
invalid_params,not_found,invalid_state). - Add
browsercommand group inCLI/cmux.swiftthat accepts agent-browser-style command grammar. - Add
--surfacemandatory targeting (with fallback fromsystem.identifywhen explicitly desired). - Add consistent JSON output mode for all browser commands.
- Implement short-ref allocator and resolver for
window/pane/workspace/surface(window:N,workspace:N,pane:N,surface:N). - Add
--id-format refs|uuids|bothacross relevant CLI commands (--jsondefault refs, plain-text default refs). - Ensure browser placement APIs always return decision-rich metadata (resolved target pane, created splits, resulting handles).
Phase 1: Core Browser Parity (P0)
- Implement
browser.snapshot(with refs). - Implement
browser.eval. - Implement
browser.waitvariants: selector, timeout, URL pattern, load state, function, text. - Implement click family:
click,dblclick,hover,focus. - Implement input family:
type,fill,press,keydown,keyup. - Implement checkbox/select family:
check,uncheck,select. - Implement scrolling family:
scroll,scroll_into_view. - Implement getters: text/html/value/attr/url/title/count/box/styles.
- Implement state checks: visible/enabled/checked.
- Implement screenshots (surface/full-page where feasible).
Phase 2: Locator + Session Parity (P1)
- Implement
browser.find.role. - Implement
browser.find.text. - Implement
browser.find.label. - Implement
browser.find.placeholder. - Implement
browser.find.alt. - Implement
browser.find.title. - Implement
browser.find.testid. - Implement
browser.find.nth|first|last. - Implement frame context switching (
frame.select,frame.main). - Implement dialog handling (
accept,dismiss, optional prompt text). - Implement download waiting.
- Implement console/error buffers and retrieval.
- Implement highlight helper.
- Implement browser state save/load format.
Phase 3: Move/Reorder + Window/Workspace Integration
- Implement
surface.movewith handle-based destination rules. - Implement
surface.reorderwithin pane. - Implement cross-workspace surface moves.
- Implement cross-window surface moves.
- Implement
workspace.reorder. - Add CLI commands for tab/surface reordering and moving (
move-surface,reorder-surface,reorder-workspace). - Add response payloads that confirm final
window_id/workspace_id/pane_id/surface_id. - Add explicit invariants tests for
surface_idstability.
Phase 4: Advanced/Optional Parity (P2)
- Evaluate feasibility of request interception/mocking in
WKWebView; implement supported subset. - Add emulation settings that are feasible in
WKWebView. - Add trace/recording equivalents where practical.
- Add script/style injection helpers.
- Document unsupported commands with explicit error
not_supported.
Phase 5: Compatibility + Migration
- Add v1-to-v2 shim for migrated command families.
- Keep existing v1 behavior unchanged while shim is active.
- Document v1/v2 mapping table for all browser/topology commands.
- Add deprecation warnings only after parity + test completion.
Phase 6: Docs + Examples
- Update
docs/v2-api-migration.mdwith browser parity status. - Add dedicated browser automation doc in
docs-site. - Add examples for LLM workflow: identify -> choose surface -> snapshot -> act -> verify.
- Add explicit "surface vs pane vs workspace vs window" section to CLI docs.
Test Port Plan (Comprehensive)
Port Targets from agent-browser
src/browser.test.ts-> ported/adapted into:tests_v2/test_browser_api_p0.pytests_v2/test_browser_api_comprehensive.pytests_v2/test_browser_api_unsupported_matrix.py
src/actions.test.ts-> adapted negative coverage intests_v2/test_browser_api_comprehensive.py(invalid_params,not_found,timeout).src/protocol.test.ts-> adapted browser command/shape validation intests_v2/test_browser_api_unsupported_matrix.pyand existingCLI/cmux.swiftcommand grammar checks.test/file-access.test.tsandtest/launch-options.test.ts-> partially applicable toWKWebView; currently tracked as follow-up parity work (not blocking current browser method coverage).src/daemon.test.ts,src/stream-server.test.ts,test/serverless.test.ts,src/ios-manager.test.ts-> out-of-scope for cmux browser parity (different transport/runtime).
Implemented cmux Browser Suites
tests_v2/test_browser_api_p0.pytests_v2/test_browser_api_comprehensive.pytests_v2/test_browser_api_unsupported_matrix.pytests_v2/test_browser_goto_split.pytests_v2/test_browser_panel_stability.pytests_v2/test_browser_custom_keybinds.py
Test Design Rules
- Prefer deterministic local fixtures (embedded HTML or local HTTP server), not public websites.
- Every command gets at least one positive and one negative test.
- Every handle-accepting API gets tests for UUID target and index-compat shim target.
- Every move/reorder test asserts
surface_idstability pre/post operation. - Browser tests must verify behavior from both focused and unfocused webview states.
- Self-identify tests must validate
focusedandcallerfields.
Migration Gate Criteria
- New browser parity tests in
tests_v2/pass. - Existing v2 regression suites still pass.
- v1 suites still pass with shim active.
- No regressions in existing window/workspace/surface workflows.
Planned verification commands at implementation completion:
ssh cmux-vm 'cd /Users/cmux/GhosttyTabs && ./scripts/run-tests-v2.sh'ssh cmux-vm 'cd /Users/cmux/GhosttyTabs && ./scripts/run-tests-v1.sh'
Decision Log (Locked - February 12, 2026)
cmux browser tab ...maps to browsersurfacetabs only (no separate workspace-level tab meaning insidebrowsernamespace).- Default browser placement without explicit target is caller-relative: reuse the nearest right sibling pane; if none exists, split right from the caller pane.
- Deeply nested layouts use local split ancestry: choose the nearest right sibling leaf in the caller's subtree path and avoid reshuffling unrelated panes.
- Network parity target is full parity (not block-only phase).
- Output shape is cmux-native overall, but
browser.snapshotand selectornot_founddiagnostics intentionally mirror agent-browser semantics for agent usability. - ID model accepts UUIDs and short refs.
- Short ref format uses full words and colon:
surface:N,pane:N,workspace:N,window:N. - Short refs are global per daemon, monotonic, and never reused until daemon restart.
- Plain-text CLI output defaults to short refs.
- JSON output defaults to short refs (UUIDs available via
--id-format uuids|both). - CLI supports
--id-format refs|uuids|bothfor output shaping. - Browser create/move commands should expose enough placement/result metadata for agents to make deterministic follow-up decisions.
- Reuse behavior is implicit by default (caller-relative right-pane reuse); explicit handles can still force deterministic targeting.
browser fillaccepts empty text and treats it as a clear operation.- Mutating browser actions can opt into post-action verification snapshots via
snapshot_after(--snapshot-afterin CLI), returningpost_action_snapshot(+ refs/title/url). - Legacy
new-pane/new-surfaceplain output prefers shortsurface:Nrefs under default CLI ID formatting.
Remaining Open Decisions
- Unsupported command policy: strict
not_supportederrors vs best-effort fallback for commands that cannot be implemented onWKWebViewwith correct semantics. - Whether to expose protocol-only agent-browser actions in first public release of
cmux browseror gate them behind a second rollout phase.