cmux/scripts/run-tests-v1.sh
Lawrence Chen 50f0dd334d
Fix frozen terminals after split churn (#12)
* Fix blank terminal after split operations and add visual tests

## Blank Terminal Fix
- Add `needsRefreshAfterWindowChange` flag in GhosttyTerminalView
- Force terminal refresh when view is added to window, even if size unchanged
- Add `ghostty_surface_refresh()` call in attachToView for same-view reattachment
- Add debug logging for surface attachment lifecycle (DEBUG builds only)

## Bonsplit Migration
- Add bonsplit as local Swift package (vendor/bonsplit submodule)
- Replace custom SplitTree with BonsplitController
- Add Panel protocol with TerminalPanel and BrowserPanel implementations
- Add SidebarTab as main tab container with BonsplitController
- Remove old Splits/ directory (SplitTree, SplitView, TerminalSplitTreeView)

## Visual Screenshot Tests
- Add test_visual_screenshots.py for automated visual regression testing
- Uses in-app screenshot API (CGWindowListCreateImage) - no screen recording needed
- Generates HTML report with before/after comparisons
- Tests: splits, browser panels, focus switching, close operations, rapid cycles
- Includes annotation fields for easy feedback

## Browser Shortcut (⌘⇧B)
- Add keyboard shortcut to open browser panel in current pane
- Add openBrowser() method to TabManager
- Add shortcut configuration in KeyboardShortcutSettings

## Screenshot Command
- Add 'screenshot' command to TerminalController for in-app window capture
- Returns OK with screenshot ID and path

## Other
- Add tests/visual_output/ and tests/visual_report.html to .gitignore

* Add browser title subscription and set tab height to 30px

- Subscribe to BrowserPanel.$pageTitle changes to update bonsplit tabs
- Update tab titles in real-time as page navigation occurs
- Clean up subscriptions when panels are removed
- Set bonsplit tab bar and tab height to 30px (in submodule)

* Fix socket API regressions in list_surfaces, list_bonsplit_tabs, focus_pane

- list_surfaces: Remove [terminal]/[browser] suffix to keep UUID-only format
  that clients and tests expect for parsing
- list_bonsplit_tabs --pane: Properly look up pane by UUID instead of
  creating a new PaneID (requires bonsplit PaneID.id to be public)
- focus_pane: Accept both UUID strings and integer indices as documented

* Fix browser panel stability and keyboard shortcuts

- Prevent WKWebView focus lifecycle crashes during split/view reshuffles
- Match bracket shortcuts via keyCode (Cmd+Shift+[ / ], Cmd+Ctrl+[ / ])
- Support Ghostty config goto_split:* keybinds when WebView is focused
- Add focus_webview/is_webview_focused socket commands and regression tests
- Rename SidebarTab to Workspace and update docs

* Make ctrl+enter keybind test skippable

Skip when the Ghostty keybind isn't configured or when osascript can't send keystrokes (no Accessibility permission), so VM runs stay green.

* Auto-focus browser omnibar when blank

When a browser surface is focused but no URL is loaded yet, focus the address bar instead of the WKWebView.

* Stabilize socket surface indexing

* Focus browser omnibar escape; add webview keybind UI tests

- Escape in omnibar now returns focus to WKWebView\n- Add UI tests for Cmd+Ctrl+H pane navigation with WebKit focused (including Ghostty config)\n- Avoid flaky element screenshots in UpdatePillUITests on the UTM VM

* Fix browser drag-to-split blanks and socket parsing

* Fix webview-focused shortcuts and stabilize browser splits

- Match ctrl/shift shortcuts by keyCode where needed (Ctrl+H, bracket keys)
- Load Ghostty goto_split triggers reliably and refresh on config load
- Add debug socket helpers: set_shortcut + simulate_shortcut for tests
- Convert browser goto_split/keybind tests to socket-based injection (no osascript)
- Bump bonsplit for drag-to-split fixes

* Fix split layout collapse and harden socket pane APIs

* Stabilize OSC 99 notification test timing

* Fix terminal focus routing after split reparent

* Support simulate_shortcut enter for focus routing test

* Stabilize terminal focus routing test

* Fix frozen new terminal tabs after many splits

* Fix frozen new terminal tabs after splits

* Fix terminal freeze on launch/new tabs

* Update ghostty submodule

* Fix terminal focus/render stalls after split churn

* Fix nested split collapsing existing pane

* Fix nested split collapse + stabilize new-surface focus

* Update bonsplit submodule

* Fix SIGINT test flake

* Remove bonsplit tab-switch crossfade

* Remove PROJECTS.md

* Remove bonsplit tab selection animation

* Ignore generated test reports

* Middle click closes tab

* Revert unintended .gitignore change

* Fix build after main merge

* Revert "Fix build after main merge"

This reverts commit 16bf9816d0856b5385d52f886aa5eb50f3c9d9a4.

* Revert "Merge remote-tracking branch 'origin/main' into fix/blank-terminal-and-visual-tests"

This reverts commit 7c20fb53fd71fea7a19a3673f2dd73e5f0c783c4, reversing
changes made to 0aff107d787bc9d8bbc28220090b4ca7af72e040.

* Remove tab close fade animation

* Use terminal.fill icon

* Make terminal tab icon smaller

* Match browser globe tab icon size

* Bonsplit: tab min width 48 and tighter close button

* Bonsplit: smaller tab title font

* Show unread notification badge in bonsplit tabs and improve UI polish

Sync unread notification state to bonsplit tab badges (blue dot).
Improve EmptyPanelView with Terminal/Browser buttons and shortcut hints.
Add tooltips to close tab button and search overlay buttons.

* Fix reload.sh single-instance safety check on macOS

Replace GNU-only `ps -o etimes=` with portable `ps -o etime=` and
parse the dd-hh:mm:ss format manually for macOS compatibility.

* Centralize keyboard shortcut definitions into Action enum

Replace per-shortcut boilerplate with a single Action enum that holds
the label, defaults key, and default binding for each shortcut. All
call sites now use shortcut(for:). Settings UI is data-driven via
ForEach(Action.allCases). Titlebar tooltips update dynamically when
shortcuts are changed. Remove duplicate .keyboardShortcut() modifiers
from menu items that are already handled by the event monitor.

* Fix WKWebView consuming app menu shortcuts and close panel confirmation

Add CmuxWebView subclass that routes key equivalents through the main
menu before WebKit, so Cmd+N/Cmd+W/tab switching work when a browser
pane is focused. Fix Cmd+W close-panel path: bypass Bonsplit delegate
gating after the user confirms the running-process dialog by tracking
forceCloseTabIds. Add unit tests (CmuxWebViewKeyEquivalentTests) and
UI test scaffolding (MenuKeyEquivalentRoutingUITests) with a new
cmux-unit Xcode scheme.

* Update CLAUDE.md and PROJECTS.md with recent changes

CLAUDE.md: enforce --tag for reload commands, add cleanup safety rules.
PROJECTS.md: log notification badge, reload.sh fix, Cmd+W fix, WebView
key equiv fix, and centralized shortcuts work.

* Keep selection index stable on close

* Add concepts page documenting terminology hierarchy

New docs page explaining Window > Workspace > Pane > Surface > Panel
hierarchy with aligned ASCII diagram. Updated tabs.mdx and splits.mdx
to use consistent terminology (workspace instead of tab, surface
instead of panel) and corrected outdated CLI command references.

* Update bonsplit submodule

* WIP: improve split close stability and UI regressions

* Close terminal panel on child exit; hide terminal dirty dot

* Fix split close/focus regressions and stabilize UI tests

* Add unread Dock/Cmd+Tab badge with settings toggle

* Fix browser-surface shortcuts and Cmd+L browser opening

* Snapshot current workspace state before regression fixes

* Update bonsplit submodule snapshot

* Stabilize split-close regression capture and sidebar resize assertions

* Change default Show Notifications shortcut from Cmd+Shift+I to Cmd+I

* Fix update check readiness race, enable release update logging, and improve checking spinner

* Restore terminal file drop, fix browser omnibar click focus, and add panel workspace ID mutation for surface moves

* Add Cmd+digit workspace hints, titlebar shortcut pills, sidebar drag-reorder, and workspace placement settings

* Add v2 browser automation API, surface move/reorder commands, and short-handle ref system to TerminalController

* Add CLI browser command surface, --id-format flag, and move/reorder commands

* Extend test clients with move/reorder APIs, ref-handle support, and increased timeouts

* Harden test runner scripts with deterministic builds, retry logic, and robust socket readiness

* Stabilize existing test suites with focus-wait helpers, increased timeouts, and API shape updates

* Add terminal file drop e2e regression test

* Add v2 browser API, CLI ref resolution, and surface move/reorder test suites

* Add unit tests for shortcut hints, workspace reorder, drop planner, and update UI test stabilization

* Add cmux-debug-windows skill with snapshot script and agent config

* Update project docs: mark browser parity and move/reorder phases complete, add parallel agent workflow guidelines

* Update bonsplit submodule: re-entrant setPosition guard, tab shortcut hints, and moveTab/reorderTab API

* Add browser agent UX improvements: snapshot refs, placement reuse, diagnostics, and skill docs

- Upgrade browser.snapshot to emit accessibility tree text with element refs (eN)
- Add right-sibling pane reuse policy for browser.open_split placement
- Add rich not_found diagnostics with retry logic for selector actions
- Support --snapshot-after for post-action verification on mutating commands
- Allow browser fill with empty text for clearing inputs
- Default CLI --id-format to refs-first (UUIDs opt-in via --id-format uuids|both)
- Format legacy new-pane/new-surface output with short surface refs
- Add skills/cmuxterm-browser/ and skills/cmuxterm/ end-user skill docs
- Add regression tests for placement policy, snapshot refs, diagnostics, and ID defaults

* Update bonsplit submodule: keep raster favicons in color when inactive
2026-02-13 16:45:31 -08:00

243 lines
6.5 KiB
Bash
Executable file

#!/usr/bin/env bash
set -euo pipefail
# This runner is intended for the UTM macOS VM (ssh cmux-vm).
# It is intentionally guarded so we don't accidentally kill the host user's cmux instances.
if [ "$(id -un)" != "cmux" ]; then
echo "ERROR: This script is intended to be run on the cmux-vm (user: cmux)." >&2
echo "Run via: ssh cmux-vm 'cd /Users/cmux/GhosttyTabs && ./scripts/run-tests-v1.sh'" >&2
exit 2
fi
cd "$(dirname "$0")/.."
DERIVED_DATA_PATH="$HOME/Library/Developer/Xcode/DerivedData/cmux-tests-v1"
APP="$DERIVED_DATA_PATH/Build/Products/Debug/cmux DEV.app"
echo "== build =="
# Work around stale explicit-module cache artifacts (notably Sentry headers) that can
# intermittently break incremental VM builds with "file ... has been modified since the
# module file ... was built".
rm -rf "$DERIVED_DATA_PATH/Build/Intermediates.noindex/SwiftExplicitPrecompiledModules" || true
xcodebuild \
-project GhosttyTabs.xcodeproj \
-scheme cmux \
-configuration Debug \
-destination "platform=macOS" \
-derivedDataPath "$DERIVED_DATA_PATH" \
build >/dev/null
if [ ! -d "$APP" ]; then
echo "ERROR: cmux DEV.app not found at expected path: $APP" >&2
exit 1
fi
cleanup() {
pkill -x "cmux DEV" || true
pkill -x "cmux" || true
rm -f /tmp/cmux*.sock || true
}
launch_and_wait() {
cleanup
# Wait briefly for the previous instance to fully terminate; LaunchServices can flake if we
# relaunch too quickly.
for _ in {1..50}; do
pgrep -x "cmux DEV" >/dev/null 2>&1 || break
sleep 0.1
done
# Force socket mode for deterministic automation runs, independent of prior user settings.
defaults write com.cmuxterm.app.debug socketControlMode -string full >/dev/null 2>&1 || true
# Launch directly with UI test mode enabled so startup follows deterministic test codepaths.
CMUX_UI_TEST_MODE=1 "$APP/Contents/MacOS/cmux DEV" >/dev/null 2>&1 &
SOCK=""
for _ in {1..120}; do
SOCK=$(ls -t /tmp/cmux-debug*.sock /tmp/cmux*.sock 2>/dev/null | head -1 || true)
if [ -n "$SOCK" ] && [ -S "$SOCK" ]; then
break
fi
sleep 0.25
done
if [ -z "$SOCK" ] || [ ! -S "$SOCK" ]; then
echo "ERROR: Socket not ready (looked for /tmp/cmux*.sock)" >&2
exit 1
fi
export CMUX_SOCKET_PATH="$SOCK"
export CMUX_SOCKET="$SOCK"
# Ensure LaunchServices has a visible/main window attached for rendering checks.
open "$APP" >/dev/null 2>&1 || true
sleep 0.5
echo "== wait ready =="
python3 - <<'PY'
import time
import os
import sys
sys.path.insert(0, os.path.join(os.getcwd(), "tests"))
from cmux import cmux # type: ignore
deadline = time.time() + 30.0
last = None
client = None
while time.time() < deadline:
try:
client = cmux()
client.connect()
break
except Exception as e:
last = e
time.sleep(0.1)
else:
raise SystemExit(f"ERROR: Socket path exists but connect keeps failing: {last}")
workspace_ready = False
while time.time() < deadline:
try:
_ = client.current_workspace()
# Many focus-sensitive tests require the main window to be key.
# `open "$APP"` does not reliably activate the app when launched from SSH.
try:
client.activate_app()
except Exception:
pass
workspace_ready = True
break
except Exception as e:
last = e
time.sleep(0.1)
if not workspace_ready:
print(f"WARN: continuing without workspace-ready state: {last}")
# Use a fresh connection to avoid stale-listener races where the first connection succeeds but
# immediate reconnects fail with ECONNREFUSED.
probe_deadline = time.time() + 10.0
while time.time() < probe_deadline:
probe = None
try:
probe = cmux()
probe.connect()
if not probe.ping():
raise RuntimeError("ping returned false")
print("ready")
break
except Exception as e:
last = e
time.sleep(0.1)
finally:
if probe is not None:
try:
probe.close()
except Exception:
pass
else:
raise SystemExit(f"ERROR: Ready-check reconnect/ping failed: {last}")
# Force a single fresh workspace so startup-state restoration doesn't leave tests
# focused on non-terminal panels (which breaks read_screen/read_terminal_text assumptions)
# or with extra pre-existing workspaces that make ordering-dependent tests flaky.
bootstrap_last = None
for _ in range(3):
try:
existing_ids = []
try:
existing_ids = [row[1] for row in client.list_workspaces() if len(row) >= 2]
except Exception:
existing_ids = []
ws_id = client.new_workspace()
client.select_workspace(ws_id)
for old_id in existing_ids:
if old_id == ws_id:
continue
try:
client.close_workspace(old_id)
except Exception:
pass
surfaces = client.list_surfaces()
if not surfaces:
raise RuntimeError("new workspace has no surfaces")
client.focus_surface(0)
break
except Exception as e:
bootstrap_last = e
time.sleep(0.2)
else:
raise SystemExit(f"ERROR: Failed to bootstrap fresh terminal workspace: {bootstrap_last}")
window_last = None
window_deadline = time.time() + 10.0
while time.time() < window_deadline:
try:
health = client.surface_health()
if any(bool(row.get("in_window")) for row in health):
break
client.activate_app()
except Exception as e:
window_last = e
time.sleep(0.1)
else:
print(f"WARN: no in-window terminal surface detected before test start: {window_last}")
if client is not None:
try:
client.close()
except Exception:
pass
PY
}
run_test_with_retry() {
local f="$1"
local attempts=3
local n=1
while [ "$n" -le "$attempts" ]; do
echo "RUN $f (attempt $n/$attempts)"
if python3 "$f"; then
return 0
fi
if [ "$n" -ge "$attempts" ]; then
return 1
fi
echo "WARN: attempt $n failed for $f; relaunching and retrying" >&2
echo "== relaunch (retry) =="
launch_and_wait
n=$((n + 1))
done
return 1
}
echo "== tests (v1) =="
fail=0
for f in tests/test_*.py; do
base=$(basename "$f")
if [ "$base" = "test_ctrl_interactive.py" ]; then
echo "SKIP $f"
continue
fi
echo "== launch ($base) =="
launch_and_wait
if ! run_test_with_retry "$f"; then
echo "FAIL $f" >&2
fail=1
break
fi
done
echo "== cleanup =="
cleanup
exit "$fail"