* Migrate CI/CD to WarpBuild, consolidate test jobs
Replace all macOS runner labels across workflows:
- depot-macos-latest → warp-macos-15-arm64-6x
- macos-15 → warp-macos-15-arm64-6x
- macos-14 → warp-macos-14-arm64-6x
Consolidates tests + tests-depot into a single tests job that runs
unit tests, regressions, UI tests, and lag tests sequentially on one
WarpBuild runner. Ubuntu jobs remain on ubuntu-latest.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Upgrade stale zig on runners that have an outdated version pre-installed
WarpBuild macos-14 ships zig 0.15.1 but the project requires 0.15.2.
The install step skipped because zig was found, just outdated.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Pin zig 0.15.2 via direct tarball instead of Homebrew
Homebrew's zig bottle for macOS 14 (Sonoma) is stuck at 0.15.1 but the
ghostty submodule requires 0.15.2. Download zig directly from
ziglang.org to guarantee the correct version on all runner images.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Fix zig tarball URL: arch-os order is aarch64-macos, not macos-aarch64
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Create /usr/local/bin and /usr/local/lib before copying zig
WarpBuild runners don't have /usr/local/lib by default.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Add 20-min timeout to WarpBuild jobs
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Fix UI test hang: stream output instead of variable capture, use GitHub runner for macOS 14
The OUTPUT=$(...) pattern buffers all xcodebuild output into a bash
variable. For the full cmux scheme (build + UI tests), this can be
hundreds of MB, causing the shell to hang. Replace with tee streaming.
macOS 14 on WarpBuild consistently hangs (unit tests timeout at 20min
vs 4min on macOS 15, same M4 Pro hardware). Use GitHub-hosted macos-14
runner for compat tests instead, which works on main today.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Split UI tests to GitHub-hosted runner (WarpBuild can't activate GUI apps)
WarpBuild macOS VMs leave XCUIApplication stuck in "Running Background"
state, causing every UI test to burn ~62s waiting for activation and
timing out the job. Root cause: WarpBuild ephemeral VMs don't provide
a full GUI session for app activation.
Split CI into parallel jobs:
- tests: WarpBuild (unit tests + regressions, ~6 min)
- tests-ui: GitHub-hosted macos-15 (UI tests + lag regression)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Move tests-ui to WarpBuild with TCC permission grants
Grant accessibility, post-event, and screen capture TCC permissions
to Xcode and XCTest processes on WarpBuild ephemeral VMs. This should
fix "Failed to activate application (Running Background)" errors that
prevent XCUITests from bringing the app to foreground.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Add GUI session diagnostics and DevToolsSecurity for WarpBuild UI tests
Add session diagnostics (who, console user, GUI domain, WindowServer,
loginwindow) to understand WarpBuild VM session state. Also enable
DevToolsSecurity and security authorizationdb for XCTest process
control. Try bootstrapping GUI session if missing.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Fix TCC permissions: use Xcode-Helper + user DB (CircleCI approach)
Previous TCC grants used wrong client IDs (com.apple.dt.Xcode) and
only wrote to the system database. CircleCI's proven approach grants:
- kTCCServiceAccessibility to com.apple.dt.Xcode-Helper (not Xcode)
- kTCCServiceDeveloperTool to com.apple.Terminal
- Both system AND user-level TCC databases
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Reduce UI test timeout to 15s for WarpBuild expected failures
WarpBuild Virtualization.framework VMs cannot activate macOS GUI apps
(XCUIApplication stuck "Running Background"). Tests still execute and
report expected failures. But the 62s per-test activation timeout
makes 30+ tests take 30+ minutes total.
Set per-test timeout to 15s so expected failures resolve quickly.
Full interactive UI test coverage runs via test-e2e.yml on
GitHub-hosted runners with proper display support.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Replace XCUITest run with build + lag regression on WarpBuild
WarpBuild Virtualization.framework VMs cannot activate macOS GUI apps
(XCUIApplication stuck "Running Background" with 62s activation
timeout per test). Tried TCC permissions, DevToolsSecurity, virtual
display, reduced timeouts, nothing fixes the framework-level issue.
Replace tests-ui job with tests-build-and-lag:
- Build the full cmux scheme (verifies compilation)
- Run workspace churn typing-lag regression (socket-based, no GUI)
- XCUITests run via test-e2e.yml on GitHub-hosted runners
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Move macOS 14 compat to WarpBuild (no GitHub-hosted runners)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Add diagnostic workflow to probe WarpBuild GUI activation
Tests multiple app activation approaches on WarpBuild VMs:
- open -a, NSWorkspace, NSRunningApplication.activate, osascript
- Virtual display state before/after CGVirtualDisplay
- TCC/accessibility permissions, Quartz session info
- VM type detection
This is a workflow_dispatch-only diagnostic to determine if
XCUITest can work on WarpBuild with the right configuration.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Trigger GUI probe on branch push (workflow_dispatch needs main)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Rewrite GUI probe with Swift (Python lacks AppKit on WarpBuild)
v1 failed because WarpBuild's Python isn't a framework build and
can't import AppKit/Quartz. v2 uses a compiled Swift binary to test
NSRunningApplication.activate(), osascript, Quartz session state,
display info, and AX trust.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* GUI probe v3: try 5 approaches to unlock WarpBuild screen
1. defaults write (screensaver, loginwindow, pmset)
2. automationmodetool enable-automationmode-without-authentication
3. CGSSessionSetScreenLocked private API + System Events keystroke
4. sysadminctl -screenLock off + keychain unlock
5. CGEvent simulation (mouse move + Return key to dismiss lock)
Each approach is followed by an activation check to see if it worked.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Test GUI activation on macOS 14, 15, and 26 (Tahoe)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Add DerivedData and GhosttyKit caching to CI workflows
Major caching improvements across ci.yml and ci-macos-compat.yml:
- Cache GhosttyKit.xcframework keyed on ghostty submodule SHA
(skip download on cache hit)
- Cache DerivedData keyed on OS + Xcode version + Package.resolved +
project.pbxproj (enables incremental builds across runs)
- Remove explicit DerivedData wipe (rely on cache key invalidation)
- Use download-prebuilt-ghosttykit.sh in compat workflow too
This should significantly speed up macOS 14 compat tests which were
taking 20+ min due to full recompilation every run.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Bump macOS 14 compat timeout to 45 min for cold cache seeding
The DerivedData cache wasn't saved because the job timed out at 30 min,
causing the post-job cache save step to be skipped. 45 min gives enough
headroom for the first uncached run to complete and seed the cache.
Subsequent runs should be much faster with incremental builds.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Use Depot runners for E2E tests (WarpBuild has screen lock on macOS 15/26)
WarpBuild VMs on macOS 15 and 26 have CGSSessionScreenIsLocked=1, which
prevents XCUIApplication activation. Depot runners have working GUI
activation. Can switch back to WarpBuild once they fix the VM images.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Skip smoke test on macOS 14 compat, remove GUI diagnostic workflow
macOS 14 was slow because it built the full app (cmux scheme) on top of
unit tests (cmux-unit scheme). Unit tests are the real compat check;
smoke test runs on macOS 15 only. Also removes the temporary
test-warpbuild-gui.yml diagnostic workflow.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Replace Sonoma with Tahoe in compat matrix, drop macOS 14
Swap macOS 14 (Sonoma) for macOS 26 (Tahoe). Smoke test runs on
macOS 15 only (WarpBuild screen lock blocks app activation on 26).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Drop macOS 26 from compat matrix (zig 0.15.2 linker failure)
Zig 0.15.2 can't link against the macOS 26 (Tahoe) SDK: undefined
symbols for basic libc functions (_abort, _free, _fork, etc.). The zig
toolchain needs an update to support Tahoe. Keep macOS 15 only for now.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Lawrence Chen <lawrencecchen@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* Fix stale Claude status in sidebar by adding missing hooks and OSC suppression
The Claude Code integration only used 3 hooks (SessionStart, Stop, Notification),
leaving gaps that caused stale sidebar status. Now uses 6 hooks:
- SessionEnd: clears status when Claude exits (covers Ctrl+C where Stop doesn't fire)
- UserPromptSubmit: clears "Needs input" and sets "Running" on new prompt
- PreToolUse (async): clears "Needs input" when Claude resumes after permission grant
Also:
- Suppress OSC 9/99 desktop notifications for workspaces with active Claude hook
sessions to prevent duplicates from the raw OSC path
- Store Claude process PID in status entries for stale-session detection
- Add 30-second sweep timer that checks agent PIDs and clears stale entries
(safety net for SIGKILL/crash where no hook fires)
- Update wrapper test expectations for the new hook set
Fixes https://github.com/manaflow-ai/cmux/issues/1301
* Don't show "Running" status on Claude launch, only when actually working
SessionStart now registers the PID for tracking and OSC suppression via
set_agent_pid without setting a visible status entry. "Running" only
appears when the user submits a prompt (UserPromptSubmit) or Claude
starts using tools (PreToolUse).
Added set_agent_pid / clear_agent_pid socket commands to decouple PID
tracking from visible status entries. OSC suppression checks agentPIDs
instead of statusEntries so it works during the initial idle period.
* Don't restore status entries across app restarts
Status entries are ephemeral runtime state tied to running processes
(e.g. claude_code "Running"). Restoring them after restart shows stale
status for processes that no longer exist.
* Address PR review comments and remove debug logging
- session-end: only clear status/PID/notifications when Stop didn't fire first
- PID sweep: check errno == ESRCH instead of treating all kill(pid,0) failures as dead
- Validate CMUX_CLAUDE_PID > 0
- Propagate tracked PID in pre-tool-use setClaudeStatus
- OSC suppression: use tabManagerFor(tabId:) for multi-window support
- clearAgentPID: resolve tab UUID before async dispatch
- restoreSessionSnapshot: also clear agentPIDs alongside statusEntries
- Fix AskUserQuestion surfaceId overwrite (wrong workspace notification)
- Fix notification text matching for "Claude Code needs your attention"
- AskUserQuestion: render option labels as bracketed inline text
- Remove artificial text truncation limits
- Remove temporary JSONL debug logging from all handlers
* Use resolveTabIdForSidebarMutation in clearAgentPID
* Fallback stable socket listener to user socket path
* Move stable socket path out of /tmp
* Keep socket health checks active on fallback paths
---------
Co-authored-by: Lawrence Chen <lawrencecchen@users.noreply.github.com>
* Add claude-teams CLI command
* Add claude-teams launcher regression test
* Exec claude-teams launcher in place
* Add existing-shim claude-teams regression test
* Reuse claude-teams shim and refresh dev CLI
* Add wrapper-selection claude-teams regression test
* Launch real claude binary for claude-teams
* Add claude-teams auto-mode launcher regression test
* Default claude-teams to fake tmux auto mode
* Build tagged reloads under DerivedData
* Add claude-teams tmux sequence regression test
* Fix claude-teams tmux teammate compatibility
* Add claude-teams split focus regression test
* Keep claude-teams leader pane focused
* Tighten claude-teams review fixes
* Pass claude-teams help through to Claude
* Use sentinel TERM_PROGRAM in claude-teams test
* Fix sidebar branch refresh after checkout
* Fix bash PR probe not refreshing on checkout (PR review feedback)
When HEAD changes (e.g. git checkout), the bash integration now resets
_CMUX_PR_LAST_RUN=0 so the PR probe is forced to re-run immediately.
This matches the zsh integration which already sets _CMUX_PR_FORCE=1
on HEAD change.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Address PR review comments:
- Test1: record original panel ID, wait for a *different* panel to report
the expected CWD — prevents false pass when focus stays on source pane
- Test2: record original tab ID, wait for a *different* tab with the
expected CWD — prevents false pass when checking the old workspace
- Remove unused `as e` exception bindings (Ruff F841)
- Shell race (set -m) was already fixed in 4402a5b0 via disown
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>