* Reduce shell integration prompt latency
Three changes to cut ~10-15ms from every precmd/preexec cycle:
1. Use zsh/net/unix (zsocket) for socket sends when available. Eliminates
fork+exec of ncat/socat/nc for every telemetry send (~3ms per send,
3-4 sends per prompt cycle). Falls back to external tools if the
module is unavailable.
2. Replace _cmux_kill_process_tree (synchronous /bin/ps -ax | awk) with
direct kill in _cmux_stop_pr_poll_loop. The tree-kill enumerated all
system processes on every command (~5-13ms). Orphaned children (gh,
sleep) finish on their own within seconds.
3. Minor savings: guard _cmux_patch_ghostty_semantic_redraw after first
success, make _cmux_clear_pr_for_panel async, cache bash send tool.
* Address review: process-group kill, fix clear_pr race, reorder bash init
1. Use kill -KILL -- -$PID (process-group kill) instead of plain kill.
Background jobs are process-group leaders, so this kills all
descendants (gh, sleep) without /bin/ps overhead.
2. Keep bash _cmux_clear_pr_for_panel synchronous to prevent race
with the next report_pr from the poll loop. Zsh version uses
_cmux_send_bg which is synchronous when zsocket is available.
3. Move _cmux_detect_send_tool after _cmux_fix_path in bash so the
cached tool lookup runs with the final PATH.
---------
Co-authored-by: Lawrence Chen <lawrencecchen@users.noreply.github.com>