fix(runtime): add server-side sweeper to detect stale runtimes

The only path to marking a runtime offline was the daemon's deregister
call on graceful shutdown. If the daemon crashed, was killed, or lost
network, the status stayed "online" forever. Add a background goroutine
that sweeps every 30s and marks runtimes offline after 45s without a
heartbeat (3 missed intervals).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Jiayuan 2026-03-29 14:22:12 +08:00
parent 586c3bf470
commit b3bbf92a1d
4 changed files with 110 additions and 0 deletions

View file

@ -44,3 +44,10 @@ RETURNING *;
UPDATE agent_runtime
SET status = 'offline', updated_at = now()
WHERE id = $1;
-- name: MarkStaleRuntimesOffline :many
UPDATE agent_runtime
SET status = 'offline', updated_at = now()
WHERE status = 'online'
AND last_seen_at < now() - make_interval(secs => @stale_seconds::double precision)
RETURNING id, workspace_id;