release: v3.37.2 - hook format fix, 3 resource evals, cross-model review sections

- Fix: hook format updated to matcher+hooks[] structure (settings.json, learning-mode.md) - New guide sections: Cross-Model Review, Lightweight Role-Switch, Task Sizing (ultimate-guide.md) - Resource Eval: ManoMano Project Aegis — Serena MCP benchmark (3/5, ecosystem gap identified) - Resource Eval: Multi-Session Management Landscape (4/5) - Resource Eval: Ischenko workflow quality (2/5, marginal) - Version bump: 3.37.1 → 3.37.2 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-19 21:22:01 +01:00 · 2026-03-19 21:22:01 +01:00 · 53ac314a15
commit 53ac314a15
parent ea7ce092dc
11 changed files with 485 additions and 17 deletions
--- a/docs/resource-evaluations/082-multi-session-management-landscape.md
+++ b/docs/resource-evaluations/082-multi-session-management-landscape.md
@ -0,0 +1,145 @@
+# Resource Evaluation: Multi-Session Claude Code Management — Landscape Overview
+
+**Date**: 2026-03-19
+**Evaluator**: Claude (research session + structured synthesis)
+**Status**: Reference — Integrate into guide (ecosystem / third-party tools section)
+
+---
+
+## Summary
+
+This evaluation covers the full landscape of tools for managing multiple Claude Code sessions across multiple projects simultaneously. The research identified 13 tools across 4 categories: monitoring dashboards, remote/browser access, multi-project orchestrators, and sound/notification systems.
+
+No single tool covers all use cases. The space is fragmented, actively evolving (most repos < 6 months old), and missing one obvious feature: per-project audio differentiation.
+
+---
+
+## Score: 4/5 (as a category)
+
+| Score | Meaning |
+|-------|---------|
+| 5 | Essential — Major gap |
+| **4** | **High value — Significant improvement** |
+| 3 | Pertinent — Useful complement |
+| 2 | Marginal — Secondary info |
+| 1 | Out of scope |
+
+**Justification**: Multi-agent, multi-project workflows are increasingly common. The guide covers individual session hooks and notifications but has no consolidated view of the tooling available for running 3-10 parallel Claude Code sessions. vibetunnel (4 276 stars) and multi-agent-shogun (1 082 stars) signal strong community demand. Integration would fill a documented gap.
+
+---
+
+## Tool Landscape
+
+### Category 1 — Monitoring Dashboards
+
+| Tool | GitHub | Stars | Stack | Key Features |
+|------|--------|-------|-------|-------------|
+| **claude-code-monitor (ccm)** | [onikan27/claude-code-monitor](https://github.com/onikan27/claude-code-monitor) | ⭐ 212 | TypeScript / Node | TUI vim-style (j/k, 1-9), session switching via AppleScript, mobile WebUI + QR code pairing, WebSocket real-time |
+| **claude-code-dashboard** | [Stargx/claude-code-dashboard](https://github.com/Stargx/claude-code-dashboard) | ⭐ 5 | Node + Express + React | Auto-detects all sessions, token/cost per session, context progress bars, git branch, permission mode badges |
+| **sniffly** | [chiphuyen/sniffly](https://github.com/chiphuyen/sniffly) | ⭐ 1 170 | Python | Analytics-first: usage patterns, error breakdown, shareable dashboard. Post-hoc, not real-time |
+| **ClaudeCode-Dashboard** | [Quriov/ClaudeCode-Dashboard](https://github.com/Quriov/ClaudeCode-Dashboard) | ⭐ 0 | Next.js 16 + ReactFlow | Config topology viewer (hooks, agents, MCP, skills), not session monitoring |
+
+**Best pick**: `ccm` for real-time monitoring on macOS (easiest setup: `npx claude-code-monitor`). `sniffly` for post-session analytics on any platform.
+
+---
+
+### Category 2 — Remote / Browser Access
+
+| Tool | GitHub | Stars | Stack | Key Features |
+|------|--------|-------|-------|-------------|
+| **vibetunnel** | [amantus-ai/vibetunnel](https://github.com/amantus-ai/vibetunnel) | ⭐ 4 276 | TypeScript + Swift | Wraps any terminal in browser tabs, multiple sessions, Git Follow Mode, VibeTunnelTalk voice narration |
+| **cc-hub** | [m0a/cc-hub](https://github.com/m0a/cc-hub) | ⭐ 1 | Go + tmux + Tailscale | Split panes + session color themes, file diff tracking (Claude edits vs git), mobile optimized, dashboard with cost stats |
+
+**Best pick**: `vibetunnel` for broad use (4 276 stars, very active). `cc-hub` if you need session color differentiation + file diffs per project (requires Tailscale).
+
+---
+
+### Category 3 — Multi-Project Orchestrators
+
+| Tool | GitHub | Stars | Stack | Key Features |
+|------|--------|-------|-------|-------------|
+| **claudio** | [Iron-Ham/claudio](https://github.com/Iron-Ham/claudio) | ⭐ 22 | Go + tmux | Isolated git worktrees, TUI dashboard, 14 color themes, task chaining (`--depends-on`), planning modes: UltraPlan / TripleShot / Adversarial Review, PR automation, cost limits |
+| **multi-agent-shogun** | [yohey-w/multi-agent-shogun](https://github.com/yohey-w/multi-agent-shogun) | ⭐ 1 082 | Shell + tmux | Shogun/Karo/Ashigaru hierarchy, 7 workers + 1 strategist, multi-CLI (Claude, Codex, Copilot, Kimi), zero API coordination cost |
+| **zenportal** | [kgang/zenportal](https://github.com/kgang/zenportal) | ⭐ 1 | Python/Textual | Multi-CLI support, git worktrees per session, 3-tier config, session persistence via tmux |
+| **praktor** | [mtzanidakis/praktor](https://github.com/mtzanidakis/praktor) | ⭐ 17 | Go + Docker Compose | Telegram I/O (chat from phone), 1 Docker container per agent, cron tasks, swarms, AES-256-GCM secrets vault |
+
+**Best pick**: `claudio` for serious multi-project orchestration (isolated worktrees + color themes + advanced planning). `multi-agent-shogun` for high-parallelism fan-out patterns with tmux visibility.
+
+---
+
+### Category 4 — Sound / Notification Systems
+
+| Tool | GitHub | Stars | Stack | Per-Project Sound |
+|------|--------|-------|-------|------------------|
+| **karina-voice-notification** | [t1seo/karina-voice-notification](https://github.com/t1seo/karina-voice-notification) | ⭐ 0 | Python (Qwen3-TTS) | Clone any voice from YouTube → custom `.wav` per project (DIY assembly) |
+| **sound-micro-server** | [arc-co/claude-code-notification-sound-micro-server](https://github.com/arc-co/claude-code-notification-sound-micro-server) | ⭐ 0 | Node.js | Browser-based sound via hook `Stop` + `curl POST`. Single sound for all sessions |
+| **ccnotify** | [Helmi/ccnotify](https://github.com/Helmi/ccnotify) | n/a | Shell | Voice notifications (spoken text) |
+| **claude-session-manager** | [Swarek/claude-session-manager](https://github.com/Swarek/claude-session-manager) | ⭐ 4 | Shell | Colored status line per session, session IDs (`cx` command), live description updates |
+
+**Gap**: No tool provides per-project audio differentiation out of the box. The cleanest DIY approach: configure `settings.local.json` per project with a different audio file in the `Stop` hook.
+
+```json
+// project-a/.claude/settings.local.json
+{
+  "hooks": {
+    "Stop": [{ "command": "afplay ~/sounds/project-a.wav" }]
+  }
+}
+```
+
+---
+
+## Capability Matrix
+
+| Tool | Multi-session visibility | Session switching | Per-project differentiation | Sound | Platform |
+|------|--------------------------|-------------------|-----------------------------|-------|----------|
+| ccm | TUI list | AppleScript focus | Status icons | No | macOS only |
+| claude-code-dashboard | Web dashboard | No | Git branch / badges | No | All |
+| vibetunnel | Browser tabs | Manual tab switch | Terminal titles | No (voice narration optional) | macOS M1+ / Linux |
+| cc-hub | Split panes | Click | Color themes per session | No | macOS/Linux + Tailscale |
+| claudio | TUI per instance | TUI controls | 14 color themes | No | macOS/Linux |
+| multi-agent-shogun | tmux panes | tmux | Pane position | No | macOS/Linux |
+| zenportal | TUI list | TUI controls | Session name | No | macOS/Linux |
+| praktor | Web + Telegram | @agent_name | Docker container | No | Linux/Docker |
+| karina-voice-notification | n/a | n/a | Custom voice per project | YES (DIY) | macOS M1+ / Linux (CUDA) |
+| sound-micro-server | n/a | n/a | No | YES (single sound) | All (browser) |
+| claude-session-manager | Terminal status line | Manual | Color-coded status | No | macOS/Linux |
+
+---
+
+## Key Findings
+
+**High adoption signal**: vibetunnel (4 276 stars) and multi-agent-shogun (1 082 stars) are the two breakout tools. Both are actively maintained and solve real problems at scale.
+
+**Missing feature**: Per-project audio differentiation does not exist as a packaged solution. DIY with `afplay` / `paplay` + `settings.local.json` hooks per project is the only current approach.
+
+**Ecosystem maturity**: The space is 3-6 months old. Most tools are single-maintainer experiments. `claudio`, `ccm`, and `vibetunnel` have the strongest signals for longevity.
+
+**Platform gap**: Most orchestrators require tmux and work only on macOS/Linux. No solid Windows option exists.
+
+---
+
+## Recommendations
+
+**Action**: Integrate a "Multi-Session Management" section into the guide (third-party tools or workflows section).
+
+**Priority picks to document**:
+
+| Use case | Recommended tool |
+|----------|-----------------|
+| Quick multi-session visibility on macOS | ccm (`npx claude-code-monitor`) |
+| Post-session analytics (all platforms) | sniffly (`uvx sniffly@latest init`) |
+| Browser/remote access | vibetunnel |
+| Serious orchestration with isolation | claudio |
+| High-parallelism fan-out | multi-agent-shogun |
+| Per-project audio (DIY) | `settings.local.json` + `afplay` |
+
+**Watch list**: cc-hub, zenportal, praktor — interesting architectures but < 20 stars each. Re-evaluate at 100+ stars.
+
+---
+
+## Related Evaluations
+
+- [078-claude-swarm-monitor.md](078-claude-swarm-monitor.md) — TUI for monitoring agents across worktrees (Rust, Linux)
+- [074-ruflo-multi-agent-orchestration.md](074-ruflo-multi-agent-orchestration.md) — Ruflo orchestration platform
+- [079-fabro-workflow-orchestration.md](079-fabro-workflow-orchestration.md) — Fabro workflow runtime
--- a/docs/resource-evaluations/2026-03-19-manomano-project-aegis-serena.md
+++ b/docs/resource-evaluations/2026-03-19-manomano-project-aegis-serena.md
@ -0,0 +1,138 @@
+# Resource Evaluation: ManoMano "Project Aegis" — Serena MCP Benchmarking
+
+**Date**: 2026-03-19
+**Evaluator**: Claude Sonnet 4.6
+**Resource URL**: https://medium.com/manomano-tech/project-aegis-benchmarking-ai-agents-and-why-serena-is-our-new-must-have-311673db35dd
+**Resource Type**: Engineering blog post (Medium)
+**Author**: ManoMano Engineering Team
+**Company**: ManoMano (e-commerce, ~1000 devs)
+**Article access**: Medium 403 during evaluation — content reconstructed from Perplexity + Serena GitHub (oraios/serena)
+
+---
+
+## Executive Summary
+
+ManoMano's engineering team ran "Project Aegis," an internal benchmark of AI coding agents across their dev stack. Their conclusion: Serena MCP became a must-have tool. The article surfaces real production usage data for Serena, an LSP-based MCP server that provides symbol-level code navigation and session memory. The guide already documents Serena extensively (8+ files, high depth in `ultimate-guide.md` and `search-tools-mastery.md`) but has a specific consistency gap: no entry in `mcp-servers-ecosystem.md`, which lists GrepAI as the only code search/analysis MCP. A reader landing on that page gets an incomplete picture.
+
+---
+
+## Content Summary
+
+**What the article covers** (reconstructed — direct fetch failed):
+
+- Internal benchmark ("Project Aegis") comparing multiple AI coding agents on production tasks
+- Serena MCP identified as the standout tool for large codebase navigation
+- Rationale: LSP-based symbol navigation (vs embedding/vector search like GrepAI) provides precise, low-latency, deterministic results
+- Token efficiency: Serena provides targeted context (symbol + callers/references) rather than full-file reads
+- Conclusion: Serena is now part of ManoMano's standard AI dev setup
+
+**What Serena does** (verified via GitHub oraios/serena + Perplexity):
+
+- Uses Language Server Protocol (LSP) for semantic code understanding — actual compiler-level symbol resolution, not embeddings
+- 30+ languages supported natively (Python, TypeScript/JS, PHP, Go, Rust, C/C++, Java out of box)
+- Core tools: `find_symbol`, `find_referencing_symbols`, `get_symbols_overview`, `replace_symbol_body`
+- Session memory: `write_memory` / `read_memory` / `list_memories` stored in `.serena/memories/`
+- Behavioral modes: planning, editing, interactive, one-shot — contexts: desktop-app, agent, ide-assistant
+- Free, open-source (GitHub: oraios/serena), runs locally via `uvx`
+- Integrates with Claude Code, Claude Desktop, VSCode, Cursor, Cline
+
+**Key distinction vs GrepAI**:
+
+| Aspect | Serena | GrepAI |
+|--------|--------|--------|
+| Approach | LSP (compiler-level symbols) | Embeddings (Ollama vector search) |
+| Latency | ~100ms | ~500ms |
+| Use case | Known symbol navigation, refactoring | Intent-based discovery, unfamiliar code |
+| Setup | Language server per language | Ollama + nomic-embed-text |
+| Memory | Built-in session memory | None |
+| Accuracy | Deterministic (exact symbols) | Probabilistic (similarity score) |
+
+---
+
+## Gap Analysis vs. Guide
+
+| Area | ManoMano article / Serena | Guide coverage |
+|------|--------------------------|----------------|
+| Serena — dedicated section | ✅ Endorses as must-have | ✅ `ultimate-guide.md:10527`, `search-tools-mastery.md` |
+| Serena session memory | ✅ Implicit (persistent workflow) | ✅ `ultimate-guide.md:1797-1843`, cheatsheet |
+| Serena — ecosystem entry | ✅ Would fit under Code Search | ❌ **Not in `mcp-servers-ecosystem.md`** |
+| Serena vs GrepAI comparison | ✅ Context from benchmarking | ✅ `search-tools-mastery.md` comparison table |
+| Production benchmarking methodology | ✅ Real team, real codebase | ❌ Guide has no multi-agent benchmark section |
+| LSP setup friction (polyglot codebases) | ⚠️ Not addressed in article | ⚠️ Understated in guide |
+
+**Real gap**: `mcp-servers-ecosystem.md` lists GrepAI as the only entry under "Code Search & Analysis." A reader arriving via that page has no path to Serena. The rest of the guide recommends both tools as complementary, creating a discoverability inconsistency.
+
+---
+
+## Relevance Score: 3/5
+
+### Why 3/5 (Pertinent — Integrate when time available)?
+
+**✅ Strengths**:
+
+1. **Production validation**: ManoMano is a real e-commerce company running this at scale, not a tutorial author
+2. **Corroborates existing guide position**: The guide already recommends Serena — this adds external credibility
+3. **Benchmarking angle**: Real-world comparison between agents is an angle the guide does not cover
+4. **Signals the discoverability gap**: The fact that a production team writes "why Serena is our must-have" suggests readers aren't finding it easily — consistent with the mcp-servers-ecosystem.md gap
+
+**⚠️ Weaknesses**:
+
+1. **Single-team case study**: One engineering team's benchmark, methodology not published
+2. **"Must-have" is marketing language**: No reproducible metrics, no controlled comparison
+3. **Article inaccessible**: Medium 403 — content could not be directly verified during evaluation
+4. **Narrow gap**: The guide already covers Serena well; the fix is a targeted addition to one file, not a major integration
+
+---
+
+## Recommendations
+
+**Primary action** (independent of this article — fix the consistency gap):
+
+Add a formal Serena entry to `guide/ecosystem/mcp-servers-ecosystem.md` under "Code Search & Analysis," after the GrepAI entry. Include:
+- Repository, license, status
+- LSP vs embedding distinction (why it complements GrepAI)
+- Key tools: `find_symbol`, `get_symbols_overview`, `write_memory`
+- Setup (uvx install, `--project-root` arg)
+- Cross-link to `guide/workflows/search-tools-mastery.md`
+
+**Secondary action** (optional, using this article as source):
+
+Mention ManoMano's production benchmarking as a real-world reference within the Serena entry or the search-tools-mastery workflow. Frame it as: "Production teams choosing Serena for large codebase work consistently cite the LSP approach's precision over embedding-based alternatives."
+
+**Priority**: Medium — the ecosystem page inconsistency is the real driver, not the article itself.
+
+---
+
+## Challenge Notes (technical-writer agent)
+
+The agent challenge during evaluation raised three valid points:
+
+1. **Score should separate resource quality from gap severity**: The 4/5 initially assigned conflated "how important is Serena" with "how good is this article." Adjusted to 3/5 after separating the two.
+
+2. **LSP setup friction understated**: Serena requires a running language server per language. For polyglot repos, this is non-trivial. Worth flagging in the guide entry.
+
+3. **Serena session memory overlaps with ICM**: The guide currently does not clearly distinguish Serena's `.serena/memories/` from ICM's cross-session memory. A clarification note would prevent user confusion when both are configured.
+
+---
+
+## Fact-Check
+
+| Claim | Verified | Source |
+|-------|----------|--------|
+| Serena uses LSP for symbol navigation | ✅ | github.com/oraios/serena, Perplexity |
+| 30+ languages supported | ✅ | Multiple sources (aiagentslist.com, vibetools.net) |
+| Claude Code integration native | ✅ | Serena README |
+| Free and open-source (MIT) | ✅ | GitHub license |
+| Session memory via `.serena/memories/` | ✅ | Guide documentation + quiz |
+| ManoMano article exists at URL | ✅ | URL valid, 403 on fetch |
+| ManoMano benchmark stats/methodology | ⚠️ | Article inaccessible — not verifiable |
+| "Must-have" as measured outcome | ❌ | Marketing claim, no reproducible metric |
+
+---
+
+## Decision
+
+- **Score**: 3/5
+- **Action**: Integrate — add Serena entry to `mcp-servers-ecosystem.md` (fix the consistency gap). Optionally cite ManoMano as production reference within that entry.
+- **Confidence**: High on the gap diagnosis; Medium on the article content (inaccessible)
+- **Urgency**: Low-Medium — the guide works without it, but the discoverability gap is real
--- a/docs/resource-evaluations/ischenko-claude-code-workflow-quality.md
+++ b/docs/resource-evaluations/ischenko-claude-code-workflow-quality.md
@ -0,0 +1,76 @@
+# Resource Evaluation: "You're probably using Claude Code wrong" - Alex Ischenko
+
+## Metadata
+
+| Field | Value |
+|-------|-------|
+| **Author** | Alex Ischenko |
+| **Role** | AI-Driven CTO, Top 100 Leaders @ CTO Craft |
+| **Published** | 2026-03-19 |
+| **Type** | LinkedIn Pulse article |
+| **URL** | https://www.linkedin.com/pulse/youre-probably-using-claude-code-wrong-i-too-until-shift-ischenko-bwdkf/ |
+| **Evaluated** | 2026-03-19 |
+| **Score** | 2/5 (Marginal) |
+| **Decision** | Do not integrate |
+
+## Summary
+
+LinkedIn article arguing that Claude Code quality is an engineering system question, not a model question. Proposes 7 workflow patterns for improving output quality, each with a full copy-paste prompt template:
+
+1. **Reality checks before implementation** - verify codebase assumptions before coding
+2. **Separate author/reviewer** - two-role pattern within same session
+3. **Project-aware reviews** - review with project context, not just diff
+4. **Requirements as mandatory artifact** - REQUIREMENTS.md before code
+5. **TDD workflow** - anchor behavior with tests first
+6. **Small task sizes** - reduce scope for better AI output
+7. **Human abstraction elevation** - move engineers to architecture/trade-off level
+
+Claims "20-30% quality improvement" from these workflow changes.
+
+## Scoring Rationale
+
+### Overlap with Guide (75-85%)
+
+| Pattern | Guide Coverage | Location |
+|---|---|---|
+| Reality checks | Partial | `exploration-workflow.md`, Plan Mode (L3717) |
+| Author/reviewer | Moderate | SE-CoVe (L13095), Scope-Focused Agents (L4410) |
+| Project-aware reviews | Partial | `code-review.md` (CLAUDE.md + REVIEW.md) |
+| Requirements artifact | Partial | `spec-first.md` (full workflow) |
+| TDD | Strong | `tdd-with-claude.md`, L19183-19320, skill template L7336 |
+| Small tasks | Scattered | `spec-first.md` L62-93, L1529, L1733 |
+| Human elevation | Thin | L17458, L15725, L3216 |
+
+### What's Unique
+
+The 7 copy-paste prompt templates are the only non-redundant element. These are practical formatting convenience but not structural insight. The guide's existing workflow files and skill templates serve the same purpose.
+
+### Credibility Assessment
+
+- No GitHub repo, no production artifact, no tooling behind the article
+- "20-30% quality improvement" has no methodology, no baseline, no control group
+- Compare to higher-scored resources: Cullen (shipped working slash command, 5/5), Chabaud (clonable repo, 3/5), Rusitschka (repo with working code, 4/5)
+
+### Accumulation Risk
+
+The guide already integrates Chabaud, Rusitschka, Cullen, and paddo.dev team tips covering adjacent workflow territory. Adding Ischenko without new substance dilutes the signal-to-noise ratio.
+
+## Identified Gaps (for future work, not from this resource)
+
+Two gaps surfaced during analysis that the guide could address independently:
+
+1. **Multi-model review pattern** (near zero coverage): deliberately using different models to review each other's work. Ischenko mentions it briefly but provides no template.
+2. **Consolidated task sizing section**: currently scattered across multiple files with no single reference point.
+
+## Fact-Check
+
+| Claim | Status | Notes |
+|---|---|---|
+| Author credentials | Unverifiable | CTO Craft exists, "Top 100" not independently verifiable |
+| "20-30% quality improvement" | Unfalsifiable | No methodology described |
+| Tool landscape (Claude Code, Cursor, etc.) | Verified | All exist as active tools |
+| LLM behavioral patterns (overconfidence, compound errors) | Verified | Well-documented in literature |
+
+## Decision
+
+**Do not integrate.** Solid engineering advice but the guide already covers these patterns through better-sourced, more detailed, and more production-grounded resources. The prompt templates could theoretically be extracted as addenda to existing workflow files, but this is low priority.