release: v3.37.2 - hook format fix, 3 resource evals, cross-model review sections

- Fix: hook format updated to matcher+hooks[] structure (settings.json, learning-mode.md)
- New guide sections: Cross-Model Review, Lightweight Role-Switch, Task Sizing (ultimate-guide.md)
- Resource Eval: ManoMano Project Aegis — Serena MCP benchmark (3/5, ecosystem gap identified)
- Resource Eval: Multi-Session Management Landscape (4/5)
- Resource Eval: Ischenko workflow quality (2/5, marginal)
- Version bump: 3.37.1 → 3.37.2

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Florian BRUNIAUX 2026-03-19 21:22:01 +01:00
parent ea7ce092dc
commit 53ac314a15
11 changed files with 485 additions and 17 deletions

View file

@ -0,0 +1,145 @@
# Resource Evaluation: Multi-Session Claude Code Management — Landscape Overview
**Date**: 2026-03-19
**Evaluator**: Claude (research session + structured synthesis)
**Status**: Reference — Integrate into guide (ecosystem / third-party tools section)
---
## Summary
This evaluation covers the full landscape of tools for managing multiple Claude Code sessions across multiple projects simultaneously. The research identified 13 tools across 4 categories: monitoring dashboards, remote/browser access, multi-project orchestrators, and sound/notification systems.
No single tool covers all use cases. The space is fragmented, actively evolving (most repos < 6 months old), and missing one obvious feature: per-project audio differentiation.
---
## Score: 4/5 (as a category)
| Score | Meaning |
|-------|---------|
| 5 | Essential — Major gap |
| **4** | **High value — Significant improvement** |
| 3 | Pertinent — Useful complement |
| 2 | Marginal — Secondary info |
| 1 | Out of scope |
**Justification**: Multi-agent, multi-project workflows are increasingly common. The guide covers individual session hooks and notifications but has no consolidated view of the tooling available for running 3-10 parallel Claude Code sessions. vibetunnel (4 276 stars) and multi-agent-shogun (1 082 stars) signal strong community demand. Integration would fill a documented gap.
---
## Tool Landscape
### Category 1 — Monitoring Dashboards
| Tool | GitHub | Stars | Stack | Key Features |
|------|--------|-------|-------|-------------|
| **claude-code-monitor (ccm)** | [onikan27/claude-code-monitor](https://github.com/onikan27/claude-code-monitor) | ⭐ 212 | TypeScript / Node | TUI vim-style (j/k, 1-9), session switching via AppleScript, mobile WebUI + QR code pairing, WebSocket real-time |
| **claude-code-dashboard** | [Stargx/claude-code-dashboard](https://github.com/Stargx/claude-code-dashboard) | ⭐ 5 | Node + Express + React | Auto-detects all sessions, token/cost per session, context progress bars, git branch, permission mode badges |
| **sniffly** | [chiphuyen/sniffly](https://github.com/chiphuyen/sniffly) | ⭐ 1 170 | Python | Analytics-first: usage patterns, error breakdown, shareable dashboard. Post-hoc, not real-time |
| **ClaudeCode-Dashboard** | [Quriov/ClaudeCode-Dashboard](https://github.com/Quriov/ClaudeCode-Dashboard) | ⭐ 0 | Next.js 16 + ReactFlow | Config topology viewer (hooks, agents, MCP, skills), not session monitoring |
**Best pick**: `ccm` for real-time monitoring on macOS (easiest setup: `npx claude-code-monitor`). `sniffly` for post-session analytics on any platform.
---
### Category 2 — Remote / Browser Access
| Tool | GitHub | Stars | Stack | Key Features |
|------|--------|-------|-------|-------------|
| **vibetunnel** | [amantus-ai/vibetunnel](https://github.com/amantus-ai/vibetunnel) | ⭐ 4 276 | TypeScript + Swift | Wraps any terminal in browser tabs, multiple sessions, Git Follow Mode, VibeTunnelTalk voice narration |
| **cc-hub** | [m0a/cc-hub](https://github.com/m0a/cc-hub) | ⭐ 1 | Go + tmux + Tailscale | Split panes + session color themes, file diff tracking (Claude edits vs git), mobile optimized, dashboard with cost stats |
**Best pick**: `vibetunnel` for broad use (4 276 stars, very active). `cc-hub` if you need session color differentiation + file diffs per project (requires Tailscale).
---
### Category 3 — Multi-Project Orchestrators
| Tool | GitHub | Stars | Stack | Key Features |
|------|--------|-------|-------|-------------|
| **claudio** | [Iron-Ham/claudio](https://github.com/Iron-Ham/claudio) | ⭐ 22 | Go + tmux | Isolated git worktrees, TUI dashboard, 14 color themes, task chaining (`--depends-on`), planning modes: UltraPlan / TripleShot / Adversarial Review, PR automation, cost limits |
| **multi-agent-shogun** | [yohey-w/multi-agent-shogun](https://github.com/yohey-w/multi-agent-shogun) | ⭐ 1 082 | Shell + tmux | Shogun/Karo/Ashigaru hierarchy, 7 workers + 1 strategist, multi-CLI (Claude, Codex, Copilot, Kimi), zero API coordination cost |
| **zenportal** | [kgang/zenportal](https://github.com/kgang/zenportal) | ⭐ 1 | Python/Textual | Multi-CLI support, git worktrees per session, 3-tier config, session persistence via tmux |
| **praktor** | [mtzanidakis/praktor](https://github.com/mtzanidakis/praktor) | ⭐ 17 | Go + Docker Compose | Telegram I/O (chat from phone), 1 Docker container per agent, cron tasks, swarms, AES-256-GCM secrets vault |
**Best pick**: `claudio` for serious multi-project orchestration (isolated worktrees + color themes + advanced planning). `multi-agent-shogun` for high-parallelism fan-out patterns with tmux visibility.
---
### Category 4 — Sound / Notification Systems
| Tool | GitHub | Stars | Stack | Per-Project Sound |
|------|--------|-------|-------|------------------|
| **karina-voice-notification** | [t1seo/karina-voice-notification](https://github.com/t1seo/karina-voice-notification) | ⭐ 0 | Python (Qwen3-TTS) | Clone any voice from YouTube → custom `.wav` per project (DIY assembly) |
| **sound-micro-server** | [arc-co/claude-code-notification-sound-micro-server](https://github.com/arc-co/claude-code-notification-sound-micro-server) | ⭐ 0 | Node.js | Browser-based sound via hook `Stop` + `curl POST`. Single sound for all sessions |
| **ccnotify** | [Helmi/ccnotify](https://github.com/Helmi/ccnotify) | n/a | Shell | Voice notifications (spoken text) |
| **claude-session-manager** | [Swarek/claude-session-manager](https://github.com/Swarek/claude-session-manager) | ⭐ 4 | Shell | Colored status line per session, session IDs (`cx` command), live description updates |
**Gap**: No tool provides per-project audio differentiation out of the box. The cleanest DIY approach: configure `settings.local.json` per project with a different audio file in the `Stop` hook.
```json
// project-a/.claude/settings.local.json
{
"hooks": {
"Stop": [{ "command": "afplay ~/sounds/project-a.wav" }]
}
}
```
---
## Capability Matrix
| Tool | Multi-session visibility | Session switching | Per-project differentiation | Sound | Platform |
|------|--------------------------|-------------------|-----------------------------|-------|----------|
| ccm | TUI list | AppleScript focus | Status icons | No | macOS only |
| claude-code-dashboard | Web dashboard | No | Git branch / badges | No | All |
| vibetunnel | Browser tabs | Manual tab switch | Terminal titles | No (voice narration optional) | macOS M1+ / Linux |
| cc-hub | Split panes | Click | Color themes per session | No | macOS/Linux + Tailscale |
| claudio | TUI per instance | TUI controls | 14 color themes | No | macOS/Linux |
| multi-agent-shogun | tmux panes | tmux | Pane position | No | macOS/Linux |
| zenportal | TUI list | TUI controls | Session name | No | macOS/Linux |
| praktor | Web + Telegram | @agent_name | Docker container | No | Linux/Docker |
| karina-voice-notification | n/a | n/a | Custom voice per project | YES (DIY) | macOS M1+ / Linux (CUDA) |
| sound-micro-server | n/a | n/a | No | YES (single sound) | All (browser) |
| claude-session-manager | Terminal status line | Manual | Color-coded status | No | macOS/Linux |
---
## Key Findings
**High adoption signal**: vibetunnel (4 276 stars) and multi-agent-shogun (1 082 stars) are the two breakout tools. Both are actively maintained and solve real problems at scale.
**Missing feature**: Per-project audio differentiation does not exist as a packaged solution. DIY with `afplay` / `paplay` + `settings.local.json` hooks per project is the only current approach.
**Ecosystem maturity**: The space is 3-6 months old. Most tools are single-maintainer experiments. `claudio`, `ccm`, and `vibetunnel` have the strongest signals for longevity.
**Platform gap**: Most orchestrators require tmux and work only on macOS/Linux. No solid Windows option exists.
---
## Recommendations
**Action**: Integrate a "Multi-Session Management" section into the guide (third-party tools or workflows section).
**Priority picks to document**:
| Use case | Recommended tool |
|----------|-----------------|
| Quick multi-session visibility on macOS | ccm (`npx claude-code-monitor`) |
| Post-session analytics (all platforms) | sniffly (`uvx sniffly@latest init`) |
| Browser/remote access | vibetunnel |
| Serious orchestration with isolation | claudio |
| High-parallelism fan-out | multi-agent-shogun |
| Per-project audio (DIY) | `settings.local.json` + `afplay` |
**Watch list**: cc-hub, zenportal, praktor — interesting architectures but < 20 stars each. Re-evaluate at 100+ stars.
---
## Related Evaluations
- [078-claude-swarm-monitor.md](078-claude-swarm-monitor.md) — TUI for monitoring agents across worktrees (Rust, Linux)
- [074-ruflo-multi-agent-orchestration.md](074-ruflo-multi-agent-orchestration.md) — Ruflo orchestration platform
- [079-fabro-workflow-orchestration.md](079-fabro-workflow-orchestration.md) — Fabro workflow runtime

View file

@ -0,0 +1,138 @@
# Resource Evaluation: ManoMano "Project Aegis" — Serena MCP Benchmarking
**Date**: 2026-03-19
**Evaluator**: Claude Sonnet 4.6
**Resource URL**: https://medium.com/manomano-tech/project-aegis-benchmarking-ai-agents-and-why-serena-is-our-new-must-have-311673db35dd
**Resource Type**: Engineering blog post (Medium)
**Author**: ManoMano Engineering Team
**Company**: ManoMano (e-commerce, ~1000 devs)
**Article access**: Medium 403 during evaluation — content reconstructed from Perplexity + Serena GitHub (oraios/serena)
---
## Executive Summary
ManoMano's engineering team ran "Project Aegis," an internal benchmark of AI coding agents across their dev stack. Their conclusion: Serena MCP became a must-have tool. The article surfaces real production usage data for Serena, an LSP-based MCP server that provides symbol-level code navigation and session memory. The guide already documents Serena extensively (8+ files, high depth in `ultimate-guide.md` and `search-tools-mastery.md`) but has a specific consistency gap: no entry in `mcp-servers-ecosystem.md`, which lists GrepAI as the only code search/analysis MCP. A reader landing on that page gets an incomplete picture.
---
## Content Summary
**What the article covers** (reconstructed — direct fetch failed):
- Internal benchmark ("Project Aegis") comparing multiple AI coding agents on production tasks
- Serena MCP identified as the standout tool for large codebase navigation
- Rationale: LSP-based symbol navigation (vs embedding/vector search like GrepAI) provides precise, low-latency, deterministic results
- Token efficiency: Serena provides targeted context (symbol + callers/references) rather than full-file reads
- Conclusion: Serena is now part of ManoMano's standard AI dev setup
**What Serena does** (verified via GitHub oraios/serena + Perplexity):
- Uses Language Server Protocol (LSP) for semantic code understanding — actual compiler-level symbol resolution, not embeddings
- 30+ languages supported natively (Python, TypeScript/JS, PHP, Go, Rust, C/C++, Java out of box)
- Core tools: `find_symbol`, `find_referencing_symbols`, `get_symbols_overview`, `replace_symbol_body`
- Session memory: `write_memory` / `read_memory` / `list_memories` stored in `.serena/memories/`
- Behavioral modes: planning, editing, interactive, one-shot — contexts: desktop-app, agent, ide-assistant
- Free, open-source (GitHub: oraios/serena), runs locally via `uvx`
- Integrates with Claude Code, Claude Desktop, VSCode, Cursor, Cline
**Key distinction vs GrepAI**:
| Aspect | Serena | GrepAI |
|--------|--------|--------|
| Approach | LSP (compiler-level symbols) | Embeddings (Ollama vector search) |
| Latency | ~100ms | ~500ms |
| Use case | Known symbol navigation, refactoring | Intent-based discovery, unfamiliar code |
| Setup | Language server per language | Ollama + nomic-embed-text |
| Memory | Built-in session memory | None |
| Accuracy | Deterministic (exact symbols) | Probabilistic (similarity score) |
---
## Gap Analysis vs. Guide
| Area | ManoMano article / Serena | Guide coverage |
|------|--------------------------|----------------|
| Serena — dedicated section | ✅ Endorses as must-have | ✅ `ultimate-guide.md:10527`, `search-tools-mastery.md` |
| Serena session memory | ✅ Implicit (persistent workflow) | ✅ `ultimate-guide.md:1797-1843`, cheatsheet |
| Serena — ecosystem entry | ✅ Would fit under Code Search | ❌ **Not in `mcp-servers-ecosystem.md`** |
| Serena vs GrepAI comparison | ✅ Context from benchmarking | ✅ `search-tools-mastery.md` comparison table |
| Production benchmarking methodology | ✅ Real team, real codebase | ❌ Guide has no multi-agent benchmark section |
| LSP setup friction (polyglot codebases) | ⚠️ Not addressed in article | ⚠️ Understated in guide |
**Real gap**: `mcp-servers-ecosystem.md` lists GrepAI as the only entry under "Code Search & Analysis." A reader arriving via that page has no path to Serena. The rest of the guide recommends both tools as complementary, creating a discoverability inconsistency.
---
## Relevance Score: 3/5
### Why 3/5 (Pertinent — Integrate when time available)?
**✅ Strengths**:
1. **Production validation**: ManoMano is a real e-commerce company running this at scale, not a tutorial author
2. **Corroborates existing guide position**: The guide already recommends Serena — this adds external credibility
3. **Benchmarking angle**: Real-world comparison between agents is an angle the guide does not cover
4. **Signals the discoverability gap**: The fact that a production team writes "why Serena is our must-have" suggests readers aren't finding it easily — consistent with the mcp-servers-ecosystem.md gap
**⚠️ Weaknesses**:
1. **Single-team case study**: One engineering team's benchmark, methodology not published
2. **"Must-have" is marketing language**: No reproducible metrics, no controlled comparison
3. **Article inaccessible**: Medium 403 — content could not be directly verified during evaluation
4. **Narrow gap**: The guide already covers Serena well; the fix is a targeted addition to one file, not a major integration
---
## Recommendations
**Primary action** (independent of this article — fix the consistency gap):
Add a formal Serena entry to `guide/ecosystem/mcp-servers-ecosystem.md` under "Code Search & Analysis," after the GrepAI entry. Include:
- Repository, license, status
- LSP vs embedding distinction (why it complements GrepAI)
- Key tools: `find_symbol`, `get_symbols_overview`, `write_memory`
- Setup (uvx install, `--project-root` arg)
- Cross-link to `guide/workflows/search-tools-mastery.md`
**Secondary action** (optional, using this article as source):
Mention ManoMano's production benchmarking as a real-world reference within the Serena entry or the search-tools-mastery workflow. Frame it as: "Production teams choosing Serena for large codebase work consistently cite the LSP approach's precision over embedding-based alternatives."
**Priority**: Medium — the ecosystem page inconsistency is the real driver, not the article itself.
---
## Challenge Notes (technical-writer agent)
The agent challenge during evaluation raised three valid points:
1. **Score should separate resource quality from gap severity**: The 4/5 initially assigned conflated "how important is Serena" with "how good is this article." Adjusted to 3/5 after separating the two.
2. **LSP setup friction understated**: Serena requires a running language server per language. For polyglot repos, this is non-trivial. Worth flagging in the guide entry.
3. **Serena session memory overlaps with ICM**: The guide currently does not clearly distinguish Serena's `.serena/memories/` from ICM's cross-session memory. A clarification note would prevent user confusion when both are configured.
---
## Fact-Check
| Claim | Verified | Source |
|-------|----------|--------|
| Serena uses LSP for symbol navigation | ✅ | github.com/oraios/serena, Perplexity |
| 30+ languages supported | ✅ | Multiple sources (aiagentslist.com, vibetools.net) |
| Claude Code integration native | ✅ | Serena README |
| Free and open-source (MIT) | ✅ | GitHub license |
| Session memory via `.serena/memories/` | ✅ | Guide documentation + quiz |
| ManoMano article exists at URL | ✅ | URL valid, 403 on fetch |
| ManoMano benchmark stats/methodology | ⚠️ | Article inaccessible — not verifiable |
| "Must-have" as measured outcome | ❌ | Marketing claim, no reproducible metric |
---
## Decision
- **Score**: 3/5
- **Action**: Integrate — add Serena entry to `mcp-servers-ecosystem.md` (fix the consistency gap). Optionally cite ManoMano as production reference within that entry.
- **Confidence**: High on the gap diagnosis; Medium on the article content (inaccessible)
- **Urgency**: Low-Medium — the guide works without it, but the discoverability gap is real

View file

@ -0,0 +1,76 @@
# Resource Evaluation: "You're probably using Claude Code wrong" - Alex Ischenko
## Metadata
| Field | Value |
|-------|-------|
| **Author** | Alex Ischenko |
| **Role** | AI-Driven CTO, Top 100 Leaders @ CTO Craft |
| **Published** | 2026-03-19 |
| **Type** | LinkedIn Pulse article |
| **URL** | https://www.linkedin.com/pulse/youre-probably-using-claude-code-wrong-i-too-until-shift-ischenko-bwdkf/ |
| **Evaluated** | 2026-03-19 |
| **Score** | 2/5 (Marginal) |
| **Decision** | Do not integrate |
## Summary
LinkedIn article arguing that Claude Code quality is an engineering system question, not a model question. Proposes 7 workflow patterns for improving output quality, each with a full copy-paste prompt template:
1. **Reality checks before implementation** - verify codebase assumptions before coding
2. **Separate author/reviewer** - two-role pattern within same session
3. **Project-aware reviews** - review with project context, not just diff
4. **Requirements as mandatory artifact** - REQUIREMENTS.md before code
5. **TDD workflow** - anchor behavior with tests first
6. **Small task sizes** - reduce scope for better AI output
7. **Human abstraction elevation** - move engineers to architecture/trade-off level
Claims "20-30% quality improvement" from these workflow changes.
## Scoring Rationale
### Overlap with Guide (75-85%)
| Pattern | Guide Coverage | Location |
|---|---|---|
| Reality checks | Partial | `exploration-workflow.md`, Plan Mode (L3717) |
| Author/reviewer | Moderate | SE-CoVe (L13095), Scope-Focused Agents (L4410) |
| Project-aware reviews | Partial | `code-review.md` (CLAUDE.md + REVIEW.md) |
| Requirements artifact | Partial | `spec-first.md` (full workflow) |
| TDD | Strong | `tdd-with-claude.md`, L19183-19320, skill template L7336 |
| Small tasks | Scattered | `spec-first.md` L62-93, L1529, L1733 |
| Human elevation | Thin | L17458, L15725, L3216 |
### What's Unique
The 7 copy-paste prompt templates are the only non-redundant element. These are practical formatting convenience but not structural insight. The guide's existing workflow files and skill templates serve the same purpose.
### Credibility Assessment
- No GitHub repo, no production artifact, no tooling behind the article
- "20-30% quality improvement" has no methodology, no baseline, no control group
- Compare to higher-scored resources: Cullen (shipped working slash command, 5/5), Chabaud (clonable repo, 3/5), Rusitschka (repo with working code, 4/5)
### Accumulation Risk
The guide already integrates Chabaud, Rusitschka, Cullen, and paddo.dev team tips covering adjacent workflow territory. Adding Ischenko without new substance dilutes the signal-to-noise ratio.
## Identified Gaps (for future work, not from this resource)
Two gaps surfaced during analysis that the guide could address independently:
1. **Multi-model review pattern** (near zero coverage): deliberately using different models to review each other's work. Ischenko mentions it briefly but provides no template.
2. **Consolidated task sizing section**: currently scattered across multiple files with no single reference point.
## Fact-Check
| Claim | Status | Notes |
|---|---|---|
| Author credentials | Unverifiable | CTO Craft exists, "Top 100" not independently verifiable |
| "20-30% quality improvement" | Unfalsifiable | No methodology described |
| Tool landscape (Claude Code, Cursor, etc.) | Verified | All exist as active tools |
| LLM behavioral patterns (overconfidence, compound errors) | Verified | Well-documented in literature |
## Decision
**Do not integrate.** Solid engineering advice but the guide already covers these patterns through better-sourced, more detailed, and more production-grounded resources. The prompt templates could theoretically be extracted as addenda to existing workflow files, but this is low priority.