Commit graph

4 commits

Author SHA1 Message Date
Naiyuan Qing
1bd25c5aec fix(media): don't cache whisper binary detection failure
Only cache successful whisper binary lookups. When whisper is not found,
leave the cache empty so subsequent calls re-check PATH. This allows
the app to detect a newly installed whisper without requiring a restart.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-09 18:27:06 +08:00
Naiyuan Qing
614d2cfd88 fix(channels): address code review issues
Critical:
- describe-video: add mkdir for MEDIA_CACHE_DIR before ffmpeg write
- telegram: check bot ID (not is_bot) for reply-to detection in groups

Important:
- telegram: check @mention in caption for media messages in groups
- hub: add .catch() to channelManager.startAll()
- describe-image: add 20MB file size check to prevent OOM
- async-agent: remove dead writeWithImages, refactor with enqueue()
- manager: lazy agent subscription via ensureSubscribed() to handle
  late agent availability and agent replacement

Suggestions:
- telegram-format: escape quotes in link URLs to prevent HTML breakout
- transcribe: catch API errors and return null (match local fallback)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-09 11:22:17 +08:00
Naiyuan Qing
db214b25ca feat(media): add image/video description and local whisper priority
- Add describe-image.ts: OpenAI Vision API (gpt-4o-mini) image description
- Add describe-video.ts: ffmpeg frame extraction + Vision API description
- Rewrite transcribe.ts: local whisper/whisper-cli → OpenAI API → null
- Update manager.ts routeMedia(): all media converted to text before agent
  - Image: describeImage() → text (was: raw ImageContent via writeWithImages)
  - Video: describeVideo() → text (was: file path info only)
  - Audio: unchanged (but underlying transcribeAudio now tries local first)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-09 11:03:31 +08:00
Naiyuan Qing
4e5780692e feat(media): transcribe audio via Whisper API before reaching agent
Move audio transcription from agent-driven (exec + local whisper) to
Manager-layer processing via OpenAI Whisper API. Voice messages are
now transcribed automatically before the agent sees them, so the
agent only receives text. Local whisper skill remains as fallback
when API key is not configured. Also changed default model from
turbo to base for faster first-time experience.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-09 10:06:11 +08:00