LlamaCppClient gains text_to_speech (OpenAI /audio/speech), list_voices and
create_voice (voice library at the swap-root /upstream/<model>/voices
passthrough), plus a tts_model slot configured via LLAMA_SWAP_TTS_MODEL
(default "chatterbox").
New Claims-gated routes:
- POST /tts/speech -> { audio_base64, format } for data: URI playback
- GET /tts/voices -> voice library passthrough
- POST /tts/voices/upload -> clone a voice from an uploaded clip (multipart)
- POST /tts/voices/from-library -> clone from a library file (ffmpeg-extracts
audio from video; audio forwarded as-is)
Security: voice_name sanitized to [A-Za-z0-9_-] (it becomes an upstream
filename), 25 MB upload cap, library refs restricted to real audio/video,
path confined via is_valid_full_path. Adds is_audio_file + unit tests for the
sanitizer, mime guesser, and swap-root derivation.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Symptom: Apollo's logs showed bursts of 422 decode_failed from
ImageApi's CLIP backfill — e.g. `._DSC_2182-S.jpg`. macOS writes
`._<name>` AppleDouble sidecars when copying to non-HFS volumes
(SMB, FAT, exFAT), and they carry the original file's extension
even though their bytes are extended-attribute metadata, not the
image. ImageApi's walker matched them via the extension predicate,
sent them through the ingest pipeline, and accumulated failed rows
in face_detections + clip_embedding while pinning Apollo's eviction
timer with the 422 burst.
Fix: predicate-level guard in is_image_file / is_video_file (and
by inheritance is_media_file). Every walker that already gates on
these (face_watch, backfill, clip_watch, watcher, files,
probe_clip_search) inherits the skip without per-callsite edits.
Narrow scope on purpose — `._*` prefix + the exact `.DS_Store`
basename — rather than blanket dotfile filtering, because a user
could plausibly name a cover image `.cover.jpg`.
Existing rows are not cleaned by this change. To purge what
already accumulated (one-shot, run from your DB shell after
deploying):
DELETE FROM image_exif
WHERE file_path LIKE '%/._%' OR file_path LIKE '%/.DS_Store';
DELETE FROM face_detections
WHERE rel_path LIKE '%/._%' OR rel_path LIKE '%/.DS_Store';
DELETE FROM tagged_photo
WHERE file_path LIKE '%/._%' OR file_path LIKE '%/.DS_Store';
DELETE FROM favorites
WHERE path LIKE '%/._%' OR path LIKE '%/.DS_Store';
The maintenance pipeline's missing-file scan would NOT catch these
on its own — the files exist on disk (they're real macOS metadata,
just not images), so stat() returns Ok and the row sticks.
Three recurring issues on every full scan:
1. Video playlist scans re-enqueued every file only to reject it as
AlreadyExists. Pre-filter in ScanDirectoryMessage and QueueVideosMessage
so we skip videos whose .m3u8 already exists, and demote the leaked
AlreadyExists log to debug.
2. image crate was built with only jpeg/png features, so webp/tiff/avif
files logged "format not supported" every scan. Enable those features.
3. RAW (ARW/NEF/CR2/...) and HEIC thumbnails weren't generated, so the
scan kept retrying them. Try the file's embedded JPEG preview via
kamadak-exif first (fast, pure-Rust, works on Sony ARW where ffmpeg's
TIFF decoder fails). Fall back to ffmpeg for HEIC/HEIF and RAWs with
no preview. Anything still undecodable gets a <thumb>.unsupported
sentinel so future scans skip it.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>