Add TTS endpoints backed by Chatterbox via llama-swap

LlamaCppClient gains text_to_speech (OpenAI /audio/speech), list_voices and create_voice (voice library at the swap-root /upstream/<model>/voices passthrough), plus a tts_model slot configured via LLAMA_SWAP_TTS_MODEL (default "chatterbox"). New Claims-gated routes: - POST /tts/speech -> { audio_base64, format } for data: URI playback - GET /tts/voices -> voice library passthrough - POST /tts/voices/upload -> clone a voice from an uploaded clip (multipart) - POST /tts/voices/from-library -> clone from a library file (ffmpeg-extracts audio from video; audio forwarded as-is) Security: voice_name sanitized to [A-Za-z0-9_-] (it becomes an upstream filename), 25 MB upload cap, library refs restricted to real audio/video, path confined via is_valid_full_path. Adds is_audio_file + unit tests for the sanitizer, mime guesser, and swap-root derivation. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-02 22:04:42 -04:00
parent 015dc976e3
commit 69268d03fe
6 changed files with 558 additions and 0 deletions
@@ -11,6 +11,7 @@ pub mod llm_client;
 pub mod ollama;
 pub mod openrouter;
 pub mod sms_client;
+pub mod tts;
 pub mod turn_registry;

 // strip_summary_boilerplate is used by binaries (test_daily_summary), not the library
@@ -34,6 +35,10 @@ pub use llm_client::{
 };
 pub use ollama::{EMBEDDING_MODEL, OllamaClient};
 pub use sms_client::{SmsApiClient, SmsMessage};
+pub use tts::{
+    create_voice_from_library_handler, create_voice_upload_handler, list_voices_handler,
+    tts_speech_handler,
+};

 /// Display name used for the user in message transcripts and first-person
 /// prompt text. Reads the `USER_NAME` env var; defaults to `"Me"`. Models