Add TTS endpoints backed by Chatterbox via llama-swap

LlamaCppClient gains text_to_speech (OpenAI /audio/speech), list_voices and
create_voice (voice library at the swap-root /upstream/<model>/voices
passthrough), plus a tts_model slot configured via LLAMA_SWAP_TTS_MODEL
(default "chatterbox").

New Claims-gated routes:
- POST /tts/speech        -> { audio_base64, format } for data: URI playback
- GET  /tts/voices        -> voice library passthrough
- POST /tts/voices/upload -> clone a voice from an uploaded clip (multipart)
- POST /tts/voices/from-library -> clone from a library file (ffmpeg-extracts
  audio from video; audio forwarded as-is)

Security: voice_name sanitized to [A-Za-z0-9_-] (it becomes an upstream
filename), 25 MB upload cap, library refs restricted to real audio/video,
path confined via is_valid_full_path. Adds is_audio_file + unit tests for the
sanitizer, mime guesser, and swap-root derivation.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
Cameron Cordes
2026-06-02 22:04:42 -04:00
parent 015dc976e3
commit 69268d03fe
6 changed files with 558 additions and 0 deletions
+3
View File
@@ -391,6 +391,9 @@ fn build_llamacpp_from_env() -> Option<Arc<LlamaCppClient>> {
if let Ok(model) = env::var("LLAMA_SWAP_VISION_MODEL") {
client.set_vision_model(model);
}
if let Ok(model) = env::var("LLAMA_SWAP_TTS_MODEL") {
client.set_tts_model(model);
}
Some(Arc::new(client))
}