Add TTS endpoints backed by Chatterbox via llama-swap
LlamaCppClient gains text_to_speech (OpenAI /audio/speech), list_voices and
create_voice (voice library at the swap-root /upstream/<model>/voices
passthrough), plus a tts_model slot configured via LLAMA_SWAP_TTS_MODEL
(default "chatterbox").
New Claims-gated routes:
- POST /tts/speech -> { audio_base64, format } for data: URI playback
- GET /tts/voices -> voice library passthrough
- POST /tts/voices/upload -> clone a voice from an uploaded clip (multipart)
- POST /tts/voices/from-library -> clone from a library file (ffmpeg-extracts
audio from video; audio forwarded as-is)
Security: voice_name sanitized to [A-Za-z0-9_-] (it becomes an upstream
filename), 25 MB upload cap, library refs restricted to real audio/video,
path confined via is_valid_full_path. Adds is_audio_file + unit tests for the
sanitizer, mime guesser, and swap-root derivation.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -22,6 +22,10 @@ pub fn needs_ffmpeg_thumbnail(path: &Path) -> bool {
|
||||
/// Supported video file extensions
|
||||
pub const VIDEO_EXTENSIONS: &[&str] = &["mp4", "mov", "avi", "mkv"];
|
||||
|
||||
/// Audio file extensions accepted as voice-clone references (TTS). Mirrors
|
||||
/// the formats Chatterbox can decode (wav/mp3/flac/m4a/aac/ogg).
|
||||
pub const AUDIO_EXTENSIONS: &[&str] = &["wav", "mp3", "flac", "m4a", "aac", "ogg", "oga", "opus"];
|
||||
|
||||
/// Filenames that are filesystem metadata, not real media — exact
|
||||
/// basename match. Extend if a new platform sidecar appears (Windows
|
||||
/// Thumbs.db / desktop.ini live here too if those libraries land).
|
||||
@@ -75,6 +79,19 @@ pub fn is_video_file(path: &Path) -> bool {
|
||||
}
|
||||
}
|
||||
|
||||
/// Check if a path has an audio extension (voice-clone references)
|
||||
pub fn is_audio_file(path: &Path) -> bool {
|
||||
if is_filesystem_metadata(path) {
|
||||
return false;
|
||||
}
|
||||
if let Some(ext) = path.extension().and_then(|e| e.to_str()) {
|
||||
let ext_lower = ext.to_lowercase();
|
||||
AUDIO_EXTENSIONS.contains(&ext_lower.as_str())
|
||||
} else {
|
||||
false
|
||||
}
|
||||
}
|
||||
|
||||
/// Check if a path has a supported media extension (image or video)
|
||||
pub fn is_media_file(path: &Path) -> bool {
|
||||
is_image_file(path) || is_video_file(path)
|
||||
|
||||
Reference in New Issue
Block a user