Document TTS endpoints and env in README + .env.example

Adds the /tts/speech and /tts/voices* endpoints plus LLAMA_SWAP_TTS_MODEL / LLAMA_SWAP_TTS_VOICE (TTS only needs LLAMA_SWAP_URL, not LLM_BACKEND=llamacpp). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-02 22:34:34 -04:00
parent 51be5df214
commit 35c5ecb427
2 changed files with 27 additions and 0 deletions
@@ -147,6 +147,25 @@ so you can rewrite the saved summary from within chat.
 - `AGENTIC_CHAT_MAX_ITERATIONS` - Cap on tool-calling iterations per chat turn [default: `6`]
  - Per-request `max_iterations` (when sent by the client) is clamped to this cap

+#### Text-to-Speech (Optional)
+Reads insights aloud and manages cloned voices via a Chatterbox model served
+behind the same llama-swap proxy. Only requires `LLAMA_SWAP_URL` (the TTS client
+is built whenever that's set — independent of `LLM_BACKEND`). Endpoints:
+- `POST /tts/speech` — body `{ text, voice?, format?, exaggeration?, cfg_weight?,
+  temperature? }`; returns `{ audio_base64, format }`. Input is cleaned
+  server-side (markdown + emoji stripped) and the generation knobs are clamped
+  to Chatterbox's ranges.
+- `GET /tts/voices` — list the voice library.
+- `POST /tts/voices/upload` — multipart `voice_name` + `voice_file`; clone a
+  voice from an uploaded clip (≤25 MB).
+- `POST /tts/voices/from-library` — body `{ voice_name, path, library? }`; clone
+  from a library file (audio forwarded as-is; video has its audio extracted via
+  ffmpeg).
+
+Env:
+- `LLAMA_SWAP_TTS_MODEL` - TTS model id in llama-swap's `config.yaml` [default: `chatterbox`]
+- `LLAMA_SWAP_TTS_VOICE` - default voice used when a `/tts/speech` request omits `voice` (optional)
+
 #### Fallback Behavior
 - Primary server is tried first with 5-second connection timeout
 - On failure, automatically falls back to secondary server (if configured)