Document TTS endpoints and env in README + .env.example
Adds the /tts/speech and /tts/voices* endpoints plus LLAMA_SWAP_TTS_MODEL / LLAMA_SWAP_TTS_VOICE (TTS only needs LLAMA_SWAP_URL, not LLM_BACKEND=llamacpp). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -80,6 +80,14 @@ AGENTIC_CHAT_MAX_ITERATIONS=6
|
|||||||
# LLAMA_SWAP_ALLOWED_MODELS=chat,vision,embed
|
# LLAMA_SWAP_ALLOWED_MODELS=chat,vision,embed
|
||||||
# LLAMA_SWAP_REQUEST_TIMEOUT_SECONDS=180
|
# LLAMA_SWAP_REQUEST_TIMEOUT_SECONDS=180
|
||||||
|
|
||||||
|
# ── Text-to-speech (optional, requires LLAMA_SWAP_URL) ───────────────────
|
||||||
|
# TTS routes through the same llama-swap proxy (a Chatterbox model id), so it
|
||||||
|
# only needs LLAMA_SWAP_URL — it does NOT require LLM_BACKEND=llamacpp.
|
||||||
|
# Powers POST /tts/speech and the /tts/voices* endpoints (read-aloud insights
|
||||||
|
# + voice cloning in the mobile app).
|
||||||
|
# LLAMA_SWAP_TTS_MODEL=chatterbox # TTS model id in config.yaml
|
||||||
|
# LLAMA_SWAP_TTS_VOICE=m # default voice when a request omits one
|
||||||
|
|
||||||
# ── AI Insights — sibling services (optional) ───────────────────────────
|
# ── AI Insights — sibling services (optional) ───────────────────────────
|
||||||
# Apollo (places, face inference, CLIP encoders). Single-Apollo deploys
|
# Apollo (places, face inference, CLIP encoders). Single-Apollo deploys
|
||||||
# typically set only APOLLO_API_BASE_URL and let the face + CLIP
|
# typically set only APOLLO_API_BASE_URL and let the face + CLIP
|
||||||
|
|||||||
@@ -147,6 +147,25 @@ so you can rewrite the saved summary from within chat.
|
|||||||
- `AGENTIC_CHAT_MAX_ITERATIONS` - Cap on tool-calling iterations per chat turn [default: `6`]
|
- `AGENTIC_CHAT_MAX_ITERATIONS` - Cap on tool-calling iterations per chat turn [default: `6`]
|
||||||
- Per-request `max_iterations` (when sent by the client) is clamped to this cap
|
- Per-request `max_iterations` (when sent by the client) is clamped to this cap
|
||||||
|
|
||||||
|
#### Text-to-Speech (Optional)
|
||||||
|
Reads insights aloud and manages cloned voices via a Chatterbox model served
|
||||||
|
behind the same llama-swap proxy. Only requires `LLAMA_SWAP_URL` (the TTS client
|
||||||
|
is built whenever that's set — independent of `LLM_BACKEND`). Endpoints:
|
||||||
|
- `POST /tts/speech` — body `{ text, voice?, format?, exaggeration?, cfg_weight?,
|
||||||
|
temperature? }`; returns `{ audio_base64, format }`. Input is cleaned
|
||||||
|
server-side (markdown + emoji stripped) and the generation knobs are clamped
|
||||||
|
to Chatterbox's ranges.
|
||||||
|
- `GET /tts/voices` — list the voice library.
|
||||||
|
- `POST /tts/voices/upload` — multipart `voice_name` + `voice_file`; clone a
|
||||||
|
voice from an uploaded clip (≤25 MB).
|
||||||
|
- `POST /tts/voices/from-library` — body `{ voice_name, path, library? }`; clone
|
||||||
|
from a library file (audio forwarded as-is; video has its audio extracted via
|
||||||
|
ffmpeg).
|
||||||
|
|
||||||
|
Env:
|
||||||
|
- `LLAMA_SWAP_TTS_MODEL` - TTS model id in llama-swap's `config.yaml` [default: `chatterbox`]
|
||||||
|
- `LLAMA_SWAP_TTS_VOICE` - default voice used when a `/tts/speech` request omits `voice` (optional)
|
||||||
|
|
||||||
#### Fallback Behavior
|
#### Fallback Behavior
|
||||||
- Primary server is tried first with 5-second connection timeout
|
- Primary server is tried first with 5-second connection timeout
|
||||||
- On failure, automatically falls back to secondary server (if configured)
|
- On failure, automatically falls back to secondary server (if configured)
|
||||||
|
|||||||
Reference in New Issue
Block a user