Give TTS synthesis its own (longer) request timeout
Long insights are chunked + synthesized server-side and can run past the shared 180s chat/embedding client timeout, causing spurious timeouts. /tts/speech now uses a per-request timeout from LLAMA_SWAP_TTS_REQUEST_TIMEOUT_SECONDS (default 600), overriding the client default without affecting chat/embeddings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -169,6 +169,10 @@ Env:
|
||||
[default: `30`]. Reference audio is ffmpeg-normalized to mono 24 kHz WAV (so any
|
||||
source format works); Chatterbox is zero-shot, so a clean ~10–20s sample is the
|
||||
sweet spot — more rarely helps.
|
||||
- `LLAMA_SWAP_TTS_REQUEST_TIMEOUT_SECONDS` - per-request synthesis timeout in
|
||||
seconds [default: `600`]. Long insights are chunked + synthesized server-side
|
||||
and can take minutes; this is separate from (and overrides, for `/tts/speech`)
|
||||
the shared `LLAMA_SWAP_REQUEST_TIMEOUT_SECONDS`.
|
||||
|
||||
#### Fallback Behavior
|
||||
- Primary server is tried first with 5-second connection timeout
|
||||
|
||||
Reference in New Issue
Block a user