Give TTS synthesis its own (longer) request timeout

Long insights are chunked + synthesized server-side and can run past the shared 180s chat/embedding client timeout, causing spurious timeouts. /tts/speech now uses a per-request timeout from LLAMA_SWAP_TTS_REQUEST_TIMEOUT_SECONDS (default 600), overriding the client default without affecting chat/embeddings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-03 10:25:06 -04:00
parent 9978b28b52
commit d8dd260c6b
4 changed files with 17 additions and 0 deletions
@@ -169,6 +169,10 @@ Env:
  [default: `30`]. Reference audio is ffmpeg-normalized to mono 24 kHz WAV (so any
  source format works); Chatterbox is zero-shot, so a clean ~10–20s sample is the
  sweet spot — more rarely helps.
+- `LLAMA_SWAP_TTS_REQUEST_TIMEOUT_SECONDS` - per-request synthesis timeout in
+  seconds [default: `600`]. Long insights are chunked + synthesized server-side
+  and can take minutes; this is separate from (and overrides, for `/tts/speech`)
+  the shared `LLAMA_SWAP_REQUEST_TIMEOUT_SECONDS`.

 #### Fallback Behavior
 - Primary server is tried first with 5-second connection timeout