Give TTS synthesis its own (longer) request timeout

Long insights are chunked + synthesized server-side and can run past the shared 180s chat/embedding client timeout, causing spurious timeouts. /tts/speech now uses a per-request timeout from LLAMA_SWAP_TTS_REQUEST_TIMEOUT_SECONDS (default 600), overriding the client default without affecting chat/embeddings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-03 10:25:06 -04:00
parent 9978b28b52
commit d8dd260c6b
4 changed files with 17 additions and 0 deletions
@@ -88,6 +88,7 @@ AGENTIC_CHAT_MAX_ITERATIONS=6
 # LLAMA_SWAP_TTS_MODEL=chatterbox        # TTS model id in config.yaml
 # LLAMA_SWAP_TTS_VOICE=m                 # default voice when a request omits one
 # LLAMA_SWAP_TTS_REF_SECONDS=30          # max voice-clone reference clip length (s)
+# LLAMA_SWAP_TTS_REQUEST_TIMEOUT_SECONDS=600   # synth timeout (long chunked text)

 # ── AI Insights — sibling services (optional) ───────────────────────────
 # Apollo (places, face inference, CLIP encoders). Single-Apollo deploys