Give TTS synthesis its own (longer) request timeout

Long insights are chunked + synthesized server-side and can run past the shared
180s chat/embedding client timeout, causing spurious timeouts. /tts/speech now
uses a per-request timeout from LLAMA_SWAP_TTS_REQUEST_TIMEOUT_SECONDS
(default 600), overriding the client default without affecting chat/embeddings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
Cameron Cordes
2026-06-03 10:25:06 -04:00
parent 9978b28b52
commit d8dd260c6b
4 changed files with 17 additions and 0 deletions
+2
View File
@@ -666,6 +666,8 @@ LLAMA_SWAP_TTS_MODEL=chatterbox # TTS model id in config.yaml (de
LLAMA_SWAP_TTS_VOICE=m # Default voice when /tts/speech omits one (optional)
LLAMA_SWAP_TTS_REF_SECONDS=30 # Max voice-clone reference clip length, seconds
# (Chatterbox is zero-shot; ~10-20s clean ref is ideal)
LLAMA_SWAP_TTS_REQUEST_TIMEOUT_SECONDS=600 # Per-request synth timeout (long chunked insights take
# minutes); overrides the shared client timeout for /tts/speech
# Insight Chat Continuation
AGENTIC_CHAT_MAX_ITERATIONS=6 # Cap on tool-calling iterations per chat turn (default 6)