Unified search: UNIFIED_SEARCH_MODEL env override for the translation step
Pin the NL->structured translation to a small, fast model that can stay co-resident with CLIP (and the chat model) so it never evicts them on a tight VRAM budget. Precedence: UNIFIED_SEARCH_MODEL env > client-selected model > configured default. Logs the effective model (backend.model()) so model A/B tests are visible. Documented in .env.example. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -80,6 +80,16 @@ AGENTIC_CHAT_MAX_ITERATIONS=6
|
||||
# LLAMA_SWAP_ALLOWED_MODELS=chat,vision,embed
|
||||
# LLAMA_SWAP_REQUEST_TIMEOUT_SECONDS=180
|
||||
|
||||
# ── Unified search translation model (optional) ─────────────────────────
|
||||
# /photos/search/unified runs one small LLM call to translate a natural-
|
||||
# language query into structured filters + a semantic term, then CLIP-ranks.
|
||||
# That step needs an LLM AND CLIP available at once. On a tight VRAM budget a
|
||||
# large chat model can't co-reside with CLIP, so pin a small, fast model here
|
||||
# (it can stay loaded alongside CLIP and the chat model). Precedence:
|
||||
# UNIFIED_SEARCH_MODEL > the client's selected model > the configured default.
|
||||
# Use the configured backend (LLM_BACKEND); local only — no hybrid.
|
||||
# UNIFIED_SEARCH_MODEL=qwen3-0.6b
|
||||
|
||||
# ── Text-to-speech (optional, requires LLAMA_SWAP_URL) ───────────────────
|
||||
# TTS routes through the same llama-swap proxy (a Chatterbox model id), so it
|
||||
# only needs LLAMA_SWAP_URL — it does NOT require LLM_BACKEND=llamacpp.
|
||||
|
||||
Reference in New Issue
Block a user