Reverts the per-request backend="llamacpp" value. Chat/vision/embedding
backend is now a deploy-time decision (LLM_BACKEND=ollama|llamacpp),
applied globally across chat, vision describe, and embeddings — so
embedding vectors stay in one space across the index.
- Per-request backend whitelist back to "local"|"hybrid". A request
arriving with backend="llamacpp" is rejected.
- LLM_BACKEND=llamacpp swaps the entire local stack to llama-swap:
chat hits the chat slot, describe hits the vision slot, embeddings
hit the embed slot. Hybrid mode still routes chat to OpenRouter
but uses LLM_BACKEND for the describe pass.
- Drops env vars HYBRID_VISION_BACKEND, LLAMA_SWAP_VISION_MODELS,
EMBEDDING_BACKEND (the last never shipped). Drops the
LlamaCppClient.vision_models allowlist — capability inference now
reports has_vision only for the configured vision_model slot.
- Drops the /insights/llamacpp/models handler. /insights/models is
the single endpoint; returns Ollama servers under LLM_BACKEND=ollama
and llama-swap slots (from LLAMA_SWAP_ALLOWED_MODELS) under
LLM_BACKEND=llamacpp. Same envelope shape either way.
- New ai::embed_one helper routes embeddings through llama-swap when
LLM_BACKEND=llamacpp (else Ollama). Wires it into the four
insight_generator embedding sites.
- Cross-replay matrix simplifies to pre-llamacpp shape (local↔local,
hybrid↔hybrid, hybrid→local allowed; local→hybrid rejected).
APOLLO_CLIP_API_BASE_URL (falls back to APOLLO_API_BASE_URL),
CLIP_BACKLOG_MAX_PER_TICK, CLIP_ENCODE_CONCURRENCY, and
CLIP_REQUEST_TIMEOUT_SEC — all of which the code already reads.
Apollo's side was documented earlier; this closes the parity gap.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The face-recognition plan and CLAUDE.md document the full env-var
surface (face detection knobs, Apollo / Ollama / OpenRouter / SMS
integrations, watch intervals, RAG flags), but no example file
existed — operators copying the project to a new deploy had nothing
to start from. Group by section, comment out optional integrations
so a minimal copy boots without external services.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>