Commit Graph

5 Commits

Author SHA1 Message Date
Cameron Cordes
fb388c29d7 docs: update env + CLAUDE.md for direct-vision llamacpp + ResolvedBackend
llamacpp models now receive images directly instead of
describe-then-inline. LLAMA_SWAP_VISION_MODEL defaults to the
primary model. Document the ResolvedBackend dispatch pattern.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 15:03:12 -04:00
Cameron Cordes
be51421b38 ai: collapse llamacpp into LLM_BACKEND env switch
Reverts the per-request backend="llamacpp" value. Chat/vision/embedding
backend is now a deploy-time decision (LLM_BACKEND=ollama|llamacpp),
applied globally across chat, vision describe, and embeddings — so
embedding vectors stay in one space across the index.

- Per-request backend whitelist back to "local"|"hybrid". A request
  arriving with backend="llamacpp" is rejected.
- LLM_BACKEND=llamacpp swaps the entire local stack to llama-swap:
  chat hits the chat slot, describe hits the vision slot, embeddings
  hit the embed slot. Hybrid mode still routes chat to OpenRouter
  but uses LLM_BACKEND for the describe pass.
- Drops env vars HYBRID_VISION_BACKEND, LLAMA_SWAP_VISION_MODELS,
  EMBEDDING_BACKEND (the last never shipped). Drops the
  LlamaCppClient.vision_models allowlist — capability inference now
  reports has_vision only for the configured vision_model slot.
- Drops the /insights/llamacpp/models handler. /insights/models is
  the single endpoint; returns Ollama servers under LLM_BACKEND=ollama
  and llama-swap slots (from LLAMA_SWAP_ALLOWED_MODELS) under
  LLM_BACKEND=llamacpp. Same envelope shape either way.
- New ai::embed_one helper routes embeddings through llama-swap when
  LLM_BACKEND=llamacpp (else Ollama). Wires it into the four
  insight_generator embedding sites.
- Cross-replay matrix simplifies to pre-llamacpp shape (local↔local,
  hybrid↔hybrid, hybrid→local allowed; local→hybrid rejected).
2026-05-21 11:36:58 -04:00
Cameron Cordes
d14df63f19 env.example: document LLAMA_SWAP_* + HYBRID_VISION_BACKEND vars
Mirrors the section added to CLAUDE.md so deploys can opt into the
llamacpp backend from the template alone.
2026-05-20 17:54:08 -04:00
Cameron Cordes
ee2ed3005b clip-search: document env knobs in .env.example
APOLLO_CLIP_API_BASE_URL (falls back to APOLLO_API_BASE_URL),
CLIP_BACKLOG_MAX_PER_TICK, CLIP_ENCODE_CONCURRENCY, and
CLIP_REQUEST_TIMEOUT_SEC — all of which the code already reads.
Apollo's side was documented earlier; this closes the parity gap.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 16:10:52 -04:00
Cameron Cordes
675b4a4849 faces: add .env.example template covering all documented env vars
The face-recognition plan and CLAUDE.md document the full env-var
surface (face detection knobs, Apollo / Ollama / OpenRouter / SMS
integrations, watch intervals, RAG flags), but no example file
existed — operators copying the project to a new deploy had nothing
to start from. Group by section, comment out optional integrations
so a minimal copy boots without external services.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 13:51:45 +00:00