Make the embedding model swappable via env for A/B testing

Trialing Qwen3-Embedding-0.6B (1024-dim, instruct-prefixed queries) against nomic required code changes at every hardcoded seam; now it's a config flip plus a reembed_embeddings run. - EMBEDDING_DIM env (default 768) replaces every hardcoded dim check: daily summary / calendar / search / location DAOs, Ollama batch validation, reembed_embeddings - entities gains the dim guard it never had — a wrong-dim vector silently kills dedup/recall (cosine over mismatched lengths is 0), so store None and warn instead - embed_query / embed_document split with EMBED_QUERY_PREFIX / EMBED_DOCUMENT_PREFIX (literal \n expanded): retrieval models treat the two sides differently — nomic wants search_query:/search_document:, Qwen3 wants Instruct:...\nQuery: on queries only. All query-side call sites and all corpus writers now declare their side. - document the contract in CLAUDE.md: change the model or any of these vars → re-run reembed_embeddings or search is garbage Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 21:40:40 -04:00
parent b1493f5aca
commit efd05db523
12 changed files with 159 additions and 67 deletions
@@ -645,6 +645,14 @@ OPENROUTER_APP_TITLE=ImageApi                  # Optional attribution header
 # re-embedding — mixed vector spaces break similarity search.
 LLM_BACKEND=ollama

+# Embedding model contract. Corpus and queries must be embedded by the same
+# model with matching prefixes — after changing the embed model or any of
+# these, run `cargo run --bin reembed_embeddings` (all tables) or search is
+# garbage. Prefix values may contain a literal \n (expanded to a newline).
+EMBEDDING_DIM=768           # 768 = nomic-embed-text v1.5; 1024 = Qwen3-Embedding-0.6B
+EMBED_QUERY_PREFIX=         # nomic: "search_query: " | Qwen3: "Instruct: <task>\nQuery: "
+EMBED_DOCUMENT_PREFIX=      # nomic: "search_document: " | Qwen3: leave empty
+
 # llama.cpp / llama-swap (used when LLM_BACKEND=llamacpp). OpenAI-compatible
 # proxy hosting one or more llama-server processes. Chat models receive
 # images directly via content-parts (all models assumed vision-capable).