Make the embedding model swappable via env for A/B testing

Trialing Qwen3-Embedding-0.6B (1024-dim, instruct-prefixed queries)
against nomic required code changes at every hardcoded seam; now it's a
config flip plus a reembed_embeddings run.

- EMBEDDING_DIM env (default 768) replaces every hardcoded dim check:
  daily summary / calendar / search / location DAOs, Ollama batch
  validation, reembed_embeddings
- entities gains the dim guard it never had — a wrong-dim vector
  silently kills dedup/recall (cosine over mismatched lengths is 0),
  so store None and warn instead
- embed_query / embed_document split with EMBED_QUERY_PREFIX /
  EMBED_DOCUMENT_PREFIX (literal \n expanded): retrieval models treat
  the two sides differently — nomic wants search_query:/search_document:,
  Qwen3 wants Instruct:...\nQuery: on queries only. All query-side
  call sites and all corpus writers now declare their side.
- document the contract in CLAUDE.md: change the model or any of these
  vars → re-run reembed_embeddings or search is garbage

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
Cameron Cordes
2026-06-11 21:40:40 -04:00
parent b1493f5aca
commit efd05db523
12 changed files with 159 additions and 67 deletions
+8 -7
View File
@@ -141,7 +141,7 @@ async fn embed_with_truncation(llm: &LocalLlm, text: &str) -> Result<(Vec<f32>,
let mut text = text.to_string();
let mut truncated = false;
loop {
match llm.embed(&text).await {
match llm.embed_document(&text).await {
Ok(emb) => return Ok((emb, truncated)),
Err(e)
if e.to_string().contains("too large to process") && text.chars().count() > 64 =>
@@ -194,14 +194,15 @@ async fn reembed_table(
}
};
// The whole pipeline (DAO checks, stored corpora) assumes 768 dims.
// A different dim means the active backend is not serving a
// nomic-compatible model — stop rather than corrupt the table.
// The whole pipeline (DAO checks, stored corpora) assumes
// EMBEDDING_DIM dims. A mismatch means the active embed slot is not
// serving the configured model — stop rather than corrupt the table.
anyhow::ensure!(
new_emb.len() == 768,
"backend returned {}-dim embedding (expected 768) — '{}' is not \
serving a nomic-embed-text-v1.5-compatible model",
new_emb.len() == image_api::ai::embedding_dim(),
"backend returned {}-dim embedding (expected {}) — '{}' does not \
match the configured EMBEDDING_DIM",
new_emb.len(),
image_api::ai::embedding_dim(),
llm.embedding_model_version()
);