Files
ImageApi/migrations/2026-05-14-000000_add_clip_embedding/up.sql
Cameron Cordes 8d9e76cf15 clip-search: migration + client + probe binary
Probe-phase scaffolding for CLIP semantic search. Adds the column
that will hold per-photo embeddings, the HTTP client to Apollo's
inference service, and a throwaway probe binary so we can eyeball
search-result quality on the live library before building the
persistence layer (backlog drain, /photos/search endpoint, UI).

- migrations/2026-05-14-000000_add_clip_embedding/ — adds
  image_exif.clip_embedding (BLOB) and clip_model_version (TEXT),
  plus a partial index on (clip_embedding IS NULL AND content_hash
  IS NOT NULL) for the future backfill drain.
- src/database/models.rs — extends ImageExif struct to match.
- src/ai/clip_client.rs — encode_image / encode_text / health,
  same Permanent/Transient/Disabled taxonomy as face_client.
- src/bin/probe_clip_search.rs — --query <q> --library N --limit M
  --top K. Encodes a sample and prints top-K cosine similarities.
  No DB writes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 16:10:52 -04:00

28 lines
1.4 KiB
SQL
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
-- CLIP semantic photo search: store a per-photo image embedding so
-- text queries can rerank against the live library via cosine
-- similarity. Apollo encodes the bytes via its CLIP service; ImageApi
-- writes the resulting blob here.
--
-- `clip_embedding` is the raw little-endian float32 buffer of an
-- L2-normalized vector (dim depends on the model — 768 bytes×4 for
-- ViT-L/14, 512 bytes×4 for ViT-B/32). Apollo always returns the
-- normalized form so the search-time dot product reduces to a plain
-- cosine similarity.
--
-- `clip_model_version` echoes the upstream `APOLLO_CLIP_MODEL` (e.g.
-- "ViT-L/14"). A model swap shouldn't silently mix geometries — the
-- backfill drain will re-eligibilize rows whose stored model_version
-- differs from the live engine's, and the search route refuses to
-- mix rows from two model_versions in the same response.
ALTER TABLE image_exif ADD COLUMN clip_embedding BLOB;
ALTER TABLE image_exif ADD COLUMN clip_model_version TEXT;
-- Partial index for the backfill drain. Mirrors the shape of
-- `idx_image_exif_date_backfill`: candidate rows are those with a
-- known content_hash (so we don't race the unhashed backlog) but no
-- embedding yet. SELECT cost stays O(missing rows) instead of full
-- table scan once the column is mostly populated.
CREATE INDEX IF NOT EXISTS idx_image_exif_clip_backfill
ON image_exif (id)
WHERE clip_embedding IS NULL AND content_hash IS NOT NULL;