faces: phase 4 — people-tag bootstrap + auto-bind on detection

Wires the existing string people-tags into the new persons table and
auto-binds new detections to a same-named person when the photo carries
exactly one matching tag. ImageApi has no notion of which tags are
people-tags today (purely a user mental model), so this is operator-
confirmed: the suggester surfaces candidates with a heuristic flag, the
operator confirms, then bootstrap creates persons rows. Auto-bind
follows on every detection thereafter.

New endpoints:
  GET  /tags/people-bootstrap-candidates
       Per case-insensitive name group: display name (most-frequent
       capitalization), normalized lowercase, summed usage_count,
       looks_like_person heuristic flag, already_exists check against
       the persons table. Sorted persons-likely-first then by count.
  POST /persons/bootstrap
       Body: {names: [string]}. Idempotent — pre-fetches the existing-
       name set so a duplicate request reports per-row "already exists"
       instead of 409-ing each insert. Created rows get
       created_from_tag=true; failed rows surface in `skipped` with a
       reason.

looks_like_person heuristic — conservative on purpose because the
operator confirms in the UI:
  - 1–2 whitespace-separated words
  - Each word starts uppercase, no digits anywhere
  - Single-word names not on a small denylist (cat, christmas, beach,
    sunset, untagged, ...). Two-word names skip the denylist so
    "Sarah Smith" is never false-rejected.

FaceDao additions:
  - find_persons_by_names_ci — bulk lowercase-name → person_id lookup
    via sql_query (Diesel's BoxedSelectStatement + LOWER() doesn't
    play well with the type system).
  - person_reference_embedding — L2-normalized mean of a person's
    detected embeddings, *filtered by model_version* so a future
    buffalo_xl row can never contaminate an in-flight buffalo_l auto-
    bind decision. Returns None when the person has no faces yet.
  - assign_face_to_person — sets face_detections.person_id and, only
    when persons.cover_face_id is NULL, claims this face as cover. The
    UI's hand-picked cover survives later auto-binds.
  - decode_embedding_bytes / cosine_similarity helpers — pub(crate)
    so face_watch can decode the wire bytes once and feed them through
    the cosine threshold.

Auto-bind in face_watch::process_one:
  After every successful detect, for each newly-stored auto face we
  pull the photo's tags, look up which (if any) map to existing
  persons, and:
    - skip when zero or multiple distinct persons are matched
      (multi-match is genuinely ambiguous; cluster suggester handles it)
    - on first face for a person: bind unconditionally so bootstrap can
      ever produce a usable reference
    - thereafter: bind iff cosine(new_emb, person_ref) >=
      FACE_AUTOBIND_MIN_COS (default 0.4, env-tunable to 0..=1)
  The reference embedding comes from person_reference_embedding under
  the same model_version as the candidate, so a model upgrade never
  silently re-anchors a person's centroid.

Plumbing: watch_files now constructs its own SqliteTagDao alongside the
other watcher DAOs and threads it through process_new_files →
run_face_detection_pass → process_one. The handler-side TagDao
registration in main.rs already covers bootstrap_candidates_handler;
no extra app_data wiring needed.

Tests: 8 new (faces.rs):
  - looks_like_person accepts/rejects/two-word-skips-denylist (3)
  - cosine_similarity on identical / orthogonal / opposite / mismatch /
    zero / empty inputs
  - decode_embedding_bytes round-trip + size validation
  - find_persons_by_names_ci groups case + handles empty input
  - person_reference_embedding filters by model_version (buffalo_l ref
    must not include buffalo_xl rows)
  - assign_face_to_person sets cover when unset, doesn't overwrite

cargo test --lib: 179 / 0; fmt + clippy clean for new code.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Cameron Cordes
2026-04-29 18:55:01 +00:00
parent f985a0d658
commit 1859399759
3 changed files with 997 additions and 49 deletions

View File

@@ -1827,6 +1827,12 @@ fn watch_files(
let face_dao = Arc::new(Mutex::new(
Box::new(faces::SqliteFaceDao::new()) as Box<dyn faces::FaceDao>
));
// tag_dao for the watcher's auto-bind path. Independent of the
// request-handler tag_dao instance — both end up pointing at the
// same SQLite file via SqliteTagDao::default().
let watcher_tag_dao = Arc::new(Mutex::new(
Box::new(SqliteTagDao::default()) as Box<dyn tags::TagDao>
));
let mut last_quick_scan = SystemTime::now();
let mut last_full_scan = SystemTime::now();
@@ -1853,6 +1859,7 @@ fn watch_files(
Arc::clone(&exif_dao),
Arc::clone(&preview_dao),
Arc::clone(&face_dao),
Arc::clone(&watcher_tag_dao),
face_client.clone(),
&excluded_dirs,
None,
@@ -1873,6 +1880,7 @@ fn watch_files(
Arc::clone(&exif_dao),
Arc::clone(&preview_dao),
Arc::clone(&face_dao),
Arc::clone(&watcher_tag_dao),
face_client.clone(),
&excluded_dirs,
Some(check_since),
@@ -1922,6 +1930,7 @@ fn process_new_files(
exif_dao: Arc<Mutex<Box<dyn ExifDao>>>,
preview_dao: Arc<Mutex<Box<dyn PreviewDao>>>,
face_dao: Arc<Mutex<Box<dyn faces::FaceDao>>>,
tag_dao: Arc<Mutex<Box<dyn tags::TagDao>>>,
face_client: crate::ai::face_client::FaceClient,
excluded_dirs: &[String],
modified_since: Option<SystemTime>,
@@ -2112,6 +2121,7 @@ fn process_new_files(
excluded_dirs,
&face_client,
Arc::clone(&face_dao),
Arc::clone(&tag_dao),
candidates,
);
}