Files
ImageApi/src/face_watch.rs
Cameron Cordes 1859399759 faces: phase 4 — people-tag bootstrap + auto-bind on detection
Wires the existing string people-tags into the new persons table and
auto-binds new detections to a same-named person when the photo carries
exactly one matching tag. ImageApi has no notion of which tags are
people-tags today (purely a user mental model), so this is operator-
confirmed: the suggester surfaces candidates with a heuristic flag, the
operator confirms, then bootstrap creates persons rows. Auto-bind
follows on every detection thereafter.

New endpoints:
  GET  /tags/people-bootstrap-candidates
       Per case-insensitive name group: display name (most-frequent
       capitalization), normalized lowercase, summed usage_count,
       looks_like_person heuristic flag, already_exists check against
       the persons table. Sorted persons-likely-first then by count.
  POST /persons/bootstrap
       Body: {names: [string]}. Idempotent — pre-fetches the existing-
       name set so a duplicate request reports per-row "already exists"
       instead of 409-ing each insert. Created rows get
       created_from_tag=true; failed rows surface in `skipped` with a
       reason.

looks_like_person heuristic — conservative on purpose because the
operator confirms in the UI:
  - 1–2 whitespace-separated words
  - Each word starts uppercase, no digits anywhere
  - Single-word names not on a small denylist (cat, christmas, beach,
    sunset, untagged, ...). Two-word names skip the denylist so
    "Sarah Smith" is never false-rejected.

FaceDao additions:
  - find_persons_by_names_ci — bulk lowercase-name → person_id lookup
    via sql_query (Diesel's BoxedSelectStatement + LOWER() doesn't
    play well with the type system).
  - person_reference_embedding — L2-normalized mean of a person's
    detected embeddings, *filtered by model_version* so a future
    buffalo_xl row can never contaminate an in-flight buffalo_l auto-
    bind decision. Returns None when the person has no faces yet.
  - assign_face_to_person — sets face_detections.person_id and, only
    when persons.cover_face_id is NULL, claims this face as cover. The
    UI's hand-picked cover survives later auto-binds.
  - decode_embedding_bytes / cosine_similarity helpers — pub(crate)
    so face_watch can decode the wire bytes once and feed them through
    the cosine threshold.

Auto-bind in face_watch::process_one:
  After every successful detect, for each newly-stored auto face we
  pull the photo's tags, look up which (if any) map to existing
  persons, and:
    - skip when zero or multiple distinct persons are matched
      (multi-match is genuinely ambiguous; cluster suggester handles it)
    - on first face for a person: bind unconditionally so bootstrap can
      ever produce a usable reference
    - thereafter: bind iff cosine(new_emb, person_ref) >=
      FACE_AUTOBIND_MIN_COS (default 0.4, env-tunable to 0..=1)
  The reference embedding comes from person_reference_embedding under
  the same model_version as the candidate, so a model upgrade never
  silently re-anchors a person's centroid.

Plumbing: watch_files now constructs its own SqliteTagDao alongside the
other watcher DAOs and threads it through process_new_files →
run_face_detection_pass → process_one. The handler-side TagDao
registration in main.rs already covers bootstrap_candidates_handler;
no extra app_data wiring needed.

Tests: 8 new (faces.rs):
  - looks_like_person accepts/rejects/two-word-skips-denylist (3)
  - cosine_similarity on identical / orthogonal / opposite / mismatch /
    zero / empty inputs
  - decode_embedding_bytes round-trip + size validation
  - find_persons_by_names_ci groups case + handles empty input
  - person_reference_embedding filters by model_version (buffalo_l ref
    must not include buffalo_xl rows)
  - assign_face_to_person sets cover when unset, doesn't overwrite

cargo test --lib: 179 / 0; fmt + clippy clean for new code.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 18:55:01 +00:00

558 lines
21 KiB
Rust
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
//! Face-detection pass for the file watcher.
//!
//! `process_new_files` calls [`run_face_detection_pass`] after the EXIF
//! registration loop. We walk the candidates (images, not yet face-scanned,
//! not excluded by EXCLUDED_DIRS), fan out parallel detect calls to Apollo,
//! and persist the results — detected faces, `no_faces` markers when Apollo
//! found nothing, `failed` markers on permanent decode errors, no marker on
//! transient failures so the next scan retries.
//!
//! The watcher runs in a plain `std::thread`, so we build a short-lived
//! tokio runtime per pass and `block_on` a join of K detect futures. K is
//! configurable via `FACE_DETECT_CONCURRENCY` (default 8). Apollo's
//! threadpool is bounded to 12 workers anyway, so the runs queue
//! server-side; the client-side fan-out is purely about overlapping IO
//! (file read + JSON encode) with someone else's inference.
use crate::ai::face_client::{DetectMeta, FaceClient, FaceDetectError};
use crate::exif;
use crate::faces::{self, FaceDao, InsertFaceDetectionInput};
use crate::file_types;
use crate::libraries::Library;
use crate::memories::PathExcluder;
use crate::tags::TagDao;
use log::{debug, info, warn};
use std::path::Path;
use std::sync::{Arc, Mutex};
use tokio::sync::Semaphore;
/// One file the watcher would like to face-scan. Built by the caller from
/// the EXIF batch (we need `content_hash` to key everything against).
#[derive(Debug, Clone)]
pub struct FaceCandidate {
pub rel_path: String,
pub content_hash: String,
}
/// Synchronous entry point. Returns once every candidate has been
/// processed (or definitively skipped). When `face_client.is_enabled()`
/// is false this is a no-op so the watcher can call unconditionally.
pub fn run_face_detection_pass(
library: &Library,
excluded_dirs: &[String],
face_client: &FaceClient,
face_dao: Arc<Mutex<Box<dyn FaceDao>>>,
tag_dao: Arc<Mutex<Box<dyn TagDao>>>,
candidates: Vec<FaceCandidate>,
) {
if !face_client.is_enabled() {
return;
}
if candidates.is_empty() {
return;
}
let base = Path::new(&library.root_path);
let filtered = filter_excluded(base, excluded_dirs, candidates, Some(&library.name));
if filtered.is_empty() {
return;
}
let concurrency: usize = std::env::var("FACE_DETECT_CONCURRENCY")
.ok()
.and_then(|s| s.parse().ok())
.filter(|n: &usize| *n > 0)
.unwrap_or(8);
info!(
"face_watch: running detection on {} candidates (library '{}', concurrency {})",
filtered.len(),
library.name,
concurrency
);
// Per-pass tokio runtime. The watcher thread isn't in any pre-existing
// async context — building one here keeps the rest of the watcher
// sync-only. Worker count is small; the parallelism we care about is
// task-level (semaphore) not thread-level.
let rt = match tokio::runtime::Builder::new_multi_thread()
.worker_threads(2)
.enable_all()
.build()
{
Ok(rt) => rt,
Err(e) => {
warn!("face_watch: failed to build tokio runtime: {e}");
return;
}
};
let library_id = library.id;
let library_root = library.root_path.clone();
rt.block_on(async move {
let sem = Arc::new(Semaphore::new(concurrency));
let mut handles = Vec::with_capacity(filtered.len());
for cand in filtered {
let permit_sem = sem.clone();
let face_client = face_client.clone();
let face_dao = face_dao.clone();
let tag_dao = tag_dao.clone();
let library_root = library_root.clone();
handles.push(tokio::spawn(async move {
// acquire_owned would let us drop the permit explicitly
// before await points; for a one-shot call into Apollo
// the simpler bounded acquire is enough.
let _permit = permit_sem.acquire().await.expect("face semaphore");
process_one(
library_id,
&library_root,
cand,
&face_client,
face_dao,
tag_dao,
)
.await;
}));
}
for h in handles {
// join; per-task panics are logged inside process_one before
// they reach here, so we don't propagate.
let _ = h.await;
}
});
}
async fn process_one(
library_id: i32,
library_root: &str,
cand: FaceCandidate,
face_client: &FaceClient,
face_dao: Arc<Mutex<Box<dyn FaceDao>>>,
tag_dao: Arc<Mutex<Box<dyn TagDao>>>,
) {
let abs = Path::new(library_root).join(&cand.rel_path);
// Read the bytes off disk in a blocking-friendly task. Filesystem IO
// is sync but cheap; a small spawn_blocking would be overkill.
let bytes = match read_image_bytes_for_detect(&abs) {
Ok(b) => b,
Err(e) => {
// Don't mark — file may have been moved/renamed mid-scan; let
// the next pass try again. Future-bug check: a permanently
// unreadable file would loop forever; we accept that for v1
// because process_new_files already prunes vanished rows on
// full scans.
warn!(
"face_watch: read failed for {} ({}): {}",
cand.rel_path, library_id, e
);
return;
}
};
let meta = DetectMeta {
content_hash: cand.content_hash.clone(),
library_id,
rel_path: cand.rel_path.clone(),
orientation: None,
model_version: None,
};
let ctx = opentelemetry::Context::current();
match face_client.detect(bytes, meta).await {
Ok(resp) => {
// Stage 1: persist detections, holding the dao lock only
// across synchronous DB writes.
let mut stored_for_autobind: Vec<(i32, Vec<f32>)> = Vec::new();
{
let mut dao = face_dao.lock().expect("face dao");
if resp.faces.is_empty() {
if let Err(e) = dao.mark_status(
&ctx,
library_id,
&cand.content_hash,
&cand.rel_path,
"no_faces",
&resp.model_version,
) {
warn!(
"face_watch: mark no_faces failed for {}: {:?}",
cand.rel_path, e
);
}
debug!(
"face_watch: {} → no faces (model {})",
cand.rel_path, resp.model_version
);
} else {
let face_count = resp.faces.len();
for face in &resp.faces {
let emb = match face.decode_embedding() {
Ok(b) => b,
Err(e) => {
warn!("face_watch: bad embedding for {}: {:?}", cand.rel_path, e);
continue;
}
};
// Decode the f32 vector once for auto-bind comparison.
let emb_floats = faces::decode_embedding_bytes(&emb);
match dao.store_detection(
&ctx,
InsertFaceDetectionInput {
library_id,
content_hash: cand.content_hash.clone(),
rel_path: cand.rel_path.clone(),
bbox: Some((face.bbox.x, face.bbox.y, face.bbox.w, face.bbox.h)),
embedding: Some(emb),
confidence: Some(face.confidence),
source: "auto".to_string(),
person_id: None,
status: "detected".to_string(),
model_version: resp.model_version.clone(),
},
) {
Ok(row) => {
if let Some(floats) = emb_floats {
stored_for_autobind.push((row.id, floats));
}
}
Err(e) => warn!(
"face_watch: store_detection failed for {}: {:?}",
cand.rel_path, e
),
}
}
info!(
"face_watch: {} → {} face(s) ({}ms, {})",
cand.rel_path, face_count, resp.duration_ms, resp.model_version
);
}
}
// Stage 2: auto-bind newly-stored faces against same-named
// people-tags. Done outside the dao lock so the lookups don't
// serialize with concurrent detect tasks.
if !stored_for_autobind.is_empty() {
try_auto_bind(
&ctx,
&cand.rel_path,
&resp.model_version,
stored_for_autobind,
&tag_dao,
&face_dao,
);
}
}
Err(FaceDetectError::Permanent(e)) => {
warn!(
"face_watch: permanent failure on {}: {} — marking failed",
cand.rel_path, e
);
let mut dao = face_dao.lock().expect("face dao");
// model_version is best-effort here — the engine that rejected
// the bytes may not have echoed one. Empty string is fine; this
// row is purely a "don't retry" sentinel.
if let Err(e) = dao.mark_status(
&ctx,
library_id,
&cand.content_hash,
&cand.rel_path,
"failed",
"",
) {
warn!(
"face_watch: mark failed errored for {}: {:?}",
cand.rel_path, e
);
}
}
Err(FaceDetectError::Transient(e)) => {
// Don't mark anything; next scan tick retries naturally.
// Demoted to debug because OOM and engine-not-ready are noisy
// and self-resolving.
debug!(
"face_watch: transient on {}: {} (will retry next pass)",
cand.rel_path, e
);
}
Err(FaceDetectError::Disabled) => {
// Caller already checked is_enabled(); this branch is defensive.
}
}
}
/// Auto-bind newly-detected faces to a same-named person, when a tag on the
/// photo unambiguously identifies one. Driven by `FACE_AUTOBIND_MIN_COS`
/// (default 0.4): the new face's embedding must reach this cosine
/// similarity against the L2-normalized mean of the person's existing
/// faces. The first face for a person binds unconditionally — there's
/// nothing to compare against, and the alternative ("never bind without
/// a reference") would mean bootstrap never kicks off.
///
/// Multi-match (the photo carries tags for two different known persons)
/// is intentionally a no-op — we can't tell which face is which without
/// additional matching. Those faces stay unassigned for the cluster
/// suggester (Phase 6) to handle.
fn try_auto_bind(
ctx: &opentelemetry::Context,
rel_path: &str,
model_version: &str,
new_faces: Vec<(i32, Vec<f32>)>, // (face_id, decoded embedding)
tag_dao: &Arc<Mutex<Box<dyn TagDao>>>,
face_dao: &Arc<Mutex<Box<dyn FaceDao>>>,
) {
// 1. Pull the photo's tags.
let tag_names: Vec<String> = {
let mut td = tag_dao.lock().expect("tag dao");
match td.get_tags_for_path(ctx, rel_path) {
Ok(tags) => tags.into_iter().map(|t| t.name).collect(),
Err(e) => {
warn!(
"face_watch: get_tags_for_path failed for {}: {:?}",
rel_path, e
);
return;
}
}
};
if tag_names.is_empty() {
return;
}
// 2. Find tags that map to existing persons (case-insensitive).
let person_for_tag: std::collections::HashMap<String, i32> = {
let mut fd = face_dao.lock().expect("face dao");
match fd.find_persons_by_names_ci(ctx, &tag_names) {
Ok(m) => m,
Err(e) => {
warn!(
"face_watch: find_persons_by_names_ci failed for {}: {:?}",
rel_path, e
);
return;
}
}
};
// 3. Multi-match: ambiguous, skip. Single match: candidate person.
let unique_person_ids: std::collections::HashSet<i32> =
person_for_tag.values().copied().collect();
if unique_person_ids.len() != 1 {
if !unique_person_ids.is_empty() {
debug!(
"face_watch: {} carries tags for {} different persons; skipping auto-bind",
rel_path,
unique_person_ids.len()
);
}
return;
}
let person_id = *unique_person_ids.iter().next().expect("nonempty set");
let threshold: f32 = std::env::var("FACE_AUTOBIND_MIN_COS")
.ok()
.and_then(|s| s.parse().ok())
.filter(|t: &f32| *t >= 0.0 && *t <= 1.0)
.unwrap_or(0.4);
// 4. Reference embedding (if any) under the same model_version.
let reference: Option<Vec<f32>> = {
let mut fd = face_dao.lock().expect("face dao");
match fd.person_reference_embedding(ctx, person_id, model_version) {
Ok(r) => r,
Err(e) => {
warn!(
"face_watch: person_reference_embedding failed for person {}: {:?}",
person_id, e
);
return;
}
}
};
// 5. Bind each new face that meets the criterion. Hold the lock once
// for the whole batch; assign_face_to_person uses its own short
// transaction internally.
let mut fd = face_dao.lock().expect("face dao");
for (face_id, emb) in new_faces {
let bind = match &reference {
None => {
// Person has no faces yet — first one wins so bootstrap
// can ever produce a usable reference. After this row
// commits, future faces evaluate against it.
debug!(
"face_watch: auto-binding first face {} → person {} (no reference yet)",
face_id, person_id
);
true
}
Some(ref_vec) => {
let sim = faces::cosine_similarity(&emb, ref_vec);
if sim >= threshold {
debug!(
"face_watch: auto-binding face {} → person {} (cos={:.3} ≥ {:.3})",
face_id, person_id, sim, threshold
);
true
} else {
debug!(
"face_watch: leaving face {} unassigned (cos={:.3} < {:.3} for person {})",
face_id, sim, threshold, person_id
);
false
}
}
};
if bind && let Err(e) = fd.assign_face_to_person(ctx, face_id, person_id) {
warn!(
"face_watch: assign_face_to_person failed (face={}, person={}): {:?}",
face_id, person_id, e
);
}
}
}
/// Drop candidates whose path matches the watcher's `EXCLUDED_DIRS` rules.
/// Pulled out for unit testing — the same `PathExcluder` /memories uses,
/// just applied at the face-detect candidate set instead of the memories
/// listing. Skip @eaDir / .thumbnails / user-defined paths before we burn
/// a detect call (and Apollo's GPU memory) on junk.
pub(crate) fn filter_excluded(
base: &Path,
excluded_dirs: &[String],
candidates: Vec<FaceCandidate>,
library_name: Option<&str>,
) -> Vec<FaceCandidate> {
if excluded_dirs.is_empty() {
return candidates;
}
let excluder = PathExcluder::new(base, excluded_dirs);
candidates
.into_iter()
.filter(|c| {
let abs = base.join(&c.rel_path);
if excluder.is_excluded(&abs) {
debug!(
"face_watch: skipping excluded path {} (library {})",
c.rel_path,
library_name.unwrap_or("<unknown>")
);
return false;
}
true
})
.collect()
}
/// Read image bytes for face detection. Insightface (via opencv) can't
/// decode RAW or HEIC — for those we extract the embedded JPEG preview
/// the way the thumbnail pipeline does. Plain JPEG/PNG/WebP/etc. go
/// through a direct read.
pub(crate) fn read_image_bytes_for_detect(path: &Path) -> std::io::Result<Vec<u8>> {
if file_types::needs_ffmpeg_thumbnail(path)
&& let Some(preview) = exif::extract_embedded_jpeg_preview(path)
{
return Ok(preview);
}
// Plain read for everything else. RAW/HEIC files without an embedded
// preview fall through here too; Apollo will then 422 and the caller
// marks the row failed. That's fine; we tried.
std::fs::read(path)
}
#[cfg(test)]
mod tests {
use super::*;
use std::fs;
fn cand(rel_path: &str) -> FaceCandidate {
FaceCandidate {
rel_path: rel_path.to_string(),
content_hash: format!("hash-{rel_path}"),
}
}
#[test]
fn filter_excluded_pattern_drops_dir_components() {
// A pattern matches a path *component* under base, not a substring.
// Phase 3 needs this for @eaDir / .thumbnails skipping.
let tmp = tempfile::tempdir().unwrap();
let base = tmp.path();
let candidates = vec![
cand("photos/a.jpg"), // keep
cand("photos/@eaDir/SYNOPHOTO_THUMB"), // drop (component match)
cand("photos/eaDir-not-a-thing.jpg"), // keep (substring, not component)
];
let kept = filter_excluded(base, &["@eaDir".to_string()], candidates, Some("test"));
let kept_paths: Vec<_> = kept.iter().map(|c| c.rel_path.as_str()).collect();
assert_eq!(
kept_paths,
vec!["photos/a.jpg", "photos/eaDir-not-a-thing.jpg"]
);
}
#[test]
fn filter_excluded_absolute_dir_drops_subtree() {
// Absolute (under-base) entries drop the whole subtree.
let tmp = tempfile::tempdir().unwrap();
let base = tmp.path();
let candidates = vec![
cand("public/a.jpg"),
cand("private/a.jpg"),
cand("private/sub/b.jpg"),
];
let kept = filter_excluded(base, &["/private".to_string()], candidates, None);
let kept_paths: Vec<_> = kept.iter().map(|c| c.rel_path.as_str()).collect();
assert_eq!(kept_paths, vec!["public/a.jpg"]);
}
#[test]
fn filter_excluded_empty_rules_passes_all() {
// Skip the PathExcluder build entirely on the common path where
// EXCLUDED_DIRS is unset — saves an allocation per pass.
let tmp = tempfile::tempdir().unwrap();
let base = tmp.path();
let candidates = vec![cand("a.jpg"), cand("b.jpg")];
let kept = filter_excluded(base, &[], candidates, None);
assert_eq!(kept.len(), 2);
}
#[test]
fn read_bytes_passes_through_for_jpeg() {
// JPEG goes through plain read — we DON'T want to lose orientation
// metadata or re-encode here; insightface's exif_transpose handles
// orientation on its end.
let tmp = tempfile::tempdir().unwrap();
let path = tmp.path().join("test.jpg");
let mut buf = Vec::new();
// Tiny 4x4 grey JPEG — encoded by image crate so we know it round-trips.
let img = image::DynamicImage::ImageRgb8(image::RgbImage::from_pixel(
4,
4,
image::Rgb([128, 128, 128]),
));
img.write_to(
&mut std::io::Cursor::new(&mut buf),
image::ImageFormat::Jpeg,
)
.unwrap();
fs::write(&path, &buf).unwrap();
let read = read_image_bytes_for_detect(&path).expect("read jpeg");
assert_eq!(read, buf, "JPEG bytes must pass through verbatim");
}
#[test]
fn read_bytes_falls_back_when_raw_has_no_preview() {
// A `.nef` file with non-RAW bytes won't have an embedded preview —
// the helper falls through to plain read rather than refusing. This
// matches the docstring contract; Apollo will then 422 and we'll
// mark the row as failed.
let tmp = tempfile::tempdir().unwrap();
let path = tmp.path().join("not_really.nef");
fs::write(&path, b"definitely-not-a-raw-file").unwrap();
let read = read_image_bytes_for_detect(&path).expect("fallback read");
assert_eq!(read, b"definitely-not-a-raw-file");
}
}