multi-library: handoff + orphan GC with two-tick consensus

Branch C of the multi-library data-model rollout. Implements the
operational maintenance pipeline pinned in CLAUDE.md → "Multi-library
data model" / "Library availability and safety". Branches A and B
land first; this branch builds on top.

New module: src/library_maintenance.rs

Three idempotent passes the watcher runs every tick after the
per-library ingest loop:

1. Missing-file scan (per online library)

   For each Online library, load a paginated page of image_exif rows
   (IMAGE_EXIF_MISSING_SCAN_PAGE_SIZE, default 500), stat() each one,
   and delete rows whose source file is NotFound. Permission/IO
   errors are skipped, never deleted. Capped at
   IMAGE_EXIF_MISSING_DELETE_CAP_PER_TICK (default 200) per library
   per tick — so a pathological mount that returns NotFound for
   everything can't wipe the table in one cycle. Cursor advances
   across ticks, wraps on partial-page returns, and naturally cycles
   through the entire library over many minutes. Skipped wholesale
   for Stale libraries via the existing probe gate.

2. Back-ref refresh (DB-only)

   For face_detections / tagged_photo / photo_insights: any
   hash-keyed row whose (library_id, rel_path) no longer matches an
   image_exif row, but whose content_hash does, is repointed at a
   surviving image_exif location. Pure SQL with EXISTS guards so
   rows whose hash is fully orphaned are left alone (the orphan GC
   handles those). Idempotent; no availability gate needed.

   This is what makes a recent → archive move invisible to readers:
   when pass 1 retires the lib-A row, pass 2 pivots tags / faces /
   insights to lib-B's surviving path before any client notices.

3. Orphan GC (destructive)

   Hash-keyed derived rows whose content_hash has no image_exif
   referent are GC-eligible. Two-tick consensus: a hash must be
   observed orphaned on two consecutive ticks AND every library must
   be Online for both. A single Stale tick within the window cancels
   all pending deletes (they remain marked but won't be promoted) —
   they're re-evaluated next tick. The pending set lives in
   OrphanGcState (in-memory); a watcher restart resets it, which can
   only delay a delete, never cause one. Hashes that re-appear in
   image_exif between ticks are "revived" from the pending set
   (handles transient share unmount / remount).

Two new ExifDao methods:
  - list_rel_paths_for_library_page(library_id, limit, offset) for
    the paginated missing-file scan.
  - (count_for_library landed in Branch A.)

Watcher wiring (main.rs)

Per-library: missing-file scan inside the existing per-library
loop, after process_new_files, gated by the same probe check that
already protects ingest. After the loop: reconcile (Branch B),
back-ref refresh, then run_orphan_gc. The maintenance connection is
opened once per tick (image_api::database::connect), used by all
three DB-only passes, and dropped at end of tick.

CLAUDE.md gains a "Maintenance pipeline" subsection that describes
the three passes and their interaction with the existing
availability-and-safety policy.

Tests: 225 pass (217 from Branch B + 8 new in library_maintenance
covering back-ref refresh including the fully-orphaned no-op case,
two-tick GC consensus, Stale-tick consensus reset, image_exif
re-appearance revival, multi-table delete, and the
all_libraries_online helper).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Cameron Cordes
2026-05-01 16:27:53 +00:00
parent a0283a6362
commit 263e27e108
6 changed files with 907 additions and 0 deletions

View File

@@ -466,6 +466,18 @@ pub trait ExifDao: Sync + Send {
context: &opentelemetry::Context,
library_id: i32,
) -> Result<i64, DbError>;
/// Paginated rel_path listing for a single library, ordered by id
/// ascending. Used by the missing-file detector to scan a library
/// in capped chunks across consecutive watcher ticks rather than
/// stat()ing every row every minute. Returns `(id, rel_path)`.
fn list_rel_paths_for_library_page(
&mut self,
context: &opentelemetry::Context,
library_id: i32,
limit: i64,
offset: i64,
) -> Result<Vec<(i32, String)>, DbError>;
}
pub struct SqliteExifDao {
@@ -1129,6 +1141,28 @@ impl ExifDao for SqliteExifDao {
})
.map_err(|_| DbError::new(DbErrorKind::QueryError))
}
fn list_rel_paths_for_library_page(
&mut self,
context: &opentelemetry::Context,
library_id_val: i32,
limit: i64,
offset: i64,
) -> Result<Vec<(i32, String)>, DbError> {
trace_db_call(context, "query", "list_rel_paths_for_library_page", |_span| {
use schema::image_exif::dsl::*;
image_exif
.filter(library_id.eq(library_id_val))
.order(id.asc())
.select((id, rel_path))
.limit(limit)
.offset(offset)
.load::<(i32, String)>(self.connection.lock().unwrap().deref_mut())
.map_err(|_| anyhow::anyhow!("Query error"))
})
.map_err(|_| DbError::new(DbErrorKind::QueryError))
}
}
#[cfg(test)]