multi-library: handoff + orphan GC with two-tick consensus

Branch C of the multi-library data-model rollout. Implements the
operational maintenance pipeline pinned in CLAUDE.md → "Multi-library
data model" / "Library availability and safety". Branches A and B
land first; this branch builds on top.

New module: src/library_maintenance.rs

Three idempotent passes the watcher runs every tick after the
per-library ingest loop:

1. Missing-file scan (per online library)

   For each Online library, load a paginated page of image_exif rows
   (IMAGE_EXIF_MISSING_SCAN_PAGE_SIZE, default 500), stat() each one,
   and delete rows whose source file is NotFound. Permission/IO
   errors are skipped, never deleted. Capped at
   IMAGE_EXIF_MISSING_DELETE_CAP_PER_TICK (default 200) per library
   per tick — so a pathological mount that returns NotFound for
   everything can't wipe the table in one cycle. Cursor advances
   across ticks, wraps on partial-page returns, and naturally cycles
   through the entire library over many minutes. Skipped wholesale
   for Stale libraries via the existing probe gate.

2. Back-ref refresh (DB-only)

   For face_detections / tagged_photo / photo_insights: any
   hash-keyed row whose (library_id, rel_path) no longer matches an
   image_exif row, but whose content_hash does, is repointed at a
   surviving image_exif location. Pure SQL with EXISTS guards so
   rows whose hash is fully orphaned are left alone (the orphan GC
   handles those). Idempotent; no availability gate needed.

   This is what makes a recent → archive move invisible to readers:
   when pass 1 retires the lib-A row, pass 2 pivots tags / faces /
   insights to lib-B's surviving path before any client notices.

3. Orphan GC (destructive)

   Hash-keyed derived rows whose content_hash has no image_exif
   referent are GC-eligible. Two-tick consensus: a hash must be
   observed orphaned on two consecutive ticks AND every library must
   be Online for both. A single Stale tick within the window cancels
   all pending deletes (they remain marked but won't be promoted) —
   they're re-evaluated next tick. The pending set lives in
   OrphanGcState (in-memory); a watcher restart resets it, which can
   only delay a delete, never cause one. Hashes that re-appear in
   image_exif between ticks are "revived" from the pending set
   (handles transient share unmount / remount).

Two new ExifDao methods:
  - list_rel_paths_for_library_page(library_id, limit, offset) for
    the paginated missing-file scan.
  - (count_for_library landed in Branch A.)

Watcher wiring (main.rs)

Per-library: missing-file scan inside the existing per-library
loop, after process_new_files, gated by the same probe check that
already protects ingest. After the loop: reconcile (Branch B),
back-ref refresh, then run_orphan_gc. The maintenance connection is
opened once per tick (image_api::database::connect), used by all
three DB-only passes, and dropped at end of tick.

CLAUDE.md gains a "Maintenance pipeline" subsection that describes
the three passes and their interaction with the existing
availability-and-safety policy.

Tests: 225 pass (217 from Branch B + 8 new in library_maintenance
covering back-ref refresh including the fully-orphaned no-op case,
two-tick GC consensus, Stale-tick consensus reset, image_exif
re-appearance revival, multi-table delete, and the
all_libraries_online helper).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Cameron Cordes
2026-05-01 16:27:53 +00:00
parent a0283a6362
commit 263e27e108
6 changed files with 907 additions and 0 deletions

View File

@@ -229,6 +229,47 @@ disappearance with no matching appearance is treated as
"unavailable-or-deleted, defer judgment" — it does not re-key any rows
and does not enqueue GC.
**Maintenance pipeline (`src/library_maintenance.rs`).** The watcher
runs three maintenance passes per tick that together implement the
move/handoff and orphan rules:
1. **Missing-file scan** — per online library, paginated. A page of
`image_exif` rows is loaded (`IMAGE_EXIF_MISSING_SCAN_PAGE_SIZE`,
default 500), each row's `(root_path/rel_path)` is `stat()`-ed,
and confirmed-not-found rows are deleted from `image_exif`
(capped at `IMAGE_EXIF_MISSING_DELETE_CAP_PER_TICK`, default 200).
Permission/IO errors are skipped, never deleted — only `NotFound`
triggers a deletion. The cursor wraps every time a partial page
comes back, so the whole library is swept across consecutive ticks.
Skipped wholesale for Stale libraries via the per-library probe
gate at the top of the loop iteration.
2. **Back-ref refresh** — DB-only. For `face_detections`,
`tagged_photo`, and `photo_insights`: any hash-keyed row whose
`(library_id, rel_path)` no longer matches an `image_exif` row
*but whose `content_hash` does* is repointed at the surviving
`image_exif` location. Idempotent SQL; no health gate needed.
This is what makes the recent → archive handoff invisible to
read paths: when the missing-file scan retires the lib-A row,
tags/faces/insights pivot to lib-B's path before any user
notices.
3. **Orphan GC** — destructive. Hash-keyed derived rows whose
`content_hash` no longer has any `image_exif` row are eligible.
Two-tick consensus: a hash must be observed orphaned on two
consecutive ticks AND every library must be online for both. A
single Stale tick within the window cancels all pending deletes.
The pending set is held in memory (`OrphanGcState`) — restart
resets it, which only delays a delete, never causes one. Tags,
faces, and insights for orphaned hashes are deleted in one batch
per tick.
A backup library that briefly disappears, then returns within two
ticks, never loses any derived data. A move from lib-A to lib-B
without disappearance flips through pass 1 (lib-A row retired) and
pass 2 (back-refs follow), with pass 3 noting nothing because the
hash is still present in `image_exif` (lib-B's row).
### File Processing Pipeline
**Thumbnail Generation:**