multi-library: handoff + orphan GC with two-tick consensus
Branch C of the multi-library data-model rollout. Implements the
operational maintenance pipeline pinned in CLAUDE.md → "Multi-library
data model" / "Library availability and safety". Branches A and B
land first; this branch builds on top.
New module: src/library_maintenance.rs
Three idempotent passes the watcher runs every tick after the
per-library ingest loop:
1. Missing-file scan (per online library)
For each Online library, load a paginated page of image_exif rows
(IMAGE_EXIF_MISSING_SCAN_PAGE_SIZE, default 500), stat() each one,
and delete rows whose source file is NotFound. Permission/IO
errors are skipped, never deleted. Capped at
IMAGE_EXIF_MISSING_DELETE_CAP_PER_TICK (default 200) per library
per tick — so a pathological mount that returns NotFound for
everything can't wipe the table in one cycle. Cursor advances
across ticks, wraps on partial-page returns, and naturally cycles
through the entire library over many minutes. Skipped wholesale
for Stale libraries via the existing probe gate.
2. Back-ref refresh (DB-only)
For face_detections / tagged_photo / photo_insights: any
hash-keyed row whose (library_id, rel_path) no longer matches an
image_exif row, but whose content_hash does, is repointed at a
surviving image_exif location. Pure SQL with EXISTS guards so
rows whose hash is fully orphaned are left alone (the orphan GC
handles those). Idempotent; no availability gate needed.
This is what makes a recent → archive move invisible to readers:
when pass 1 retires the lib-A row, pass 2 pivots tags / faces /
insights to lib-B's surviving path before any client notices.
3. Orphan GC (destructive)
Hash-keyed derived rows whose content_hash has no image_exif
referent are GC-eligible. Two-tick consensus: a hash must be
observed orphaned on two consecutive ticks AND every library must
be Online for both. A single Stale tick within the window cancels
all pending deletes (they remain marked but won't be promoted) —
they're re-evaluated next tick. The pending set lives in
OrphanGcState (in-memory); a watcher restart resets it, which can
only delay a delete, never cause one. Hashes that re-appear in
image_exif between ticks are "revived" from the pending set
(handles transient share unmount / remount).
Two new ExifDao methods:
- list_rel_paths_for_library_page(library_id, limit, offset) for
the paginated missing-file scan.
- (count_for_library landed in Branch A.)
Watcher wiring (main.rs)
Per-library: missing-file scan inside the existing per-library
loop, after process_new_files, gated by the same probe check that
already protects ingest. After the loop: reconcile (Branch B),
back-ref refresh, then run_orphan_gc. The maintenance connection is
opened once per tick (image_api::database::connect), used by all
three DB-only passes, and dropped at end of tick.
CLAUDE.md gains a "Maintenance pipeline" subsection that describes
the three passes and their interaction with the existing
availability-and-safety policy.
Tests: 225 pass (217 from Branch B + 8 new in library_maintenance
covering back-ref refresh including the fully-orphaned no-op case,
two-tick GC consensus, Stale-tick consensus reset, image_exif
re-appearance revival, multi-table delete, and the
all_libraries_online helper).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
41
CLAUDE.md
41
CLAUDE.md
@@ -229,6 +229,47 @@ disappearance with no matching appearance is treated as
|
||||
"unavailable-or-deleted, defer judgment" — it does not re-key any rows
|
||||
and does not enqueue GC.
|
||||
|
||||
**Maintenance pipeline (`src/library_maintenance.rs`).** The watcher
|
||||
runs three maintenance passes per tick that together implement the
|
||||
move/handoff and orphan rules:
|
||||
|
||||
1. **Missing-file scan** — per online library, paginated. A page of
|
||||
`image_exif` rows is loaded (`IMAGE_EXIF_MISSING_SCAN_PAGE_SIZE`,
|
||||
default 500), each row's `(root_path/rel_path)` is `stat()`-ed,
|
||||
and confirmed-not-found rows are deleted from `image_exif`
|
||||
(capped at `IMAGE_EXIF_MISSING_DELETE_CAP_PER_TICK`, default 200).
|
||||
Permission/IO errors are skipped, never deleted — only `NotFound`
|
||||
triggers a deletion. The cursor wraps every time a partial page
|
||||
comes back, so the whole library is swept across consecutive ticks.
|
||||
Skipped wholesale for Stale libraries via the per-library probe
|
||||
gate at the top of the loop iteration.
|
||||
|
||||
2. **Back-ref refresh** — DB-only. For `face_detections`,
|
||||
`tagged_photo`, and `photo_insights`: any hash-keyed row whose
|
||||
`(library_id, rel_path)` no longer matches an `image_exif` row
|
||||
*but whose `content_hash` does* is repointed at the surviving
|
||||
`image_exif` location. Idempotent SQL; no health gate needed.
|
||||
This is what makes the recent → archive handoff invisible to
|
||||
read paths: when the missing-file scan retires the lib-A row,
|
||||
tags/faces/insights pivot to lib-B's path before any user
|
||||
notices.
|
||||
|
||||
3. **Orphan GC** — destructive. Hash-keyed derived rows whose
|
||||
`content_hash` no longer has any `image_exif` row are eligible.
|
||||
Two-tick consensus: a hash must be observed orphaned on two
|
||||
consecutive ticks AND every library must be online for both. A
|
||||
single Stale tick within the window cancels all pending deletes.
|
||||
The pending set is held in memory (`OrphanGcState`) — restart
|
||||
resets it, which only delays a delete, never causes one. Tags,
|
||||
faces, and insights for orphaned hashes are deleted in one batch
|
||||
per tick.
|
||||
|
||||
A backup library that briefly disappears, then returns within two
|
||||
ticks, never loses any derived data. A move from lib-A to lib-B
|
||||
without disappearance flips through pass 1 (lib-A row retired) and
|
||||
pass 2 (back-refs follow), with pass 3 noting nothing because the
|
||||
hash is still present in `image_exif` (lib-B's row).
|
||||
|
||||
### File Processing Pipeline
|
||||
|
||||
**Thumbnail Generation:**
|
||||
|
||||
Reference in New Issue
Block a user