"marked 2 new" parses as "2 new files" on first read — but the
unit is content_hashes, and the action is observing them as
orphaned (becoming-deleted, not appearing). Reword:
"{} new orphan hash(es) marked, {} revived"
instead of "marked {} new, revived {}". Also pluralize the deleted
counts ("row(s)") and append the pending-set size to the success
log so a tick that both deletes and re-marks doesn't lose the
trailing-state context.
No behavior change.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
run_orphan_gc returned early on the !all_online branch before the
final debug/info log line, so the GC was effectively invisible
whenever any library was Stale — exactly the dry-run scenario where
operators most want to confirm the safety gate is firing. Add the
same conditional log inside the early-return branch (plus a
"deferred — at least one library Stale" hint in the info-level
variant when there's something newly marked).
No behavior change beyond observability.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The maintenance pipeline added in Branch C assumes (library_id,
rel_path) bytes are stable for as long as the file lives at that
path. In-place edits (crop, re-export to same name) bypass
process_new_files's already-indexed check, so the row's
content_hash stays pinned to the original bytes — tags / faces /
insights remain attached to that hash silently.
Document the gap and the proposed shape of the fix:
- Stale-content detection pass: compare last_modified / size_bytes
to fs::metadata, re-hash on mismatch, update image_exif.
- "Content branched" semantics on hash change: faces re-run, tags
migrate forward (user intent survives a crop), insights migrate
+ flag for re-generation, favorites follow path.
- Apollo derived.db cache invalidation belongs in the same design
cycle, not after.
Captured here so the design intent is clear before someone hits the
case in real life. No code change.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Branch C of the multi-library data-model rollout. Implements the
operational maintenance pipeline pinned in CLAUDE.md → "Multi-library
data model" / "Library availability and safety". Branches A and B
land first; this branch builds on top.
New module: src/library_maintenance.rs
Three idempotent passes the watcher runs every tick after the
per-library ingest loop:
1. Missing-file scan (per online library)
For each Online library, load a paginated page of image_exif rows
(IMAGE_EXIF_MISSING_SCAN_PAGE_SIZE, default 500), stat() each one,
and delete rows whose source file is NotFound. Permission/IO
errors are skipped, never deleted. Capped at
IMAGE_EXIF_MISSING_DELETE_CAP_PER_TICK (default 200) per library
per tick — so a pathological mount that returns NotFound for
everything can't wipe the table in one cycle. Cursor advances
across ticks, wraps on partial-page returns, and naturally cycles
through the entire library over many minutes. Skipped wholesale
for Stale libraries via the existing probe gate.
2. Back-ref refresh (DB-only)
For face_detections / tagged_photo / photo_insights: any
hash-keyed row whose (library_id, rel_path) no longer matches an
image_exif row, but whose content_hash does, is repointed at a
surviving image_exif location. Pure SQL with EXISTS guards so
rows whose hash is fully orphaned are left alone (the orphan GC
handles those). Idempotent; no availability gate needed.
This is what makes a recent → archive move invisible to readers:
when pass 1 retires the lib-A row, pass 2 pivots tags / faces /
insights to lib-B's surviving path before any client notices.
3. Orphan GC (destructive)
Hash-keyed derived rows whose content_hash has no image_exif
referent are GC-eligible. Two-tick consensus: a hash must be
observed orphaned on two consecutive ticks AND every library must
be Online for both. A single Stale tick within the window cancels
all pending deletes (they remain marked but won't be promoted) —
they're re-evaluated next tick. The pending set lives in
OrphanGcState (in-memory); a watcher restart resets it, which can
only delay a delete, never cause one. Hashes that re-appear in
image_exif between ticks are "revived" from the pending set
(handles transient share unmount / remount).
Two new ExifDao methods:
- list_rel_paths_for_library_page(library_id, limit, offset) for
the paginated missing-file scan.
- (count_for_library landed in Branch A.)
Watcher wiring (main.rs)
Per-library: missing-file scan inside the existing per-library
loop, after process_new_files, gated by the same probe check that
already protects ingest. After the loop: reconcile (Branch B),
back-ref refresh, then run_orphan_gc. The maintenance connection is
opened once per tick (image_api::database::connect), used by all
three DB-only passes, and dropped at end of tick.
CLAUDE.md gains a "Maintenance pipeline" subsection that describes
the three passes and their interaction with the existing
availability-and-safety policy.
Tests: 225 pass (217 from Branch B + 8 new in library_maintenance
covering back-ref refresh including the fully-orphaned no-op case,
two-tick GC consensus, Stale-tick consensus reset, image_exif
re-appearance revival, multi-table delete, and the
all_libraries_online helper).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>