docs(claude): note in-place edit gap as future Branch D

The maintenance pipeline added in Branch C assumes (library_id,
rel_path) bytes are stable for as long as the file lives at that
path. In-place edits (crop, re-export to same name) bypass
process_new_files's already-indexed check, so the row's
content_hash stays pinned to the original bytes — tags / faces /
insights remain attached to that hash silently.

Document the gap and the proposed shape of the fix:
  - Stale-content detection pass: compare last_modified / size_bytes
    to fs::metadata, re-hash on mismatch, update image_exif.
  - "Content branched" semantics on hash change: faces re-run, tags
    migrate forward (user intent survives a crop), insights migrate
    + flag for re-generation, favorites follow path.
  - Apollo derived.db cache invalidation belongs in the same design
    cycle, not after.

Captured here so the design intent is clear before someone hits the
case in real life. No code change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Cameron Cordes
2026-05-01 16:53:08 +00:00
parent 263e27e108
commit 5f247be1f1

View File

@@ -270,6 +270,37 @@ without disappearance flips through pass 1 (lib-A row retired) and
pass 2 (back-refs follow), with pass 3 noting nothing because the
hash is still present in `image_exif` (lib-B's row).
**Known gap: in-place content changes (future Branch D).** The
maintenance pipeline assumes a `(library_id, rel_path)`'s bytes are
stable for as long as the file exists at that path. If a user edits
a file in place (crop, re-export) without renaming, the watcher's
quick scan walks the file (mtime is recent) but `process_new_files`
short-circuits because `(library_id, rel_path)` already has an
`image_exif` row — no re-hash, no re-EXIF, no face redetection. The
row's `content_hash` keeps pointing at the original bytes. Tags /
faces / insights stay attached to the original hash and continue to
display because the rel_path back-ref still resolves; new faces
introduced by the edit are never detected.
The right place to fix this is a **stale-content detection pass**
that compares `image_exif.last_modified` / `size_bytes` to
`fs::metadata` for rows the quick scan would otherwise skip. On
mismatch, recompute the hash, update `image_exif`, and apply the
"content branched" semantics:
- **Faces** re-run (faces are fully derived from bytes).
- **Tags** migrate to the new hash (user intent — "this photo is
vacation" survives a crop). Insights migrate forward as a
starting point and are flagged for re-generation.
- **Favorites** (when migrated to hash-keyed) follow the path /
user intent.
The interesting case is the operator who keeps an unedited copy in
the archive library and edits the local copy: post-detection, the
archive copy stays on the original hash, the local copy branches to
the new hash, and the two histories cleanly split. Apollo's
`derived.db` cache will need an invalidation hook for the changed
hash — design it alongside Branch D.
### File Processing Pipeline
**Thumbnail Generation:**