Merge pull request 'feature/duplicate-detection' (#73 ) from feature/duplicate-detection into master

Reviewed-on: #73
duplicates: library-aware visibility — only hide a demoted row when its survivor is reachable
2026-05-03 22:34:49 +00:00 · 2026-05-03 18:24:07 -04:00 · 2026-05-03 18:19:48 -04:00 · 2026-05-03 18:08:05 -04:00 · 2026-05-03 17:36:01 -04:00 · 2026-05-01 23:09:22 +00:00
36 changed files with 5341 additions and 112 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -2,8 +2,12 @@
 database/target
 *.db
 *.db.bak
+*.db-shm
+*.db-wal
 .env
 /tmp
+/docs
+/specs

 # Default ignored files
 .idea/shelf/
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -104,6 +104,242 @@ All database access goes through trait-based DAOs (e.g., `ExifDao`, `SqliteExifD
 - `query_by_exif()`: Complex filtering by camera, GPS bounds, date ranges
 - Batch operations minimize DB hits during file watching

+### Multi-library data model
+
+ImageApi supports more than one library (a library = a `(name, root_path)`
+row in the `libraries` table that maps to a mounted directory tree). The
+same bytes may exist under more than one library — typical case is an
+"active" library plus an "archive" library that ingests files as they age
+out — and the data model is designed so that derived data follows the
+**bytes**, not the path, while user-managed data does the same.
+
+**The principle.** A photo's identity is its `content_hash` (blake3, see
+`src/content_hash.rs`). Anything we compute from or attach to a photo is
+keyed on that hash so it survives:
+- the same file appearing in a second library (backup / archive / mirror),
+- the file moving between libraries (recent → archive handoff),
+- the file moving within a library (re-organized rel_path),
+- intra-library duplicates (same bytes at two paths).
+
+**Table classification.** Three categories drive the keying decision:
+
+| Category | Key | Rationale | Tables |
+|---|---|---|---|
+| Intrinsic to bytes | `content_hash` | Rerunning is wasted work (or LLM cost) | `face_detections` ✓, `image_exif` (target), `photo_insights` (target), `video_preview_clips` (target) |
+| User intent about a photo | `content_hash` | "Tag this photo" means the bytes, not a path | `tagged_photo` (target), `favorites` (target) |
+| Library administrative | `(library_id, rel_path)` | Tied to a specific filesystem location | `libraries`, `entity_photo_links`, the `rel_path` back-ref columns on hash-keyed tables |
+
+✓ = already implemented this way. *(target)* = today still keyed on
+`(library_id, rel_path)` and slated for migration. The migration adds a
+nullable `content_hash` column, populates it from `image_exif` where
+known, and read paths fall back to rel_path while the hash is null.
+
+**Carrying a `rel_path` even when hash-keyed.** Hash-keyed tables retain
+`(library_id, rel_path)` columns as a denormalized **back-reference**, not
+as the key. This lets a single query answer "what is at this path right
+now" without joining through `image_exif`, and supports the path-only
+endpoints that predate the hash. `face_detections` is the reference
+implementation: hash is the truth, path is a hint.
+
+**Merge semantics on read.** When the same hash has rows under more than
+one library:
+- Set-valued data (tags, favorites, faces, entity links) → **union**.
+- Scalar data (current insight, EXIF row, video preview clip) → earliest
+  `generated_at` / `created_time` wins. The historical lib1 row beats a
+  re-generated lib2 row, so the user's curated insight isn't shadowed by
+  a re-run on archive ingest.
+
+**Write attribution.** A new tag/favorite/insight created while viewing
+under lib2 binds to the bytes, not to lib2 — so it shows up under lib1
+too. This is by design, but it's the most surprising rule on first
+encounter; clients should not assume tags are library-scoped.
+
+**Hash-less rows (transitional state).** During and immediately after a
+new mount, `image_exif.content_hash` is being populated by
+`backfill_unhashed_backlog` (capped per tick). Rules during this window:
+- Writes: if the hash is known, write hash-keyed. If not, write
+  `(library_id, rel_path)`-keyed and let the reconciliation job collapse
+  duplicates once the hash lands.
+- Reads: prefer hash key, fall back to `(library_id, rel_path)`.
+- Reconciliation: a one-shot pass after every backfill tick collapses
+  rows that now share a hash, applying the merge semantics above.
+  Idempotent — safe to re-run.
+
+**Library handoff (recent → archive).** When a file moves between
+libraries (e.g. operator moves `~/photos/2024/IMG.nef` to the archive
+mount), the file watcher sees the disappearance under lib1 and the
+appearance under lib2. Hash-keyed rows don't need migration; the
+`(library_id, rel_path)` back-ref columns are updated to point to the new
+location. Library administrative rows (`entity_photo_links`,
+`(library_id, rel_path)` rows in `image_exif` for hash-less items) are
+re-keyed by the move detector, which matches a disappearance to an
+appearance by `content_hash` within a configurable window.
+
+**Orphans (source deleted while a copy survives).** When the only
+`image_exif` row for a hash is deleted (file removed from disk), the
+hash-keyed derived rows survive **as long as another `image_exif` row
+references the same hash**. If the last reference is gone, derived rows
+are eligible for GC (deferred — the GC job runs on a slow schedule so
+that a brief unmount or rename doesn't wipe history).
+
+**Stats and counts.** When reporting "how many photos do you have," count
+`DISTINCT content_hash` over `image_exif`, not row count. Faces stats
+already does this (`FaceDao::stats` in `src/faces.rs`); other counters
+should follow suit. Numerator and denominator must live in the same
+domain — see the face-stats commentary below for the cautionary tale.
+
+**Per-library scoping when the user asks for it.** A request scoped to
+`?library=N` filters the `image_exif` view to that library, and the
+hash-keyed derived data is joined through that view. The user sees only
+photos that have a copy under lib N, but the derived data attached to
+those photos is the merged hash-keyed view. This is the answer to "show
+me archive photos with their original tags."
+
+**Operator kill switch (`libraries.enabled`).** Setting `enabled=0` on a
+library is a hard pause: the watcher skips it entirely — before the
+probe, before ingest, before any maintenance pass — and the orphan-GC
+all-online consensus check filters disabled libraries out (they don't
+keep the GC window closed). Reads / serving are unaffected; nothing
+prevents `/image?path=...` from resolving against a disabled library's
+root if the file is on disk. The existing `image_exif` rows for a
+disabled library are **not deleted** — they continue to anchor
+hash-keyed derived data, so cross-library duplicates survive the
+disable. Toggle via SQL; there is intentionally no HTTP endpoint for
+library mutation (single-user tool, no role / permission story).
+Typical workflows: stage a new mount with `enabled=0` then flip to `1`;
+quiet a flaky NAS during maintenance without disturbing the rest of
+the system.
+
+**Per-library excludes (`libraries.excluded_dirs`).** A
+comma-separated column, same shape as the global `EXCLUDED_DIRS` env
+var, that's applied **in union** with the env-var globals when a
+walker scans this library. Use case: mount a parent directory as a
+new library while a sibling library covers a child subtree, and
+exclude that child subtree from the parent so the two libraries
+don't double-walk and double-write `image_exif`. Two entry forms
+(parsed by `memories::PathExcluder`):
+- `/sub/path` — leading slash flags it as a path under the library
+  root. Joins to root + matches by `path.starts_with(...)`. Works
+  at any depth (`/photos`, `/media/2024/raw`).
+- `name` — no leading slash flags it as a component name to skip
+  anywhere in the tree (`@eaDir`, `.thumbnails`). Single segment
+  only — `media/photos/a` without a leading slash never matches
+  anything. Hash-keyed derived
+data (faces, tags, insights) is unaffected either way — those
+follow the bytes — but `image_exif` row count, walker CPU, and
+thumbnail disk usage all drop to 1× instead of 2× for the overlap.
+Affects: file-watch ingest (`process_new_files`), thumbnail
+generation, media-count gauges, the orphaned-playlist cleanup walk,
+and the `/memories` endpoint. The face-detection backlog drain
+inherits via `face_watch::filter_excluded`. NULL = no extras (only
+the global env var applies).
+
+**Library availability and safety.** Libraries can be on network shares
+or removable media; the file watcher must not interpret a temporary
+unavailability as a mass-deletion event. Every tick begins with a
+**presence probe** per library: the library is considered online iff
+its `root_path` exists, is readable, and a top-level scan returns at
+least one expected entry (or matches a recent file-count high-water
+mark within a tolerance). The probe result gates which actions are safe
+to run on that library this tick:
+
+| Action | Requires online? |
+|---|---|
+| Quick / full scan ingest of new files | yes |
+| EXIF / face / insight backlog drains | yes — but the work runs against any online library |
+| Move-handoff detection (lib1 disappearance ↔ lib2 appearance match) | **both** libraries online |
+| `(library_id, rel_path)` re-keying on detected move | **both** libraries online |
+| Orphan GC of hash-keyed derived data | all libraries that have *ever* held the hash must be online and confirmed-clean for two consecutive ticks |
+| Reads / serving | always allowed; falls back to whichever library is online |
+
+A library that fails the probe enters a "stale" state: writes scoped to
+it are paused, its rows are flagged stale (not deleted) in
+`/libraries` status, and the watcher logs at `warn` once per
+state-transition (not per tick). A library that recovers re-enters the
+online set automatically; no operator action required for transient
+outages. The intent is that pulling a USB drive, rebooting a NAS, or
+losing a VPN never triggers a destructive code path — the worst case is
+that derived-data work pauses until the share returns.
+
+The same rule constrains the move-handoff matcher: a disappearance
+under lib1 only counts as a "move" if there is a matching appearance
+under another **online** library within the window. A bare
+disappearance with no matching appearance is treated as
+"unavailable-or-deleted, defer judgment" — it does not re-key any rows
+and does not enqueue GC.
+
+**Maintenance pipeline (`src/library_maintenance.rs`).** The watcher
+runs three maintenance passes per tick that together implement the
+move/handoff and orphan rules:
+
+1. **Missing-file scan** — per online library, paginated. A page of
+   `image_exif` rows is loaded (`IMAGE_EXIF_MISSING_SCAN_PAGE_SIZE`,
+   default 500), each row's `(root_path/rel_path)` is `stat()`-ed,
+   and confirmed-not-found rows are deleted from `image_exif`
+   (capped at `IMAGE_EXIF_MISSING_DELETE_CAP_PER_TICK`, default 200).
+   Permission/IO errors are skipped, never deleted — only `NotFound`
+   triggers a deletion. The cursor wraps every time a partial page
+   comes back, so the whole library is swept across consecutive ticks.
+   Skipped wholesale for Stale libraries via the per-library probe
+   gate at the top of the loop iteration.
+
+2. **Back-ref refresh** — DB-only. For `face_detections`,
+   `tagged_photo`, and `photo_insights`: any hash-keyed row whose
+   `(library_id, rel_path)` no longer matches an `image_exif` row
+   *but whose `content_hash` does* is repointed at the surviving
+   `image_exif` location. Idempotent SQL; no health gate needed.
+   This is what makes the recent → archive handoff invisible to
+   read paths: when the missing-file scan retires the lib-A row,
+   tags/faces/insights pivot to lib-B's path before any user
+   notices.
+
+3. **Orphan GC** — destructive. Hash-keyed derived rows whose
+   `content_hash` no longer has any `image_exif` row are eligible.
+   Two-tick consensus: a hash must be observed orphaned on two
+   consecutive ticks AND every library must be online for both. A
+   single Stale tick within the window cancels all pending deletes.
+   The pending set is held in memory (`OrphanGcState`) — restart
+   resets it, which only delays a delete, never causes one. Tags,
+   faces, and insights for orphaned hashes are deleted in one batch
+   per tick.
+
+A backup library that briefly disappears, then returns within two
+ticks, never loses any derived data. A move from lib-A to lib-B
+without disappearance flips through pass 1 (lib-A row retired) and
+pass 2 (back-refs follow), with pass 3 noting nothing because the
+hash is still present in `image_exif` (lib-B's row).
+
+**Known gap: in-place content changes (future Branch D).** The
+maintenance pipeline assumes a `(library_id, rel_path)`'s bytes are
+stable for as long as the file exists at that path. If a user edits
+a file in place (crop, re-export) without renaming, the watcher's
+quick scan walks the file (mtime is recent) but `process_new_files`
+short-circuits because `(library_id, rel_path)` already has an
+`image_exif` row — no re-hash, no re-EXIF, no face redetection. The
+row's `content_hash` keeps pointing at the original bytes. Tags /
+faces / insights stay attached to the original hash and continue to
+display because the rel_path back-ref still resolves; new faces
+introduced by the edit are never detected.
+
+The right place to fix this is a **stale-content detection pass**
+that compares `image_exif.last_modified` / `size_bytes` to
+`fs::metadata` for rows the quick scan would otherwise skip. On
+mismatch, recompute the hash, update `image_exif`, and apply the
+"content branched" semantics:
+- **Faces** re-run (faces are fully derived from bytes).
+- **Tags** migrate to the new hash (user intent — "this photo is
+  vacation" survives a crop). Insights migrate forward as a
+  starting point and are flagged for re-generation.
+- **Favorites** (when migrated to hash-keyed) follow the path /
+  user intent.
+
+The interesting case is the operator who keeps an unedited copy in
+the archive library and edits the local copy: post-detection, the
+archive copy stays on the original hash, the local copy branches to
+the new hash, and the two histories cleanly split. Apollo's
+`derived.db` cache will need an invalidation hook for the changed
+hash — design it alongside Branch D.
+
 ### File Processing Pipeline

 **Thumbnail Generation:**
@@ -219,7 +455,7 @@ ImageApi owns the face data; Apollo (sibling repo) hosts the insightface inferen
 - `persons(id, name UNIQUE COLLATE NOCASE, cover_face_id, entity_id, created_from_tag, notes, ...)` — operator-managed, name is the user-visible identity.
 - `face_detections(id, library_id, content_hash, rel_path, bbox_*, embedding BLOB, confidence, source, person_id, status, model_version, ...)` — keyed on `content_hash` so a photo duplicated across libraries is detected once. Marker rows for `status IN ('no_faces','failed')` carry NULL bbox/embedding (CHECK constraint enforces this).

-**Why content_hash and not (library_id, rel_path):** ties face data to the bytes, not the path. A backup mount that copies files from the primary library naturally inherits the existing detections without re-running inference.
+**Why content_hash and not (library_id, rel_path):** ties face data to the bytes, not the path. A backup mount that copies files from the primary library naturally inherits the existing detections without re-running inference. This is the reference implementation of the multi-library data model — see "Multi-library data model" above.

 **File-watch hook** (`src/main.rs::process_new_files`): for each photo with a populated `content_hash`, check `FaceDao::already_scanned(hash)`; if not, send bytes (or embedded JPEG preview for RAW via `exif::extract_embedded_jpeg_preview`) to Apollo's `/api/internal/faces/detect`. K=`FACE_DETECT_CONCURRENCY` (default 8) parallel calls per scan tick; Apollo serializes them via its single-worker GPU pool. `face_watch.rs` is the Tokio orchestration layer.

@@ -233,6 +469,8 @@ ImageApi owns the face data; Apollo (sibling repo) hosts the insightface inferen

 **Rerun preserves manual rows** (`POST /image/faces/{id}/rerun`): only `source='auto'` rows are deleted before re-running detection. `already_scanned` returns true on ANY row, so a photo whose only faces are manually drawn never auto-redetects.

+**Stats domain — content_hash, not file rows** (`FaceDao::stats` in `src/faces.rs`): `total_photos` counts `DISTINCT content_hash` over `image_exif` (filtered to image extensions, `content_hash IS NOT NULL`), and so do `scanned` / `with_faces` / `no_faces` / `failed` over `face_detections`. Numerator and denominator must live in the same domain — `face_detections` is keyed on content_hash, so the same JPEG present at two rel_paths or in two libraries scans once. Counting `image_exif` rows in the denominator inflated total by one per duplicate file and produced a permanent gap (e.g. 1101/1103 with nothing actually pending). Hash-less rows are excluded from total_photos while they sit in the `backfill_unhashed_backlog` queue; otherwise the bar pins below 100% for the duration of that backfill even though those rows aren't pending detection yet — they're pending hashing.
+
 Module map:
 - `src/faces.rs` — `FaceDao` trait + `SqliteFaceDao` impl, route handlers for `/faces/*`, `/image/faces/*`, `/persons/*`. Mirror of `tags.rs` layout.
 - `src/face_watch.rs` — Tokio orchestration for the file-watch detect pass; `filter_excluded` (PathExcluder + image-extension filter), `read_image_bytes_for_detect` (RAW preview fallback).
--- a/Cargo.lock
+++ b/Cargo.lock
@@ -600,6 +600,16 @@ version = "2.6.0"
 source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "6099cdc01846bc367c4e7dd630dc5966dccf36b652fae7a74e17b640411a91b2"

+[[package]]
+name = "bk-tree"
+version = "0.5.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "a8283fb8e64b873918f8bc527efa6aff34956296e48ea750a9c909cd47c01546"
+dependencies = [
+ "fnv",
+ "triple_accel",
+]
+
 [[package]]
 name = "blake3"
 version = "1.8.4"
@@ -1928,6 +1938,7 @@ dependencies = [
 "async-trait",
 "base64",
 "bcrypt",
+ "bk-tree",
 "blake3",
 "bytes",
 "chrono",
@@ -1939,6 +1950,7 @@ dependencies = [
 "futures",
 "ical",
 "image",
+ "image_hasher",
 "indicatif",
 "infer",
 "jsonwebtoken",
@@ -1978,6 +1990,19 @@ dependencies = [
 "quick-error",
 ]

+[[package]]
+name = "image_hasher"
+version = "3.1.1"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "dd266c66b0a0e2d4c6db8e710663fc163a2d33595ce997b6fbda407c8759d344"
+dependencies = [
+ "base64",
+ "image",
+ "rustdct",
+ "serde",
+ "transpose",
+]
+
 [[package]]
 name = "imgref"
 version = "1.11.0"
@@ -2438,6 +2463,15 @@ dependencies = [
 "num-traits",
 ]

+[[package]]
+name = "num-complex"
+version = "0.4.6"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "73f88a1307638156682bada9d7604135552957b7818057dcef22705b4d509495"
+dependencies = [
+ "num-traits",
+]
+
 [[package]]
 name = "num-conv"
 version = "0.1.0"
@@ -2907,6 +2941,15 @@ version = "0.1.1"
 source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "925383efa346730478fb4838dbe9137d2a47675ad789c546d150a6e1dd4ab31c"

+[[package]]
+name = "primal-check"
+version = "0.3.4"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "dc0d895b311e3af9902528fbb8f928688abbd95872819320517cc24ca6b2bd08"
+dependencies = [
+ "num-integer",
+]
+
 [[package]]
 name = "proc-macro2"
 version = "1.0.101"
@@ -3286,6 +3329,29 @@ dependencies = [
 "semver",
 ]

+[[package]]
+name = "rustdct"
+version = "0.7.1"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "8b61555105d6a9bf98797c063c362a1d24ed8ab0431655e38f1cf51e52089551"
+dependencies = [
+ "rustfft",
+]
+
+[[package]]
+name = "rustfft"
+version = "6.4.1"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "21db5f9893e91f41798c88680037dba611ca6674703c1a18601b01a72c8adb89"
+dependencies = [
+ "num-complex",
+ "num-integer",
+ "num-traits",
+ "primal-check",
+ "strength_reduce",
+ "transpose",
+]
+
 [[package]]
 name = "rustix"
 version = "1.0.8"
@@ -3624,6 +3690,12 @@ version = "1.2.0"
 source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "a8f112729512f8e442d81f95a8a7ddf2b7c6b8a1a6f509a95864142b30cab2d3"

+[[package]]
+name = "strength_reduce"
+version = "0.2.4"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "fe895eb47f22e2ddd4dabc02bce419d2e643c8e3b585c78158b349195bc24d82"
+
 [[package]]
 name = "strfmt"
 version = "0.2.5"
@@ -4122,6 +4194,22 @@ dependencies = [
 "once_cell",
 ]

+[[package]]
+name = "transpose"
+version = "0.2.3"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "1ad61aed86bc3faea4300c7aee358b4c6d0c8d6ccc36524c96e4c92ccf26e77e"
+dependencies = [
+ "num-integer",
+ "strength_reduce",
+]
+
+[[package]]
+name = "triple_accel"
+version = "0.3.4"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "622b09ce2fe2df4618636fb92176d205662f59803f39e70d1c333393082de96c"
+
 [[package]]
 name = "try-lock"
 version = "0.2.5"
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -59,5 +59,7 @@ ical = "0.11"
 scraper = "0.20"
 base64 = "0.22"
 blake3 = "1.5"
+image_hasher = "3.0"
+bk-tree = "0.5"
 async-trait = "0.1"
 indicatif = "0.17"
--- a/migrations/2026-04-30-000000_unique_tag_name/down.sql
+++ b/migrations/2026-04-30-000000_unique_tag_name/down.sql
@@ -0,0 +1 @@
+DROP INDEX IF EXISTS idx_tags_name_nocase;
--- a/migrations/2026-04-30-000000_unique_tag_name/up.sql
+++ b/migrations/2026-04-30-000000_unique_tag_name/up.sql
@@ -0,0 +1,28 @@
+-- Tags only enforced uniqueness in application code (the add_tag handler
+-- looks up by name before inserting). The schema itself accepted dupes,
+-- so a divergent code path could land two tags with the same name. Now
+-- that we expose a rename endpoint we want a hard guarantee: case-
+-- insensitive UNIQUE on tags.name.
+
+-- Pre-flight: collapse exact-name duplicates (case-insensitive) onto the
+-- lowest-id row before adding the constraint, otherwise the index
+-- creation fails on any DB that ever produced dupes. On a clean DB this
+-- is a no-op.
+UPDATE tagged_photo
+SET tag_id = (
+    SELECT MIN(t2.id) FROM tags t2
+    WHERE LOWER(t2.name) = LOWER((SELECT name FROM tags WHERE id = tagged_photo.tag_id))
+)
+WHERE tag_id IN (
+    SELECT t.id FROM tags t
+    WHERE t.id <> (
+        SELECT MIN(t2.id) FROM tags t2 WHERE LOWER(t2.name) = LOWER(t.name)
+    )
+);
+
+DELETE FROM tags
+WHERE id <> (
+    SELECT MIN(t2.id) FROM tags t2 WHERE LOWER(t2.name) = LOWER(tags.name)
+);
+
+CREATE UNIQUE INDEX idx_tags_name_nocase ON tags (name COLLATE NOCASE);
--- a/migrations/2026-05-01-000000_hash_keyed_derived_data/down.sql
+++ b/migrations/2026-05-01-000000_hash_keyed_derived_data/down.sql
@@ -0,0 +1,5 @@
+DROP INDEX IF EXISTS idx_photo_insights_content_hash;
+ALTER TABLE photo_insights DROP COLUMN content_hash;
+
+DROP INDEX IF EXISTS idx_tagged_photo_content_hash;
+ALTER TABLE tagged_photo DROP COLUMN content_hash;
--- a/migrations/2026-05-01-000000_hash_keyed_derived_data/up.sql
+++ b/migrations/2026-05-01-000000_hash_keyed_derived_data/up.sql
@@ -0,0 +1,64 @@
+-- Phase B of the multi-library data-model rollout: add a nullable
+-- `content_hash` column to derived/user-intent tables that should follow
+-- the bytes rather than the path. Reads will prefer hash-key joins and
+-- fall back to rel_path while the column is null. A separate
+-- reconciliation pass collapses duplicates as the column populates.
+--
+-- See CLAUDE.md → "Multi-library data model" for the policy. The
+-- reference implementation is `face_detections`, which has been
+-- hash-keyed since it was introduced.
+--
+-- Tables in this migration:
+--   * tagged_photo   — user-intent (tags follow the bytes)
+--   * photo_insights — intrinsic to bytes (LLM-generated description)
+--
+-- favorites is the natural third candidate but its DAO is barely used in
+-- v1 and the row count is tiny; deferring lets this migration stay
+-- focused on the high-volume tables that drive cross-library overhead.
+
+-- ---------------------------------------------------------------------------
+-- tagged_photo
+-- ---------------------------------------------------------------------------
+ALTER TABLE tagged_photo ADD COLUMN content_hash TEXT;
+
+-- Backfill: for each tagged_photo row, find the content_hash for its
+-- rel_path. tagged_photo doesn't carry a library_id, so a rel_path that
+-- exists under multiple libraries with different content is genuinely
+-- ambiguous — we take the first matching image_exif row. The
+-- reconciliation pass at runtime cleans up any rows that resolve
+-- differently once a hash is known per library.
+UPDATE tagged_photo
+SET content_hash = (
+    SELECT content_hash FROM image_exif
+    WHERE image_exif.rel_path = tagged_photo.rel_path
+      AND image_exif.content_hash IS NOT NULL
+    LIMIT 1
+)
+WHERE content_hash IS NULL;
+
+-- Hash-key index. Partial (only non-null rows) to keep the index small
+-- during the transitional window where most rows are still null.
+CREATE INDEX idx_tagged_photo_content_hash
+    ON tagged_photo (content_hash)
+    WHERE content_hash IS NOT NULL;
+
+-- ---------------------------------------------------------------------------
+-- photo_insights
+-- ---------------------------------------------------------------------------
+ALTER TABLE photo_insights ADD COLUMN content_hash TEXT;
+
+-- Backfill keyed on (library_id, rel_path) — photo_insights already
+-- carries library_id, so the resolution is unambiguous.
+UPDATE photo_insights
+SET content_hash = (
+    SELECT content_hash FROM image_exif
+    WHERE image_exif.library_id = photo_insights.library_id
+      AND image_exif.rel_path = photo_insights.rel_path
+      AND image_exif.content_hash IS NOT NULL
+    LIMIT 1
+)
+WHERE content_hash IS NULL;
+
+CREATE INDEX idx_photo_insights_content_hash
+    ON photo_insights (content_hash)
+    WHERE content_hash IS NOT NULL;
--- a/migrations/2026-05-01-100000_libraries_enabled_flag/down.sql
+++ b/migrations/2026-05-01-100000_libraries_enabled_flag/down.sql
@@ -0,0 +1,2 @@
+-- Requires SQLite 3.35+ for ALTER TABLE DROP COLUMN.
+ALTER TABLE libraries DROP COLUMN enabled;
--- a/migrations/2026-05-01-100000_libraries_enabled_flag/up.sql
+++ b/migrations/2026-05-01-100000_libraries_enabled_flag/up.sql
@@ -0,0 +1,14 @@
+-- Operator-controlled kill switch for a library. When `enabled = 0` the
+-- watcher tick skips that library entirely — before the availability
+-- probe, before ingest, before any maintenance pass — and the orphan-GC
+-- all-online check treats it as out-of-scope rather than as a blocker.
+--
+-- The intended workflow is staging a new mount: insert with enabled=0,
+-- verify the row appears in /libraries with enabled=false, then UPDATE
+-- to 1 to start ingest. Same toggle works as a maintenance kill switch
+-- after the fact ("don't keep probing this NAS while I'm rebooting it").
+--
+-- Default 1 so every existing library stays running on upgrade — no
+-- behavior change without an explicit flip.
+
+ALTER TABLE libraries ADD COLUMN enabled BOOLEAN NOT NULL DEFAULT 1;
--- a/migrations/2026-05-01-110000_libraries_excluded_dirs/down.sql
+++ b/migrations/2026-05-01-110000_libraries_excluded_dirs/down.sql
@@ -0,0 +1,2 @@
+-- Requires SQLite 3.35+ for ALTER TABLE DROP COLUMN.
+ALTER TABLE libraries DROP COLUMN excluded_dirs;
--- a/migrations/2026-05-01-110000_libraries_excluded_dirs/up.sql
+++ b/migrations/2026-05-01-110000_libraries_excluded_dirs/up.sql
@@ -0,0 +1,14 @@
+-- Per-library excluded directories.
+--
+-- The global EXCLUDED_DIRS env var is the right knob for excludes that
+-- every library shares (Synology @eaDir, .thumbnails, etc.). It's a
+-- poor fit for "exclude this subtree from THIS library only", which
+-- the natural use case for is mounting a parent directory while
+-- another library already covers a child subtree underneath.
+--
+-- This column is parsed comma-separated, same shape as the env var,
+-- and the watcher / memories / thumbnail walks each apply
+-- (env_globals ∪ library.excluded_dirs) when scanning the library.
+-- NULL = no extra excludes; the global env var still applies.
+
+ALTER TABLE libraries ADD COLUMN excluded_dirs TEXT;
--- a/migrations/2026-05-03-000000_add_perceptual_hash/down.sql
+++ b/migrations/2026-05-03-000000_add_perceptual_hash/down.sql
@@ -0,0 +1,8 @@
+DROP INDEX IF EXISTS idx_image_exif_duplicate_of_hash;
+DROP INDEX IF EXISTS idx_image_exif_dhash;
+DROP INDEX IF EXISTS idx_image_exif_phash;
+
+ALTER TABLE image_exif DROP COLUMN duplicate_decided_at;
+ALTER TABLE image_exif DROP COLUMN duplicate_of_hash;
+ALTER TABLE image_exif DROP COLUMN dhash_64;
+ALTER TABLE image_exif DROP COLUMN phash_64;
--- a/migrations/2026-05-03-000000_add_perceptual_hash/up.sql
+++ b/migrations/2026-05-03-000000_add_perceptual_hash/up.sql
@@ -0,0 +1,41 @@
+-- Adds perceptual-hash signals + soft-mark resolution state to image_exif so
+-- the duplicates surface in Apollo can group near-duplicates (re-encoded,
+-- resized, format-converted copies) and let the user demote losers without
+-- touching the file on disk. Image-only for v1: phash_64/dhash_64 are NULL
+-- on videos and on images that fail to decode. See Apollo CLAUDE.md →
+-- Duplicate detection / Caching layer for the policy.
+--
+-- Soft-mark columns are media-type-agnostic — when video perceptual hashing
+-- arrives, it lives in a separate hash-keyed companion table and reuses the
+-- same duplicate_of_hash / duplicate_decided_at machinery.
+
+-- pHash (DCT, 64-bit) packed as i64 for fast XOR + popcount Hamming.
+ALTER TABLE image_exif ADD COLUMN phash_64 BIGINT;
+
+-- dHash (gradient, 64-bit). Cheap, robust to compression/resize. Stored
+-- alongside pHash so the query layer can fall back if either is null.
+ALTER TABLE image_exif ADD COLUMN dhash_64 BIGINT;
+
+-- When non-null, this row is a soft-marked duplicate of the row whose
+-- content_hash matches. The duplicate file stays on disk; the default
+-- /photos listing filters it out. /photos?include_duplicates=true opts
+-- back in (the Apollo duplicates modal uses this).
+ALTER TABLE image_exif ADD COLUMN duplicate_of_hash TEXT;
+
+-- Unix seconds of the resolve. Distinguishes "never reviewed" from
+-- "reviewed and resolved" for the Apollo include_resolved toggle.
+ALTER TABLE image_exif ADD COLUMN duplicate_decided_at BIGINT;
+
+-- Partial indexes — the columns are NULL for the vast majority of rows
+-- during the transitional window and forever for videos / decode failures.
+CREATE INDEX idx_image_exif_phash
+    ON image_exif (phash_64)
+    WHERE phash_64 IS NOT NULL;
+
+CREATE INDEX idx_image_exif_dhash
+    ON image_exif (dhash_64)
+    WHERE dhash_64 IS NOT NULL;
+
+CREATE INDEX idx_image_exif_duplicate_of_hash
+    ON image_exif (duplicate_of_hash)
+    WHERE duplicate_of_hash IS NOT NULL;
--- a/src/ai/face_client.rs
+++ b/src/ai/face_client.rs
@@ -383,7 +383,10 @@ mod tests {
        // body cap and rejected normal-size photos before they reached
        // the backend.
        assert!(is_transient(&classify_error_response(408, "")));
-        assert!(is_transient(&classify_error_response(413, "<html>nginx</html>")));
+        assert!(is_transient(&classify_error_response(
+            413,
+            "<html>nginx</html>"
+        )));
        assert!(is_transient(&classify_error_response(429, "{}")));
    }

--- a/src/ai/insight_chat.rs
+++ b/src/ai/insight_chat.rs
@@ -521,6 +521,7 @@ impl InsightChatService {
                training_messages: Some(json),
                backend: effective_backend.clone(),
                fewshot_source_ids: None,
+                content_hash: None,
            };
            let cx = opentelemetry::Context::new();
            let mut dao = self.insight_dao.lock().expect("Unable to lock InsightDao");
@@ -983,6 +984,7 @@ impl InsightChatService {
                training_messages: Some(json),
                backend: effective_backend.clone(),
                fewshot_source_ids: None,
+                content_hash: None,
            };
            let cx = opentelemetry::Context::new();
            let mut dao = self.insight_dao.lock().expect("Unable to lock InsightDao");
--- a/src/ai/insight_generator.rs
+++ b/src/ai/insight_generator.rs
@@ -1255,7 +1255,9 @@ impl InsightGenerator {
            .span()
            .set_attribute(KeyValue::new("summary_length", summary.len() as i64));

-        // 11. Store in database
+        // 11. Store in database. content_hash is None here — store_insight
+        // looks it up from image_exif before persisting; reconciliation
+        // backfills if the hash isn't known yet.
        let insight = InsertPhotoInsight {
            library_id: crate::libraries::PRIMARY_LIBRARY_ID,
            file_path: file_path.to_string(),
@@ -1267,6 +1269,7 @@ impl InsightGenerator {
            training_messages: None,
            backend: "local".to_string(),
            fewshot_source_ids: None,
+            content_hash: None,
        };

        let mut dao = self.insight_dao.lock().expect("Unable to lock InsightDao");
@@ -3530,6 +3533,7 @@ Return ONLY the summary, nothing else."#,
            training_messages,
            backend: backend_label.clone(),
            fewshot_source_ids: fewshot_source_ids_json,
+            content_hash: None,
        };

        let stored = {
--- a/src/bin/backfill_perceptual_hash.rs
+++ b/src/bin/backfill_perceptual_hash.rs
@@ -0,0 +1,243 @@
+//! Backfill `image_exif.phash_64` + `dhash_64` for image rows that
+//! were ingested before perceptual hashing was wired into the watcher.
+//!
+//! The watcher computes perceptual hashes for new images as they're
+//! ingested, so this binary is a one-shot for the historical backlog.
+//! Idempotent — only rows with a non-null content_hash and a null
+//! phash are processed, so re-runs are safe and pick up where they
+//! left off (e.g. after a crash or interrupt).
+//!
+//! Image-only by design: `get_rows_missing_perceptual_hash` filters by
+//! file extension at the DB layer so videos and other non-decodable
+//! media are skipped without round-tripping `image_hasher`. Files that
+//! can't be opened (missing on disk, permission errors) are quietly
+//! left as null and counted as "missing"; on next run, if the file is
+//! restored, the row will surface again.
+
+use std::path::Path;
+use std::sync::{Arc, Mutex};
+use std::time::Instant;
+
+use clap::Parser;
+use log::{error, warn};
+use rayon::prelude::*;
+
+use image_api::bin_progress;
+use image_api::database::{ExifDao, SqliteExifDao, connect};
+use image_api::libraries::{self, Library};
+use image_api::perceptual_hash;
+
+#[derive(Parser, Debug)]
+#[command(name = "backfill_perceptual_hash")]
+#[command(about = "Compute pHash + dHash for image_exif rows missing one")]
+struct Args {
+    /// Max rows to hash per batch. The process loops until no rows remain.
+    #[arg(long, default_value_t = 256)]
+    batch_size: i64,
+
+    /// Rayon parallelism override. 0 uses the default thread pool size.
+    #[arg(long, default_value_t = 0)]
+    parallelism: usize,
+
+    /// Dry-run: log what would be hashed without writing to the DB.
+    #[arg(long)]
+    dry_run: bool,
+}
+
+fn main() -> anyhow::Result<()> {
+    env_logger::init();
+    dotenv::dotenv().ok();
+
+    let args = Args::parse();
+    if args.parallelism > 0 {
+        rayon::ThreadPoolBuilder::new()
+            .num_threads(args.parallelism)
+            .build_global()
+            .expect("Unable to configure rayon thread pool");
+    }
+
+    let base_path = dotenv::var("BASE_PATH").ok();
+    let mut seed_conn = connect();
+    if let Some(base) = base_path.as_deref() {
+        libraries::seed_or_patch_from_env(&mut seed_conn, base);
+    }
+    let libs = libraries::load_all(&mut seed_conn);
+    drop(seed_conn);
+    if libs.is_empty() {
+        anyhow::bail!("No libraries configured; cannot backfill perceptual hashes");
+    }
+    let libs_by_id: std::collections::HashMap<i32, Library> =
+        libs.into_iter().map(|lib| (lib.id, lib)).collect();
+    println!(
+        "Configured libraries: {}",
+        libs_by_id
+            .values()
+            .map(|l| format!("{} -> {}", l.name, l.root_path))
+            .collect::<Vec<_>>()
+            .join(", ")
+    );
+
+    let dao: Arc<Mutex<Box<dyn ExifDao>>> = Arc::new(Mutex::new(Box::new(SqliteExifDao::new())));
+    let ctx = opentelemetry::Context::new();
+
+    let mut total_hashed = 0u64;
+    let mut total_missing = 0u64;
+    let mut total_decode_failures = 0u64;
+    let mut total_errors = 0u64;
+    let start = Instant::now();
+
+    let pb = bin_progress::spinner("perceptual-hashing");
+
+    loop {
+        let rows = {
+            let mut guard = dao.lock().expect("Unable to lock ExifDao");
+            guard
+                .get_rows_missing_perceptual_hash(&ctx, args.batch_size)
+                .map_err(|e| anyhow::anyhow!("DB error: {:?}", e))?
+        };
+        if rows.is_empty() {
+            break;
+        }
+        let batch_size = rows.len();
+        pb.set_message(format!(
+            "batch of {} (hashed={} decode_fail={} missing={} errors={})",
+            batch_size, total_hashed, total_decode_failures, total_missing, total_errors
+        ));
+
+        // Compute perceptual hashes in parallel — CPU-bound, decoder
+        // releases the GIL-equivalent. rayon's default thread pool
+        // matches the host's logical-core count which is the right
+        // ceiling for image_hasher's DCT pass.
+        let results: Vec<(i32, String, FilePerceptualResult)> = rows
+            .into_par_iter()
+            .map(|(library_id, rel_path)| {
+                let abs = libs_by_id
+                    .get(&library_id)
+                    .map(|lib| Path::new(&lib.root_path).join(&rel_path));
+                match abs {
+                    Some(abs_path) if abs_path.exists() => {
+                        match perceptual_hash::compute(&abs_path) {
+                            Some(id) => (library_id, rel_path, FilePerceptualResult::Ok(id)),
+                            None => (library_id, rel_path, FilePerceptualResult::DecodeFailed),
+                        }
+                    }
+                    Some(_) => (library_id, rel_path, FilePerceptualResult::MissingOnDisk),
+                    None => {
+                        warn!("Row refers to unknown library_id {}", library_id);
+                        (library_id, rel_path, FilePerceptualResult::MissingOnDisk)
+                    }
+                }
+            })
+            .collect();
+
+        // Persist sequentially — SQLite writes serialize anyway.
+        if !args.dry_run {
+            let mut guard = dao.lock().expect("Unable to lock ExifDao");
+            for (library_id, rel_path, result) in &results {
+                match result {
+                    FilePerceptualResult::Ok(id) => {
+                        match guard.backfill_perceptual_hash(
+                            &ctx,
+                            *library_id,
+                            rel_path,
+                            Some(id.phash_64),
+                            Some(id.dhash_64),
+                        ) {
+                            Ok(_) => {
+                                total_hashed += 1;
+                                pb.inc(1);
+                            }
+                            Err(e) => {
+                                pb.println(format!("persist error for {}: {:?}", rel_path, e));
+                                total_errors += 1;
+                            }
+                        }
+                    }
+                    FilePerceptualResult::DecodeFailed => {
+                        // Persist phash_64=0/dhash_64=0 as a "tried,
+                        // unhashable" sentinel so this row leaves the
+                        // `phash_64 IS NULL` candidate set and the
+                        // backfill doesn't infinite-loop on a queue of
+                        // unbreakable formats (HEIC, RAW, CMYK JPEGs,
+                        // truncated bytes). The all-zero hash is
+                        // explicitly excluded from clustering by
+                        // is_informative_hash in duplicates.rs, so it
+                        // won't pollute group output — it just becomes
+                        // invisible to the duplicate finder.
+                        log::debug!(
+                            "perceptual decode failed for {} (lib {}); marking unhashable",
+                            rel_path,
+                            library_id
+                        );
+                        match guard.backfill_perceptual_hash(
+                            &ctx,
+                            *library_id,
+                            rel_path,
+                            Some(0),
+                            Some(0),
+                        ) {
+                            Ok(_) => {
+                                total_decode_failures += 1;
+                            }
+                            Err(e) => {
+                                pb.println(format!(
+                                    "persist error (decode-fail sentinel) for {}: {:?}",
+                                    rel_path, e
+                                ));
+                                total_errors += 1;
+                            }
+                        }
+                    }
+                    FilePerceptualResult::MissingOnDisk => {
+                        total_missing += 1;
+                    }
+                }
+            }
+        } else {
+            for (_, rel_path, result) in &results {
+                match result {
+                    FilePerceptualResult::Ok(id) => {
+                        pb.println(format!(
+                            "[dry-run] {} -> phash={:016x} dhash={:016x}",
+                            rel_path, id.phash_64, id.dhash_64
+                        ));
+                        total_hashed += 1;
+                        pb.inc(1);
+                    }
+                    FilePerceptualResult::DecodeFailed => {
+                        total_decode_failures += 1;
+                    }
+                    FilePerceptualResult::MissingOnDisk => {
+                        total_missing += 1;
+                    }
+                }
+            }
+            pb.println(format!(
+                "[dry-run] processed one batch of {}. Stopping — a real run would continue \
+                 until no NULL phash_64 image rows remain.",
+                results.len()
+            ));
+            break;
+        }
+    }
+
+    pb.finish_and_clear();
+    println!(
+        "Done. hashed={}, decode_failed={}, skipped (missing on disk)={}, errors={}, elapsed={:.1}s",
+        total_hashed,
+        total_decode_failures,
+        total_missing,
+        total_errors,
+        start.elapsed().as_secs_f64()
+    );
+    if total_errors > 0 {
+        error!("Backfill completed with {} persist errors", total_errors);
+    }
+    Ok(())
+}
+
+enum FilePerceptualResult {
+    Ok(perceptual_hash::PerceptualIdentity),
+    DecodeFailed,
+    MissingOnDisk,
+}
--- a/src/content_hash.rs
+++ b/src/content_hash.rs
@@ -53,12 +53,34 @@ pub fn thumbnail_path(thumbs_dir: &Path, hash: &str) -> PathBuf {
 /// Hash-keyed HLS output directory: `<video_dir>/<hash[..2]>/<hash>/`.
 /// The playlist lives at `playlist.m3u8` inside this directory and its
 /// segments are co-located so HLS relative references Just Work.
+///
+/// Allow-dead until Branch B/C rewires the HLS pipeline to use it; the
+/// helper lives here today so Branch A's path layout decisions stay
+/// adjacent to thumbnail/legacy ones.
 #[allow(dead_code)]
 pub fn hls_dir(video_dir: &Path, hash: &str) -> PathBuf {
    let shard = shard_prefix(hash);
    video_dir.join(shard).join(hash)
 }

+/// Library-scoped legacy mirrored path:
+/// `<derivative_dir>/<library_id>/<rel_path>`. Used as the fallback when
+/// `content_hash` isn't available — the library prefix prevents the
+/// "lib1 wrote `vacation/IMG.jpg` first, lib2 sees thumb_path.exists()
+/// and serves the wrong image" failure mode.
+///
+/// Existing single-library deployments may already have thumbnails at the
+/// bare-legacy `<derivative_dir>/<rel_path>` shape; serving code is
+/// expected to check both this scoped path and the bare-legacy path so
+/// nothing 404s during the transition.
+pub fn library_scoped_legacy_path(
+    derivative_dir: &Path,
+    library_id: i32,
+    rel_path: impl AsRef<Path>,
+) -> PathBuf {
+    derivative_dir.join(library_id.to_string()).join(rel_path)
+}
+
 fn shard_prefix(hash: &str) -> &str {
    let end = hash
        .char_indices()
@@ -105,4 +127,17 @@ mod tests {
        let d = hls_dir(video, "1234deadbeef");
        assert_eq!(d, PathBuf::from("/tmp/video/12/1234deadbeef"));
    }
+
+    #[test]
+    fn library_scoped_legacy_path_prefixes_with_library_id() {
+        let thumbs = Path::new("/tmp/thumbs");
+        let p = library_scoped_legacy_path(thumbs, 7, "vacation/IMG.jpg");
+        assert_eq!(p, PathBuf::from("/tmp/thumbs/7/vacation/IMG.jpg"));
+
+        // Same rel_path, different library — different output. This is
+        // the whole point: lib 1 and lib 2 don't clobber each other.
+        let p1 = library_scoped_legacy_path(thumbs, 1, "vacation/IMG.jpg");
+        let p2 = library_scoped_legacy_path(thumbs, 2, "vacation/IMG.jpg");
+        assert_ne!(p1, p2);
+    }
 }
--- a/src/data/mod.rs
+++ b/src/data/mod.rs
@@ -165,6 +165,15 @@ pub struct FilesRequest {
    /// Optional library filter. Accepts a library id (e.g. "1") or name
    /// (e.g. "main"). When omitted, results span all libraries.
    pub library: Option<String>,
+
+    /// When true, include rows soft-marked as duplicates of another file
+    /// (i.e. `image_exif.duplicate_of_hash IS NOT NULL`). Default false —
+    /// the standard /photos listing hides demoted siblings so the grid
+    /// silently shrinks after a resolve. The Apollo duplicates modal
+    /// passes `true` so it can show both survivors and demoted members
+    /// inside a group.
+    #[serde(default)]
+    pub include_duplicates: Option<bool>,
 }

 #[derive(Copy, Clone, Deserialize, PartialEq, Debug)]
--- a/src/database/insights_dao.rs
+++ b/src/database/insights_dao.rs
@@ -111,13 +111,30 @@ impl InsightDao for SqliteInsightDao {
    fn store_insight(
        &mut self,
        context: &opentelemetry::Context,
-        insight: InsertPhotoInsight,
+        mut insight: InsertPhotoInsight,
    ) -> Result<PhotoInsight, DbError> {
        trace_db_call(context, "insert", "store_insight", |_span| {
            use schema::photo_insights::dsl::*;

            let mut connection = self.connection.lock().expect("Unable to get InsightDao");

+            // Eagerly populate content_hash so this insight follows the
+            // bytes (CLAUDE.md "Multi-library data model"). Caller-
+            // supplied hash wins; otherwise look it up from image_exif
+            // for the (library_id, rel_path) tuple. None is acceptable —
+            // reconciliation backfills it once the hash lands.
+            if insight.content_hash.is_none() {
+                use schema::image_exif as ie;
+                insight.content_hash = ie::table
+                    .filter(ie::library_id.eq(insight.library_id))
+                    .filter(ie::rel_path.eq(&insight.file_path))
+                    .filter(ie::content_hash.is_not_null())
+                    .select(ie::content_hash)
+                    .first::<Option<String>>(connection.deref_mut())
+                    .ok()
+                    .flatten();
+            }
+
            // Mark all existing insights for this file as no longer current
            diesel::update(
                photo_insights
--- a/src/database/mod.rs
+++ b/src/database/mod.rs
@@ -9,6 +9,25 @@ use crate::database::models::{
 };
 use crate::otel::trace_db_call;

+/// Wire shape for a single member of a duplicate group, returned by
+/// `list_duplicates_*` and `lookup_duplicate_row`. Carries everything
+/// the Apollo modal needs to render a member tile and its meta line —
+/// thumbnails are derived from `(library_id, rel_path)` upstream.
+#[derive(Debug, Clone, serde::Serialize)]
+pub struct DuplicateRow {
+    pub library_id: i32,
+    pub rel_path: String,
+    pub content_hash: String,
+    pub size_bytes: Option<i64>,
+    pub date_taken: Option<i64>,
+    pub width: Option<i32>,
+    pub height: Option<i32>,
+    pub phash_64: Option<i64>,
+    pub dhash_64: Option<i64>,
+    pub duplicate_of_hash: Option<String>,
+    pub duplicate_decided_at: Option<i64>,
+}
+
 pub mod calendar_dao;
 pub mod daily_summary_dao;
 pub mod insights_dao;
@@ -16,6 +35,7 @@ pub mod knowledge_dao;
 pub mod location_dao;
 pub mod models;
 pub mod preview_dao;
+pub mod reconcile;
 pub mod schema;
 pub mod search_dao;

@@ -136,10 +156,19 @@ pub fn connect() -> SqliteConnection {
    // rollback-journal durability; we accept the narrow last-fsync
    // window for the 2–10× write throughput).
    use diesel::connection::SimpleConnection;
+    // foreign_keys = ON is per-connection in SQLite (off by default), so
+    // it has to be set here alongside the other pragmas. Without it
+    // every `REFERENCES … ON DELETE CASCADE / SET NULL` clause in the
+    // schema is documentation-only — orphan rows would survive the
+    // referenced row's deletion. With it, the cascade fires
+    // automatically and code that previously did manual two-step
+    // cleanup (delete child rows, then parent) becomes redundant but
+    // still correct.
    conn.batch_execute(
        "PRAGMA journal_mode = WAL; \
         PRAGMA busy_timeout = 5000; \
-         PRAGMA synchronous = NORMAL;",
+         PRAGMA synchronous = NORMAL; \
+         PRAGMA foreign_keys = ON;",
    )
    .expect("set sqlite pragmas");
    conn
@@ -286,17 +315,29 @@ pub trait ExifDao: Sync + Send {
        library_id: Option<i32>,
    ) -> Result<Vec<(String, i64)>, DbError>;

-    /// Batch load EXIF data for multiple file paths (single query)
+    /// Batch load EXIF data for multiple file paths (single query). When
+    /// `library_id = Some(id)` the lookup is keyed on `(library_id,
+    /// rel_path)`; cross-library duplicates with the same rel_path are
+    /// excluded. `None` keeps the legacy rel-path-only behavior — used by
+    /// the union-mode `/photos` listing, which already disambiguates by
+    /// `(file_path, library_id)` in the caller.
    fn get_exif_batch(
        &mut self,
        context: &opentelemetry::Context,
+        library_id: Option<i32>,
        file_paths: &[String],
    ) -> Result<Vec<ImageExif>, DbError>;

-    /// Query files by EXIF criteria with optional filters
+    /// Query files by EXIF criteria with optional filters. `library_id =
+    /// Some(id)` restricts to that library; `None` spans every library
+    /// (used by the unscoped `/photos` form). The composite
+    /// `(library_id, date_taken)` index added in the multi_library
+    /// migration depends on `library_id` being part of the WHERE clause —
+    /// callers that have a library context must pass it.
    fn query_by_exif(
        &mut self,
        context: &opentelemetry::Context,
+        library_id: Option<i32>,
        camera_make: Option<&str>,
        camera_model: Option<&str>,
        lens_model: Option<&str>,
@@ -355,6 +396,104 @@ pub trait ExifDao: Sync + Send {
        size_bytes: i64,
    ) -> Result<(), DbError>;

+    /// Return image rows that have a `content_hash` but no `phash_64`,
+    /// oldest first. Used by the `backfill_perceptual_hash` binary.
+    /// Filters by image extension at the DB layer to avoid ever asking
+    /// `image_hasher` to decode a video. Returns `(library_id, rel_path)`.
+    fn get_rows_missing_perceptual_hash(
+        &mut self,
+        context: &opentelemetry::Context,
+        limit: i64,
+    ) -> Result<Vec<(i32, String)>, DbError>;
+
+    /// Persist computed perceptual hashes (pHash + dHash) for an
+    /// existing image_exif row. Either column may be left NULL by
+    /// passing `None`, but in practice the binary computes both or
+    /// neither — `image_hasher` either decodes the image and produces
+    /// both signals, or fails entirely.
+    fn backfill_perceptual_hash(
+        &mut self,
+        context: &opentelemetry::Context,
+        library_id: i32,
+        rel_path: &str,
+        phash_64: Option<i64>,
+        dhash_64: Option<i64>,
+    ) -> Result<(), DbError>;
+
+    /// Group exact-hash duplicates: rows whose `content_hash` appears
+    /// more than once across the (optionally library-scoped) corpus.
+    /// Returns one [`DuplicateRow`] per member; callers group by
+    /// `content_hash`. When `include_resolved=false`, rows already
+    /// soft-marked (`duplicate_of_hash IS NOT NULL`) are excluded so
+    /// the modal doesn't re-surface decisions the user already made.
+    fn list_duplicates_exact(
+        &mut self,
+        context: &opentelemetry::Context,
+        library_id: Option<i32>,
+        include_resolved: bool,
+    ) -> Result<Vec<DuplicateRow>, DbError>;
+
+    /// Return all rows with a non-null `phash_64` (optionally library-
+    /// scoped), used by the perceptual-cluster routine in
+    /// [`crate::main`] to single-link cluster via Hamming distance.
+    /// Each returned row is a *distinct content_hash* — exact duplicates
+    /// are collapsed at the DB layer so the in-memory clusterer doesn't
+    /// rediscover them.
+    fn list_perceptual_candidates(
+        &mut self,
+        context: &opentelemetry::Context,
+        library_id: Option<i32>,
+        include_resolved: bool,
+    ) -> Result<Vec<DuplicateRow>, DbError>;
+
+    /// Look up a single row's metadata by `(library_id, rel_path)`. Used
+    /// by the resolve endpoint to map the request payload to the
+    /// underlying `content_hash` before writing the soft-mark. Returns
+    /// `Ok(None)` if the file doesn't exist in `image_exif`.
+    fn lookup_duplicate_row(
+        &mut self,
+        context: &opentelemetry::Context,
+        library_id: i32,
+        rel_path: &str,
+    ) -> Result<Option<DuplicateRow>, DbError>;
+
+    /// Soft-mark a file as a duplicate of `survivor_hash`. Sets
+    /// `duplicate_of_hash` and `duplicate_decided_at` on the row(s)
+    /// matching `(library_id, rel_path)`. The file stays on disk; the
+    /// default `/photos` listing hides it because of the
+    /// `duplicate_of_hash IS NULL` filter.
+    fn set_duplicate_of(
+        &mut self,
+        context: &opentelemetry::Context,
+        library_id: i32,
+        rel_path: &str,
+        survivor_hash: &str,
+        decided_at: i64,
+    ) -> Result<(), DbError>;
+
+    /// Reverse a soft-mark: clears `duplicate_of_hash` and
+    /// `duplicate_decided_at`. Used by the modal's UNRESOLVE chip.
+    fn clear_duplicate_of(
+        &mut self,
+        context: &opentelemetry::Context,
+        library_id: i32,
+        rel_path: &str,
+    ) -> Result<(), DbError>;
+
+    /// Union the tags from `demoted_hash` onto `survivor_hash`. Used at
+    /// resolve time for *perceptual* duplicates (different content_hashes,
+    /// independent tag sets) so the user doesn't lose their tagging work
+    /// when promoting a survivor. Idempotent: a tag already on the survivor
+    /// is left alone. Exact duplicates (same content_hash) don't need this
+    /// because their tag rows are already shared.
+    fn union_perceptual_tags(
+        &mut self,
+        context: &opentelemetry::Context,
+        survivor_hash: &str,
+        demoted_hash: &str,
+        survivor_rel_path: &str,
+    ) -> Result<(), DbError>;
+
    /// Return the first EXIF row with the given content hash (any library).
    /// Used by thumbnail/HLS generation to detect pre-existing derivatives
    /// from another library before regenerating.
@@ -418,11 +557,17 @@ pub trait ExifDao: Sync + Send {
    /// `library_ids` is empty, rows from every library are returned. Used by
    /// `/photos` recursive listing to skip the filesystem walk — the watcher
    /// keeps image_exif in parity with disk via the reconciliation pass.
+    ///
+    /// `include_duplicates=false` filters out rows soft-marked with
+    /// `duplicate_of_hash IS NOT NULL` so the default photo listing hides
+    /// demoted siblings; the Apollo duplicates modal passes `true` to
+    /// see both survivors and demoted members inside a group.
    fn list_rel_paths_for_libraries(
        &mut self,
        context: &opentelemetry::Context,
        library_ids: &[i32],
        path_prefix: Option<&str>,
+        include_duplicates: bool,
    ) -> Result<Vec<(i32, String)>, DbError>;

    /// Delete a single image_exif row scoped to `(library_id, rel_path)`.
@@ -434,6 +579,28 @@ pub trait ExifDao: Sync + Send {
        library_id: i32,
        rel_path: &str,
    ) -> Result<(), DbError>;
+
+    /// Number of image_exif rows for a library. Used by the availability
+    /// probe to decide whether an empty mount is "fresh" (zero rows: fine)
+    /// or "the share went offline" (non-zero rows: stale). Zero on query
+    /// error so a transient DB hiccup doesn't itself cause a Stale flip.
+    fn count_for_library(
+        &mut self,
+        context: &opentelemetry::Context,
+        library_id: i32,
+    ) -> Result<i64, DbError>;
+
+    /// Paginated rel_path listing for a single library, ordered by id
+    /// ascending. Used by the missing-file detector to scan a library
+    /// in capped chunks across consecutive watcher ticks rather than
+    /// stat()ing every row every minute. Returns `(id, rel_path)`.
+    fn list_rel_paths_for_library_page(
+        &mut self,
+        context: &opentelemetry::Context,
+        library_id: i32,
+        limit: i64,
+        offset: i64,
+    ) -> Result<Vec<(i32, String)>, DbError>;
 }

 pub struct SqliteExifDao {
@@ -613,6 +780,7 @@ impl ExifDao for SqliteExifDao {
    fn get_exif_batch(
        &mut self,
        context: &opentelemetry::Context,
+        library_id_filter: Option<i32>,
        file_paths: &[String],
    ) -> Result<Vec<ImageExif>, DbError> {
        trace_db_call(context, "query", "get_exif_batch", |_span| {
@@ -623,8 +791,11 @@ impl ExifDao for SqliteExifDao {
            }

            let mut connection = self.connection.lock().expect("Unable to get ExifDao");
-
-            image_exif
+            let mut query = image_exif.into_boxed();
+            if let Some(lib_id) = library_id_filter {
+                query = query.filter(library_id.eq(lib_id));
+            }
+            query
                .filter(rel_path.eq_any(file_paths))
                .load::<ImageExif>(connection.deref_mut())
                .map_err(|_| anyhow::anyhow!("Query error"))
@@ -635,6 +806,7 @@ impl ExifDao for SqliteExifDao {
    fn query_by_exif(
        &mut self,
        context: &opentelemetry::Context,
+        library_id_filter: Option<i32>,
        camera_make_filter: Option<&str>,
        camera_model_filter: Option<&str>,
        lens_model_filter: Option<&str>,
@@ -648,6 +820,12 @@ impl ExifDao for SqliteExifDao {
            let mut connection = self.connection.lock().expect("Unable to get ExifDao");
            let mut query = image_exif.into_boxed();

+            // Library scope (most-selective filter — apply first so the
+            // `(library_id, ...)` indexes are eligible).
+            if let Some(lib_id) = library_id_filter {
+                query = query.filter(library_id.eq(lib_id));
+            }
+
            // Camera filters (case-insensitive partial match)
            if let Some(make) = camera_make_filter {
                query = query.filter(camera_make.like(format!("%{}%", make)));
@@ -1022,6 +1200,7 @@ impl ExifDao for SqliteExifDao {
        context: &opentelemetry::Context,
        library_ids: &[i32],
        path_prefix: Option<&str>,
+        include_duplicates: bool,
    ) -> Result<Vec<(i32, String)>, DbError> {
        trace_db_call(context, "query", "list_rel_paths_for_libraries", |_span| {
            use schema::image_exif::dsl::*;
@@ -1042,6 +1221,41 @@ impl ExifDao for SqliteExifDao {
                query = query.filter(rel_path.like(pattern).escape('\\'));
            }

+            if !include_duplicates {
+                if library_ids.is_empty() {
+                    // Unscoped (all-libraries) view — every survivor is
+                    // reachable somewhere, so a soft-marked row is
+                    // genuinely a duplicate from the user's perspective.
+                    // Hide it.
+                    query = query.filter(duplicate_of_hash.is_null());
+                } else {
+                    // Scoped to specific libraries: only hide a
+                    // soft-marked row when the survivor is reachable
+                    // *in this view*. If the survivor lives in a
+                    // library the user can't see right now, the
+                    // demoted file is the only copy of those bytes
+                    // they have access to — keep it visible.
+                    //
+                    // Implemented as a correlated NOT EXISTS subquery
+                    // over an aliased image_exif. Library ids are i32
+                    // so format!-inlining the integer list is safe.
+                    use diesel::sql_types::Bool;
+                    let lib_list = library_ids
+                        .iter()
+                        .map(i32::to_string)
+                        .collect::<Vec<_>>()
+                        .join(",");
+                    let raw = format!(
+                        "(image_exif.duplicate_of_hash IS NULL OR NOT EXISTS \
+                         (SELECT 1 FROM image_exif AS survivor \
+                          WHERE survivor.content_hash = image_exif.duplicate_of_hash \
+                            AND survivor.library_id IN ({})))",
+                        lib_list
+                    );
+                    query = query.filter(diesel::dsl::sql::<Bool>(&raw));
+                }
+            }
+
            query
                .load::<(i32, String)>(connection.deref_mut())
                .map_err(|_| anyhow::anyhow!("Query error"))
@@ -1069,6 +1283,465 @@ impl ExifDao for SqliteExifDao {
        })
        .map_err(|_| DbError::new(DbErrorKind::QueryError))
    }
+
+    fn count_for_library(
+        &mut self,
+        context: &opentelemetry::Context,
+        library_id_val: i32,
+    ) -> Result<i64, DbError> {
+        trace_db_call(context, "query", "count_for_library", |_span| {
+            use schema::image_exif::dsl::*;
+
+            image_exif
+                .filter(library_id.eq(library_id_val))
+                .count()
+                .get_result::<i64>(self.connection.lock().unwrap().deref_mut())
+                .map_err(|_| anyhow::anyhow!("Count error"))
+        })
+        .map_err(|_| DbError::new(DbErrorKind::QueryError))
+    }
+
+    fn list_rel_paths_for_library_page(
+        &mut self,
+        context: &opentelemetry::Context,
+        library_id_val: i32,
+        limit: i64,
+        offset: i64,
+    ) -> Result<Vec<(i32, String)>, DbError> {
+        trace_db_call(
+            context,
+            "query",
+            "list_rel_paths_for_library_page",
+            |_span| {
+                use schema::image_exif::dsl::*;
+
+                image_exif
+                    .filter(library_id.eq(library_id_val))
+                    .order(id.asc())
+                    .select((id, rel_path))
+                    .limit(limit)
+                    .offset(offset)
+                    .load::<(i32, String)>(self.connection.lock().unwrap().deref_mut())
+                    .map_err(|_| anyhow::anyhow!("Query error"))
+            },
+        )
+        .map_err(|_| DbError::new(DbErrorKind::QueryError))
+    }
+
+    fn get_rows_missing_perceptual_hash(
+        &mut self,
+        context: &opentelemetry::Context,
+        limit: i64,
+    ) -> Result<Vec<(i32, String)>, DbError> {
+        trace_db_call(
+            context,
+            "query",
+            "get_rows_missing_perceptual_hash",
+            |_span| {
+                use schema::image_exif::dsl::*;
+
+                let mut connection = self.connection.lock().expect("Unable to get ExifDao");
+
+                // Image-only filter via extension. Videos and decode-failures
+                // would always come back NULL otherwise and the binary would
+                // grind through them on every run. The list mirrors the file
+                // formats `image` 0.25 / `image_hasher` 3.x can decode.
+                image_exif
+                    .filter(content_hash.is_not_null())
+                    .filter(phash_64.is_null())
+                    .filter(
+                        rel_path
+                            .like("%.jpg")
+                            .or(rel_path.like("%.jpeg"))
+                            .or(rel_path.like("%.JPG"))
+                            .or(rel_path.like("%.JPEG"))
+                            .or(rel_path.like("%.png"))
+                            .or(rel_path.like("%.PNG"))
+                            .or(rel_path.like("%.webp"))
+                            .or(rel_path.like("%.WEBP"))
+                            .or(rel_path.like("%.tif"))
+                            .or(rel_path.like("%.tiff"))
+                            .or(rel_path.like("%.TIF"))
+                            .or(rel_path.like("%.TIFF"))
+                            .or(rel_path.like("%.avif"))
+                            .or(rel_path.like("%.AVIF")),
+                    )
+                    .select((library_id, rel_path))
+                    .order(id.asc())
+                    .limit(limit)
+                    .load::<(i32, String)>(connection.deref_mut())
+                    .map_err(|_| anyhow::anyhow!("Query error"))
+            },
+        )
+        .map_err(|_| DbError::new(DbErrorKind::QueryError))
+    }
+
+    fn backfill_perceptual_hash(
+        &mut self,
+        context: &opentelemetry::Context,
+        library_id_val: i32,
+        rel_path_val: &str,
+        phash_val: Option<i64>,
+        dhash_val: Option<i64>,
+    ) -> Result<(), DbError> {
+        trace_db_call(context, "update", "backfill_perceptual_hash", |_span| {
+            use schema::image_exif::dsl::*;
+
+            let mut connection = self.connection.lock().expect("Unable to get ExifDao");
+
+            diesel::update(
+                image_exif
+                    .filter(library_id.eq(library_id_val))
+                    .filter(rel_path.eq(rel_path_val)),
+            )
+            .set((phash_64.eq(phash_val), dhash_64.eq(dhash_val)))
+            .execute(connection.deref_mut())
+            .map(|_| ())
+            .map_err(|_| anyhow::anyhow!("Update error"))
+        })
+        .map_err(|_| DbError::new(DbErrorKind::UpdateError))
+    }
+
+    fn list_duplicates_exact(
+        &mut self,
+        context: &opentelemetry::Context,
+        library_id_filter: Option<i32>,
+        include_resolved: bool,
+    ) -> Result<Vec<DuplicateRow>, DbError> {
+        trace_db_call(context, "query", "list_duplicates_exact", |_span| {
+            // Sub-select the content_hashes that appear more than once
+            // (optionally library-scoped), then load the full member rows
+            // for those hashes ordered by hash + library + path so the
+            // caller can stream-group without buffering the full dataset.
+            let mut connection = self.connection.lock().expect("Unable to get ExifDao");
+
+            // Step 1: hashes with count > 1.
+            let dup_hashes: Vec<String> = {
+                use schema::image_exif::dsl::*;
+                let mut q = image_exif
+                    .filter(content_hash.is_not_null())
+                    .group_by(content_hash)
+                    .select(content_hash.assume_not_null())
+                    .having(diesel::dsl::count_star().gt(1))
+                    .into_boxed();
+                if let Some(lib) = library_id_filter {
+                    q = q.filter(library_id.eq(lib));
+                }
+                q.load::<String>(connection.deref_mut())
+                    .map_err(|_| anyhow::anyhow!("Query error"))?
+            };
+
+            if dup_hashes.is_empty() {
+                return Ok(Vec::new());
+            }
+
+            // Step 2: every member row for those hashes.
+            use schema::image_exif::dsl::*;
+            let mut q = image_exif
+                .filter(content_hash.eq_any(&dup_hashes))
+                .select((
+                    library_id,
+                    rel_path,
+                    content_hash.assume_not_null(),
+                    size_bytes,
+                    date_taken,
+                    width,
+                    height,
+                    phash_64,
+                    dhash_64,
+                    duplicate_of_hash,
+                    duplicate_decided_at,
+                ))
+                .order((content_hash.asc(), library_id.asc(), rel_path.asc()))
+                .into_boxed();
+            if let Some(lib) = library_id_filter {
+                q = q.filter(library_id.eq(lib));
+            }
+            if !include_resolved {
+                q = q.filter(duplicate_of_hash.is_null());
+            }
+
+            let rows: Vec<(
+                i32,
+                String,
+                String,
+                Option<i64>,
+                Option<i64>,
+                Option<i32>,
+                Option<i32>,
+                Option<i64>,
+                Option<i64>,
+                Option<String>,
+                Option<i64>,
+            )> = q
+                .load(connection.deref_mut())
+                .map_err(|_| anyhow::anyhow!("Query error"))?;
+
+            Ok(rows
+                .into_iter()
+                .map(|r| DuplicateRow {
+                    library_id: r.0,
+                    rel_path: r.1,
+                    content_hash: r.2,
+                    size_bytes: r.3,
+                    date_taken: r.4,
+                    width: r.5,
+                    height: r.6,
+                    phash_64: r.7,
+                    dhash_64: r.8,
+                    duplicate_of_hash: r.9,
+                    duplicate_decided_at: r.10,
+                })
+                .collect())
+        })
+        .map_err(|_| DbError::new(DbErrorKind::QueryError))
+    }
+
+    fn list_perceptual_candidates(
+        &mut self,
+        context: &opentelemetry::Context,
+        library_id_filter: Option<i32>,
+        include_resolved: bool,
+    ) -> Result<Vec<DuplicateRow>, DbError> {
+        trace_db_call(context, "query", "list_perceptual_candidates", |_span| {
+            use schema::image_exif::dsl::*;
+
+            let mut connection = self.connection.lock().expect("Unable to get ExifDao");
+
+            // For perceptual candidates we want one canonical row per
+            // distinct content_hash — exact dups are clustered by the
+            // exact-dup query and would only pollute the perceptual
+            // graph with zero-distance edges. Diesel doesn't have a
+            // clean `DISTINCT ON`, so we load every row and dedup
+            // client-side keyed on content_hash. The result set is small
+            // (only rows with a phash) and the cost is negligible vs
+            // the BK-tree clustering that follows.
+            let mut q = image_exif
+                .filter(content_hash.is_not_null())
+                .filter(phash_64.is_not_null())
+                .select((
+                    library_id,
+                    rel_path,
+                    content_hash.assume_not_null(),
+                    size_bytes,
+                    date_taken,
+                    width,
+                    height,
+                    phash_64,
+                    dhash_64,
+                    duplicate_of_hash,
+                    duplicate_decided_at,
+                ))
+                .order((content_hash.asc(), library_id.asc(), rel_path.asc()))
+                .into_boxed();
+
+            if let Some(lib) = library_id_filter {
+                q = q.filter(library_id.eq(lib));
+            }
+            if !include_resolved {
+                q = q.filter(duplicate_of_hash.is_null());
+            }
+
+            let rows: Vec<(
+                i32,
+                String,
+                String,
+                Option<i64>,
+                Option<i64>,
+                Option<i32>,
+                Option<i32>,
+                Option<i64>,
+                Option<i64>,
+                Option<String>,
+                Option<i64>,
+            )> = q
+                .load(connection.deref_mut())
+                .map_err(|_| anyhow::anyhow!("Query error"))?;
+
+            // Dedup keyed on content_hash, keeping the first occurrence
+            // (deterministic by the SQL ORDER BY: lowest library_id,
+            // then lexicographically smallest rel_path).
+            let mut seen = std::collections::HashSet::new();
+            let mut out = Vec::with_capacity(rows.len());
+            for r in rows {
+                if seen.insert(r.2.clone()) {
+                    out.push(DuplicateRow {
+                        library_id: r.0,
+                        rel_path: r.1,
+                        content_hash: r.2,
+                        size_bytes: r.3,
+                        date_taken: r.4,
+                        width: r.5,
+                        height: r.6,
+                        phash_64: r.7,
+                        dhash_64: r.8,
+                        duplicate_of_hash: r.9,
+                        duplicate_decided_at: r.10,
+                    });
+                }
+            }
+            Ok(out)
+        })
+        .map_err(|_| DbError::new(DbErrorKind::QueryError))
+    }
+
+    fn lookup_duplicate_row(
+        &mut self,
+        context: &opentelemetry::Context,
+        library_id_val: i32,
+        rel_path_val: &str,
+    ) -> Result<Option<DuplicateRow>, DbError> {
+        trace_db_call(context, "query", "lookup_duplicate_row", |_span| {
+            use schema::image_exif::dsl::*;
+
+            let mut connection = self.connection.lock().expect("Unable to get ExifDao");
+
+            image_exif
+                .filter(library_id.eq(library_id_val))
+                .filter(rel_path.eq(rel_path_val))
+                .filter(content_hash.is_not_null())
+                .select((
+                    library_id,
+                    rel_path,
+                    content_hash.assume_not_null(),
+                    size_bytes,
+                    date_taken,
+                    width,
+                    height,
+                    phash_64,
+                    dhash_64,
+                    duplicate_of_hash,
+                    duplicate_decided_at,
+                ))
+                .first::<(
+                    i32,
+                    String,
+                    String,
+                    Option<i64>,
+                    Option<i64>,
+                    Option<i32>,
+                    Option<i32>,
+                    Option<i64>,
+                    Option<i64>,
+                    Option<String>,
+                    Option<i64>,
+                )>(connection.deref_mut())
+                .optional()
+                .map(|opt| {
+                    opt.map(|r| DuplicateRow {
+                        library_id: r.0,
+                        rel_path: r.1,
+                        content_hash: r.2,
+                        size_bytes: r.3,
+                        date_taken: r.4,
+                        width: r.5,
+                        height: r.6,
+                        phash_64: r.7,
+                        dhash_64: r.8,
+                        duplicate_of_hash: r.9,
+                        duplicate_decided_at: r.10,
+                    })
+                })
+                .map_err(|_| anyhow::anyhow!("Query error"))
+        })
+        .map_err(|_| DbError::new(DbErrorKind::QueryError))
+    }
+
+    fn set_duplicate_of(
+        &mut self,
+        context: &opentelemetry::Context,
+        library_id_val: i32,
+        rel_path_val: &str,
+        survivor_hash: &str,
+        decided_at: i64,
+    ) -> Result<(), DbError> {
+        trace_db_call(context, "update", "set_duplicate_of", |_span| {
+            use schema::image_exif::dsl::*;
+
+            let mut connection = self.connection.lock().expect("Unable to get ExifDao");
+
+            diesel::update(
+                image_exif
+                    .filter(library_id.eq(library_id_val))
+                    .filter(rel_path.eq(rel_path_val)),
+            )
+            .set((
+                duplicate_of_hash.eq(survivor_hash),
+                duplicate_decided_at.eq(decided_at),
+            ))
+            .execute(connection.deref_mut())
+            .map(|_| ())
+            .map_err(|_| anyhow::anyhow!("Update error"))
+        })
+        .map_err(|_| DbError::new(DbErrorKind::UpdateError))
+    }
+
+    fn clear_duplicate_of(
+        &mut self,
+        context: &opentelemetry::Context,
+        library_id_val: i32,
+        rel_path_val: &str,
+    ) -> Result<(), DbError> {
+        trace_db_call(context, "update", "clear_duplicate_of", |_span| {
+            use schema::image_exif::dsl::*;
+
+            let mut connection = self.connection.lock().expect("Unable to get ExifDao");
+
+            diesel::update(
+                image_exif
+                    .filter(library_id.eq(library_id_val))
+                    .filter(rel_path.eq(rel_path_val)),
+            )
+            .set((
+                duplicate_of_hash.eq::<Option<String>>(None),
+                duplicate_decided_at.eq::<Option<i64>>(None),
+            ))
+            .execute(connection.deref_mut())
+            .map(|_| ())
+            .map_err(|_| anyhow::anyhow!("Update error"))
+        })
+        .map_err(|_| DbError::new(DbErrorKind::UpdateError))
+    }
+
+    fn union_perceptual_tags(
+        &mut self,
+        context: &opentelemetry::Context,
+        survivor_hash: &str,
+        demoted_hash: &str,
+        survivor_rel_path: &str,
+    ) -> Result<(), DbError> {
+        trace_db_call(context, "update", "union_perceptual_tags", |_span| {
+            // INSERT OR IGNORE handles two relevant uniqueness paths:
+            //   - tagged_photo (rel_path, tag_id) is the historical key,
+            //     so existing tag rows under the survivor's path collide
+            //     and stay put.
+            //   - The (rel_path, tag_id) collision is the one that
+            //     matters for idempotence; (content_hash, tag_id) at the
+            //     bytes level isn't enforced by SQLite but the read path
+            //     dedups on it, so an extra row would be cosmetic.
+            // Tags whose rel_path differs are inserted, picking up the
+            // survivor's content_hash so they live under the right bytes.
+            let mut connection = self.connection.lock().expect("Unable to get ExifDao");
+
+            diesel::sql_query(
+                "INSERT OR IGNORE INTO tagged_photo (rel_path, tag_id, created_time, content_hash) \
+                 SELECT ?, tag_id, strftime('%s','now'), ? \
+                 FROM tagged_photo \
+                 WHERE content_hash = ? \
+                   AND tag_id NOT IN ( \
+                       SELECT tag_id FROM tagged_photo WHERE content_hash = ? \
+                   )",
+            )
+            .bind::<diesel::sql_types::Text, _>(survivor_rel_path)
+            .bind::<diesel::sql_types::Text, _>(survivor_hash)
+            .bind::<diesel::sql_types::Text, _>(demoted_hash)
+            .bind::<diesel::sql_types::Text, _>(survivor_hash)
+            .execute(connection.deref_mut())
+            .map(|_| ())
+            .map_err(|_| anyhow::anyhow!("Tag union error"))
+        })
+        .map_err(|_| DbError::new(DbErrorKind::UpdateError))
+    }
 }

 #[cfg(test)]
@@ -1105,6 +1778,8 @@ mod exif_dao_tests {
                last_modified: 0,
                content_hash: None,
                size_bytes: None,
+                phash_64: None,
+                dhash_64: None,
            },
        )
        .expect("insert exif row");
@@ -1118,6 +1793,8 @@ mod exif_dao_tests {
                name: "archive",
                root_path: "/tmp/archive",
                created_at: 0,
+                enabled: true,
+                excluded_dirs: None,
            })
            .execute(&mut conn)
            .expect("seed second library");
@@ -1158,4 +1835,61 @@ mod exif_dao_tests {
        let lib1 = dao.get_all_with_date_taken(&ctx(), Some(1)).unwrap();
        assert_eq!(lib1, vec![("main/a.jpg".to_string(), 100)]);
    }
+
+    #[test]
+    fn query_by_exif_scopes_by_library_id() {
+        let mut dao = setup_two_libraries();
+        insert_row(&mut dao, 1, "main/a.jpg", Some(100));
+        insert_row(&mut dao, 2, "archive/a.jpg", Some(200));
+
+        // Union: both rows.
+        let all = dao
+            .query_by_exif(&ctx(), None, None, None, None, None, None, None)
+            .unwrap();
+        assert_eq!(all.len(), 2);
+
+        // Scoped to lib 2: only archive row.
+        let lib2 = dao
+            .query_by_exif(&ctx(), Some(2), None, None, None, None, None, None)
+            .unwrap();
+        assert_eq!(lib2.len(), 1);
+        assert_eq!(lib2[0].file_path, "archive/a.jpg");
+        assert_eq!(lib2[0].library_id, 2);
+    }
+
+    #[test]
+    fn get_exif_batch_scopes_by_library_id() {
+        let mut dao = setup_two_libraries();
+        // Same rel_path, different libraries — the cross-library duplicate
+        // case the audit flagged.
+        insert_row(&mut dao, 1, "shared/photo.jpg", Some(100));
+        insert_row(&mut dao, 2, "shared/photo.jpg", Some(200));
+
+        // None spans both libraries (legacy union behavior).
+        let union = dao
+            .get_exif_batch(&ctx(), None, &["shared/photo.jpg".to_string()])
+            .unwrap();
+        assert_eq!(union.len(), 2);
+
+        // Some(2) returns only the archive row.
+        let scoped = dao
+            .get_exif_batch(&ctx(), Some(2), &["shared/photo.jpg".to_string()])
+            .unwrap();
+        assert_eq!(scoped.len(), 1);
+        assert_eq!(scoped[0].library_id, 2);
+        assert_eq!(scoped[0].date_taken, Some(200));
+    }
+
+    #[test]
+    fn count_for_library_returns_per_library_count() {
+        let mut dao = setup_two_libraries();
+        insert_row(&mut dao, 1, "main/a.jpg", None);
+        insert_row(&mut dao, 1, "main/b.jpg", None);
+        insert_row(&mut dao, 2, "archive/a.jpg", None);
+
+        assert_eq!(dao.count_for_library(&ctx(), 1).unwrap(), 2);
+        assert_eq!(dao.count_for_library(&ctx(), 2).unwrap(), 1);
+        // Unknown library: zero, no error.
+        assert_eq!(dao.count_for_library(&ctx(), 999).unwrap(), 0);
+    }
 }
--- a/src/database/models.rs
+++ b/src/database/models.rs
@@ -59,6 +59,10 @@ pub struct InsertImageExif {
    pub last_modified: i64,
    pub content_hash: Option<String>,
    pub size_bytes: Option<i64>,
+    /// 64-bit pHash (DCT) packed as i64. NULL for videos and decode failures.
+    pub phash_64: Option<i64>,
+    /// 64-bit dHash (gradient). NULL for videos and decode failures.
+    pub dhash_64: Option<i64>,
 }

 // Field order matches the post-migration column order in `image_exif`.
@@ -86,6 +90,14 @@ pub struct ImageExif {
    pub last_modified: i64,
    pub content_hash: Option<String>,
    pub size_bytes: Option<i64>,
+    pub phash_64: Option<i64>,
+    pub dhash_64: Option<i64>,
+    /// When non-null, this row is a soft-marked duplicate of the file
+    /// whose `content_hash` matches this value. The default `/photos`
+    /// listing filters such rows out.
+    pub duplicate_of_hash: Option<String>,
+    /// Unix seconds at which the resolve was committed.
+    pub duplicate_decided_at: Option<i64>,
 }

 #[derive(Insertable)]
@@ -108,6 +120,13 @@ pub struct InsertPhotoInsight {
    /// generation). Used downstream to filter out contaminated rows when
    /// assembling an unbiased training / evaluation set.
    pub fewshot_source_ids: Option<String>,
+    /// Bytes-keyed identity. When present, this insight is considered
+    /// to belong to the content rather than the path — see CLAUDE.md
+    /// "Multi-library data model". The DAO populates this from
+    /// `image_exif.content_hash` at insert time when known; rows
+    /// inserted before the hash is available stay null and the
+    /// reconciliation pass backfills them.
+    pub content_hash: Option<String>,
 }

 #[derive(Serialize, Queryable, Clone, Debug)]
@@ -126,6 +145,7 @@ pub struct PhotoInsight {
    /// `"local"` (Ollama with images) | `"hybrid"` (local vision + OpenRouter chat).
    pub backend: String,
    pub fewshot_source_ids: Option<String>,
+    pub content_hash: Option<String>,
 }

 // --- Libraries ---
@@ -136,6 +156,20 @@ pub struct LibraryRow {
    pub name: String,
    pub root_path: String,
    pub created_at: i64,
+    /// Operator kill switch. `false` = the watcher skips this library
+    /// entirely (no probe, no ingest, no maintenance) and orphan-GC
+    /// treats it as out-of-scope for the all-online consensus rule.
+    /// Toggle via SQL today — there is intentionally no HTTP endpoint
+    /// for library mutation (see CLAUDE.md "Multi-library data model").
+    pub enabled: bool,
+    /// Per-library excluded paths/patterns, stored comma-separated
+    /// (same shape as the global `EXCLUDED_DIRS` env var). NULL = no
+    /// extra excludes for this library; the global env var still
+    /// applies. The runtime `Library` struct parses this into a
+    /// `Vec<String>` and the walker applies the union of (global,
+    /// library) excludes when scanning. Use case: mount a parent
+    /// directory while another library covers a child subtree.
+    pub excluded_dirs: Option<String>,
 }

 #[derive(Insertable)]
@@ -144,6 +178,8 @@ pub struct InsertLibrary<'a> {
    pub name: &'a str,
    pub root_path: &'a str,
    pub created_at: i64,
+    pub enabled: bool,
+    pub excluded_dirs: Option<&'a str>,
 }

 // --- Knowledge memory models ---
--- a/src/database/reconcile.rs
+++ b/src/database/reconcile.rs
@@ -0,0 +1,382 @@
+//! Reconciliation pass for hash-keyed derived data.
+//!
+//! As `backfill_unhashed_backlog` populates `image_exif.content_hash`
+//! for legacy rows, we want the matching `tagged_photo` and
+//! `photo_insights` rows — which were inserted before the hash was
+//! known — to inherit the hash too. Otherwise reads keep falling back
+//! to the rel_path path even when a hash is now available.
+//!
+//! Two passes:
+//!   1. **Hash backfill** — for every `tagged_photo` / `photo_insights`
+//!      row with NULL `content_hash`, look up the matching
+//!      `image_exif.content_hash` and write it. SQL-only; idempotent;
+//!      a no-op once everything is hashed.
+//!   2. **Insight scalar merge** — when multiple `photo_insights` rows
+//!      share a `content_hash` with `is_current = true`, only the
+//!      earliest `generated_at` keeps `is_current = true` (per the
+//!      "earliest wins" rule in CLAUDE.md → "Multi-library data
+//!      model"). Others are demoted, not deleted, so they remain
+//!      visible in history endpoints.
+//!
+//! Tags are set-valued under the policy (union on read), so there's no
+//! analogous "collapse" pass — duplicate `(tag_id, content_hash)` rows
+//! across libraries are harmless and correctly de-duped at read time
+//! by the existing `DISTINCT` queries.
+//!
+//! The pass operates on the database alone — no filesystem access —
+//! so it doesn't need the library availability gate.
+
+// The lib doesn't call into this module directly — the watcher (in the
+// bin) does. Dead-code analysis at the lib level can't see that, so
+// suppress at the module level. Tests still exercise every function.
+#![allow(dead_code)]
+
+use diesel::prelude::*;
+use diesel::sql_query;
+use diesel::sqlite::SqliteConnection;
+use log::{debug, info, warn};
+
+/// Outcome of a reconciliation tick. Tracked so the watcher can log
+/// progress when something changed and stay quiet when nothing did.
+#[derive(Debug, Default, Clone, Copy, PartialEq, Eq)]
+pub struct ReconcileStats {
+    pub tagged_photo_hashes_filled: usize,
+    pub photo_insights_hashes_filled: usize,
+    pub photo_insights_demoted: usize,
+}
+
+impl ReconcileStats {
+    pub fn changed(&self) -> bool {
+        self.tagged_photo_hashes_filled > 0
+            || self.photo_insights_hashes_filled > 0
+            || self.photo_insights_demoted > 0
+    }
+}
+
+/// Run the reconciliation pass. Idempotent — safe to call on every
+/// watcher tick. Errors are logged but never propagated; reconciliation
+/// is best-effort and a transient DB hiccup must not stall the watcher.
+pub fn run(conn: &mut SqliteConnection) -> ReconcileStats {
+    let mut stats = ReconcileStats::default();
+
+    stats.tagged_photo_hashes_filled = match backfill_tagged_photo_hashes(conn) {
+        Ok(n) => n,
+        Err(e) => {
+            warn!("reconcile: tagged_photo hash backfill failed: {:?}", e);
+            0
+        }
+    };
+
+    stats.photo_insights_hashes_filled = match backfill_photo_insights_hashes(conn) {
+        Ok(n) => n,
+        Err(e) => {
+            warn!("reconcile: photo_insights hash backfill failed: {:?}", e);
+            0
+        }
+    };
+
+    stats.photo_insights_demoted = match collapse_insight_currents(conn) {
+        Ok(n) => n,
+        Err(e) => {
+            warn!("reconcile: photo_insights scalar merge failed: {:?}", e);
+            0
+        }
+    };
+
+    if stats.changed() {
+        info!(
+            "reconcile: filled {} tagged_photo hash(es), {} photo_insights hash(es); demoted {} non-current insight row(s)",
+            stats.tagged_photo_hashes_filled,
+            stats.photo_insights_hashes_filled,
+            stats.photo_insights_demoted,
+        );
+    } else {
+        debug!("reconcile: no changes this tick");
+    }
+
+    stats
+}
+
+/// Populate `tagged_photo.content_hash` for any row that still has
+/// NULL by joining on `rel_path` against `image_exif`. tagged_photo
+/// doesn't carry `library_id`, so a path that exists under multiple
+/// libraries with different content is genuinely ambiguous; we pick
+/// any non-null hash for that path. Same trade-off as the migration
+/// backfill — see `migrations/2026-05-01-000000_hash_keyed_derived_data`.
+fn backfill_tagged_photo_hashes(conn: &mut SqliteConnection) -> QueryResult<usize> {
+    sql_query(
+        "UPDATE tagged_photo \
+         SET content_hash = ( \
+             SELECT content_hash FROM image_exif \
+             WHERE image_exif.rel_path = tagged_photo.rel_path \
+               AND image_exif.content_hash IS NOT NULL \
+             LIMIT 1 \
+         ) \
+         WHERE content_hash IS NULL \
+           AND EXISTS ( \
+               SELECT 1 FROM image_exif \
+               WHERE image_exif.rel_path = tagged_photo.rel_path \
+                 AND image_exif.content_hash IS NOT NULL \
+           )",
+    )
+    .execute(conn)
+}
+
+/// Populate `photo_insights.content_hash` from `image_exif`, keyed on
+/// `(library_id, rel_path)`. Unambiguous because photo_insights carries
+/// library_id.
+fn backfill_photo_insights_hashes(conn: &mut SqliteConnection) -> QueryResult<usize> {
+    sql_query(
+        "UPDATE photo_insights \
+         SET content_hash = ( \
+             SELECT content_hash FROM image_exif \
+             WHERE image_exif.library_id = photo_insights.library_id \
+               AND image_exif.rel_path = photo_insights.rel_path \
+               AND image_exif.content_hash IS NOT NULL \
+             LIMIT 1 \
+         ) \
+         WHERE content_hash IS NULL \
+           AND EXISTS ( \
+               SELECT 1 FROM image_exif \
+               WHERE image_exif.library_id = photo_insights.library_id \
+                 AND image_exif.rel_path = photo_insights.rel_path \
+                 AND image_exif.content_hash IS NOT NULL \
+           )",
+    )
+    .execute(conn)
+}
+
+/// Scalar-merge step: when multiple rows share a `content_hash` and
+/// claim `is_current = true`, demote all but the earliest by
+/// `generated_at` (ties broken by lowest id, deterministic).
+///
+/// Demoted rows keep their data — only `is_current` flips. Clients that
+/// hit `/insights/history` still see the full sequence; only the
+/// "current" pointer is unique per hash.
+fn collapse_insight_currents(conn: &mut SqliteConnection) -> QueryResult<usize> {
+    sql_query(
+        "UPDATE photo_insights \
+         SET is_current = 0 \
+         WHERE is_current = 1 \
+           AND content_hash IS NOT NULL \
+           AND id NOT IN ( \
+               SELECT MIN(p2.id) FROM photo_insights p2 \
+               WHERE p2.is_current = 1 \
+                 AND p2.content_hash = photo_insights.content_hash \
+                 AND p2.generated_at = ( \
+                     SELECT MIN(p3.generated_at) FROM photo_insights p3 \
+                     WHERE p3.is_current = 1 \
+                       AND p3.content_hash = p2.content_hash \
+                 ) \
+           )",
+    )
+    .execute(conn)
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+    use crate::database::test::in_memory_db_connection;
+
+    fn ensure_library(conn: &mut SqliteConnection, library_id: i32) {
+        // Migration seeds library id=1; tests that reference id>1 must
+        // create those rows themselves, otherwise FK enforcement (added
+        // in the tags-edit migration) rejects image_exif inserts.
+        diesel::sql_query(
+            "INSERT OR IGNORE INTO libraries (id, name, root_path, created_at) \
+             VALUES (?, 'test-' || ?, '/tmp/test-' || ?, 0)",
+        )
+        .bind::<diesel::sql_types::Integer, _>(library_id)
+        .bind::<diesel::sql_types::Integer, _>(library_id)
+        .bind::<diesel::sql_types::Integer, _>(library_id)
+        .execute(conn)
+        .unwrap();
+    }
+
+    fn insert_image_exif(
+        conn: &mut SqliteConnection,
+        library_id: i32,
+        rel_path: &str,
+        content_hash: Option<&str>,
+    ) {
+        use crate::database::schema::image_exif;
+        ensure_library(conn, library_id);
+        diesel::sql_query(
+            "INSERT INTO image_exif (library_id, rel_path, created_time, last_modified, content_hash) \
+             VALUES (?, ?, 0, 0, ?)",
+        )
+        .bind::<diesel::sql_types::Integer, _>(library_id)
+        .bind::<diesel::sql_types::Text, _>(rel_path)
+        .bind::<diesel::sql_types::Nullable<diesel::sql_types::Text>, _>(content_hash)
+        .execute(conn)
+        .unwrap();
+        // Keep clippy happy that the import is used.
+        let _ = image_exif::table;
+    }
+
+    fn insert_tagged_photo(conn: &mut SqliteConnection, rel_path: &str, tag_id: i32) {
+        diesel::sql_query(
+            "INSERT INTO tagged_photo (rel_path, tag_id, created_time) VALUES (?, ?, 0)",
+        )
+        .bind::<diesel::sql_types::Text, _>(rel_path)
+        .bind::<diesel::sql_types::Integer, _>(tag_id)
+        .execute(conn)
+        .unwrap();
+    }
+
+    fn insert_tag(conn: &mut SqliteConnection, id: i32, name: &str) {
+        diesel::sql_query("INSERT INTO tags (id, name, created_time) VALUES (?, ?, 0)")
+            .bind::<diesel::sql_types::Integer, _>(id)
+            .bind::<diesel::sql_types::Text, _>(name)
+            .execute(conn)
+            .unwrap();
+    }
+
+    fn insert_insight(
+        conn: &mut SqliteConnection,
+        library_id: i32,
+        rel_path: &str,
+        generated_at: i64,
+        is_current: bool,
+    ) -> i32 {
+        ensure_library(conn, library_id);
+        diesel::sql_query(
+            "INSERT INTO photo_insights (library_id, rel_path, title, summary, generated_at, model_version, is_current, backend) \
+             VALUES (?, ?, 't', 's', ?, 'v', ?, 'local')",
+        )
+        .bind::<diesel::sql_types::Integer, _>(library_id)
+        .bind::<diesel::sql_types::Text, _>(rel_path)
+        .bind::<diesel::sql_types::BigInt, _>(generated_at)
+        .bind::<diesel::sql_types::Bool, _>(is_current)
+        .execute(conn)
+        .unwrap();
+        diesel::sql_query("SELECT last_insert_rowid() AS id")
+            .get_result::<TestId>(conn)
+            .map(|r| r.id)
+            .unwrap()
+    }
+
+    #[derive(QueryableByName)]
+    struct TestId {
+        #[diesel(sql_type = diesel::sql_types::Integer)]
+        id: i32,
+    }
+
+    #[derive(QueryableByName, Debug)]
+    struct HashOnly {
+        #[diesel(sql_type = diesel::sql_types::Nullable<diesel::sql_types::Text>)]
+        content_hash: Option<String>,
+    }
+
+    #[derive(QueryableByName, Debug)]
+    struct CurrentRow {
+        #[diesel(sql_type = diesel::sql_types::Integer)]
+        id: i32,
+        #[diesel(sql_type = diesel::sql_types::Bool)]
+        is_current: bool,
+    }
+
+    #[test]
+    fn backfill_fills_tagged_photo_hash_when_image_exif_has_one() {
+        let mut conn = in_memory_db_connection();
+        insert_tag(&mut conn, 1, "vacation");
+        insert_tagged_photo(&mut conn, "trip/IMG.jpg", 1);
+        // No image_exif row yet — backfill no-op.
+        let stats = run(&mut conn);
+        assert_eq!(stats.tagged_photo_hashes_filled, 0);
+
+        // image_exif row appears with a hash; next reconcile fills it.
+        insert_image_exif(&mut conn, 1, "trip/IMG.jpg", Some("hashabc"));
+        let stats = run(&mut conn);
+        assert_eq!(stats.tagged_photo_hashes_filled, 1);
+
+        let row = diesel::sql_query(
+            "SELECT content_hash FROM tagged_photo WHERE rel_path = 'trip/IMG.jpg'",
+        )
+        .get_result::<HashOnly>(&mut conn)
+        .unwrap();
+        assert_eq!(row.content_hash.as_deref(), Some("hashabc"));
+
+        // Idempotent: a second run is a no-op.
+        let stats = run(&mut conn);
+        assert_eq!(stats.tagged_photo_hashes_filled, 0);
+    }
+
+    #[test]
+    fn backfill_skips_tagged_photo_when_image_exif_has_no_hash() {
+        let mut conn = in_memory_db_connection();
+        insert_tag(&mut conn, 1, "vacation");
+        insert_tagged_photo(&mut conn, "trip/IMG.jpg", 1);
+        // image_exif exists but its hash is null.
+        insert_image_exif(&mut conn, 1, "trip/IMG.jpg", None);
+
+        let stats = run(&mut conn);
+        assert_eq!(stats.tagged_photo_hashes_filled, 0);
+    }
+
+    #[test]
+    fn backfill_fills_photo_insights_hash_scoped_by_library() {
+        let mut conn = in_memory_db_connection();
+        // Row in library 1 only — must not be filled by a hash from
+        // library 2's same-rel_path entry.
+        insert_image_exif(&mut conn, 1, "shared.jpg", Some("hash-lib1"));
+        let id1 = insert_insight(&mut conn, 1, "shared.jpg", 100, true);
+
+        let stats = run(&mut conn);
+        assert_eq!(stats.photo_insights_hashes_filled, 1);
+
+        let row = diesel::sql_query("SELECT content_hash FROM photo_insights WHERE id = ?")
+            .bind::<diesel::sql_types::Integer, _>(id1)
+            .get_result::<HashOnly>(&mut conn)
+            .unwrap();
+        assert_eq!(row.content_hash.as_deref(), Some("hash-lib1"));
+    }
+
+    #[test]
+    fn collapse_keeps_earliest_is_current_per_hash() {
+        let mut conn = in_memory_db_connection();
+        // Two libraries, same content_hash via image_exif. Insights
+        // were generated independently in each library, both currently
+        // is_current = true. The earlier one wins.
+        insert_image_exif(&mut conn, 1, "a.jpg", Some("h1"));
+        insert_image_exif(&mut conn, 2, "a.jpg", Some("h1"));
+        let earlier = insert_insight(&mut conn, 1, "a.jpg", 100, true);
+        let later = insert_insight(&mut conn, 2, "a.jpg", 200, true);
+
+        // First pass fills the content_hash; second collapses.
+        let stats = run(&mut conn);
+        assert_eq!(stats.photo_insights_hashes_filled, 2);
+        assert_eq!(stats.photo_insights_demoted, 1);
+
+        let rows = diesel::sql_query("SELECT id, is_current FROM photo_insights ORDER BY id")
+            .get_results::<CurrentRow>(&mut conn)
+            .unwrap();
+        let earlier_row = rows.iter().find(|r| r.id == earlier).unwrap();
+        let later_row = rows.iter().find(|r| r.id == later).unwrap();
+        assert!(
+            earlier_row.is_current,
+            "earlier insight should remain current"
+        );
+        assert!(!later_row.is_current, "later insight should be demoted");
+
+        // Idempotent.
+        let stats = run(&mut conn);
+        assert_eq!(stats.photo_insights_demoted, 0);
+    }
+
+    #[test]
+    fn collapse_does_not_demote_a_solo_current_row() {
+        let mut conn = in_memory_db_connection();
+        insert_image_exif(&mut conn, 1, "a.jpg", Some("h1"));
+        let solo = insert_insight(&mut conn, 1, "a.jpg", 100, true);
+
+        let stats = run(&mut conn);
+        assert_eq!(stats.photo_insights_demoted, 0);
+
+        let row = diesel::sql_query("SELECT id, is_current FROM photo_insights WHERE id = ?")
+            .bind::<diesel::sql_types::Integer, _>(solo)
+            .get_result::<CurrentRow>(&mut conn)
+            .unwrap();
+        assert!(row.is_current);
+    }
+}
--- a/src/database/schema.rs
+++ b/src/database/schema.rs
@@ -121,6 +121,10 @@ diesel::table! {
        last_modified -> BigInt,
        content_hash -> Nullable<Text>,
        size_bytes -> Nullable<BigInt>,
+        phash_64 -> Nullable<BigInt>,
+        dhash_64 -> Nullable<BigInt>,
+        duplicate_of_hash -> Nullable<Text>,
+        duplicate_decided_at -> Nullable<BigInt>,
    }
 }

@@ -130,6 +134,8 @@ diesel::table! {
        name -> Text,
        root_path -> Text,
        created_at -> BigInt,
+        enabled -> Bool,
+        excluded_dirs -> Nullable<Text>,
    }
 }

@@ -178,6 +184,7 @@ diesel::table! {
        approved -> Nullable<Bool>,
        backend -> Text,
        fewshot_source_ids -> Nullable<Text>,
+        content_hash -> Nullable<Text>,
    }
 }

@@ -199,6 +206,7 @@ diesel::table! {
        rel_path -> Text,
        tag_id -> Integer,
        created_time -> BigInt,
+        content_hash -> Nullable<Text>,
    }
 }

--- a/src/duplicates.rs
+++ b/src/duplicates.rs
@@ -0,0 +1,893 @@
+//! Duplicate detection surface — exact (blake3) and perceptual
+//! (pHash + Hamming) groups, plus the soft-mark resolve flow that
+//! Apollo's DUPLICATES modal drives.
+//!
+//! All routes require auth (Claims). Endpoints:
+//!
+//! - `GET  /duplicates/exact?library=&include_resolved=`         — count>1 byte-identical groups.
+//! - `GET  /duplicates/perceptual?library=&threshold=&include_resolved=` — Hamming-clustered groups.
+//! - `POST /duplicates/resolve`                                  — soft-mark demoted siblings.
+//! - `POST /duplicates/unresolve`                                — clear a prior soft-mark.
+//!
+//! Perceptual clustering caches the BK-tree result for 5 minutes so
+//! repeated opens of the modal don't re-cluster the whole library.
+//! Cache invalidation is best-effort: resolve/unresolve clear the
+//! cache, but new files arriving via the watcher don't (the next
+//! 5-minute window picks them up). For a single-user personal tool
+//! that's the right trade-off.
+
+use std::collections::HashMap;
+use std::sync::Mutex;
+use std::time::{Duration, Instant};
+
+use actix_web::{App, HttpRequest, HttpResponse, Responder, dev::ServiceFactory, web};
+use bk_tree::{BKTree, Metric};
+use lazy_static::lazy_static;
+use opentelemetry::trace::{TraceContextExt, Tracer};
+use serde::{Deserialize, Serialize};
+
+use crate::data::Claims;
+use crate::database::{DuplicateRow, ExifDao};
+use crate::libraries;
+use crate::otel::{extract_context_from_request, global_tracer};
+use crate::state::AppState;
+
+// ── Cache ────────────────────────────────────────────────────────────────
+
+const PERCEPTUAL_CACHE_TTL: Duration = Duration::from_secs(300);
+
+#[derive(Clone)]
+struct PerceptualCacheEntry {
+    /// Cache key: (library_id, threshold, include_resolved). `library_id`
+    /// is `None` for "all libraries". Cluster output is the same shape we
+    /// return on the wire so we can serve cached requests with zero work.
+    library_id: Option<i32>,
+    threshold: u32,
+    include_resolved: bool,
+    computed_at: Instant,
+    groups: Vec<DuplicateGroup>,
+}
+
+lazy_static! {
+    static ref PERCEPTUAL_CACHE: Mutex<Option<PerceptualCacheEntry>> = Mutex::new(None);
+}
+
+/// Drop the perceptual-cluster cache. Called from `resolve`/`unresolve`
+/// so the next modal open reflects the soft-mark change immediately.
+fn invalidate_perceptual_cache() {
+    if let Ok(mut guard) = PERCEPTUAL_CACHE.lock() {
+        *guard = None;
+    }
+}
+
+// ── Wire shapes ──────────────────────────────────────────────────────────
+
+#[derive(Serialize, Debug, Clone)]
+pub struct DuplicateMember {
+    pub library_id: i32,
+    pub rel_path: String,
+    pub content_hash: String,
+    pub size_bytes: Option<i64>,
+    pub date_taken: Option<i64>,
+    pub width: Option<i32>,
+    pub height: Option<i32>,
+    pub duplicate_of_hash: Option<String>,
+    pub duplicate_decided_at: Option<i64>,
+}
+
+impl From<DuplicateRow> for DuplicateMember {
+    fn from(r: DuplicateRow) -> Self {
+        Self {
+            library_id: r.library_id,
+            rel_path: r.rel_path,
+            content_hash: r.content_hash,
+            size_bytes: r.size_bytes,
+            date_taken: r.date_taken,
+            width: r.width,
+            height: r.height,
+            duplicate_of_hash: r.duplicate_of_hash,
+            duplicate_decided_at: r.duplicate_decided_at,
+        }
+    }
+}
+
+#[derive(Serialize, Debug, Clone)]
+#[serde(rename_all = "lowercase")]
+pub enum DuplicateKind {
+    Exact,
+    Perceptual,
+}
+
+#[derive(Serialize, Debug, Clone)]
+pub struct DuplicateGroup {
+    pub kind: DuplicateKind,
+    /// Representative content_hash. For exact groups, the shared hash
+    /// (every member has the same one). For perceptual groups, an
+    /// arbitrary cluster member's hash, used only as a stable id for
+    /// the UI to key off.
+    pub representative_hash: String,
+    pub members: Vec<DuplicateMember>,
+}
+
+#[derive(Deserialize, Debug)]
+pub struct ListDuplicatesQuery {
+    pub library: Option<String>,
+    #[serde(default)]
+    pub include_resolved: Option<bool>,
+    /// Perceptual only — Hamming-distance threshold. Ignored on the
+    /// exact endpoint. Defaults to 8 (~12% similarity tolerance, the
+    /// sweet spot for resized/recompressed copies).
+    #[serde(default)]
+    pub threshold: Option<u32>,
+}
+
+#[derive(Deserialize, Debug)]
+pub struct DuplicateMemberRef {
+    pub library_id: i32,
+    pub rel_path: String,
+}
+
+#[derive(Deserialize, Debug)]
+pub struct ResolveDuplicatesReq {
+    pub survivor: DuplicateMemberRef,
+    pub demoted: Vec<DuplicateMemberRef>,
+}
+
+#[derive(Serialize, Debug)]
+pub struct ResolveResponse {
+    pub resolved_count: usize,
+}
+
+#[derive(Deserialize, Debug)]
+pub struct UnresolveDuplicateReq {
+    pub library_id: i32,
+    pub rel_path: String,
+}
+
+// ── Handlers ─────────────────────────────────────────────────────────────
+
+async fn list_exact_handler(
+    _: Claims,
+    request: HttpRequest,
+    app_state: web::Data<AppState>,
+    query: web::Query<ListDuplicatesQuery>,
+    exif_dao: web::Data<Mutex<Box<dyn ExifDao>>>,
+) -> impl Responder {
+    let context = extract_context_from_request(&request);
+    let span = global_tracer().start_with_context("duplicates.list_exact", &context);
+    let span_context = opentelemetry::Context::current_with_span(span);
+
+    let library_id = libraries::resolve_library_param(&app_state, query.library.as_deref())
+        .ok()
+        .flatten()
+        .map(|l| l.id);
+    let include_resolved = query.include_resolved.unwrap_or(false);
+
+    let rows = {
+        let mut dao = exif_dao.lock().expect("exif dao lock");
+        match dao.list_duplicates_exact(&span_context, library_id, include_resolved) {
+            Ok(rows) => rows,
+            Err(e) => {
+                return HttpResponse::InternalServerError().body(format!("{:?}", e));
+            }
+        }
+    };
+
+    let groups = group_exact(rows);
+    HttpResponse::Ok().json(GroupsResponse { groups })
+}
+
+async fn list_perceptual_handler(
+    _: Claims,
+    request: HttpRequest,
+    app_state: web::Data<AppState>,
+    query: web::Query<ListDuplicatesQuery>,
+    exif_dao: web::Data<Mutex<Box<dyn ExifDao>>>,
+) -> impl Responder {
+    let context = extract_context_from_request(&request);
+    let span = global_tracer().start_with_context("duplicates.list_perceptual", &context);
+    let span_context = opentelemetry::Context::current_with_span(span);
+
+    let library_id = libraries::resolve_library_param(&app_state, query.library.as_deref())
+        .ok()
+        .flatten()
+        .map(|l| l.id);
+    let threshold = query.threshold.unwrap_or(8).clamp(0, 32);
+    let include_resolved = query.include_resolved.unwrap_or(false);
+
+    // Cache hit?
+    if let Ok(guard) = PERCEPTUAL_CACHE.lock()
+        && let Some(entry) = guard.as_ref()
+        && entry.library_id == library_id
+        && entry.threshold == threshold
+        && entry.include_resolved == include_resolved
+        && entry.computed_at.elapsed() < PERCEPTUAL_CACHE_TTL
+    {
+        return HttpResponse::Ok().json(GroupsResponse {
+            groups: entry.groups.clone(),
+        });
+    }
+
+    let rows = {
+        let mut dao = exif_dao.lock().expect("exif dao lock");
+        match dao.list_perceptual_candidates(&span_context, library_id, include_resolved) {
+            Ok(rows) => rows,
+            Err(e) => {
+                return HttpResponse::InternalServerError().body(format!("{:?}", e));
+            }
+        }
+    };
+
+    let groups = cluster_perceptual(rows, threshold);
+
+    if let Ok(mut guard) = PERCEPTUAL_CACHE.lock() {
+        *guard = Some(PerceptualCacheEntry {
+            library_id,
+            threshold,
+            include_resolved,
+            computed_at: Instant::now(),
+            groups: groups.clone(),
+        });
+    }
+
+    HttpResponse::Ok().json(GroupsResponse { groups })
+}
+
+async fn resolve_handler(
+    _: Claims,
+    request: HttpRequest,
+    body: web::Json<ResolveDuplicatesReq>,
+    exif_dao: web::Data<Mutex<Box<dyn ExifDao>>>,
+) -> impl Responder {
+    let context = extract_context_from_request(&request);
+    let span = global_tracer().start_with_context("duplicates.resolve", &context);
+    let span_context = opentelemetry::Context::current_with_span(span);
+
+    if body.demoted.is_empty() {
+        return HttpResponse::BadRequest().body("demoted list is empty");
+    }
+
+    let mut dao = exif_dao.lock().expect("exif dao lock");
+
+    // Resolve survivor → its content_hash, plus the canonical rel_path
+    // we'll use as the destination for any tag-union INSERTs.
+    let survivor = match dao.lookup_duplicate_row(
+        &span_context,
+        body.survivor.library_id,
+        &body.survivor.rel_path,
+    ) {
+        Ok(Some(row)) => row,
+        Ok(None) => return HttpResponse::NotFound().body("survivor not found"),
+        Err(e) => return HttpResponse::InternalServerError().body(format!("{:?}", e)),
+    };
+
+    // Survivor must not itself be soft-marked — otherwise the modal is
+    // pointing at a row we've already demoted, which would create a chain.
+    if survivor.duplicate_of_hash.is_some() {
+        return HttpResponse::Conflict().body("survivor is itself soft-marked as a duplicate");
+    }
+
+    let now = chrono::Utc::now().timestamp();
+    let mut resolved_count = 0usize;
+
+    for member_ref in &body.demoted {
+        let demoted = match dao.lookup_duplicate_row(
+            &span_context,
+            member_ref.library_id,
+            &member_ref.rel_path,
+        ) {
+            Ok(Some(row)) => row,
+            Ok(None) => {
+                log::warn!(
+                    "duplicates.resolve: skipping unknown demoted ({}, {})",
+                    member_ref.library_id,
+                    member_ref.rel_path
+                );
+                continue;
+            }
+            Err(e) => {
+                return HttpResponse::InternalServerError().body(format!("{:?}", e));
+            }
+        };
+
+        // Survivor and demoted must not be the same row (would set
+        // duplicate_of_hash to its own hash — recursive nonsense).
+        if demoted.library_id == survivor.library_id && demoted.rel_path == survivor.rel_path {
+            continue;
+        }
+
+        // For perceptual dups (different content_hash), union the
+        // demoted's tag set onto the survivor before flipping the
+        // soft-mark. For exact dups (same content_hash), tags are
+        // already shared at the bytes layer — the union is a no-op.
+        if demoted.content_hash != survivor.content_hash
+            && let Err(e) = dao.union_perceptual_tags(
+                &span_context,
+                &survivor.content_hash,
+                &demoted.content_hash,
+                &survivor.rel_path,
+            )
+        {
+            log::warn!(
+                "duplicates.resolve: tag union failed for {}: {:?}",
+                demoted.rel_path,
+                e
+            );
+            // Continue with the soft-mark anyway — losing tag
+            // continuity is recoverable (unresolve restores the
+            // demoted row's grid presence, and the original tags
+            // never moved off the demoted hash).
+        }
+
+        if let Err(e) = dao.set_duplicate_of(
+            &span_context,
+            demoted.library_id,
+            &demoted.rel_path,
+            &survivor.content_hash,
+            now,
+        ) {
+            return HttpResponse::InternalServerError().body(format!("{:?}", e));
+        }
+
+        resolved_count += 1;
+    }
+
+    drop(dao);
+    invalidate_perceptual_cache();
+
+    HttpResponse::Ok().json(ResolveResponse { resolved_count })
+}
+
+async fn unresolve_handler(
+    _: Claims,
+    request: HttpRequest,
+    body: web::Json<UnresolveDuplicateReq>,
+    exif_dao: web::Data<Mutex<Box<dyn ExifDao>>>,
+) -> impl Responder {
+    let context = extract_context_from_request(&request);
+    let span = global_tracer().start_with_context("duplicates.unresolve", &context);
+    let span_context = opentelemetry::Context::current_with_span(span);
+
+    let mut dao = exif_dao.lock().expect("exif dao lock");
+    if let Err(e) = dao.clear_duplicate_of(&span_context, body.library_id, &body.rel_path) {
+        return HttpResponse::InternalServerError().body(format!("{:?}", e));
+    }
+
+    drop(dao);
+    invalidate_perceptual_cache();
+
+    HttpResponse::Ok().finish()
+}
+
+// ── Grouping / clustering ────────────────────────────────────────────────
+
+#[derive(Serialize, Debug)]
+struct GroupsResponse {
+    groups: Vec<DuplicateGroup>,
+}
+
+fn group_exact(rows: Vec<DuplicateRow>) -> Vec<DuplicateGroup> {
+    let mut by_hash: HashMap<String, Vec<DuplicateRow>> = HashMap::new();
+    for row in rows {
+        by_hash
+            .entry(row.content_hash.clone())
+            .or_default()
+            .push(row);
+    }
+    let mut groups: Vec<DuplicateGroup> = by_hash
+        .into_iter()
+        .filter(|(_, members)| members.len() > 1)
+        .map(|(hash, members)| DuplicateGroup {
+            kind: DuplicateKind::Exact,
+            representative_hash: hash,
+            members: members.into_iter().map(DuplicateMember::from).collect(),
+        })
+        .collect();
+    // Largest groups first (most reward per click), then deterministic.
+    groups.sort_by(|a, b| {
+        b.members
+            .len()
+            .cmp(&a.members.len())
+            .then_with(|| a.representative_hash.cmp(&b.representative_hash))
+    });
+    groups
+}
+
+/// Bits set in a "useful" perceptual hash. Real photographic content
+/// produces ~50/50 bit distributions; anything outside the [16, 48]
+/// band is low-entropy structure (uniform skies, black frames,
+/// monochrome scans, faded film) where pHash collapses to near-
+/// uniform values that Hamming-trivially across hundreds of unrelated
+/// images. The 8/56 band that shipped first was too permissive —
+/// even at threshold=4 the false-positive cluster persisted.
+const MIN_INFORMATIVE_POPCOUNT: u32 = 16;
+const MAX_INFORMATIVE_POPCOUNT: u32 = 64 - MIN_INFORMATIVE_POPCOUNT;
+
+#[inline]
+fn is_informative_hash(h: i64) -> bool {
+    let pop = (h as u64).count_ones();
+    (MIN_INFORMATIVE_POPCOUNT..=MAX_INFORMATIVE_POPCOUNT).contains(&pop)
+}
+
+/// dHash gets a stricter threshold than pHash. pHash is the
+/// candidate-discovery signal (BK-tree neighbourhood lookup); dHash
+/// is the validation signal that has to actively agree before we
+/// union. Splitting the budget asymmetrically means a real near-dup
+/// (which scores well on both) survives while an incidental pHash
+/// collision (uniform-content false positive) gets vetoed.
+///
+/// Floor of 2 so threshold=4 still allows a 1-bit jitter in dHash —
+/// genuine resampling can flip a low-frequency gradient bit even
+/// when the visual content is identical.
+#[inline]
+fn dhash_threshold(phash_threshold: u32) -> u32 {
+    (phash_threshold / 2).max(2)
+}
+
+/// Single-link cluster the input rows by Hamming distance over their
+/// pHash, with `threshold` as the maximum distance for an edge. Rows
+/// without a pHash, or with a degenerate (low-entropy) pHash, are
+/// excluded — they'd chain together unrelated images.
+///
+/// Two-signal validation: the BK-tree gives candidate pairs cheaply,
+/// then we additionally require dHash agreement before unioning. pHash
+/// alone is too permissive; pairing it with dHash collapses the false-
+/// positive cluster significantly (different DCT vs gradient
+/// signatures on real near-dups still both stay close, but spurious
+/// pHash collisions on uniform images don't survive the dHash check).
+///
+/// Implementation: BK-tree neighbourhood lookup per row, union-find
+/// over the validated edges. O(N log N) instead of the O(N²) naive
+/// pairwise scan; on a 1.26M-row library that's the difference between
+/// "responds in 1.5 s" and "responds in 25 minutes".
+fn cluster_perceptual(rows: Vec<DuplicateRow>, threshold: u32) -> Vec<DuplicateGroup> {
+    let candidates: Vec<DuplicateRow> = rows
+        .into_iter()
+        .filter(|r| r.phash_64.is_some_and(is_informative_hash))
+        .collect();
+    if candidates.len() < 2 {
+        return Vec::new();
+    }
+
+    // Build BK-tree keyed on (phash_u64, index-in-candidates).
+    let mut tree: BKTree<HashKey, HammingMetric> = BKTree::new(HammingMetric);
+    for (idx, row) in candidates.iter().enumerate() {
+        if let Some(p) = row.phash_64 {
+            tree.add(HashKey {
+                phash: p as u64,
+                idx,
+            });
+        }
+    }
+
+    // Union-find over edges within `threshold`. For a candidate pair
+    // surfaced by the pHash BK-tree, require dHash within a *stricter*
+    // threshold (`dhash_threshold(threshold)`) before unioning. pHash
+    // agreement on low-entropy structure can be incidental; pHash
+    // agreement AND dHash within roughly half that distance is a
+    // strong near-dup signal. dHash on either side missing → reject
+    // (was: trust pHash alone). Missing dHash means we can't validate
+    // the candidate, and the false-positive cost outweighs the rare
+    // case of a partial backfill.
+    let dhash_max = dhash_threshold(threshold);
+    let mut uf = UnionFind::new(candidates.len());
+    for (idx, row) in candidates.iter().enumerate() {
+        let Some(p) = row.phash_64 else { continue };
+        let key = HashKey {
+            phash: p as u64,
+            idx,
+        };
+        for (_, neighbour) in tree.find(&key, threshold) {
+            if neighbour.idx == idx {
+                continue;
+            }
+            let other = &candidates[neighbour.idx];
+            let dhash_ok = match (row.dhash_64, other.dhash_64) {
+                (Some(a), Some(b)) => {
+                    (a as u64 ^ b as u64).count_ones() <= dhash_max
+                        && is_informative_hash(a)
+                        && is_informative_hash(b)
+                }
+                _ => false,
+            };
+            if dhash_ok {
+                uf.union(idx, neighbour.idx);
+            }
+        }
+    }
+
+    // Bucket by root.
+    let mut by_root: HashMap<usize, Vec<DuplicateRow>> = HashMap::new();
+    for (idx, row) in candidates.into_iter().enumerate() {
+        let root = uf.find(idx);
+        by_root.entry(root).or_default().push(row);
+    }
+
+    // Medoid-validate each cluster to break single-link chains.
+    // Single-link unions any pair within threshold; that means a chain
+    // A↔B↔C can collapse into one cluster even when A and C aren't
+    // similar. The medoid pass picks the cluster's most-central member
+    // and drops any other whose distance to it exceeds threshold —
+    // chains lose their tail, dense real-near-dup clusters keep all
+    // members. Discard clusters that drop below 2 after refinement.
+    let groups: Vec<DuplicateGroup> = by_root
+        .into_values()
+        .filter_map(|cluster| refine_cluster(cluster, threshold, dhash_max))
+        .map(|cluster| {
+            let representative_hash = cluster[0].content_hash.clone();
+            DuplicateGroup {
+                kind: DuplicateKind::Perceptual,
+                representative_hash,
+                members: cluster.into_iter().map(DuplicateMember::from).collect(),
+            }
+        })
+        .collect();
+    let mut groups = groups;
+    groups.sort_by(|a, b| {
+        b.members
+            .len()
+            .cmp(&a.members.len())
+            .then_with(|| a.representative_hash.cmp(&b.representative_hash))
+    });
+    groups
+}
+
+/// Tighten a single-link cluster to its medoid neighbourhood. Returns
+/// `None` when fewer than 2 members survive — caller drops the cluster.
+fn refine_cluster(
+    cluster: Vec<DuplicateRow>,
+    phash_max: u32,
+    dhash_max: u32,
+) -> Option<Vec<DuplicateRow>> {
+    if cluster.len() < 2 {
+        return None;
+    }
+    if cluster.len() == 2 {
+        // No chain can exist with only two members; the union-find
+        // already guaranteed both signals validated when joining.
+        return Some(cluster);
+    }
+
+    // Pick the medoid: member whose summed pHash+dHash distance to the
+    // rest of the cluster is smallest. Stable-deterministic via the
+    // first-best-wins tie break (lower content_hash wins via natural
+    // iteration order from the BK-tree input ordering).
+    let phashes: Vec<u64> = cluster
+        .iter()
+        .map(|r| r.phash_64.unwrap_or(0) as u64)
+        .collect();
+    let dhashes: Vec<u64> = cluster
+        .iter()
+        .map(|r| r.dhash_64.unwrap_or(0) as u64)
+        .collect();
+
+    let mut best_idx = 0usize;
+    let mut best_score = u32::MAX;
+    for i in 0..cluster.len() {
+        let mut score: u32 = 0;
+        for j in 0..cluster.len() {
+            if i == j {
+                continue;
+            }
+            score = score.saturating_add((phashes[i] ^ phashes[j]).count_ones());
+            score = score.saturating_add((dhashes[i] ^ dhashes[j]).count_ones());
+        }
+        if score < best_score {
+            best_score = score;
+            best_idx = i;
+        }
+    }
+
+    let medoid_phash = phashes[best_idx];
+    let medoid_dhash = dhashes[best_idx];
+
+    let kept: Vec<DuplicateRow> = cluster
+        .into_iter()
+        .enumerate()
+        .filter(|(i, _)| {
+            *i == best_idx
+                || ((phashes[*i] ^ medoid_phash).count_ones() <= phash_max
+                    && (dhashes[*i] ^ medoid_dhash).count_ones() <= dhash_max)
+        })
+        .map(|(_, r)| r)
+        .collect();
+
+    if kept.len() < 2 { None } else { Some(kept) }
+}
+
+#[derive(Clone, Copy, Debug, PartialEq, Eq, Hash)]
+struct HashKey {
+    phash: u64,
+    idx: usize,
+}
+
+struct HammingMetric;
+
+impl Metric<HashKey> for HammingMetric {
+    fn distance(&self, a: &HashKey, b: &HashKey) -> u32 {
+        (a.phash ^ b.phash).count_ones()
+    }
+
+    fn threshold_distance(&self, a: &HashKey, b: &HashKey, _: u32) -> Option<u32> {
+        Some(self.distance(a, b))
+    }
+}
+
+struct UnionFind {
+    parent: Vec<usize>,
+    rank: Vec<u8>,
+}
+
+impl UnionFind {
+    fn new(n: usize) -> Self {
+        Self {
+            parent: (0..n).collect(),
+            rank: vec![0; n],
+        }
+    }
+
+    fn find(&mut self, x: usize) -> usize {
+        if self.parent[x] != x {
+            let root = self.find(self.parent[x]);
+            self.parent[x] = root;
+        }
+        self.parent[x]
+    }
+
+    fn union(&mut self, a: usize, b: usize) {
+        let ra = self.find(a);
+        let rb = self.find(b);
+        if ra == rb {
+            return;
+        }
+        if self.rank[ra] < self.rank[rb] {
+            self.parent[ra] = rb;
+        } else if self.rank[ra] > self.rank[rb] {
+            self.parent[rb] = ra;
+        } else {
+            self.parent[rb] = ra;
+            self.rank[ra] += 1;
+        }
+    }
+}
+
+// ── Routing ──────────────────────────────────────────────────────────────
+
+pub fn add_duplicate_services<T>(app: App<T>) -> App<T>
+where
+    T: ServiceFactory<
+            actix_web::dev::ServiceRequest,
+            Config = (),
+            Error = actix_web::Error,
+            InitError = (),
+        >,
+{
+    app.service(web::resource("/duplicates/exact").route(web::get().to(list_exact_handler)))
+        .service(
+            web::resource("/duplicates/perceptual").route(web::get().to(list_perceptual_handler)),
+        )
+        .service(web::resource("/duplicates/resolve").route(web::post().to(resolve_handler)))
+        .service(web::resource("/duplicates/unresolve").route(web::post().to(unresolve_handler)))
+}
+
+// ── Tests ────────────────────────────────────────────────────────────────
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+
+    fn row(library_id: i32, rel: &str, hash: &str, phash: Option<i64>) -> DuplicateRow {
+        DuplicateRow {
+            library_id,
+            rel_path: rel.into(),
+            content_hash: hash.into(),
+            size_bytes: Some(1000),
+            date_taken: None,
+            width: None,
+            height: None,
+            phash_64: phash,
+            dhash_64: None,
+            duplicate_of_hash: None,
+            duplicate_decided_at: None,
+        }
+    }
+
+    #[test]
+    fn group_exact_collapses_by_hash() {
+        let rows = vec![
+            row(1, "a.jpg", "h1", None),
+            row(1, "b.jpg", "h1", None),
+            row(2, "c.jpg", "h1", None),
+            row(1, "lonely.jpg", "h2", None),
+        ];
+        let groups = group_exact(rows);
+        assert_eq!(groups.len(), 1);
+        assert_eq!(groups[0].representative_hash, "h1");
+        assert_eq!(groups[0].members.len(), 3);
+    }
+
+    /// All hashes used below have popcount in the "informative"
+    /// 8..=56 band so they survive the entropy filter that keeps
+    /// solid-colour images out of the cluster graph.
+    const INFORMATIVE_BASE: i64 = 0x55AA_55AA_55AA_55AA; // popcount = 32
+    const INFORMATIVE_NEAR: i64 = 0x55AA_55AA_55AA_55AB; // 1-bit away from BASE
+    const INFORMATIVE_FAR: i64 = 0x6996_6996_6996_6996; // 32-bits away from BASE
+
+    fn row_with_dhash(
+        library_id: i32,
+        rel: &str,
+        hash: &str,
+        phash: Option<i64>,
+        dhash: Option<i64>,
+    ) -> DuplicateRow {
+        DuplicateRow {
+            library_id,
+            rel_path: rel.into(),
+            content_hash: hash.into(),
+            size_bytes: Some(1000),
+            date_taken: None,
+            width: None,
+            height: None,
+            phash_64: phash,
+            dhash_64: dhash,
+            duplicate_of_hash: None,
+            duplicate_decided_at: None,
+        }
+    }
+
+    #[test]
+    fn cluster_perceptual_unites_close_hashes() {
+        // Two rows near each other on both pHash and dHash; one far
+        // on pHash. Threshold 4 should merge the close pair.
+        let rows = vec![
+            row_with_dhash(
+                1,
+                "a.jpg",
+                "h1",
+                Some(INFORMATIVE_BASE),
+                Some(INFORMATIVE_BASE),
+            ),
+            row_with_dhash(
+                1,
+                "b.jpg",
+                "h2",
+                Some(INFORMATIVE_NEAR),
+                Some(INFORMATIVE_NEAR),
+            ),
+            row_with_dhash(
+                1,
+                "c.jpg",
+                "h3",
+                Some(INFORMATIVE_FAR),
+                Some(INFORMATIVE_FAR),
+            ),
+        ];
+        let groups = cluster_perceptual(rows, 4);
+        assert_eq!(groups.len(), 1);
+        assert_eq!(groups[0].members.len(), 2);
+        let paths: Vec<&str> = groups[0]
+            .members
+            .iter()
+            .map(|m| m.rel_path.as_str())
+            .collect();
+        assert!(paths.contains(&"a.jpg"));
+        assert!(paths.contains(&"b.jpg"));
+    }
+
+    #[test]
+    fn cluster_perceptual_threshold_zero_drops_distinct() {
+        let rows = vec![
+            row_with_dhash(
+                1,
+                "a.jpg",
+                "h1",
+                Some(INFORMATIVE_BASE),
+                Some(INFORMATIVE_BASE),
+            ),
+            row_with_dhash(
+                1,
+                "b.jpg",
+                "h2",
+                Some(INFORMATIVE_NEAR),
+                Some(INFORMATIVE_NEAR),
+            ),
+        ];
+        let groups = cluster_perceptual(rows, 0);
+        assert!(groups.is_empty());
+    }
+
+    #[test]
+    fn cluster_perceptual_skips_singletons() {
+        let rows = vec![row(1, "alone.jpg", "h1", Some(INFORMATIVE_BASE))];
+        assert!(cluster_perceptual(rows, 8).is_empty());
+    }
+
+    #[test]
+    fn cluster_perceptual_filters_low_entropy_hashes() {
+        // Both 0 (popcount 0) and i64::MAX (popcount 63) fall outside
+        // the informative band. A pair of these would trivially match
+        // (Hamming distance to each other small or zero) without the
+        // entropy filter — that's exactly the regression that was
+        // producing a giant first cluster of solid-colour images.
+        let rows = vec![
+            row(1, "blank-a.jpg", "h1", Some(0)),
+            row(1, "blank-b.jpg", "h2", Some(0)),
+            row(1, "white-a.jpg", "h3", Some(i64::MAX)),
+            row(1, "white-b.jpg", "h4", Some(i64::MAX)),
+        ];
+        assert!(cluster_perceptual(rows, 8).is_empty());
+    }
+
+    #[test]
+    fn cluster_perceptual_requires_dhash_agreement() {
+        // pHash within threshold but dHash far apart — the candidate
+        // edge from the BK-tree must be rejected. Without the dHash
+        // double-check this would form a 2-member cluster.
+        let rows = vec![
+            row_with_dhash(
+                1,
+                "a.jpg",
+                "h1",
+                Some(INFORMATIVE_BASE),
+                Some(INFORMATIVE_BASE),
+            ),
+            row_with_dhash(
+                1,
+                "b.jpg",
+                "h2",
+                Some(INFORMATIVE_NEAR),
+                Some(INFORMATIVE_FAR),
+            ),
+        ];
+        assert!(cluster_perceptual(rows, 4).is_empty());
+    }
+
+    #[test]
+    fn cluster_perceptual_breaks_long_chain_at_medoid() {
+        // 4-link chain at threshold=2 with pairwise distances chosen
+        // so single-link unions all four but the endpoints sit past
+        // the medoid's neighbourhood. Bit positions hop by exactly 2
+        // bits per step, in non-overlapping nibbles, so consecutive
+        // hops compose into wider distant-pair distances:
+        //   A↔B = 2, B↔C = 2, C↔D = 2,
+        //   A↔C = 4, B↔D = 4, A↔D = 6.
+        // Medoid (B or C) keeps Δ ≤ 2 of itself; the far endpoint
+        // gets chopped, leaving exactly 3 members.
+        const A: i64 = 0x55AA_55AA_55AA_55AA;
+        const B: i64 = 0x55AA_55AA_55AA_55A9; // ^0x03 last byte
+        const C: i64 = 0x55AA_55AA_55AA_55A5; // ^0x0C from B
+        const D: i64 = 0x55AA_55AA_55AA_5595; // ^0x30 from C
+        let rows = vec![
+            row_with_dhash(1, "a.jpg", "h1", Some(A), Some(A)),
+            row_with_dhash(1, "b.jpg", "h2", Some(B), Some(B)),
+            row_with_dhash(1, "c.jpg", "h3", Some(C), Some(C)),
+            row_with_dhash(1, "d.jpg", "h4", Some(D), Some(D)),
+        ];
+        let groups = cluster_perceptual(rows, 2);
+        assert_eq!(groups.len(), 1);
+        assert_eq!(
+            groups[0].members.len(),
+            3,
+            "medoid pass should chop one chain endpoint past Δ=2"
+        );
+    }
+
+    /// Sanity-check the BK-tree's metric, which is what the duplicates
+    /// path actually clusters on.
+    #[test]
+    fn hamming_metric_is_symmetric() {
+        let m = HammingMetric;
+        let a = HashKey {
+            phash: 0b1010,
+            idx: 0,
+        };
+        let b = HashKey {
+            phash: 0b0101,
+            idx: 1,
+        };
+        let d1 = m.distance(&a, &b);
+        let d2 = m.distance(&b, &a);
+        assert_eq!(d1, d2);
+        assert_eq!(d1, 4);
+    }
+}
--- a/src/faces.rs
+++ b/src/faces.rs
@@ -20,9 +20,10 @@

 use crate::Claims;
 use crate::ai::face_client::{DetectMeta, FaceClient, FaceDetectError};
-use crate::exif;
 use crate::database::schema::{face_detections, image_exif, persons};
 use crate::error::IntoHttpError;
+use crate::exif;
+use crate::file_types;
 use crate::libraries::{self, Library};
 use crate::otel::{extract_context_from_request, global_tracer, trace_db_call};
 use crate::state::AppState;
@@ -99,9 +100,30 @@ pub struct FaceDetectionRow {
    pub created_at: i64,
 }

+/// SQL fragment restricting an `image_exif.rel_path` (or `face_detections.rel_path`)
+/// column to image extensions. Videos register in `image_exif` with a
+/// populated `content_hash` but can never produce a `face_detections` row
+/// — applying this filter at query time keeps videos out of the per-tick
+/// backlog drain (which would otherwise loop forever — `filter_excluded`
+/// drops them client-side without writing a marker) and out of the SCANNED
+/// stat denominator (so 100% is reachable).
+fn image_path_predicate(col: &str) -> String {
+    let clauses: Vec<String> = file_types::IMAGE_EXTENSIONS
+        .iter()
+        .map(|ext| format!("lower({col}) LIKE '%.{ext}'"))
+        .collect();
+    format!("({})", clauses.join(" OR "))
+}
+
 /// Row shape for `list_unscanned_candidates`'s raw SQL. Diesel's
 /// `sql_query` requires a `QueryableByName` row type with explicit
 /// column SQL types; using a tuple isn't supported.
+#[derive(diesel::QueryableByName, Debug)]
+struct CountRow {
+    #[diesel(sql_type = diesel::sql_types::BigInt)]
+    count: i64,
+}
+
 #[derive(diesel::QueryableByName, Debug)]
 struct UnscannedRow {
    #[diesel(sql_type = diesel::sql_types::Text)]
@@ -601,26 +623,32 @@ impl FaceDao for SqliteFaceDao {
            // fire multiple detect calls for the same hash if it lives
            // under several rel_paths in the same library. The
            // anti-join (NOT EXISTS) drains hashes that have no row in
-            // face_detections at all.
-            let rows: Vec<(String, String)> = diesel::sql_query(
+            // face_detections at all. The image-extension predicate
+            // keeps videos out of the candidate set; without it they'd
+            // be filtered client-side and re-pulled every tick forever
+            // because no marker row is written for excluded paths.
+            let ext_predicate = image_path_predicate("rel_path");
+            let sql = format!(
                "SELECT rel_path, content_hash \
                 FROM image_exif e \
                 WHERE library_id = ? \
                   AND content_hash IS NOT NULL \
+                   AND {ext_predicate} \
                   AND NOT EXISTS ( \
                     SELECT 1 FROM face_detections f \
                     WHERE f.content_hash = e.content_hash \
                   ) \
                 GROUP BY content_hash \
-                 LIMIT ?",
-            )
-            .bind::<diesel::sql_types::Integer, _>(library_id)
-            .bind::<diesel::sql_types::BigInt, _>(limit)
-            .load::<UnscannedRow>(conn.deref_mut())
-            .with_context(|| "list_unscanned_candidates")?
-            .into_iter()
-            .map(|r| (r.rel_path, r.content_hash))
-            .collect();
+                 LIMIT ?"
+            );
+            let rows: Vec<(String, String)> = diesel::sql_query(sql)
+                .bind::<diesel::sql_types::Integer, _>(library_id)
+                .bind::<diesel::sql_types::BigInt, _>(limit)
+                .load::<UnscannedRow>(conn.deref_mut())
+                .with_context(|| "list_unscanned_candidates")?
+                .into_iter()
+                .map(|r| (r.rel_path, r.content_hash))
+                .collect();
            Ok(rows)
        })
    }
@@ -856,14 +884,18 @@ impl FaceDao for SqliteFaceDao {
            // Pair with the base64-encoded embedding string so the handler
            // doesn't need to know the wire format. Skip rows with NULL
            // embedding (shouldn't happen on detected rows, but defensive).
+            // `embedding.take()` moves the bytes out of the row so we can
+            // hand the (now-empty-embedding) row plus the encoded string
+            // back to the caller without cloning the whole row — at 20k
+            // rows × 2 KB that clone was 40 MB of pointless heap traffic
+            // per cluster-suggest run.
            use base64::Engine;
            Ok(rows
                .into_iter()
-                .filter_map(|r| {
-                    r.embedding.as_ref().map(|bytes| {
-                        let b64 = base64::engine::general_purpose::STANDARD.encode(bytes);
-                        (r.clone(), b64)
-                    })
+                .filter_map(|mut r| {
+                    let bytes = r.embedding.take()?;
+                    let b64 = base64::engine::general_purpose::STANDARD.encode(&bytes);
+                    Some((r, b64))
                })
                .collect())
        })
@@ -1013,14 +1045,42 @@ impl FaceDao for SqliteFaceDao {
                    .first(conn.deref_mut())
                    .with_context(|| "stats: failed")?
            };
+            // Image-extension filter mirrors `list_unscanned_candidates` so
+            // SCANNED can actually reach 100%: videos sit in `image_exif` but
+            // never get a `face_detections` row, so counting them here
+            // permanently caps the percentage below 100%.
+            //
+            // Count DISTINCT content_hash (not rows) so the numerator
+            // (`scanned`, also distinct-content_hash) and denominator live
+            // in the same domain. Without this, a file present at multiple
+            // rel_paths or across libraries inflates total_photos by one
+            // per duplicate row while face_detections — keyed on
+            // content_hash — counts the bytes once, leaving a permanent
+            // gap (e.g. 1101/1103 with nothing actually pending). Rows
+            // with NULL content_hash are excluded; they're held in the
+            // hash-backfill backlog and counting them would pin the bar
+            // below 100% for the duration of that backfill.
            let total_photos: i64 = {
-                let mut q = image_exif::table.into_boxed();
-                if let Some(lib) = library_id {
-                    q = q.filter(image_exif::library_id.eq(lib));
-                }
-                q.select(diesel::dsl::count_star())
-                    .first(conn.deref_mut())
-                    .with_context(|| "stats: total_photos")?
+                let ext_predicate = image_path_predicate("rel_path");
+                let row: CountRow = if let Some(lib) = library_id {
+                    let sql = format!(
+                        "SELECT COUNT(DISTINCT content_hash) AS count FROM image_exif \
+                         WHERE library_id = ? AND content_hash IS NOT NULL AND {ext_predicate}"
+                    );
+                    diesel::sql_query(sql)
+                        .bind::<diesel::sql_types::Integer, _>(lib)
+                        .get_result(conn.deref_mut())
+                        .with_context(|| "stats: total_photos")?
+                } else {
+                    let sql = format!(
+                        "SELECT COUNT(DISTINCT content_hash) AS count FROM image_exif \
+                         WHERE content_hash IS NOT NULL AND {ext_predicate}"
+                    );
+                    diesel::sql_query(sql)
+                        .get_result(conn.deref_mut())
+                        .with_context(|| "stats: total_photos")?
+                };
+                row.count
            };
            let persons_count: i64 = persons::table
                .select(diesel::dsl::count_star())
@@ -2255,6 +2315,12 @@ async fn update_face_handler<D: FaceDao>(
    let mut new_embedding: Option<Vec<u8>> = None;
    if let Some((bx, by, bw, bh)) = bbox_patch {
        if !face_client.is_enabled() {
+            warn!(
+                "PATCH /image/faces/{}: 503 — face client not enabled \
+                 (APOLLO_FACE_API_BASE_URL / APOLLO_API_BASE_URL both unset). \
+                 Bbox edit requires Apollo to re-embed.",
+                id
+            );
            return HttpResponse::ServiceUnavailable()
                .body("face client disabled — bbox edit requires Apollo");
        }
@@ -2284,8 +2350,7 @@ async fn update_face_handler<D: FaceDao>(
                    "PATCH /image/faces/{}: crop failed for {:?}: {:?}",
                    id, abs_path, e
                );
-                return HttpResponse::BadRequest()
-                    .body(format!("cannot crop new bbox: {}", e));
+                return HttpResponse::BadRequest().body(format!("cannot crop new bbox: {}", e));
            }
        };
        let meta = DetectMeta {
@@ -2332,11 +2397,20 @@ async fn update_face_handler<D: FaceDao>(
                );
            }
            Err(FaceDetectError::Transient(e)) => {
+                warn!(
+                    "PATCH /image/faces/{}: 503 — Apollo face client transient \
+                     error during re-embed: {}",
+                    id, e
+                );
                return HttpResponse::ServiceUnavailable().body(format!("{}", e));
            }
            Err(FaceDetectError::Disabled) => {
-                return HttpResponse::ServiceUnavailable()
-                    .body("face client disabled mid-flight");
+                warn!(
+                    "PATCH /image/faces/{}: 503 — face client became disabled \
+                     mid-flight",
+                    id
+                );
+                return HttpResponse::ServiceUnavailable().body("face client disabled mid-flight");
            }
        }
    }
@@ -3145,6 +3219,39 @@ mod tests {
        assert_eq!(stats.with_faces, 0);
    }

+    #[test]
+    fn stats_total_photos_excludes_videos() {
+        // SCANNED counts content_hashes in face_detections; total_photos
+        // must apply the same image-extension filter as the watcher
+        // backlog query so the percentage can reach 100%. Without this,
+        // videos sit in image_exif but never produce a face_detections
+        // row (Apollo decodes images only) and the bar caps below 100%.
+        let mut dao = fresh_dao();
+        diesel::sql_query(
+            "INSERT OR IGNORE INTO libraries (id, name, root_path, created_at) \
+             VALUES (1, 'main', '/tmp', 0)",
+        )
+        .execute(dao.connection.lock().unwrap().deref_mut())
+        .expect("seed libraries");
+
+        diesel::sql_query(
+            "INSERT INTO image_exif \
+             (library_id, rel_path, content_hash, created_time, last_modified) VALUES \
+             (1, 'a.jpg',     'h-a',   0, 0), \
+             (1, 'b.JPEG',    'h-b',   0, 0), \
+             (1, 'movie.mp4', 'h-mp4', 0, 0), \
+             (1, 'clip.MOV',  'h-mov', 0, 0)",
+        )
+        .execute(dao.connection.lock().unwrap().deref_mut())
+        .expect("seed image_exif");
+
+        let stats = dao.stats(&ctx(), Some(1)).expect("stats");
+        assert_eq!(
+            stats.total_photos, 2,
+            "videos should not count toward total"
+        );
+    }
+
    #[test]
    fn merge_persons_repoints_faces() {
        let mut dao = fresh_dao();
@@ -3325,8 +3432,7 @@ mod tests {
            )
            .unwrap();
        let row = seed_library_and_face(&mut dao, Some(p.id));
-        let joined =
-            hydrate_face_with_person(&mut dao, &ctx(), row).expect("hydrate assigned");
+        let joined = hydrate_face_with_person(&mut dao, &ctx(), row).expect("hydrate assigned");
        assert_eq!(joined.person_id, Some(p.id));
        assert_eq!(joined.person_name.as_deref(), Some("Alice"));
        // Bbox + confidence + source must round-trip — these are what
@@ -3345,8 +3451,7 @@ mod tests {
        // previously-assigned row's serialization.
        let mut dao = fresh_dao();
        let row = seed_library_and_face(&mut dao, None);
-        let joined =
-            hydrate_face_with_person(&mut dao, &ctx(), row).expect("hydrate unassigned");
+        let joined = hydrate_face_with_person(&mut dao, &ctx(), row).expect("hydrate unassigned");
        assert!(joined.person_id.is_none());
        assert!(joined.person_name.is_none());
    }
@@ -3367,7 +3472,12 @@ mod tests {
        .execute(dao.connection.lock().unwrap().deref_mut())
        .expect("seed libraries");

-        // Seed image_exif: mix of hashed/unhashed/scanned/cross-library.
+        // Seed image_exif: mix of hashed/unhashed/scanned/cross-library,
+        // plus a video and a mixed-case image extension. Videos register
+        // in image_exif but can never produce a face_detections row, so
+        // the SQL must filter them out — otherwise the per-tick backlog
+        // drain re-pulls them every tick (no marker is ever written, so
+        // they loop forever) and the SCANNED stat is permanently capped.
        diesel::sql_query(
            "INSERT INTO image_exif \
             (library_id, rel_path, content_hash, created_time, last_modified) VALUES \
@@ -3375,6 +3485,9 @@ mod tests {
             (1, 'b.jpg', 'h-b', 0, 0), \
             (1, 'c.jpg', NULL,  0, 0), \
             (1, 'd.jpg', 'h-d', 0, 0), \
+             (1, 'movie.mp4', 'h-mp4', 0, 0), \
+             (1, 'clip.MOV',  'h-mov', 0, 0), \
+             (1, 'photo.JPG', 'h-jpg-upper', 0, 0), \
             (2, 'e.jpg', 'h-e', 0, 0)",
        )
        .execute(dao.connection.lock().unwrap().deref_mut())
@@ -3388,16 +3501,26 @@ mod tests {
            .list_unscanned_candidates(&ctx(), 1, 10)
            .expect("list unscanned");

-        let hashes: std::collections::HashSet<_> =
-            cands.iter().map(|(_, h)| h.clone()).collect();
+        let hashes: std::collections::HashSet<_> = cands.iter().map(|(_, h)| h.clone()).collect();

-        // Should contain a and d (hashed, unscanned, library 1).
+        // Should contain a, d, and the upper-case .JPG (image-extension
+        // match is case-insensitive).
        assert!(hashes.contains("h-a"), "missing h-a: {:?}", hashes);
        assert!(hashes.contains("h-d"), "missing h-d: {:?}", hashes);
-        // Should NOT contain b (scanned), c (no hash), e (other library).
+        assert!(
+            hashes.contains("h-jpg-upper"),
+            "missing h-jpg-upper: {:?}",
+            hashes
+        );
+        // Should NOT contain b (scanned), c (no hash), e (other library),
+        // or videos (mp4/mov are not image extensions).
        assert!(!hashes.contains("h-b"), "expected h-b filtered (scanned)");
-        assert!(!hashes.contains("h-e"), "expected h-e filtered (other library)");
-        assert_eq!(cands.len(), 2, "unexpected candidates: {:?}", cands);
+        assert!(
+            !hashes.contains("h-e"),
+            "expected h-e filtered (other library)"
+        );
+        assert!(!hashes.contains("h-mp4"), "expected h-mp4 filtered (video)");
+        assert!(!hashes.contains("h-mov"), "expected h-mov filtered (video)");
+        assert_eq!(cands.len(), 3, "unexpected candidates: {:?}", cands);
    }
-
 }
--- a/src/files.rs
+++ b/src/files.rs
@@ -110,11 +110,18 @@ fn in_memory_date_sort(
    let total_count = files.len() as i64;
    let file_paths: Vec<String> = files.iter().map(|f| f.file_name.clone()).collect();

-    // Batch fetch EXIF data (keyed by rel_path; in union mode a rel_path may
-    // correspond to rows in multiple libraries — pick the date from the one
-    // matching the requesting row's library_id when possible).
+    // Batch fetch EXIF data. When every file in this batch belongs to the
+    // same library, scope the SQL filter to that library so cross-library
+    // duplicates with the same rel_path don't get fetched and discarded.
+    // In genuine union mode (mixed libraries) keep the rel-path-only
+    // lookup; the caller's `(file_path, library_id)` map below picks the
+    // right row.
+    let scope_library = match file_libraries.first() {
+        Some(&first) if file_libraries.iter().all(|&id| id == first) => Some(first),
+        _ => None,
+    };
    let exif_rows = exif_dao
-        .get_exif_batch(span_context, &file_paths)
+        .get_exif_batch(span_context, scope_library, &file_paths)
        .unwrap_or_default();
    let exif_map: std::collections::HashMap<(String, i32), i64> = exif_rows
        .into_iter()
@@ -309,11 +316,15 @@ pub async fn list_photos<TagD: TagDao, FS: FileSystemAccess>(
            None
        };

-        // Query EXIF database
+        // Query EXIF database. When the request named a library, the EXIF
+        // filter must be scoped to it — otherwise camera/date/GPS hits
+        // from other libraries would pollute the result set even though
+        // downstream filesystem walks would never visit those files.
        let mut exif_dao_guard = exif_dao.lock().expect("Unable to get ExifDao");
        let exif_results = exif_dao_guard
            .query_by_exif(
                &span_context,
+                library.map(|l| l.id),
                req.camera_make.as_deref(),
                req.camera_model.as_deref(),
                req.lens_model.as_deref(),
@@ -572,9 +583,10 @@ pub async fn list_photos<TagD: TagDao, FS: FileSystemAccess>(
        } else {
            Some(trimmed)
        };
+        let include_duplicates = req.include_duplicates.unwrap_or(false);
        let rows = {
            let mut dao = exif_dao.lock().expect("Unable to get ExifDao");
-            dao.list_rel_paths_for_libraries(&span_context, &lib_ids, prefix)
+            dao.list_rel_paths_for_libraries(&span_context, &lib_ids, prefix, include_duplicates)
                .unwrap_or_else(|e| {
                    warn!("list_rel_paths_for_libraries failed: {:?}", e);
                    Vec::new()
@@ -1242,15 +1254,19 @@ pub async fn list_exif_summary(
        .collect();

    let mut exif_dao_guard = exif_dao.lock().expect("Unable to get ExifDao");
-    match exif_dao_guard.query_by_exif(&cx, None, None, None, None, req.date_from, req.date_to) {
+    match exif_dao_guard.query_by_exif(
+        &cx,
+        library_filter,
+        None,
+        None,
+        None,
+        None,
+        req.date_from,
+        req.date_to,
+    ) {
        Ok(rows) => {
            let photos: Vec<ExifSummary> = rows
                .into_iter()
-                // Library filter post-query: keeps the DAO trait (and its
-                // mocks) unchanged. For typical 2–3 library setups the in-
-                // memory pass over a date-bounded result set is negligible;
-                // can be pushed into SQL later if it ever isn't.
-                .filter(|r| library_filter.is_none_or(|id| r.library_id == id))
                .map(|r| ExifSummary {
                    library_name: library_names.get(&r.library_id).cloned(),
                    file_path: r.file_path,
@@ -1488,6 +1504,10 @@ mod tests {
                last_modified: data.last_modified,
                content_hash: data.content_hash.clone(),
                size_bytes: data.size_bytes,
+                phash_64: data.phash_64,
+                dhash_64: data.dhash_64,
+                duplicate_of_hash: None,
+                duplicate_decided_at: None,
            })
        }

@@ -1527,6 +1547,10 @@ mod tests {
                last_modified: data.last_modified,
                content_hash: data.content_hash.clone(),
                size_bytes: data.size_bytes,
+                phash_64: data.phash_64,
+                dhash_64: data.dhash_64,
+                duplicate_of_hash: None,
+                duplicate_decided_at: None,
            })
        }

@@ -1549,6 +1573,7 @@ mod tests {
        fn get_exif_batch(
            &mut self,
            _context: &opentelemetry::Context,
+            _library_id: Option<i32>,
            _: &[String],
        ) -> Result<Vec<crate::database::models::ImageExif>, DbError> {
            Ok(Vec::new())
@@ -1557,6 +1582,7 @@ mod tests {
        fn query_by_exif(
            &mut self,
            _context: &opentelemetry::Context,
+            _library_id: Option<i32>,
            _: Option<&str>,
            _: Option<&str>,
            _: Option<&str>,
@@ -1672,6 +1698,7 @@ mod tests {
            _context: &opentelemetry::Context,
            _library_ids: &[i32],
            _path_prefix: Option<&str>,
+            _include_duplicates: bool,
        ) -> Result<Vec<(i32, String)>, DbError> {
            Ok(vec![])
        }
@@ -1684,6 +1711,100 @@ mod tests {
        ) -> Result<(), DbError> {
            Ok(())
        }
+
+        fn count_for_library(
+            &mut self,
+            _context: &opentelemetry::Context,
+            _library_id: i32,
+        ) -> Result<i64, DbError> {
+            Ok(0)
+        }
+
+        fn list_rel_paths_for_library_page(
+            &mut self,
+            _context: &opentelemetry::Context,
+            _library_id: i32,
+            _limit: i64,
+            _offset: i64,
+        ) -> Result<Vec<(i32, String)>, DbError> {
+            Ok(Vec::new())
+        }
+
+        fn get_rows_missing_perceptual_hash(
+            &mut self,
+            _context: &opentelemetry::Context,
+            _limit: i64,
+        ) -> Result<Vec<(i32, String)>, DbError> {
+            Ok(Vec::new())
+        }
+
+        fn backfill_perceptual_hash(
+            &mut self,
+            _context: &opentelemetry::Context,
+            _library_id: i32,
+            _rel_path: &str,
+            _phash_64: Option<i64>,
+            _dhash_64: Option<i64>,
+        ) -> Result<(), DbError> {
+            Ok(())
+        }
+
+        fn list_duplicates_exact(
+            &mut self,
+            _context: &opentelemetry::Context,
+            _library_id: Option<i32>,
+            _include_resolved: bool,
+        ) -> Result<Vec<crate::database::DuplicateRow>, DbError> {
+            Ok(Vec::new())
+        }
+
+        fn list_perceptual_candidates(
+            &mut self,
+            _context: &opentelemetry::Context,
+            _library_id: Option<i32>,
+            _include_resolved: bool,
+        ) -> Result<Vec<crate::database::DuplicateRow>, DbError> {
+            Ok(Vec::new())
+        }
+
+        fn lookup_duplicate_row(
+            &mut self,
+            _context: &opentelemetry::Context,
+            _library_id: i32,
+            _rel_path: &str,
+        ) -> Result<Option<crate::database::DuplicateRow>, DbError> {
+            Ok(None)
+        }
+
+        fn set_duplicate_of(
+            &mut self,
+            _context: &opentelemetry::Context,
+            _library_id: i32,
+            _rel_path: &str,
+            _survivor_hash: &str,
+            _decided_at: i64,
+        ) -> Result<(), DbError> {
+            Ok(())
+        }
+
+        fn clear_duplicate_of(
+            &mut self,
+            _context: &opentelemetry::Context,
+            _library_id: i32,
+            _rel_path: &str,
+        ) -> Result<(), DbError> {
+            Ok(())
+        }
+
+        fn union_perceptual_tags(
+            &mut self,
+            _context: &opentelemetry::Context,
+            _survivor_hash: &str,
+            _demoted_hash: &str,
+            _survivor_rel_path: &str,
+        ) -> Result<(), DbError> {
+            Ok(())
+        }
    }

    mod api {
--- a/src/lib.rs
+++ b/src/lib.rs
@@ -10,6 +10,7 @@ pub mod cleanup;
 pub mod content_hash;
 pub mod data;
 pub mod database;
+pub mod duplicates;
 pub mod error;
 pub mod exif;
 pub mod face_watch;
@@ -19,9 +20,11 @@ pub mod file_types;
 pub mod files;
 pub mod geo;
 pub mod libraries;
+pub mod library_maintenance;
 pub mod memories;
 pub mod otel;
 pub mod parsers;
+pub mod perceptual_hash;
 pub mod service;
 pub mod state;
 pub mod tags;
--- a/src/libraries.rs
+++ b/src/libraries.rs
@@ -3,7 +3,9 @@ use chrono::Utc;
 use diesel::prelude::*;
 use diesel::sqlite::SqliteConnection;
 use log::{info, warn};
+use std::collections::HashMap;
 use std::path::{Path, PathBuf};
+use std::sync::{Arc, RwLock};

 use crate::data::Claims;
 use crate::database::models::{InsertLibrary, LibraryRow};
@@ -26,6 +28,19 @@ pub struct Library {
    pub id: i32,
    pub name: String,
    pub root_path: String,
+    /// Operator kill switch (mirrors `libraries.enabled`). When `false`
+    /// the watcher skips this library entirely — before the probe,
+    /// before ingest, before maintenance. Reads / serving still work
+    /// (a request whose path resolves to a disabled library's root
+    /// will succeed if the file is on disk; nothing prevents that
+    /// today and there's no obvious reason to). Toggle via SQL.
+    pub enabled: bool,
+    /// Per-library excluded paths/patterns, parsed from the
+    /// comma-separated DB column. The walker applies these
+    /// **in union** with the global `EXCLUDED_DIRS` env var; either
+    /// list matching a path is enough to exclude. Empty = no
+    /// library-specific excludes (only the global env var applies).
+    pub excluded_dirs: Vec<String>,
 }

 impl Library {
@@ -47,6 +62,36 @@ impl Library {
            .ok()
            .map(|p| p.to_string_lossy().replace('\\', "/"))
    }
+
+    /// Effective excluded directories for a walk of this library:
+    /// the union of the global env-var excludes (passed in by the
+    /// caller as `globals`) and this library's per-row excludes.
+    /// Order doesn't matter; `PathExcluder` accepts repeats.
+    pub fn effective_excluded_dirs(&self, globals: &[String]) -> Vec<String> {
+        if self.excluded_dirs.is_empty() {
+            return globals.to_vec();
+        }
+        let mut combined: Vec<String> =
+            Vec::with_capacity(globals.len() + self.excluded_dirs.len());
+        combined.extend_from_slice(globals);
+        combined.extend(self.excluded_dirs.iter().cloned());
+        combined
+    }
+}
+
+/// Parse a comma-separated excluded_dirs column into a Vec, dropping
+/// empty entries (mirrors `AppState::parse_excluded_dirs` for the env
+/// var). NULL → empty Vec.
+pub fn parse_excluded_dirs_column(raw: Option<&str>) -> Vec<String> {
+    match raw {
+        None => Vec::new(),
+        Some(s) => s
+            .split(',')
+            .map(str::trim)
+            .filter(|s| !s.is_empty())
+            .map(String::from)
+            .collect(),
+    }
 }

 impl From<LibraryRow> for Library {
@@ -55,6 +100,8 @@ impl From<LibraryRow> for Library {
            id: row.id,
            name: row.name,
            root_path: row.root_path,
+            enabled: row.enabled,
+            excluded_dirs: parse_excluded_dirs_column(row.excluded_dirs.as_deref()),
        }
    }
 }
@@ -109,6 +156,8 @@ pub fn seed_or_patch_from_env(conn: &mut SqliteConnection, base_path: &str) {
                name: "main",
                root_path: base_path,
                created_at: now,
+                enabled: true,
+                excluded_dirs: None,
            })
            .execute(conn);
        match result {
@@ -146,16 +195,165 @@ pub fn resolve_library_param<'a>(
        .ok_or_else(|| format!("unknown library name: {}", raw))
 }

+/// Health of a library at a point in time. Probed at the top of each
+/// file-watcher tick. The `Stale` state is the "be conservative" signal:
+/// destructive paths (ingest writes, future move-handoff and orphan GC in
+/// branches B/C) skip a stale library, but reads/serving stay unaffected.
+///
+/// See `CLAUDE.md` → "Library availability and safety" for the policy.
+#[derive(Clone, Debug, serde::Serialize, PartialEq, Eq)]
+#[serde(tag = "state", rename_all = "snake_case")]
+pub enum LibraryHealth {
+    Online,
+    Stale {
+        reason: String,
+        /// Unix timestamp (seconds) of the most recent transition into
+        /// Stale. Held for telemetry / `/libraries` surfacing only —
+        /// gating logic doesn't read it.
+        since: i64,
+    },
+}
+
+impl LibraryHealth {
+    pub fn is_online(&self) -> bool {
+        matches!(self, LibraryHealth::Online)
+    }
+}
+
+/// Shared snapshot of every configured library's health, keyed by
+/// `library_id`. The watcher writes; HTTP handlers read. RwLock because
+/// reads vastly outnumber writes (one tick vs. every status request).
+pub type LibraryHealthMap = Arc<RwLock<HashMap<i32, LibraryHealth>>>;
+
+/// Construct an initial health map. Libraries start `Online`; the first
+/// probe will downgrade any that fail. Starting `Stale` would block ingest
+/// for the watcher's first tick on a healthy mount, which is the wrong
+/// default for a server that's just been restarted.
+pub fn new_health_map(libs: &[Library]) -> LibraryHealthMap {
+    let mut m = HashMap::with_capacity(libs.len());
+    for lib in libs {
+        m.insert(lib.id, LibraryHealth::Online);
+    }
+    Arc::new(RwLock::new(m))
+}
+
+/// Probe a library's mount point. Cheap: stat + open dir + peek one entry.
+///
+/// `had_data` is the caller's prior knowledge that this library has been
+/// non-empty before — typically `image_exif` row count > 0. When true, an
+/// empty directory is suspicious (it's how an unmounted NFS share looks);
+/// when false, it's accepted as a fresh mount that simply hasn't been
+/// indexed yet.
+///
+/// Note: stat / read_dir on a hard-mounted, unreachable NFS share can
+/// block. The watcher accepts that risk for now — the worst case is that
+/// the tick stalls until the mount returns, which is no more destructive
+/// than the pre-probe behavior. A future enhancement can wrap this in a
+/// thread + timeout if it becomes an operational issue.
+pub fn probe_online(lib: &Library, had_data: bool) -> LibraryHealth {
+    let now = Utc::now().timestamp();
+    let path = Path::new(&lib.root_path);
+
+    let metadata = match std::fs::metadata(path) {
+        Ok(m) => m,
+        Err(e) => {
+            return LibraryHealth::Stale {
+                reason: format!("root_path stat failed: {}", e),
+                since: now,
+            };
+        }
+    };
+    if !metadata.is_dir() {
+        return LibraryHealth::Stale {
+            reason: format!("root_path is not a directory: {}", lib.root_path),
+            since: now,
+        };
+    }
+
+    let mut entries = match std::fs::read_dir(path) {
+        Ok(it) => it,
+        Err(e) => {
+            return LibraryHealth::Stale {
+                reason: format!("read_dir failed: {}", e),
+                since: now,
+            };
+        }
+    };
+
+    // Empty directory only counts as Stale when we have prior evidence
+    // this library used to have content. A genuinely fresh mount is
+    // legitimately empty, and degrading it would block first-time ingest.
+    if had_data && entries.next().is_none() {
+        return LibraryHealth::Stale {
+            reason: "library is empty but image_exif has rows for it".to_string(),
+            since: now,
+        };
+    }
+
+    LibraryHealth::Online
+}
+
+/// Probe `lib`, update `map`, and return the new state. Logs only on a
+/// state transition (Online↔Stale) so a long outage doesn't spam at every
+/// tick — operators get one warn on the way down and one info on the way
+/// up.
+pub fn refresh_health(map: &LibraryHealthMap, lib: &Library, had_data: bool) -> LibraryHealth {
+    let new_state = probe_online(lib, had_data);
+    let mut guard = map.write().unwrap_or_else(|e| e.into_inner());
+    let prev = guard.get(&lib.id).cloned();
+    let transitioned = matches!(
+        (&prev, &new_state),
+        (None, LibraryHealth::Stale { .. })
+            | (Some(LibraryHealth::Online), LibraryHealth::Stale { .. })
+            | (Some(LibraryHealth::Stale { .. }), LibraryHealth::Online)
+    );
+    if transitioned {
+        match &new_state {
+            LibraryHealth::Online => info!(
+                "Library '{}' (id={}) recovered: {} is online",
+                lib.name, lib.id, lib.root_path
+            ),
+            LibraryHealth::Stale { reason, .. } => warn!(
+                "Library '{}' (id={}) is STALE — pausing writes. Reason: {}. Path: {}",
+                lib.name, lib.id, reason, lib.root_path
+            ),
+        }
+    }
+    guard.insert(lib.id, new_state.clone());
+    new_state
+}
+
+/// Snapshot of one library + its current health, for `/libraries`.
+#[derive(serde::Serialize)]
+pub struct LibraryStatus {
+    #[serde(flatten)]
+    pub library: Library,
+    pub health: LibraryHealth,
+}
+
 #[derive(serde::Serialize)]
 pub struct LibrariesResponse {
-    pub libraries: Vec<Library>,
+    pub libraries: Vec<LibraryStatus>,
 }

 #[get("/libraries")]
 pub async fn list_libraries(_claims: Claims, app_state: Data<AppState>) -> impl Responder {
-    HttpResponse::Ok().json(LibrariesResponse {
-        libraries: app_state.libraries.clone(),
-    })
+    let health_guard = app_state
+        .library_health
+        .read()
+        .unwrap_or_else(|e| e.into_inner());
+    let libraries = app_state
+        .libraries
+        .iter()
+        .map(|lib| LibraryStatus {
+            library: lib.clone(),
+            health: health_guard
+                .get(&lib.id)
+                .cloned()
+                .unwrap_or(LibraryHealth::Online),
+        })
+        .collect();
+    HttpResponse::Ok().json(LibrariesResponse { libraries })
 }

 #[cfg(test)]
@@ -192,6 +390,8 @@ mod tests {
            id: 1,
            name: "main".into(),
            root_path: "/tmp/media".into(),
+            enabled: true,
+            excluded_dirs: Vec::new(),
        };
        let rel = lib.strip_root(Path::new("/tmp/media/2024/photo.jpg"));
        assert_eq!(rel.as_deref(), Some("2024/photo.jpg"));
@@ -205,6 +405,8 @@ mod tests {
            id: 1,
            name: "main".into(),
            root_path: "/tmp/media".into(),
+            enabled: true,
+            excluded_dirs: Vec::new(),
        };
        let abs = lib.resolve("2024/photo.jpg");
        assert_eq!(abs, PathBuf::from("/tmp/media/2024/photo.jpg"));
@@ -222,11 +424,15 @@ mod tests {
                id: 1,
                name: "main".into(),
                root_path: "/tmp/main".into(),
+                enabled: true,
+                excluded_dirs: Vec::new(),
            },
            Library {
                id: 7,
                name: "archive".into(),
                root_path: "/tmp/archive".into(),
+                enabled: true,
+                excluded_dirs: Vec::new(),
            },
        ]
    }
@@ -279,4 +485,138 @@ mod tests {
        let err = resolve_library_param(&state, Some("missing")).unwrap_err();
        assert!(err.contains("unknown library name"));
    }
+
+    #[test]
+    fn parse_excluded_dirs_column_handles_null_and_whitespace() {
+        assert_eq!(parse_excluded_dirs_column(None), Vec::<String>::new());
+        assert_eq!(parse_excluded_dirs_column(Some("")), Vec::<String>::new());
+        assert_eq!(
+            parse_excluded_dirs_column(Some(" /a , /b/sub , @eaDir ,, ")),
+            vec!["/a".to_string(), "/b/sub".to_string(), "@eaDir".to_string()]
+        );
+    }
+
+    #[test]
+    fn effective_excluded_dirs_unions_global_and_per_library() {
+        let lib_no_extras = Library {
+            id: 1,
+            name: "main".into(),
+            root_path: "/x".into(),
+            enabled: true,
+            excluded_dirs: Vec::new(),
+        };
+        let globals = vec!["@eaDir".to_string(), ".thumbnails".to_string()];
+        // Empty per-library excludes → exactly the globals.
+        assert_eq!(lib_no_extras.effective_excluded_dirs(&globals), globals);
+
+        let lib_with_extras = Library {
+            id: 2,
+            name: "archive".into(),
+            root_path: "/y".into(),
+            enabled: true,
+            excluded_dirs: vec!["/photos".to_string()],
+        };
+        let combined = lib_with_extras.effective_excluded_dirs(&globals);
+        assert!(combined.contains(&"@eaDir".to_string()));
+        assert!(combined.contains(&".thumbnails".to_string()));
+        assert!(combined.contains(&"/photos".to_string()));
+        assert_eq!(combined.len(), 3);
+    }
+
+    fn probe_lib(id: i32, root: String) -> Library {
+        Library {
+            id,
+            name: "main".into(),
+            root_path: root,
+            enabled: true,
+            excluded_dirs: Vec::new(),
+        }
+    }
+
+    #[test]
+    fn probe_online_for_existing_non_empty_dir() {
+        let tmp = tempfile::tempdir().unwrap();
+        std::fs::write(tmp.path().join("photo.jpg"), b"hello").unwrap();
+        let lib = probe_lib(1, tmp.path().to_string_lossy().into());
+        // had_data doesn't matter when the dir has entries.
+        assert!(probe_online(&lib, true).is_online());
+        assert!(probe_online(&lib, false).is_online());
+    }
+
+    #[test]
+    fn probe_stale_when_root_missing() {
+        let lib = probe_lib(1, "/nonexistent/definitely/not/here".into());
+        assert!(matches!(
+            probe_online(&lib, false),
+            LibraryHealth::Stale { .. }
+        ));
+    }
+
+    #[test]
+    fn probe_stale_when_root_is_a_file() {
+        let tmp = tempfile::tempdir().unwrap();
+        let file = tmp.path().join("not-a-dir");
+        std::fs::write(&file, b"x").unwrap();
+        let lib = probe_lib(1, file.to_string_lossy().into());
+        assert!(matches!(
+            probe_online(&lib, false),
+            LibraryHealth::Stale { .. }
+        ));
+    }
+
+    #[test]
+    fn probe_empty_dir_is_online_when_no_prior_data() {
+        // Fresh mount: empty directory, no rows in image_exif. Accept it.
+        let tmp = tempfile::tempdir().unwrap();
+        let lib = probe_lib(1, tmp.path().to_string_lossy().into());
+        assert!(probe_online(&lib, false).is_online());
+    }
+
+    #[test]
+    fn probe_empty_dir_is_stale_when_prior_data_existed() {
+        // The "share went offline" signal: directory exists but is empty,
+        // and we know the library used to have content. Treat as Stale.
+        let tmp = tempfile::tempdir().unwrap();
+        let lib = probe_lib(1, tmp.path().to_string_lossy().into());
+        match probe_online(&lib, true) {
+            LibraryHealth::Stale { reason, .. } => {
+                assert!(reason.contains("empty"), "unexpected reason: {}", reason)
+            }
+            other => panic!("expected Stale, got {:?}", other),
+        }
+    }
+
+    #[test]
+    fn refresh_health_logs_only_on_transition() {
+        // Smoke test: refresh_health updates the map and reports correctly.
+        // (We can't easily assert on logs without a custom logger; the
+        // important thing is that the state churns properly.)
+        let tmp = tempfile::tempdir().unwrap();
+        let lib = Library {
+            id: 42,
+            name: "test".into(),
+            root_path: tmp.path().to_string_lossy().into(),
+            enabled: true,
+            excluded_dirs: Vec::new(),
+        };
+        let map = new_health_map(&[lib.clone()]);
+
+        // First probe: empty dir, no prior data — Online.
+        let s1 = refresh_health(&map, &lib, false);
+        assert!(s1.is_online());
+
+        // Probe again with had_data=true on the same empty dir — Stale.
+        let s2 = refresh_health(&map, &lib, true);
+        assert!(matches!(s2, LibraryHealth::Stale { .. }));
+        assert_eq!(
+            map.read().unwrap().get(&lib.id).cloned(),
+            Some(s2.clone()),
+            "map should reflect the latest probe"
+        );
+
+        // Recovery: drop a file and probe again.
+        std::fs::write(tmp.path().join("photo.jpg"), b"x").unwrap();
+        let s3 = refresh_health(&map, &lib, true);
+        assert!(s3.is_online());
+    }
 }
--- a/src/library_maintenance.rs
+++ b/src/library_maintenance.rs
@@ -0,0 +1,828 @@
+//! Filesystem-backed maintenance of `image_exif`, the back-ref columns
+//! on hash-keyed tables, and orphan derived data.
+//!
+//! These passes are the operational implementation of the library
+//! handoff and orphan rules from CLAUDE.md → "Multi-library data
+//! model" / "Library availability and safety":
+//!
+//! 1. **Missing-file detection** — when a file disappears from disk
+//!    but its `image_exif` row remains, the row is removed. Naturally
+//!    implements the move case: when a user moves a file from lib-A
+//!    to lib-B, the watcher's normal ingest creates the lib-B row;
+//!    this pass eventually retires the lib-A row.
+//!
+//! 2. **Back-ref refresh** — hash-keyed rows (`face_detections` and,
+//!    after Branch B, `tagged_photo` / `photo_insights`) carry a
+//!    denormalized `(library_id, rel_path)` back-ref. After a move,
+//!    that back-ref may point at a deleted row. The refresh pass
+//!    finds rows whose `(library_id, rel_path)` no longer matches
+//!    any `image_exif` row but whose `content_hash` does, and updates
+//!    the back-ref to one of the surviving paths. Idempotent.
+//!
+//! 3. **Orphan GC** — when a `content_hash` no longer has any
+//!    `image_exif` row referencing it, hash-keyed derived rows for
+//!    that hash become eligible for deletion. To survive transient
+//!    unmounts, the pass uses a **two-tick consensus rule**: a hash
+//!    must be observed orphaned for two consecutive ticks AND every
+//!    library must be online for both observations. The "marked but
+//!    not yet deleted" state is held in memory; restarting the
+//!    watcher resets it (which is fine — the second tick simply
+//!    happens after the next tick, not the very next one).
+//!
+//! Pass 1 is filesystem-dependent and gated on the per-library
+//! availability probe. Passes 2 and 3 are database-only but pass 3
+//! additionally requires every library to be online for the
+//! consensus window.
+
+use std::collections::HashSet;
+use std::path::Path;
+use std::sync::{Arc, Mutex};
+
+use diesel::prelude::*;
+use diesel::sql_query;
+use diesel::sqlite::SqliteConnection;
+use log::{debug, info, warn};
+
+use crate::database::ExifDao;
+use crate::libraries::{Library, LibraryHealthMap};
+
+/// Cap on missing-file deletions per library per tick. Prevents a
+/// pathological mount that returns "not found" for everything (e.g.
+/// case-sensitivity flip on a network share that the probe didn't
+/// catch) from wiping the entire image_exif table in one tick. Tunable
+/// via `IMAGE_EXIF_MISSING_DELETE_CAP_PER_TICK`.
+pub const DEFAULT_MISSING_DELETE_CAP: usize = 200;
+
+/// Page size for the missing-file scan. We stat() every row in this
+/// batch but only delete those that are confirmed-not-found (subject
+/// to the delete cap above). Tunable via
+/// `IMAGE_EXIF_MISSING_SCAN_PAGE_SIZE`.
+pub const DEFAULT_SCAN_PAGE_SIZE: i64 = 500;
+
+/// Scan a page of `image_exif` rows for `library`, stat() each one,
+/// and delete rows whose source file is gone. Returns
+/// `(deleted, next_offset)`. `next_offset` wraps to 0 when the page
+/// returned fewer rows than the page size, so the watcher cycles
+/// through the whole library across ticks.
+///
+/// Caller must already have confirmed the library is online — running
+/// against a Stale library would interpret every row as missing.
+pub fn detect_missing_files_for_library(
+    context: &opentelemetry::Context,
+    library: &Library,
+    exif_dao: &Arc<Mutex<Box<dyn ExifDao>>>,
+    offset: i64,
+    page_size: i64,
+    delete_cap: usize,
+) -> (usize, i64) {
+    let rows = {
+        let mut dao = exif_dao.lock().expect("exif_dao poisoned");
+        match dao.list_rel_paths_for_library_page(context, library.id, page_size, offset) {
+            Ok(r) => r,
+            Err(e) => {
+                warn!(
+                    "missing-file scan: list page failed for library '{}' (offset={}): {:?}",
+                    library.name, offset, e
+                );
+                return (0, offset);
+            }
+        }
+    };
+    let n_returned = rows.len();
+    // Wrap offset when we hit the end of the table — next tick starts
+    // a fresh sweep. Doing it here rather than on the next call keeps
+    // the offset accounting visible in one place.
+    let next_offset = if (n_returned as i64) < page_size {
+        0
+    } else {
+        offset + page_size
+    };
+
+    if rows.is_empty() {
+        return (0, next_offset);
+    }
+
+    let root = Path::new(&library.root_path);
+    let mut to_delete: Vec<String> = Vec::new();
+    for (_id, rel_path) in &rows {
+        if to_delete.len() >= delete_cap {
+            break;
+        }
+        let abs = root.join(rel_path);
+        match std::fs::metadata(&abs) {
+            Ok(_) => {
+                // File still exists — nothing to do.
+            }
+            Err(e) if e.kind() == std::io::ErrorKind::NotFound => {
+                to_delete.push(rel_path.clone());
+            }
+            Err(e) => {
+                // Permission denied / IO error / etc. — skip this row,
+                // leave it for the next sweep. We never want a transient
+                // FS hiccup to mass-delete metadata.
+                debug!(
+                    "missing-file scan: stat() error for {:?}, skipping: {:?}",
+                    abs, e
+                );
+            }
+        }
+    }
+
+    if to_delete.is_empty() {
+        return (0, next_offset);
+    }
+
+    let mut deleted = 0;
+    {
+        let mut dao = exif_dao.lock().expect("exif_dao poisoned");
+        for rel_path in &to_delete {
+            match dao.delete_exif_by_library(context, library.id, rel_path) {
+                Ok(()) => deleted += 1,
+                Err(e) => warn!(
+                    "missing-file scan: delete failed for ({}, {}): {:?}",
+                    library.id, rel_path, e
+                ),
+            }
+        }
+    }
+
+    if deleted > 0 {
+        info!(
+            "missing-file scan: removed {} stale image_exif row(s) from library '{}'",
+            deleted, library.name
+        );
+    }
+
+    (deleted, next_offset)
+}
+
+/// Refresh the `(library_id, rel_path)` back-refs on hash-keyed
+/// tables. A back-ref is stale when:
+///   - its `content_hash` is non-null,
+///   - that hash is referenced by at least one `image_exif` row, but
+///   - the row's own `(library_id, rel_path)` does not appear in
+///     `image_exif`.
+///
+/// In that case, point the back-ref at any surviving image_exif row
+/// for the same hash. `face_detections` is the canonical case (it
+/// carries `library_id` + `rel_path` columns); `tagged_photo` and
+/// `photo_insights` only carry rel_path historically — we still keep
+/// it in sync here for consistency, picking any surviving rel_path.
+///
+/// All-SQL, idempotent. Returns the number of rows updated.
+pub fn refresh_back_refs(conn: &mut SqliteConnection) -> usize {
+    let mut total = 0usize;
+
+    // face_detections — back-ref is (library_id, rel_path). Repoint to
+    // any surviving image_exif row carrying the same content_hash.
+    let updated = sql_query(
+        "UPDATE face_detections \
+         SET library_id = ( \
+                 SELECT ie.library_id FROM image_exif ie \
+                 WHERE ie.content_hash = face_detections.content_hash \
+                 ORDER BY ie.id LIMIT 1 \
+             ), \
+             rel_path = ( \
+                 SELECT ie.rel_path FROM image_exif ie \
+                 WHERE ie.content_hash = face_detections.content_hash \
+                 ORDER BY ie.id LIMIT 1 \
+             ) \
+         WHERE EXISTS ( \
+                 SELECT 1 FROM image_exif ie \
+                 WHERE ie.content_hash = face_detections.content_hash \
+             ) \
+           AND NOT EXISTS ( \
+                 SELECT 1 FROM image_exif ie \
+                 WHERE ie.library_id = face_detections.library_id \
+                   AND ie.rel_path  = face_detections.rel_path \
+             )",
+    )
+    .execute(conn)
+    .unwrap_or_else(|e| {
+        warn!("back-ref refresh: face_detections update failed: {:?}", e);
+        0
+    });
+    total += updated;
+
+    // tagged_photo — only rel_path. Update to any surviving rel_path
+    // for the same content_hash so the path-only DAO read still finds
+    // tags after a move.
+    let updated = sql_query(
+        "UPDATE tagged_photo \
+         SET rel_path = ( \
+                 SELECT ie.rel_path FROM image_exif ie \
+                 WHERE ie.content_hash = tagged_photo.content_hash \
+                 ORDER BY ie.id LIMIT 1 \
+             ) \
+         WHERE content_hash IS NOT NULL \
+           AND EXISTS ( \
+                 SELECT 1 FROM image_exif ie \
+                 WHERE ie.content_hash = tagged_photo.content_hash \
+             ) \
+           AND NOT EXISTS ( \
+                 SELECT 1 FROM image_exif ie \
+                 WHERE ie.rel_path = tagged_photo.rel_path \
+             )",
+    )
+    .execute(conn)
+    .unwrap_or_else(|e| {
+        warn!("back-ref refresh: tagged_photo update failed: {:?}", e);
+        0
+    });
+    total += updated;
+
+    // photo_insights — has both library_id and rel_path. Update both
+    // when the (library_id, rel_path) tuple no longer matches any
+    // image_exif row but the hash does.
+    let updated = sql_query(
+        "UPDATE photo_insights \
+         SET library_id = ( \
+                 SELECT ie.library_id FROM image_exif ie \
+                 WHERE ie.content_hash = photo_insights.content_hash \
+                 ORDER BY ie.id LIMIT 1 \
+             ), \
+             rel_path = ( \
+                 SELECT ie.rel_path FROM image_exif ie \
+                 WHERE ie.content_hash = photo_insights.content_hash \
+                 ORDER BY ie.id LIMIT 1 \
+             ) \
+         WHERE content_hash IS NOT NULL \
+           AND EXISTS ( \
+                 SELECT 1 FROM image_exif ie \
+                 WHERE ie.content_hash = photo_insights.content_hash \
+             ) \
+           AND NOT EXISTS ( \
+                 SELECT 1 FROM image_exif ie \
+                 WHERE ie.library_id = photo_insights.library_id \
+                   AND ie.rel_path  = photo_insights.rel_path \
+             )",
+    )
+    .execute(conn)
+    .unwrap_or_else(|e| {
+        warn!("back-ref refresh: photo_insights update failed: {:?}", e);
+        0
+    });
+    total += updated;
+
+    if total > 0 {
+        info!("back-ref refresh: updated {} hash-keyed row(s)", total);
+    }
+    total
+}
+
+/// One tick's outcome of the orphan-GC pass.
+#[derive(Debug, Default, Clone, Copy, PartialEq, Eq)]
+pub struct GcStats {
+    /// Hashes newly observed orphaned this tick (added to the
+    /// pending set).
+    pub newly_marked: usize,
+    /// Hashes that were marked last tick AND are still orphaned this
+    /// tick AND every library is online — these are deleted.
+    pub deleted_face_detections: usize,
+    pub deleted_tagged_photo: usize,
+    pub deleted_photo_insights: usize,
+    /// Hashes dropped from the pending set because they re-appeared
+    /// in image_exif (e.g. user remounted a backup that was briefly
+    /// missing).
+    pub revived: usize,
+}
+
+impl GcStats {
+    pub fn changed(&self) -> bool {
+        self.newly_marked > 0
+            || self.deleted_face_detections > 0
+            || self.deleted_tagged_photo > 0
+            || self.deleted_photo_insights > 0
+            || self.revived > 0
+    }
+
+    pub fn total_deleted(&self) -> usize {
+        self.deleted_face_detections + self.deleted_tagged_photo + self.deleted_photo_insights
+    }
+}
+
+/// Two-tick orphan-GC state. The watcher constructs one of these once
+/// at startup and passes it back into `run_orphan_gc` every tick.
+#[derive(Debug, Default)]
+pub struct OrphanGcState {
+    /// Hashes observed orphaned on the previous tick. A hash gets
+    /// promoted to "delete" when it survives a second consecutive
+    /// observation with all libraries online.
+    pending: HashSet<String>,
+    /// Whether every library was online on the previous tick. Combined
+    /// with the all-online check on the current tick, this gives the
+    /// "two consecutive ticks of full availability" guard described in
+    /// CLAUDE.md → "Library availability and safety".
+    prev_tick_all_online: bool,
+}
+
+/// Run one tick of the orphan GC. The function is responsible for the
+/// full lifecycle: probing for orphans, updating `state.pending`,
+/// performing deletes when consensus is reached, and returning stats
+/// for the watcher to log.
+///
+/// Safety guard: `all_online` MUST reflect every configured library
+/// being Online right now. Even if true, deletes only happen when the
+/// previous tick was also all-online. A single Stale tick within the
+/// window cancels any pending deletes (they stay marked but won't be
+/// promoted) — they're then re-evaluated next tick.
+pub fn run_orphan_gc(
+    conn: &mut SqliteConnection,
+    state: &mut OrphanGcState,
+    all_online: bool,
+) -> GcStats {
+    let mut stats = GcStats::default();
+
+    // Find every distinct content_hash referenced by hash-keyed
+    // derived data that is NOT currently referenced by image_exif.
+    // These are this tick's orphan candidates. Cheap query — three
+    // index lookups + a HashSet at row count of derived tables, which
+    // is small.
+    let orphans: HashSet<String> = match collect_orphan_hashes(conn) {
+        Ok(set) => set,
+        Err(e) => {
+            warn!("orphan-gc: candidate query failed: {:?}", e);
+            return stats;
+        }
+    };
+
+    // Drop entries from pending that are no longer orphaned
+    // ("revived"). Common case: a network share that briefly went
+    // stale comes back, image_exif gets re-populated by ingest, and
+    // the hash is no longer orphaned.
+    let revived = state
+        .pending
+        .difference(&orphans)
+        .cloned()
+        .collect::<Vec<_>>();
+    if !revived.is_empty() {
+        for h in &revived {
+            state.pending.remove(h);
+        }
+        stats.revived = revived.len();
+    }
+
+    if !all_online {
+        // Every Stale library cancels both the consensus window AND
+        // any pending deletes. We *do* still note newly observed
+        // orphans below — that's harmless bookkeeping. But we never
+        // delete this tick.
+        for h in &orphans {
+            if state.pending.insert(h.clone()) {
+                stats.newly_marked += 1;
+            }
+        }
+        state.prev_tick_all_online = false;
+        if stats.changed() {
+            info!(
+                "orphan-gc: {} new orphan hash(es) marked, {} revived (deferred — at least one library Stale; pending: {})",
+                stats.newly_marked,
+                stats.revived,
+                state.pending.len()
+            );
+        } else {
+            debug!(
+                "orphan-gc: stale library, no changes (pending: {})",
+                state.pending.len()
+            );
+        }
+        return stats;
+    }
+
+    // All-online + previous-tick-also-all-online: hashes that are
+    // both pending AND still orphaned this tick are confirmed and
+    // get deleted. Hashes orphaned this tick but not pending get
+    // freshly marked.
+    let consensus_window_open = state.prev_tick_all_online;
+
+    let to_delete: Vec<String> = if consensus_window_open {
+        orphans
+            .iter()
+            .filter(|h| state.pending.contains(*h))
+            .cloned()
+            .collect()
+    } else {
+        Vec::new()
+    };
+
+    for h in &orphans {
+        if !state.pending.contains(h) {
+            state.pending.insert(h.clone());
+            stats.newly_marked += 1;
+        }
+    }
+
+    if !to_delete.is_empty() {
+        match delete_hash_keyed_rows(conn, &to_delete) {
+            Ok((faces, tags, insights)) => {
+                stats.deleted_face_detections = faces;
+                stats.deleted_tagged_photo = tags;
+                stats.deleted_photo_insights = insights;
+                // Drop deleted hashes from pending so we don't try to
+                // re-delete them next tick (they'll have already been
+                // removed from the orphan set).
+                for h in &to_delete {
+                    state.pending.remove(h);
+                }
+            }
+            Err(e) => warn!("orphan-gc: delete batch failed: {:?}", e),
+        }
+    }
+
+    state.prev_tick_all_online = true;
+
+    if stats.changed() {
+        info!(
+            "orphan-gc: {} new orphan hash(es) marked, {} revived; deleted {} face_detections / {} tagged_photo / {} photo_insights row(s) (pending: {})",
+            stats.newly_marked,
+            stats.revived,
+            stats.deleted_face_detections,
+            stats.deleted_tagged_photo,
+            stats.deleted_photo_insights,
+            state.pending.len(),
+        );
+    } else {
+        debug!(
+            "orphan-gc: no changes this tick (pending: {})",
+            state.pending.len()
+        );
+    }
+
+    stats
+}
+
+/// Helper for the watcher: are *all enabled* libraries currently Online?
+///
+/// Disabled libraries are out-of-scope for the orphan-GC consensus
+/// rule — they don't get probed, don't have a health entry, and a
+/// system with one disabled library should still be able to GC
+/// orphans for the remaining online libraries. Treating disabled as
+/// "blocking" would mean flipping a library to `enabled=false` would
+/// permanently halt GC, which is the opposite of the intended kill-
+/// switch semantics ("turn this library off and let the rest of the
+/// system run normally").
+pub fn all_libraries_online(libs: &[Library], health: &LibraryHealthMap) -> bool {
+    let guard = health.read().unwrap_or_else(|e| e.into_inner());
+    libs.iter()
+        .filter(|lib| lib.enabled)
+        .all(|lib| guard.get(&lib.id).map(|h| h.is_online()).unwrap_or(false))
+}
+
+#[derive(QueryableByName, Debug)]
+struct HashRow {
+    #[diesel(sql_type = diesel::sql_types::Text)]
+    content_hash: String,
+}
+
+fn collect_orphan_hashes(conn: &mut SqliteConnection) -> QueryResult<HashSet<String>> {
+    // Union of every distinct content_hash carried by hash-keyed
+    // derived tables, minus those still referenced by image_exif.
+    let rows = sql_query(
+        "SELECT DISTINCT content_hash FROM ( \
+             SELECT content_hash FROM face_detections WHERE content_hash IS NOT NULL \
+             UNION ALL \
+             SELECT content_hash FROM tagged_photo    WHERE content_hash IS NOT NULL \
+             UNION ALL \
+             SELECT content_hash FROM photo_insights  WHERE content_hash IS NOT NULL \
+         ) AS derived \
+         WHERE content_hash NOT IN ( \
+             SELECT content_hash FROM image_exif WHERE content_hash IS NOT NULL \
+         )",
+    )
+    .get_results::<HashRow>(conn)?;
+
+    Ok(rows.into_iter().map(|r| r.content_hash).collect())
+}
+
+/// Delete every hash-keyed row whose `content_hash` is in `hashes`.
+/// Returns `(faces, tagged_photo, photo_insights)`.
+fn delete_hash_keyed_rows(
+    conn: &mut SqliteConnection,
+    hashes: &[String],
+) -> QueryResult<(usize, usize, usize)> {
+    if hashes.is_empty() {
+        return Ok((0, 0, 0));
+    }
+
+    use crate::database::schema::{face_detections, photo_insights, tagged_photo};
+
+    let faces =
+        diesel::delete(face_detections::table.filter(face_detections::content_hash.eq_any(hashes)))
+            .execute(conn)?;
+    let tags =
+        diesel::delete(tagged_photo::table.filter(tagged_photo::content_hash.eq_any(hashes)))
+            .execute(conn)?;
+    let insights =
+        diesel::delete(photo_insights::table.filter(photo_insights::content_hash.eq_any(hashes)))
+            .execute(conn)?;
+
+    Ok((faces, tags, insights))
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+    use crate::database::test::in_memory_db_connection;
+
+    fn ensure_library(conn: &mut SqliteConnection, library_id: i32) {
+        diesel::sql_query(
+            "INSERT OR IGNORE INTO libraries (id, name, root_path, created_at) \
+             VALUES (?, 'test-' || ?, '/tmp/test-' || ?, 0)",
+        )
+        .bind::<diesel::sql_types::Integer, _>(library_id)
+        .bind::<diesel::sql_types::Integer, _>(library_id)
+        .bind::<diesel::sql_types::Integer, _>(library_id)
+        .execute(conn)
+        .unwrap();
+    }
+
+    fn insert_image_exif(
+        conn: &mut SqliteConnection,
+        library_id: i32,
+        rel_path: &str,
+        content_hash: Option<&str>,
+    ) {
+        ensure_library(conn, library_id);
+        diesel::sql_query(
+            "INSERT INTO image_exif (library_id, rel_path, created_time, last_modified, content_hash) \
+             VALUES (?, ?, 0, 0, ?)",
+        )
+        .bind::<diesel::sql_types::Integer, _>(library_id)
+        .bind::<diesel::sql_types::Text, _>(rel_path)
+        .bind::<diesel::sql_types::Nullable<diesel::sql_types::Text>, _>(content_hash)
+        .execute(conn)
+        .unwrap();
+    }
+
+    fn insert_face(conn: &mut SqliteConnection, library_id: i32, rel_path: &str, hash: &str) {
+        ensure_library(conn, library_id);
+        diesel::sql_query(
+            "INSERT INTO face_detections (library_id, content_hash, rel_path, source, status, model_version, created_at) \
+             VALUES (?, ?, ?, 'auto', 'no_faces', 'v', 0)",
+        )
+        .bind::<diesel::sql_types::Integer, _>(library_id)
+        .bind::<diesel::sql_types::Text, _>(hash)
+        .bind::<diesel::sql_types::Text, _>(rel_path)
+        .execute(conn)
+        .unwrap();
+    }
+
+    fn insert_tag_with_hash(conn: &mut SqliteConnection, rel_path: &str, hash: &str) {
+        diesel::sql_query("INSERT OR IGNORE INTO tags (id, name, created_time) VALUES (1, 't', 0)")
+            .execute(conn)
+            .unwrap();
+        diesel::sql_query(
+            "INSERT INTO tagged_photo (rel_path, tag_id, created_time, content_hash) VALUES (?, 1, 0, ?)",
+        )
+        .bind::<diesel::sql_types::Text, _>(rel_path)
+        .bind::<diesel::sql_types::Text, _>(hash)
+        .execute(conn)
+        .unwrap();
+    }
+
+    fn insert_insight_with_hash(
+        conn: &mut SqliteConnection,
+        library_id: i32,
+        rel_path: &str,
+        hash: &str,
+    ) {
+        ensure_library(conn, library_id);
+        diesel::sql_query(
+            "INSERT INTO photo_insights (library_id, rel_path, title, summary, generated_at, model_version, is_current, backend, content_hash) \
+             VALUES (?, ?, 't', 's', 0, 'v', 1, 'local', ?)",
+        )
+        .bind::<diesel::sql_types::Integer, _>(library_id)
+        .bind::<diesel::sql_types::Text, _>(rel_path)
+        .bind::<diesel::sql_types::Text, _>(hash)
+        .execute(conn)
+        .unwrap();
+    }
+
+    #[derive(QueryableByName, Debug)]
+    struct CountRow {
+        #[diesel(sql_type = diesel::sql_types::BigInt)]
+        n: i64,
+    }
+    fn count(conn: &mut SqliteConnection, sql: &str) -> i64 {
+        diesel::sql_query(sql)
+            .get_result::<CountRow>(conn)
+            .unwrap()
+            .n
+    }
+
+    #[test]
+    fn refresh_back_refs_repoints_face_detection_after_move() {
+        let mut conn = in_memory_db_connection();
+        // Original location lib 1, rel "old.jpg". image_exif row gone
+        // (file moved); only the new lib 2 row remains.
+        insert_image_exif(&mut conn, 2, "new.jpg", Some("h1"));
+        insert_face(&mut conn, 1, "old.jpg", "h1");
+
+        let updated = refresh_back_refs(&mut conn);
+        assert_eq!(updated, 1);
+
+        let row = diesel::sql_query("SELECT library_id AS n FROM face_detections")
+            .get_result::<CountRow>(&mut conn)
+            .unwrap();
+        assert_eq!(row.n, 2, "library_id should now point at lib 2");
+    }
+
+    #[test]
+    fn refresh_back_refs_no_change_when_back_ref_still_valid() {
+        let mut conn = in_memory_db_connection();
+        insert_image_exif(&mut conn, 1, "a.jpg", Some("h1"));
+        insert_face(&mut conn, 1, "a.jpg", "h1");
+
+        let updated = refresh_back_refs(&mut conn);
+        assert_eq!(updated, 0);
+    }
+
+    #[test]
+    fn refresh_back_refs_no_change_when_hash_fully_orphaned() {
+        // Hash exists on face_detections but no surviving image_exif
+        // row for it → the refresh is a no-op (orphan GC handles
+        // these). Important: the SET subquery would return NULL and
+        // we'd null out the back-ref otherwise; the EXISTS guard
+        // protects against that.
+        let mut conn = in_memory_db_connection();
+        insert_face(&mut conn, 1, "gone.jpg", "h1");
+
+        let updated = refresh_back_refs(&mut conn);
+        assert_eq!(updated, 0);
+    }
+
+    #[test]
+    fn orphan_gc_requires_two_consecutive_all_online_ticks() {
+        let mut conn = in_memory_db_connection();
+        // Hash present in face_detections but NOT image_exif → orphan.
+        insert_face(&mut conn, 1, "x.jpg", "h-orphan");
+        let mut state = OrphanGcState::default();
+
+        // Tick 1: prev_tick_all_online is false (default), so even
+        // with current tick all-online we mark only.
+        let stats = run_orphan_gc(&mut conn, &mut state, true);
+        assert_eq!(stats.newly_marked, 1);
+        assert_eq!(stats.total_deleted(), 0);
+        assert_eq!(state.pending.len(), 1);
+
+        // Tick 2: prev_tick_all_online is now true, current tick still
+        // all-online → consensus reached, hash gets deleted.
+        let stats = run_orphan_gc(&mut conn, &mut state, true);
+        assert_eq!(stats.deleted_face_detections, 1);
+        assert!(state.pending.is_empty());
+
+        // Tick 3: nothing left.
+        let stats = run_orphan_gc(&mut conn, &mut state, true);
+        assert_eq!(stats.total_deleted(), 0);
+        assert_eq!(stats.newly_marked, 0);
+    }
+
+    #[test]
+    fn orphan_gc_resets_consensus_on_stale_library() {
+        let mut conn = in_memory_db_connection();
+        insert_face(&mut conn, 1, "x.jpg", "h-orphan");
+        let mut state = OrphanGcState::default();
+
+        // Tick 1: all-online, mark.
+        run_orphan_gc(&mut conn, &mut state, true);
+        // Tick 2: stale library — consensus window resets, no delete.
+        let stats = run_orphan_gc(&mut conn, &mut state, false);
+        assert_eq!(stats.total_deleted(), 0);
+        assert!(!state.prev_tick_all_online);
+        // Tick 3: all-online again — but we need ANOTHER tick to set
+        // prev_tick_all_online before deletes can fire. So tick 3
+        // marks (no-op on existing pending), tick 4 deletes.
+        let stats = run_orphan_gc(&mut conn, &mut state, true);
+        assert_eq!(stats.total_deleted(), 0);
+        let stats = run_orphan_gc(&mut conn, &mut state, true);
+        assert_eq!(stats.deleted_face_detections, 1);
+    }
+
+    #[test]
+    fn orphan_gc_revives_when_image_exif_reappears() {
+        let mut conn = in_memory_db_connection();
+        insert_face(&mut conn, 1, "x.jpg", "h-orphan");
+        let mut state = OrphanGcState::default();
+
+        // Tick 1: mark.
+        run_orphan_gc(&mut conn, &mut state, true);
+        assert!(state.pending.contains("h-orphan"));
+
+        // Between ticks, the image_exif row reappears (e.g. backup
+        // share was briefly stale). Hash is no longer orphaned.
+        insert_image_exif(&mut conn, 2, "x.jpg", Some("h-orphan"));
+
+        let stats = run_orphan_gc(&mut conn, &mut state, true);
+        assert_eq!(stats.revived, 1);
+        assert_eq!(stats.total_deleted(), 0);
+        assert!(state.pending.is_empty());
+    }
+
+    #[test]
+    fn orphan_gc_deletes_across_all_three_tables() {
+        let mut conn = in_memory_db_connection();
+        // Same orphan hash appears in all three derived tables.
+        insert_face(&mut conn, 1, "a.jpg", "h-orphan");
+        insert_tag_with_hash(&mut conn, "a.jpg", "h-orphan");
+        insert_insight_with_hash(&mut conn, 1, "a.jpg", "h-orphan");
+
+        let mut state = OrphanGcState::default();
+        run_orphan_gc(&mut conn, &mut state, true);
+        let stats = run_orphan_gc(&mut conn, &mut state, true);
+        assert_eq!(stats.deleted_face_detections, 1);
+        assert_eq!(stats.deleted_tagged_photo, 1);
+        assert_eq!(stats.deleted_photo_insights, 1);
+
+        assert_eq!(
+            count(&mut conn, "SELECT COUNT(*) AS n FROM face_detections"),
+            0
+        );
+        assert_eq!(
+            count(&mut conn, "SELECT COUNT(*) AS n FROM tagged_photo"),
+            0
+        );
+        assert_eq!(
+            count(&mut conn, "SELECT COUNT(*) AS n FROM photo_insights"),
+            0
+        );
+    }
+
+    #[test]
+    fn all_libraries_online_helper() {
+        use crate::libraries::{LibraryHealth, new_health_map};
+        let libs = vec![
+            Library {
+                id: 1,
+                name: "a".into(),
+                root_path: "/x".into(),
+                enabled: true,
+                excluded_dirs: Vec::new(),
+            },
+            Library {
+                id: 2,
+                name: "b".into(),
+                root_path: "/y".into(),
+                enabled: true,
+                excluded_dirs: Vec::new(),
+            },
+        ];
+        let health = new_health_map(&libs);
+        assert!(all_libraries_online(&libs, &health));
+
+        // Flip lib 2 to stale.
+        {
+            let mut g = health.write().unwrap();
+            g.insert(
+                2,
+                LibraryHealth::Stale {
+                    reason: "test".into(),
+                    since: 0,
+                },
+            );
+        }
+        assert!(!all_libraries_online(&libs, &health));
+    }
+
+    #[test]
+    fn all_libraries_online_treats_disabled_as_out_of_scope() {
+        use crate::libraries::{LibraryHealth, new_health_map};
+        // lib 1 enabled+online, lib 2 disabled (would be treated as
+        // Online in the health map's optimistic seed but the map
+        // entry is irrelevant — disabled libs are filtered out
+        // before the health lookup).
+        let libs = vec![
+            Library {
+                id: 1,
+                name: "a".into(),
+                root_path: "/x".into(),
+                enabled: true,
+                excluded_dirs: Vec::new(),
+            },
+            Library {
+                id: 2,
+                name: "b".into(),
+                root_path: "/y".into(),
+                enabled: false,
+                excluded_dirs: Vec::new(),
+            },
+        ];
+        let health = new_health_map(&libs);
+        // Sanity: forcibly mark lib 2 stale to prove disabled wins
+        // over even an explicit Stale entry — the filter skips it
+        // before the health check happens.
+        {
+            let mut g = health.write().unwrap();
+            g.insert(
+                2,
+                LibraryHealth::Stale {
+                    reason: "intentionally stale".into(),
+                    since: 0,
+                },
+            );
+        }
+        assert!(
+            all_libraries_online(&libs, &health),
+            "disabled library should not block consensus"
+        );
+    }
+}
--- a/src/main.rs
+++ b/src/main.rs
@@ -64,6 +64,7 @@ mod auth;
 mod content_hash;
 mod data;
 mod database;
+mod duplicates;
 mod error;
 mod exif;
 mod face_watch;
@@ -72,6 +73,8 @@ mod file_types;
 mod files;
 mod geo;
 mod libraries;
+mod library_maintenance;
+mod perceptual_hash;
 mod state;
 mod tags;
 mod utils;
@@ -150,7 +153,12 @@ async fn get_image(
            let relative_path_str = relative_path.to_string_lossy().replace('\\', "/");

            let thumbs = &app_state.thumbnail_path;
-            let legacy_thumb_path = Path::new(&thumbs).join(relative_path);
+            let bare_legacy_thumb_path = Path::new(&thumbs).join(relative_path);
+            let scoped_legacy_thumb_path = content_hash::library_scoped_legacy_path(
+                Path::new(&thumbs),
+                library.id,
+                relative_path,
+            );

            // Gif thumbnails are a separate lookup (video GIF previews).
            // Dual-lookup for gif is out of scope; preserve existing flow.
@@ -168,8 +176,16 @@ async fn get_image(
                }
            }

-            // Resolve the hash-keyed thumbnail (if the row already has a
-            // content_hash) and fall back to the legacy mirrored path.
+            // Lookup chain (most-specific first, falling back as we miss):
+            //   1. hash-keyed (`<thumbs>/<hash[..2]>/<hash>.jpg`) — content
+            //      identity, shared across libraries;
+            //   2. library-scoped legacy (`<thumbs>/<lib_id>/<rel_path>`) —
+            //      written by current generation when hash isn't known;
+            //   3. bare legacy (`<thumbs>/<rel_path>`) — pre-multi-library
+            //      thumbs from the days before library prefixing existed.
+            // Stage (3) goes away once a one-time migration lifts every
+            // bare-legacy file under a library prefix; until then it
+            // prevents needless 404s for already-warmed deployments.
            let hash_thumb_path: Option<PathBuf> = {
                let mut dao = exif_dao.lock().expect("Unable to lock ExifDao");
                match dao.get_exif(&context, &relative_path_str) {
@@ -184,7 +200,14 @@ async fn get_image(
                .as_ref()
                .filter(|p| p.exists())
                .cloned()
-                .unwrap_or_else(|| legacy_thumb_path.clone());
+                .or_else(|| {
+                    if scoped_legacy_thumb_path.exists() {
+                        Some(scoped_legacy_thumb_path.clone())
+                    } else {
+                        None
+                    }
+                })
+                .unwrap_or_else(|| bare_legacy_thumb_path.clone());

            // Handle circular thumbnail request
            if req.shape == Some(ThumbnailShape::Circle) {
@@ -509,6 +532,11 @@ async fn set_image_gps(
            .ok()
            .map(|c| c.content_hash),
        size_bytes: content_hash::compute(&full_path).ok().map(|c| c.size_bytes),
+        // GPS-update path doesn't touch perceptual hashes either; columns
+        // ignored by update_exif. Compute best-effort so a new file lands
+        // with a usable signal; failure just leaves prior values in place.
+        phash_64: perceptual_hash::compute(&full_path).map(|h| h.phash_64),
+        dhash_64: perceptual_hash::compute(&full_path).map(|h| h.dhash_64),
    };

    let updated = {
@@ -631,6 +659,37 @@ async fn upload_image(
            &full_path.to_str().unwrap().to_string(),
            true,
        ) {
+            // Pre-write content-hash check: if these exact bytes already
+            // exist anywhere in any library (and aren't themselves
+            // soft-marked as duplicates), don't write the file. Return
+            // 409 with the canonical sibling so the mobile app can show
+            // a friendly "already in your library" toast.
+            let upload_hash = blake3::Hasher::new()
+                .update(&file_content)
+                .finalize()
+                .to_hex()
+                .to_string();
+            {
+                let mut dao = exif_dao.lock().expect("Unable to lock ExifDao");
+                if let Ok(Some(existing)) = dao.find_by_content_hash(&span_context, &upload_hash)
+                    && existing.duplicate_of_hash.is_none()
+                {
+                    let library_name = libraries::load_all(&mut crate::database::connect())
+                        .into_iter()
+                        .find(|l| l.id == existing.library_id)
+                        .map(|l| l.name);
+                    span.set_status(Status::Ok);
+                    return HttpResponse::Conflict().json(serde_json::json!({
+                        "duplicate_of": {
+                            "library_id": existing.library_id,
+                            "rel_path": existing.file_path,
+                        },
+                        "content_hash": upload_hash,
+                        "library_name": library_name,
+                    }));
+                }
+            }
+
            let context =
                opentelemetry::Context::new().with_remote_span_context(span.span_context().clone());
            tracer
@@ -689,6 +748,7 @@ async fn upload_image(
                                (None, None)
                            }
                        };
+                        let perceptual = perceptual_hash::compute(&uploaded_path);
                        let insert_exif = InsertImageExif {
                            library_id: target_library.id,
                            file_path: relative_path.clone(),
@@ -710,6 +770,8 @@ async fn upload_image(
                            last_modified: timestamp,
                            content_hash,
                            size_bytes,
+                            phash_64: perceptual.map(|h| h.phash_64),
+                            dhash_64: perceptual.map(|h| h.dhash_64),
                        };

                        if let Ok(mut dao) = exif_dao.lock() {
@@ -761,6 +823,15 @@ async fn generate_video(

    if let Some(name) = filename.file_name() {
        let filename = name.to_str().expect("Filename should convert to string");
+        // KNOWN ISSUE (multi-library): playlist filename is the basename
+        // alone, so two source files with the same basename — whether in
+        // different libraries or different subdirs of one library —
+        // overwrite each other's playlists while ffmpeg runs. The
+        // hash-keyed `content_hash::hls_dir` is the long-term answer
+        // (see CLAUDE.md "Multi-library data model"); rewiring the
+        // actor pipeline to use it is out of scope for this branch.
+        // The orphan-cleanup job above already walks every library so
+        // it doesn't false-delete archive playlists.
        let playlist = format!("{}/{}.m3u8", app_state.video_path, filename);

        let library = libraries::resolve_library_param(&app_state, body.library.as_deref())
@@ -1305,19 +1376,41 @@ fn create_thumbnails(libs: &[libraries::Library], excluded_dirs: &[String]) {
            lib.name, lib.root_path
        );
        let images = PathBuf::from(&lib.root_path);
+        // Effective excludes = global env-var excludes ∪ library row's
+        // excluded_dirs. Lets a parent-library mount skip the subtree
+        // already covered by a child library.
+        let effective_excludes = lib.effective_excluded_dirs(excluded_dirs);

        // Prune EXCLUDED_DIRS so we don't generate thumbnails-of-thumbnails
        // for Synology @eaDir trees. file_scan handles filter_entry pruning.
-        image_api::file_scan::walk_library_files(&images, excluded_dirs)
+        image_api::file_scan::walk_library_files(&images, &effective_excludes)
            .into_par_iter()
            .for_each(|entry| {
                let src = entry.path();
                let Ok(relative_path) = src.strip_prefix(&images) else {
                    return;
                };
-                let thumb_path = Path::new(thumbnail_directory).join(relative_path);
+                // Library-scoped legacy path: prevents two libraries with
+                // the same rel_path from clobbering each other's thumbs.
+                // Hash-keyed promotion happens lazily on first hash-aware
+                // request — keeping this loop ExifDao-free preserves the
+                // current "cargo build && go" startup story.
+                let thumb_path = content_hash::library_scoped_legacy_path(
+                    thumbnail_directory,
+                    lib.id,
+                    relative_path,
+                );
+                let bare_legacy = thumbnail_directory.join(relative_path);

-                if thumb_path.exists() || unsupported_thumbnail_sentinel(&thumb_path).exists() {
+                // Backwards-compat check: if a single-library install has a
+                // bare-legacy thumb here already, accept it as present.
+                // Same for the sentinel. Means we don't redo work after
+                // upgrade and we don't leave stale duplicates around.
+                if thumb_path.exists()
+                    || bare_legacy.exists()
+                    || unsupported_thumbnail_sentinel(&thumb_path).exists()
+                    || unsupported_thumbnail_sentinel(&bare_legacy).exists()
+                {
                    return;
                }

@@ -1365,7 +1458,8 @@ fn create_thumbnails(libs: &[libraries::Library], excluded_dirs: &[String]) {
    debug!("Finished making thumbnails");

    for lib in libs {
-        update_media_counts(Path::new(&lib.root_path), excluded_dirs);
+        let effective_excludes = lib.effective_excluded_dirs(excluded_dirs);
+        update_media_counts(Path::new(&lib.root_path), &effective_excludes);
    }
 }

@@ -1462,10 +1556,18 @@ fn main() -> std::io::Result<()> {
            preview_gen_for_watcher,
            app_state.face_client.clone(),
            app_state.excluded_dirs.clone(),
+            app_state.library_health.clone(),
        );

-        // Start orphaned playlist cleanup job
-        cleanup_orphaned_playlists(app_state.excluded_dirs.clone());
+        // Start orphaned playlist cleanup job. Multi-library aware: walks
+        // every configured library when looking for the source video, and
+        // skips the whole cycle while any library is stale (a missing
+        // source is indistinguishable from a transiently-unmounted share).
+        cleanup_orphaned_playlists(
+            app_state.libraries.clone(),
+            app_state.excluded_dirs.clone(),
+            app_state.library_health.clone(),
+        );

        // Spawn background job to generate daily conversation summaries
        {
@@ -1600,6 +1702,7 @@ fn main() -> std::io::Result<()> {
                .add_feature(add_tag_services::<_, SqliteTagDao>)
                .add_feature(knowledge::add_knowledge_services::<_, SqliteKnowledgeDao>)
                .add_feature(faces::add_face_services::<_, faces::SqliteFaceDao>)
+                .add_feature(duplicates::add_duplicate_services)
                .app_data(app_data.clone())
                .app_data::<Data<RealFileSystem>>(Data::new(RealFileSystem::new(
                    app_data.base_path.clone(),
@@ -1657,10 +1760,13 @@ fn run_migrations(
 }

 /// Clean up orphaned HLS playlists and segments whose source videos no longer exist
-fn cleanup_orphaned_playlists(excluded_dirs: Vec<String>) {
+fn cleanup_orphaned_playlists(
+    libs: Vec<libraries::Library>,
+    excluded_dirs: Vec<String>,
+    library_health: libraries::LibraryHealthMap,
+) {
    std::thread::spawn(move || {
        let video_path = dotenv::var("VIDEO_PATH").expect("VIDEO_PATH must be set");
-        let base_path = dotenv::var("BASE_PATH").expect("BASE_PATH must be set");

        // Get cleanup interval from environment (default: 24 hours)
        let cleanup_interval_secs = dotenv::var("PLAYLIST_CLEANUP_INTERVAL_SECONDS")
@@ -1671,10 +1777,39 @@ fn cleanup_orphaned_playlists(excluded_dirs: Vec<String>) {
        info!("Starting orphaned playlist cleanup job");
        info!("  Cleanup interval: {} seconds", cleanup_interval_secs);
        info!("  Playlist directory: {}", video_path);
+        for lib in &libs {
+            info!(
+                "  Checking sources under '{}' at {}",
+                lib.name, lib.root_path
+            );
+        }

        loop {
            std::thread::sleep(Duration::from_secs(cleanup_interval_secs));

+            // Safety gate: skip the cleanup cycle if any library is
+            // stale. A missing source video on a stale library is
+            // indistinguishable from a transient unmount, and the
+            // cleanup is destructive — we'd rather leak a few playlist
+            // files for a tick than delete one whose source is briefly
+            // unreachable. The cycle re-runs on the next interval.
+            {
+                let guard = library_health.read().unwrap_or_else(|e| e.into_inner());
+                let stale: Vec<String> = libs
+                    .iter()
+                    .filter(|lib| guard.get(&lib.id).map(|h| !h.is_online()).unwrap_or(false))
+                    .map(|lib| lib.name.clone())
+                    .collect();
+                if !stale.is_empty() {
+                    warn!(
+                        "Skipping orphaned-playlist cleanup: {} library(ies) stale: [{}]",
+                        stale.len(),
+                        stale.join(", ")
+                    );
+                    continue;
+                }
+            }
+
            info!("Running orphaned playlist cleanup");
            let start = std::time::Instant::now();
            let mut deleted_count = 0;
@@ -1703,20 +1838,26 @@ fn cleanup_orphaned_playlists(excluded_dirs: Vec<String>) {
                if let Some(filename) = playlist_path.file_stem() {
                    let video_filename = filename.to_string_lossy();

-                    // Search for this video file in BASE_PATH, respecting
-                    // EXCLUDED_DIRS so we don't false-resurrect playlists for
-                    // videos that only exist inside an excluded subtree.
+                    // Search for this video file across every configured
+                    // library, respecting EXCLUDED_DIRS so we don't
+                    // false-resurrect playlists for videos that only
+                    // exist inside an excluded subtree. As soon as one
+                    // library has a matching source, we're done — the
+                    // playlist isn't orphaned.
                    let mut video_exists = false;
-                    for entry in image_api::file_scan::walk_library_files(
-                        Path::new(&base_path),
-                        &excluded_dirs,
-                    ) {
-                        if let Some(entry_stem) = entry.path().file_stem()
-                            && entry_stem == filename
-                            && is_video_file(entry.path())
-                        {
-                            video_exists = true;
-                            break;
+                    'libs: for lib in &libs {
+                        let effective = lib.effective_excluded_dirs(&excluded_dirs);
+                        for entry in image_api::file_scan::walk_library_files(
+                            Path::new(&lib.root_path),
+                            &effective,
+                        ) {
+                            if let Some(entry_stem) = entry.path().file_stem()
+                                && entry_stem == filename
+                                && is_video_file(entry.path())
+                            {
+                                video_exists = true;
+                                break 'libs;
+                            }
                        }
                    }

@@ -1792,6 +1933,7 @@ fn watch_files(
    preview_generator: Addr<video::actors::PreviewClipGenerator>,
    face_client: crate::ai::face_client::FaceClient,
    excluded_dirs: Vec<String>,
+    library_health: libraries::LibraryHealthMap,
 ) {
    std::thread::spawn(move || {
        // Get polling intervals from environment variables
@@ -1850,6 +1992,52 @@ fn watch_files(
        let mut last_full_scan = SystemTime::now();
        let mut scan_count = 0u64;

+        // Per-library cursor for the missing-file scan. Each tick reads
+        // a page from `offset`, stat()s the rows, deletes confirmed-
+        // missing ones, and advances or wraps the cursor. State held
+        // in-memory so a watcher restart resumes from 0 — fine, the
+        // sweep is idempotent.
+        let mut missing_file_offsets: std::collections::HashMap<i32, i64> =
+            std::collections::HashMap::new();
+
+        let missing_scan_page_size: i64 = dotenv::var("IMAGE_EXIF_MISSING_SCAN_PAGE_SIZE")
+            .ok()
+            .and_then(|s| s.parse().ok())
+            .filter(|n: &i64| *n > 0)
+            .unwrap_or(library_maintenance::DEFAULT_SCAN_PAGE_SIZE);
+        let missing_delete_cap: usize = dotenv::var("IMAGE_EXIF_MISSING_DELETE_CAP_PER_TICK")
+            .ok()
+            .and_then(|s| s.parse().ok())
+            .filter(|n: &usize| *n > 0)
+            .unwrap_or(library_maintenance::DEFAULT_MISSING_DELETE_CAP);
+
+        // Two-tick orphan-GC consensus state. Carried across ticks via
+        // `OrphanGcState`; see library_maintenance::run_orphan_gc.
+        let mut orphan_gc_state = library_maintenance::OrphanGcState::default();
+
+        // Initial availability sweep before the loop's first sleep so
+        // /libraries reports the truth from the very first request,
+        // rather than the optimistic Online default that
+        // new_health_map seeds. Without this, an unmounted share would
+        // appear online for up to WATCH_QUICK_INTERVAL_SECONDS (default
+        // 60s) after boot. Same probe logic as the per-tick gate
+        // below; no ingest runs here, just the health update + log.
+        // Disabled libraries skip the probe entirely — they should
+        // never enter the health map (treated as out-of-scope).
+        for lib in &libs {
+            if !lib.enabled {
+                continue;
+            }
+            let context = opentelemetry::Context::new();
+            let had_data = exif_dao
+                .lock()
+                .expect("exif_dao poisoned")
+                .count_for_library(&context, lib.id)
+                .map(|n| n > 0)
+                .unwrap_or(false);
+            libraries::refresh_health(&library_health, lib, had_data);
+        }
+
        loop {
            std::thread::sleep(Duration::from_secs(quick_interval_secs));

@@ -1861,6 +2049,44 @@ fn watch_files(
            let is_full_scan = since_last_full.as_secs() >= full_interval_secs;

            for lib in &libs {
+                // Operator kill switch: a disabled library is invisible
+                // to the watcher entirely. No probe, no ingest, no
+                // maintenance, no health entry. Distinct from Stale —
+                // Stale is "we wanted to but couldn't"; Disabled is
+                // "we don't want to". Toggle via SQL.
+                if !lib.enabled {
+                    debug!(
+                        "watcher: skipping library '{}' (id={}) — enabled=false",
+                        lib.name, lib.id
+                    );
+                    continue;
+                }
+
+                // Availability probe: every tick checks that the
+                // library's mount is reachable, is a directory, is
+                // readable, and (if image_exif has rows for it) is
+                // non-empty. A Stale library skips ingest, backlog
+                // drains, and metric refresh — reads/serving in HTTP
+                // handlers continue to work. Branches B/C extend the
+                // probe gate to cover handoff and orphan GC. See
+                // CLAUDE.md "Library availability and safety".
+                let had_data = {
+                    let context = opentelemetry::Context::new();
+                    let mut guard = exif_dao.lock().expect("exif_dao poisoned");
+                    guard
+                        .count_for_library(&context, lib.id)
+                        .map(|n| n > 0)
+                        .unwrap_or(false)
+                };
+                let health = libraries::refresh_health(&library_health, lib, had_data);
+                if !health.is_online() {
+                    // Skip every write path for this library this tick.
+                    // Don't refresh the media-count gauge either — a
+                    // probe-failed library would otherwise flap to 0
+                    // image / 0 video and pollute Prometheus.
+                    continue;
+                }
+
                // Drain the unhashed-hash backlog AND the face-detection
                // backlog every tick, regardless of quick/full. Quick
                // scans only walk recently-modified files, so the
@@ -1868,6 +2094,11 @@ fn watch_files(
                // — without these standalone passes, backfill +
                // detection only progressed during full scans
                // (default once an hour).
+                // Effective excludes for this library: global env-var
+                // ∪ row's excluded_dirs. Compute once per tick — used
+                // by every walker below for this library.
+                let effective_excludes = lib.effective_excluded_dirs(&excluded_dirs);
+
                if face_client.is_enabled() {
                    let context = opentelemetry::Context::new();
                    backfill_unhashed_backlog(&context, lib, &exif_dao);
@@ -1877,7 +2108,7 @@ fn watch_files(
                        &face_client,
                        &face_dao,
                        &watcher_tag_dao,
-                        &excluded_dirs,
+                        &effective_excludes,
                    );
                }

@@ -1893,7 +2124,7 @@ fn watch_files(
                        Arc::clone(&face_dao),
                        Arc::clone(&watcher_tag_dao),
                        face_client.clone(),
-                        &excluded_dirs,
+                        &effective_excludes,
                        None,
                        playlist_manager.clone(),
                        preview_generator.clone(),
@@ -1914,7 +2145,7 @@ fn watch_files(
                        Arc::clone(&face_dao),
                        Arc::clone(&watcher_tag_dao),
                        face_client.clone(),
-                        &excluded_dirs,
+                        &effective_excludes,
                        Some(check_since),
                        playlist_manager.clone(),
                        preview_generator.clone(),
@@ -1922,7 +2153,66 @@ fn watch_files(
                }

                // Update media counts per library (metric aggregates across all)
-                update_media_counts(Path::new(&lib.root_path), &excluded_dirs);
+                update_media_counts(Path::new(&lib.root_path), &effective_excludes);
+
+                // Missing-file detection: prune image_exif rows whose
+                // source file is no longer on disk. Per-library, so we
+                // pass library-online-this-tick implicitly (we only
+                // reach here if the probe gate at the top of the
+                // iteration passed). Capped + paginated so a huge
+                // library doesn't stall the watcher; rows we don't
+                // visit this tick get visited next tick. See
+                // library_maintenance::detect_missing_files_for_library.
+                {
+                    let context = opentelemetry::Context::new();
+                    let offset = missing_file_offsets.get(&lib.id).copied().unwrap_or(0);
+                    let (deleted, next_offset) =
+                        library_maintenance::detect_missing_files_for_library(
+                            &context,
+                            lib,
+                            &exif_dao,
+                            offset,
+                            missing_scan_page_size,
+                            missing_delete_cap,
+                        );
+                    missing_file_offsets.insert(lib.id, next_offset);
+                    if deleted > 0 {
+                        debug!(
+                            "missing-file scan: library '{}' next_offset={}",
+                            lib.name, next_offset
+                        );
+                    }
+                }
+            }
+
+            // Reconciliation: cross-library, so it runs once per tick
+            // outside the per-library loop. Idempotent — fast no-op when
+            // there's nothing to do. Operates on the database alone, no
+            // filesystem dependency, so it doesn't need a health gate.
+            // See database::reconcile and CLAUDE.md "Multi-library data
+            // model" for the rules.
+            {
+                let mut conn = image_api::database::connect();
+                let _ = image_api::database::reconcile::run(&mut conn);
+
+                // Back-ref refresh: hash-keyed rows whose
+                // (library_id, rel_path) tuple no longer matches any
+                // image_exif row but whose hash still does. After a
+                // recent→archive move, the missing-file scan removes
+                // the old image_exif row; this pass repoints face /
+                // tag / insight back-refs at the surviving location.
+                // DB-only, no health gate needed — uses what's in
+                // image_exif as truth.
+                let _ = library_maintenance::refresh_back_refs(&mut conn);
+
+                // Orphan GC: the destructive end of the maintenance
+                // pipeline. Two-tick consensus + every-library-online
+                // requirement is enforced inside run_orphan_gc; we
+                // pass the current all-online flag and the function
+                // tracks the previous tick's flag in OrphanGcState.
+                let all_online = library_maintenance::all_libraries_online(&libs, &library_health);
+                let _ =
+                    library_maintenance::run_orphan_gc(&mut conn, &mut orphan_gc_state, all_online);
            }

            if is_full_scan {
@@ -1992,7 +2282,9 @@ fn process_new_files(

    let existing_exif_paths: HashMap<String, bool> = {
        let mut dao = exif_dao.lock().expect("Unable to lock ExifDao");
-        match dao.get_exif_batch(&context, &file_paths) {
+        // Walk is per-library, so scope the lookup so a same-named file
+        // in another library doesn't make this one look already-indexed.
+        match dao.get_exif_batch(&context, Some(library.id), &file_paths) {
            Ok(exif_records) => exif_records
                .into_iter()
                .map(|record| (record.file_path, true))
@@ -2012,9 +2304,19 @@ fn process_new_files(
    // derivative dedup and DB-indexed sort/filter work for every file,
    // not just photos with parseable EXIF.
    for (file_path, relative_path) in &files {
-        let thumb_path = thumbnail_directory.join(relative_path);
-        let needs_thumbnail =
-            !thumb_path.exists() && !unsupported_thumbnail_sentinel(&thumb_path).exists();
+        // Check both the library-scoped legacy path (current shape) and
+        // the bare-legacy path (pre-multi-library shape). Either one
+        // existing means a thumbnail is already on disk for this file.
+        let scoped_thumb_path = content_hash::library_scoped_legacy_path(
+            thumbnail_directory,
+            library.id,
+            relative_path,
+        );
+        let bare_legacy_thumb_path = thumbnail_directory.join(relative_path);
+        let needs_thumbnail = !scoped_thumb_path.exists()
+            && !bare_legacy_thumb_path.exists()
+            && !unsupported_thumbnail_sentinel(&scoped_thumb_path).exists()
+            && !unsupported_thumbnail_sentinel(&bare_legacy_thumb_path).exists();
        let needs_row = !existing_exif_paths.contains_key(relative_path);

        if needs_thumbnail || needs_row {
@@ -2049,6 +2351,12 @@ fn process_new_files(
                }
            };

+            // Perceptual hashes (pHash + dHash). Best-effort — None for
+            // videos and decode failures. Drives near-duplicate detection
+            // in the Apollo duplicates surface; failure here is non-fatal
+            // and never blocks indexing.
+            let perceptual = perceptual_hash::compute(&file_path);
+
            // EXIF is best-effort enrichment. When extraction fails (or the
            // file type doesn't support EXIF) we still store a row with all
            // EXIF fields NULL; the file remains visible to sort-by-date
@@ -2100,6 +2408,8 @@ fn process_new_files(
                last_modified: timestamp,
                content_hash,
                size_bytes,
+                phash_64: perceptual.map(|h| h.phash_64),
+                dhash_64: perceptual.map(|h| h.dhash_64),
            };

            let mut dao = exif_dao.lock().expect("Unable to lock ExifDao");
@@ -2131,7 +2441,7 @@ fn process_new_files(
        // ensures small/medium deploys self-heal without operator
        // action.
        backfill_missing_content_hashes(&context, &files, library, &exif_dao);
-        let candidates = build_face_candidates(&context, &files, &exif_dao, &face_dao);
+        let candidates = build_face_candidates(&context, library, &files, &exif_dao, &face_dao);
        debug!(
            "face_watch: scan tick — {} image file(s) walked, {} candidate(s) (library '{}', modified_since={})",
            files.iter().filter(|(p, _)| !is_video_file(p)).count(),
@@ -2449,7 +2759,7 @@ fn backfill_missing_content_hashes(

    let exif_records = {
        let mut dao = exif_dao.lock().expect("Unable to lock ExifDao");
-        dao.get_exif_batch(context, &image_paths)
+        dao.get_exif_batch(context, Some(library.id), &image_paths)
            .unwrap_or_default()
    };
    // Cheap lookup back from rel_path → absolute file_path so
@@ -2541,6 +2851,7 @@ fn backfill_missing_content_hashes(
 /// covers both new uploads and the initial backlog scan.
 fn build_face_candidates(
    context: &opentelemetry::Context,
+    library: &libraries::Library,
    files: &[(PathBuf, String)],
    exif_dao: &Arc<Mutex<Box<dyn ExifDao>>>,
    face_dao: &Arc<Mutex<Box<dyn faces::FaceDao>>>,
@@ -2558,7 +2869,7 @@ fn build_face_candidates(

    let exif_records = {
        let mut dao = exif_dao.lock().expect("Unable to lock ExifDao");
-        dao.get_exif_batch(context, &image_paths)
+        dao.get_exif_batch(context, Some(library.id), &image_paths)
            .unwrap_or_default()
    };
    // rel_path → content_hash (only rows with a hash; without one we have
--- a/src/memories.rs
+++ b/src/memories.rs
@@ -569,7 +569,8 @@ pub async fn list_memories(

    for lib in &libraries_to_scan {
        let base = Path::new(&lib.root_path);
-        let path_excluder = PathExcluder::new(base, &app_state.excluded_dirs);
+        let effective = lib.effective_excluded_dirs(&app_state.excluded_dirs);
+        let path_excluder = PathExcluder::new(base, &effective);

        let exif_memories = collect_exif_memories(
            &exif_dao,
--- a/src/perceptual_hash.rs
+++ b/src/perceptual_hash.rs
@@ -0,0 +1,159 @@
+//! Perceptual image hashing for near-duplicate detection.
+//!
+//! Two 64-bit signals per image, packed into i64 for storage and fast
+//! Hamming distance via XOR + popcount:
+//!
+//! - **pHash (DCT)** — robust to lossy recompression, format conversion,
+//!   moderate brightness/contrast shifts. The primary signal.
+//! - **dHash (gradient)** — much cheaper to compute, robust to scaling
+//!   and small crops. Acts as a fallback / corroboration when pHash is
+//!   ambiguous (very flat images can collide).
+//!
+//! Image-only by design. Videos, decode failures, and any image we
+//! can't open all return `None` — perceptual hash failure is non-fatal
+//! and must not block the indexer; the file is still hashed by blake3
+//! and exact-match dedup keeps working.
+
+use std::path::Path;
+
+use image_hasher::{HashAlg, HasherConfig};
+
+/// 64-bit perceptual fingerprint pair.
+#[derive(Clone, Copy, Debug, PartialEq, Eq)]
+pub struct PerceptualIdentity {
+    pub phash_64: i64,
+    pub dhash_64: i64,
+}
+
+/// Compute pHash + dHash for an image at `path`. Returns `None` on
+/// decode failure (unsupported format, corrupt bytes, video, etc.) —
+/// callers should treat that as "no perceptual signal available" and
+/// proceed with exact-match dedup only.
+pub fn compute(path: &Path) -> Option<PerceptualIdentity> {
+    let img = image::open(path).ok()?;
+
+    // 8x8 = 64 bits, the standard size for pHash/dHash. Larger sizes
+    // give more discriminative power but no longer fit in i64 and the
+    // marginal robustness isn't worth the storage / index cost for a
+    // personal-scale library.
+    let phash = HasherConfig::new()
+        .hash_alg(HashAlg::Mean)
+        .hash_size(8, 8)
+        .preproc_dct()
+        .to_hasher()
+        .hash_image(&img);
+
+    let dhash = HasherConfig::new()
+        .hash_alg(HashAlg::Gradient)
+        .hash_size(8, 8)
+        .to_hasher()
+        .hash_image(&img);
+
+    Some(PerceptualIdentity {
+        phash_64: bytes_to_i64(phash.as_bytes())?,
+        dhash_64: bytes_to_i64(dhash.as_bytes())?,
+    })
+}
+
+/// Hamming distance between two 64-bit perceptual hashes. The primary
+/// query primitive: two images are "near-duplicates" when this is below
+/// a threshold (default 8 for pHash, ~12% similarity tolerance). The
+/// duplicates module clusters via a BK-tree which uses its own copy of
+/// this calculation; this helper is kept for ad-hoc tools and tests.
+#[allow(dead_code)]
+#[inline]
+pub fn hamming_distance(a: i64, b: i64) -> u32 {
+    (a ^ b).count_ones()
+}
+
+fn bytes_to_i64(bytes: &[u8]) -> Option<i64> {
+    if bytes.len() < 8 {
+        return None;
+    }
+    let mut buf = [0u8; 8];
+    buf.copy_from_slice(&bytes[..8]);
+    Some(i64::from_be_bytes(buf))
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+    use image::{ImageBuffer, Rgb};
+
+    fn write_test_image(path: &Path, seed: u32) {
+        // Deterministic-but-distinct image content: simple gradient with
+        // a per-seed offset. Gives pHash/dHash a real signal to work
+        // with (a uniform image collapses to all-zero hashes).
+        let img: ImageBuffer<Rgb<u8>, Vec<u8>> = ImageBuffer::from_fn(64, 64, |x, y| {
+            let r = ((x + seed) & 0xFF) as u8;
+            let g = ((y + seed * 2) & 0xFF) as u8;
+            let b = ((x ^ y ^ seed) & 0xFF) as u8;
+            Rgb([r, g, b])
+        });
+        img.save(path).unwrap();
+    }
+
+    #[test]
+    fn identical_bytes_yield_identical_hashes() {
+        let dir = tempfile::tempdir().unwrap();
+        let a = dir.path().join("a.png");
+        let b = dir.path().join("b.png");
+        write_test_image(&a, 42);
+        write_test_image(&b, 42);
+        let ha = compute(&a).expect("hash a");
+        let hb = compute(&b).expect("hash b");
+        assert_eq!(ha, hb);
+        assert_eq!(hamming_distance(ha.phash_64, hb.phash_64), 0);
+    }
+
+    #[test]
+    fn distinct_images_have_distinct_hashes() {
+        let dir = tempfile::tempdir().unwrap();
+        let a = dir.path().join("a.png");
+        let b = dir.path().join("b.png");
+        write_test_image(&a, 42);
+        write_test_image(&b, 123);
+        let ha = compute(&a).expect("hash a");
+        let hb = compute(&b).expect("hash b");
+        assert_ne!(ha.phash_64, hb.phash_64);
+    }
+
+    #[test]
+    fn resized_copy_is_near_duplicate_under_threshold() {
+        // The whole point of perceptual hashing: a resized copy of the
+        // same source image should land within a small Hamming distance
+        // of the original. We check the dHash specifically because it's
+        // the more resize-robust of the two; pHash is also tight but
+        // gradient-based dHash gives the most reliable signal here.
+        let dir = tempfile::tempdir().unwrap();
+        let a = dir.path().join("a.png");
+        write_test_image(&a, 7);
+        let img = image::open(&a).unwrap();
+        let small = img.resize_exact(32, 32, image::imageops::FilterType::Lanczos3);
+        let b = dir.path().join("b.png");
+        small.save(&b).unwrap();
+
+        let ha = compute(&a).expect("hash a");
+        let hb = compute(&b).expect("hash b");
+        let d_dhash = hamming_distance(ha.dhash_64, hb.dhash_64);
+        assert!(
+            d_dhash <= 8,
+            "expected dhash Hamming distance <= 8 for resized copy, got {}",
+            d_dhash
+        );
+    }
+
+    #[test]
+    fn unsupported_path_returns_none() {
+        let dir = tempfile::tempdir().unwrap();
+        let p = dir.path().join("notanimage.txt");
+        std::fs::write(&p, b"hello").unwrap();
+        assert!(compute(&p).is_none());
+    }
+
+    #[test]
+    fn missing_file_returns_none() {
+        let p = Path::new("/nonexistent/path/that/does/not/exist.png");
+        assert!(compute(p).is_none());
+    }
+}
--- a/src/state.rs
+++ b/src/state.rs
@@ -10,7 +10,7 @@ use crate::database::{
    connect,
 };
 use crate::database::{PreviewDao, SqlitePreviewDao};
-use crate::libraries::{self, Library};
+use crate::libraries::{self, Library, LibraryHealthMap};
 use crate::tags::{SqliteTagDao, TagDao};
 use crate::video::actors::{
    PlaylistGenerator, PreviewClipGenerator, StreamActor, VideoPlaylistManager,
@@ -26,6 +26,11 @@ pub struct AppState {
    /// All configured media libraries. Ordered by `id` ascending; the first
    /// entry is the primary library.
    pub libraries: Vec<Library>,
+    /// Per-library availability snapshot. Updated by the file watcher at
+    /// the top of each tick via `libraries::refresh_health`. HTTP handlers
+    /// read it (e.g. `/libraries` surfacing). See "Library availability
+    /// and safety" in CLAUDE.md.
+    pub library_health: LibraryHealthMap,
    /// Legacy shim equal to `libraries[0].root_path`. Phase 2 transitional —
    /// new code should go through `primary_library()`.
    pub base_path: String,
@@ -105,11 +110,13 @@ impl AppState {
            preview_dao,
        );

+        let library_health = libraries::new_health_map(&libraries_vec);
        Self {
            stream_manager,
            playlist_manager: Arc::new(video_playlist_manager.start()),
            preview_clip_generator: Arc::new(preview_clip_generator.start()),
            libraries: libraries_vec,
+            library_health,
            base_path,
            thumbnail_path,
            video_path,
@@ -348,6 +355,8 @@ impl AppState {
            id: crate::libraries::PRIMARY_LIBRARY_ID,
            name: "main".to_string(),
            root_path: base_path_str.clone(),
+            enabled: true,
+            excluded_dirs: Vec::new(),
        };
        let insight_generator = InsightGenerator::new(
            ollama.clone(),
@@ -384,6 +393,8 @@ impl AppState {
            id: crate::libraries::PRIMARY_LIBRARY_ID,
            name: "main".to_string(),
            root_path: base_path_str.clone(),
+            enabled: true,
+            excluded_dirs: Vec::new(),
        }];
        AppState::new(
            Arc::new(StreamActor {}.start()),
--- a/src/tags.rs
+++ b/src/tags.rs
@@ -33,6 +33,11 @@ where
    .service(web::resource("image/tags/all").route(web::get().to(get_all_tags::<TagD>)))
    .service(web::resource("image/tags/batch").route(web::post().to(update_tags::<TagD>)))
    .service(web::resource("image/tags/lookup").route(web::post().to(lookup_tags_batch::<TagD>)))
+    .service(
+        web::resource("image/tags/{id}")
+            .route(web::put().to(update_tag::<TagD>))
+            .route(web::delete().to(delete_tag::<TagD>)),
+    )
 }

 async fn add_tag<D: TagDao>(
@@ -53,7 +58,14 @@ async fn add_tag<D: TagDao>(
    tag_dao
        .get_all_tags(&span_context, None)
        .and_then(|tags| {
-            if let Some((_, tag)) = tags.iter().find(|t| t.1.name == tag_name) {
+            // Case-insensitive match. With the unique-NOCASE index on
+            // tags.name now in place, a case-sensitive find here would
+            // miss a casing-only collision and let the subsequent
+            // create_tag INSERT crash on the constraint.
+            if let Some((_, tag)) = tags
+                .iter()
+                .find(|t| t.1.name.eq_ignore_ascii_case(&tag_name))
+            {
                Ok(tag.clone())
            } else {
                info!(
@@ -71,6 +83,74 @@ async fn add_tag<D: TagDao>(
        .into_http_internal_err()
 }

+async fn update_tag<D: TagDao>(
+    _: Claims,
+    http_request: HttpRequest,
+    path: web::Path<i32>,
+    body: web::Json<UpdateTagRequest>,
+    tag_dao: web::Data<Mutex<D>>,
+) -> impl Responder {
+    let tracer = global_tracer();
+    let context = extract_context_from_request(&http_request);
+    let span = tracer.start_with_context("update_tag", &context);
+    let span_context = opentelemetry::Context::current_with_span(span);
+
+    let id = path.into_inner();
+    let trimmed = body.name.trim();
+    if trimmed.is_empty() {
+        return HttpResponse::BadRequest()
+            .json(serde_json::json!({ "error": "Tag name must not be empty" }));
+    }
+
+    let mut tag_dao = tag_dao.lock().expect("Unable to get TagDao");
+    match tag_dao.update_tag_name(&span_context, id, trimmed) {
+        Ok(UpdateTagOutcome::Renamed(tag)) => {
+            span_context.span().set_status(Status::Ok);
+            info!("Renamed tag {} -> '{}'", id, trimmed);
+            HttpResponse::Ok().json(tag)
+        }
+        Ok(UpdateTagOutcome::NotFound) => {
+            HttpResponse::NotFound().json(serde_json::json!({ "error": "Tag not found" }))
+        }
+        Ok(UpdateTagOutcome::Conflict { existing }) => HttpResponse::Conflict().json(
+            serde_json::json!({ "error": "Tag name already exists", "existing_tag": existing }),
+        ),
+        Err(e) => {
+            log::error!("update_tag failed: {:?}", e);
+            HttpResponse::InternalServerError()
+                .json(serde_json::json!({ "error": "Update failed" }))
+        }
+    }
+}
+
+async fn delete_tag<D: TagDao>(
+    _: Claims,
+    http_request: HttpRequest,
+    path: web::Path<i32>,
+    tag_dao: web::Data<Mutex<D>>,
+) -> impl Responder {
+    let tracer = global_tracer();
+    let context = extract_context_from_request(&http_request);
+    let span = tracer.start_with_context("delete_tag", &context);
+    let span_context = opentelemetry::Context::current_with_span(span);
+
+    let id = path.into_inner();
+    let mut tag_dao = tag_dao.lock().expect("Unable to get TagDao");
+    match tag_dao.delete_tag(&span_context, id) {
+        Ok(true) => {
+            span_context.span().set_status(Status::Ok);
+            info!("Deleted tag {}", id);
+            HttpResponse::NoContent().finish()
+        }
+        Ok(false) => HttpResponse::NotFound().json(serde_json::json!({ "error": "Tag not found" })),
+        Err(e) => {
+            log::error!("delete_tag failed: {:?}", e);
+            HttpResponse::InternalServerError()
+                .json(serde_json::json!({ "error": "Delete failed" }))
+        }
+    }
+}
+
 async fn get_tags<D: TagDao>(
    _: Claims,
    http_request: HttpRequest,
@@ -284,9 +364,15 @@ async fn lookup_tags_batch<D: TagDao>(
    // Stage 1: query → content_hash mapping. Files without a hash yet
    // (just-indexed, hash compute failed, etc.) skip the sibling
    // expansion and only get tags from their own rel_path.
+    // Library-agnostic by design: this endpoint takes raw rel_paths from
+    // the client (typically Apollo) with no library context. Span all
+    // libraries and let the hash-keyed sibling expansion below do the
+    // disambiguation. Same-rel_path/different-content collisions across
+    // libraries surface as multiple hashes for one path — fine, we union
+    // every sibling tag set.
    let exif_records = {
        let mut dao = exif_dao.lock().expect("Unable to get ExifDao");
-        match dao.get_exif_batch(&span_context, &query_paths) {
+        match dao.get_exif_batch(&span_context, None, &query_paths) {
            Ok(rows) => rows,
            Err(e) => {
                return HttpResponse::InternalServerError()
@@ -421,6 +507,11 @@ pub struct InsertTaggedPhoto {
    #[diesel(column_name = rel_path)]
    pub photo_name: String,
    pub created_time: i64,
+    /// Hash-keyed identity. The DAO populates this from
+    /// `image_exif.content_hash` at insert time when known; the
+    /// reconciliation pass backfills rows inserted before the hash
+    /// landed. See CLAUDE.md "Multi-library data model".
+    pub content_hash: Option<String>,
 }

 #[derive(Queryable, Clone, Debug)]
@@ -434,6 +525,8 @@ pub struct TaggedPhoto {
    pub tag_id: i32,
    #[allow(dead_code)] // Part of API contract
    pub created_time: i64,
+    #[allow(dead_code)]
+    pub content_hash: Option<String>,
 }

 #[derive(Debug, Deserialize)]
@@ -442,6 +535,22 @@ pub struct AddTagsRequest {
    pub tag_ids: Vec<i32>,
 }

+#[derive(Debug, Deserialize)]
+pub struct UpdateTagRequest {
+    pub name: String,
+}
+
+/// Result of an attempted tag rename. Returning a typed outcome (rather
+/// than `anyhow::Result<Tag>`) lets the handler map each case to a
+/// distinct HTTP status without sniffing error strings, and keeps the
+/// 409 path a normal control-flow result instead of a DB constraint
+/// violation surfacing as a generic 500.
+pub enum UpdateTagOutcome {
+    Renamed(Tag),
+    NotFound,
+    Conflict { existing: Tag },
+}
+
 pub trait TagDao: Send + Sync {
    fn get_all_tags(
        &mut self,
@@ -511,6 +620,26 @@ pub trait TagDao: Send + Sync {
        context: &opentelemetry::Context,
        file_paths: &[String],
    ) -> anyhow::Result<std::collections::HashMap<String, i64>>;
+    /// Rename a tag in place. The tag id stays stable so existing
+    /// `tagged_photo` rows automatically reflect the new name without
+    /// a join-table rewrite. Conflict is resolved against the rest of
+    /// the table case-insensitively (mirroring the
+    /// `idx_tags_name_nocase` UNIQUE index) — a rename that changes
+    /// only the case of the tag's own current name is allowed.
+    fn update_tag_name(
+        &mut self,
+        context: &opentelemetry::Context,
+        id: i32,
+        new_name: &str,
+    ) -> anyhow::Result<UpdateTagOutcome>;
+    /// Globally remove a tag and every `tagged_photo` row that
+    /// references it. Returns `true` if a tag was deleted, `false` if
+    /// no row matched the id. The schema's FK is `ON DELETE CASCADE`
+    /// but SQLite only honors that with `PRAGMA foreign_keys = ON`,
+    /// which this project doesn't set — the impl deletes both tables
+    /// explicitly in a single transaction so partial state is
+    /// impossible.
+    fn delete_tag(&mut self, context: &opentelemetry::Context, id: i32) -> anyhow::Result<bool>;
 }

 pub struct SqliteTagDao {
@@ -704,6 +833,83 @@ impl TagDao for SqliteTagDao {
        })
    }

+    fn update_tag_name(
+        &mut self,
+        context: &opentelemetry::Context,
+        id: i32,
+        new_name: &str,
+    ) -> anyhow::Result<UpdateTagOutcome> {
+        let mut conn = self
+            .connection
+            .lock()
+            .expect("Unable to lock SqliteTagDao connection");
+        trace_db_call(context, "update", "update_tag_name", |span| {
+            span.set_attributes(vec![
+                KeyValue::new("tag_id", id as i64),
+                KeyValue::new("new_name", new_name.to_string()),
+            ]);
+
+            let target = tags::table
+                .filter(tags::id.eq(id))
+                .select((tags::id, tags::name, tags::created_time))
+                .get_result::<Tag>(conn.deref_mut())
+                .optional()
+                .with_context(|| format!("Unable to look up tag id {}", id))?;
+            let target = match target {
+                Some(t) => t,
+                None => return Ok(UpdateTagOutcome::NotFound),
+            };
+
+            // Case-insensitive collision check on every other row.
+            // Belt-and-suspenders: idx_tags_name_nocase enforces this at
+            // the index level, but checking up front gives the handler
+            // a clean 409 with the existing tag's id instead of a
+            // generic constraint-violation 500. Tags table is small;
+            // loading peers and comparing in Rust avoids a fragile
+            // dsl::sql composition for case-insensitive equality.
+            let conflict = tags::table
+                .filter(tags::id.ne(id))
+                .select((tags::id, tags::name, tags::created_time))
+                .get_results::<Tag>(conn.deref_mut())
+                .with_context(|| "Unable to query for tag-name conflict")?
+                .into_iter()
+                .find(|t| t.name.eq_ignore_ascii_case(new_name));
+            if let Some(existing) = conflict {
+                return Ok(UpdateTagOutcome::Conflict { existing });
+            }
+
+            diesel::update(tags::table.filter(tags::id.eq(id)))
+                .set(tags::name.eq(new_name))
+                .execute(conn.deref_mut())
+                .with_context(|| format!("Unable to rename tag {}", id))?;
+
+            Ok(UpdateTagOutcome::Renamed(Tag {
+                id: target.id,
+                name: new_name.to_string(),
+                created_time: target.created_time,
+            }))
+        })
+    }
+
+    fn delete_tag(&mut self, context: &opentelemetry::Context, id: i32) -> anyhow::Result<bool> {
+        let mut conn = self
+            .connection
+            .lock()
+            .expect("Unable to lock SqliteTagDao connection");
+        trace_db_call(context, "delete", "delete_tag", |span| {
+            span.set_attribute(KeyValue::new("tag_id", id as i64));
+
+            // tagged_photo.tag_id is `ON DELETE CASCADE` and the
+            // connection now sets `PRAGMA foreign_keys = ON`, so a
+            // single DELETE on tags removes its tagged_photo rows
+            // atomically.
+            let removed = diesel::delete(tags::table.filter(tags::id.eq(id)))
+                .execute(conn.deref_mut())
+                .with_context(|| format!("Unable to delete tag {}", id))?;
+            Ok(removed > 0)
+        })
+    }
+
    fn remove_tag(
        &mut self,
        context: &opentelemetry::Context,
@@ -759,11 +965,31 @@ impl TagDao for SqliteTagDao {
                KeyValue::new("tag_id", tag_id.to_string()),
            ]);

+            // Eagerly populate content_hash so this tag follows the bytes,
+            // not the path (see CLAUDE.md "Multi-library data model").
+            // None is fine — the reconciliation pass will backfill once
+            // image_exif has a hash for this file. We deliberately don't
+            // require library_id here: the tag handler is library-
+            // agnostic by design, and any matching image_exif row's hash
+            // is acceptable. If the path resolves to different bytes in
+            // different libraries, reconciliation per-library refines.
+            let content_hash: Option<String> = {
+                use crate::database::schema::image_exif as ie;
+                ie::table
+                    .filter(ie::rel_path.eq(path))
+                    .filter(ie::content_hash.is_not_null())
+                    .select(ie::content_hash)
+                    .first::<Option<String>>(conn.deref_mut())
+                    .ok()
+                    .flatten()
+            };
+
            diesel::insert_into(tagged_photo::table)
                .values(InsertTaggedPhoto {
                    tag_id,
                    photo_name: path.to_string(),
                    created_time: Utc::now().timestamp(),
+                    content_hash,
                })
                .execute(conn.deref_mut())
                .with_context(|| format!("Unable to tag file {:?} in sqlite", path))
@@ -1168,6 +1394,7 @@ mod tests {
                    tag_id: tag.id,
                    created_time: Utc::now().timestamp(),
                    photo_name: path.to_string(),
+                    content_hash: None,
                };

                if self.tagged_photos.borrow().contains_key(path) {
@@ -1238,6 +1465,54 @@ mod tests {
            }
            Ok(counts)
        }
+
+        fn update_tag_name(
+            &mut self,
+            _context: &opentelemetry::Context,
+            id: i32,
+            new_name: &str,
+        ) -> anyhow::Result<UpdateTagOutcome> {
+            // Conflict pass first so the target tag's own old name
+            // doesn't collide with itself.
+            let conflict = self
+                .tags
+                .borrow()
+                .iter()
+                .find(|t| t.id != id && t.name.eq_ignore_ascii_case(new_name))
+                .cloned();
+            if let Some(existing) = conflict {
+                return Ok(UpdateTagOutcome::Conflict { existing });
+            }
+            let mut tags = self.tags.borrow_mut();
+            match tags.iter_mut().find(|t| t.id == id) {
+                Some(t) => {
+                    t.name = new_name.to_string();
+                    Ok(UpdateTagOutcome::Renamed(t.clone()))
+                }
+                None => Ok(UpdateTagOutcome::NotFound),
+            }
+        }
+
+        fn delete_tag(
+            &mut self,
+            _context: &opentelemetry::Context,
+            id: i32,
+        ) -> anyhow::Result<bool> {
+            let target_name = {
+                let tags = self.tags.borrow();
+                tags.iter().find(|t| t.id == id).map(|t| t.name.clone())
+            };
+            let Some(name) = target_name else {
+                return Ok(false);
+            };
+            // Mirror the cascade: drop any tagged_photo references, then
+            // remove the tag itself.
+            for (_path, tags) in self.tagged_photos.borrow_mut().iter_mut() {
+                tags.retain(|t| t.id != id && t.name != name);
+            }
+            self.tags.borrow_mut().retain(|t| t.id != id);
+            Ok(true)
+        }
    }

    #[actix_rt::test]
@@ -1253,20 +1528,29 @@ mod tests {
        // Seed: two paths tagged, one path untagged.
        dao.tagged_photos.borrow_mut().insert(
            "a.jpg".into(),
-            vec![Tag { id: 1, name: "alpha".into(), created_time: 0 }],
+            vec![Tag {
+                id: 1,
+                name: "alpha".into(),
+                created_time: 0,
+            }],
        );
        dao.tagged_photos.borrow_mut().insert(
            "b.jpg".into(),
            vec![
-                Tag { id: 2, name: "beta".into(), created_time: 0 },
-                Tag { id: 3, name: "gamma".into(), created_time: 0 },
+                Tag {
+                    id: 2,
+                    name: "beta".into(),
+                    created_time: 0,
+                },
+                Tag {
+                    id: 3,
+                    name: "gamma".into(),
+                    created_time: 0,
+                },
            ],
        );
        let grouped = dao
-            .get_tags_grouped_by_paths(
-                &ctx,
-                &["a.jpg".into(), "b.jpg".into(), "c.jpg".into()],
-            )
+            .get_tags_grouped_by_paths(&ctx, &["a.jpg".into(), "b.jpg".into(), "c.jpg".into()])
            .unwrap();
        assert_eq!(grouped.get("a.jpg").map(|v| v.len()), Some(1));
        assert_eq!(grouped.get("b.jpg").map(|v| v.len()), Some(2));
@@ -1381,6 +1665,177 @@ mod tests {
            None
        );
    }
+
+    async fn rename_tag(
+        dao: &Data<Mutex<TestTagDao>>,
+        id: i32,
+        new_name: &str,
+    ) -> actix_web::http::StatusCode {
+        use actix_web::Responder;
+        let req = TestRequest::default().to_http_request();
+        let body = web::Json(UpdateTagRequest {
+            name: new_name.to_string(),
+        });
+        let claims = Claims::valid_user(String::from("1"));
+        let resp = update_tag(claims, req.clone(), web::Path::from(id), body, dao.clone()).await;
+        resp.respond_to(&req).status()
+    }
+
+    #[actix_rt::test]
+    async fn update_tag_renames_successfully() {
+        let mut dao = TestTagDao::new();
+        let tag = dao
+            .create_tag(&opentelemetry::Context::current(), "old")
+            .unwrap();
+        let dao = Data::new(Mutex::new(dao));
+
+        assert_eq!(
+            rename_tag(&dao, tag.id, "new").await,
+            actix_web::http::StatusCode::OK
+        );
+
+        let mut locked = dao.lock().unwrap();
+        let all = locked
+            .get_all_tags(&opentelemetry::Context::current(), None)
+            .unwrap();
+        assert_eq!(all.len(), 1);
+        assert_eq!(all[0].1.name, "new");
+    }
+
+    #[actix_rt::test]
+    async fn update_tag_not_found_returns_404() {
+        let dao = Data::new(Mutex::new(TestTagDao::new()));
+        assert_eq!(
+            rename_tag(&dao, 99999, "nope").await,
+            actix_web::http::StatusCode::NOT_FOUND
+        );
+    }
+
+    #[actix_rt::test]
+    async fn update_tag_empty_name_returns_400() {
+        let mut dao = TestTagDao::new();
+        let tag = dao
+            .create_tag(&opentelemetry::Context::current(), "keep")
+            .unwrap();
+        let dao = Data::new(Mutex::new(dao));
+
+        assert_eq!(
+            rename_tag(&dao, tag.id, "   ").await,
+            actix_web::http::StatusCode::BAD_REQUEST
+        );
+
+        let mut locked = dao.lock().unwrap();
+        let all = locked
+            .get_all_tags(&opentelemetry::Context::current(), None)
+            .unwrap();
+        assert_eq!(all[0].1.name, "keep", "name must not change on 400");
+    }
+
+    #[actix_rt::test]
+    async fn update_tag_conflict_returns_409() {
+        let mut dao = TestTagDao::new();
+        let _a = dao
+            .create_tag(&opentelemetry::Context::current(), "a")
+            .unwrap();
+        let b = dao
+            .create_tag(&opentelemetry::Context::current(), "b")
+            .unwrap();
+        let dao = Data::new(Mutex::new(dao));
+
+        // Case-insensitive collision: renaming b -> "A" must conflict with a.
+        assert_eq!(
+            rename_tag(&dao, b.id, "A").await,
+            actix_web::http::StatusCode::CONFLICT
+        );
+
+        let mut locked = dao.lock().unwrap();
+        let all = locked
+            .get_all_tags(&opentelemetry::Context::current(), None)
+            .unwrap();
+        let b_after = all.iter().find(|(_, t)| t.id == b.id).unwrap();
+        assert_eq!(b_after.1.name, "b", "no DB change on 409");
+    }
+
+    async fn delete_via_handler(
+        dao: &Data<Mutex<TestTagDao>>,
+        id: i32,
+    ) -> actix_web::http::StatusCode {
+        use actix_web::Responder;
+        let req = TestRequest::default().to_http_request();
+        let claims = Claims::valid_user(String::from("1"));
+        let resp = delete_tag(claims, req.clone(), web::Path::from(id), dao.clone()).await;
+        resp.respond_to(&req).status()
+    }
+
+    #[actix_rt::test]
+    async fn delete_tag_removes_tag_and_cascades_tagged_photos() {
+        let mut dao = TestTagDao::new();
+        let tag = dao
+            .create_tag(&opentelemetry::Context::current(), "doomed")
+            .unwrap();
+        dao.tag_file(&opentelemetry::Context::current(), "a.jpg", tag.id)
+            .unwrap();
+        dao.tag_file(&opentelemetry::Context::current(), "b.jpg", tag.id)
+            .unwrap();
+        let dao = Data::new(Mutex::new(dao));
+
+        assert_eq!(
+            delete_via_handler(&dao, tag.id).await,
+            actix_web::http::StatusCode::NO_CONTENT
+        );
+
+        let mut locked = dao.lock().unwrap();
+        assert!(
+            locked
+                .get_all_tags(&opentelemetry::Context::current(), None)
+                .unwrap()
+                .is_empty()
+        );
+        assert!(
+            locked
+                .get_tags_for_path(&opentelemetry::Context::current(), "a.jpg")
+                .unwrap()
+                .is_empty(),
+            "tagged_photo references must be cleaned up by the cascade"
+        );
+        assert!(
+            locked
+                .get_tags_for_path(&opentelemetry::Context::current(), "b.jpg")
+                .unwrap()
+                .is_empty()
+        );
+    }
+
+    #[actix_rt::test]
+    async fn delete_tag_unknown_id_returns_404() {
+        let dao = Data::new(Mutex::new(TestTagDao::new()));
+        assert_eq!(
+            delete_via_handler(&dao, 99999).await,
+            actix_web::http::StatusCode::NOT_FOUND
+        );
+    }
+
+    #[actix_rt::test]
+    async fn update_tag_case_only_change_succeeds() {
+        let mut dao = TestTagDao::new();
+        let tag = dao
+            .create_tag(&opentelemetry::Context::current(), "vacation")
+            .unwrap();
+        let dao = Data::new(Mutex::new(dao));
+
+        // The conflict check excludes the target's own row, so changing
+        // only the case of the tag's current name must succeed.
+        assert_eq!(
+            rename_tag(&dao, tag.id, "Vacation").await,
+            actix_web::http::StatusCode::OK
+        );
+
+        let mut locked = dao.lock().unwrap();
+        let all = locked
+            .get_all_tags(&opentelemetry::Context::current(), None)
+            .unwrap();
+        assert_eq!(all[0].1.name, "Vacation");
+    }
 }
 #[derive(QueryableByName, Debug, Clone)]
 pub struct FileWithTagCount {