feat: add content_hash backfill + register every media file

Adds blake3 content hashing as the basis for derivative dedup
(thumbnails, HLS) across libraries. Computed inline by the watcher on
ingest and by a new `backfill_hashes` binary for historical rows.

Key changes:
- `content_hash` and `size_bytes` are now populated on new image_exif
  rows; a new ExifDao surface (`get_rows_missing_hash`,
  `backfill_content_hash`, `find_by_content_hash`) supports backfill and
  future hash-keyed lookups.
- The watcher now registers every image/video in image_exif, not just
  files with parseable EXIF. EXIF becomes optional enrichment; videos
  and other non-EXIF files still get a hashed row. This also makes
  DB-indexed sort/filter cover the full library.
- `/image` thumbnail serve dual-looks up hash-keyed path first, then
  falls back to the legacy mirrored layout.
- Upload flow accepts `?library=` query param + hashes uploaded files.
- Store_exif logs the underlying Diesel error on insert failure so
  constraint violations surface instead of hiding behind a generic
  InsertError.
- New migration normalizes rel_path separators to forward slash across
  all tables, deduplicating any rows that collide after normalization.
  Fixes spurious UNIQUE violations from mixed backslash/forward-slash
  paths on Windows ingest.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Cameron
2026-04-17 16:25:39 -04:00
parent 561e261d39
commit 524f00b068
11 changed files with 681 additions and 69 deletions

View File

@@ -0,0 +1,4 @@
-- No-op: there's no sensible way to recover which rows originally used
-- backslashes, and there's no reason to want backslashes back. The
-- deleted duplicates are also gone.
SELECT 1;

View File

@@ -0,0 +1,85 @@
-- Normalize `rel_path` columns to forward slashes. Windows ingest
-- historically produced a mix of `\` and `/`, which broke lookups and
-- caused spurious UNIQUE-constraint violations on re-registration.
--
-- SQLite enforces UNIQUE per-row during UPDATE, so we have to drop
-- losing duplicates BEFORE normalizing. For each table that has a
-- UNIQUE on rel_path, we delete rows whose normalized form already
-- exists in canonical (forward-slash) form — keeping the existing
-- forward-slash row as the survivor. Then a flat UPDATE finishes the
-- job for remaining backslash rows.
-- image_exif: UNIQUE(library_id, rel_path)
DELETE FROM image_exif
WHERE rel_path LIKE '%\%'
AND EXISTS (
SELECT 1 FROM image_exif AS other
WHERE other.library_id = image_exif.library_id
AND other.rel_path = REPLACE(image_exif.rel_path, '\', '/')
AND other.id != image_exif.id
);
UPDATE image_exif
SET rel_path = REPLACE(rel_path, '\', '/')
WHERE rel_path LIKE '%\%';
-- favorites: UNIQUE(userid, rel_path)
DELETE FROM favorites
WHERE rel_path LIKE '%\%'
AND EXISTS (
SELECT 1 FROM favorites AS other
WHERE other.userid = favorites.userid
AND other.rel_path = REPLACE(favorites.rel_path, '\', '/')
AND other.id != favorites.id
);
UPDATE favorites
SET rel_path = REPLACE(rel_path, '\', '/')
WHERE rel_path LIKE '%\%';
-- tagged_photo: UNIQUE(rel_path, tag_id)
DELETE FROM tagged_photo
WHERE rel_path LIKE '%\%'
AND EXISTS (
SELECT 1 FROM tagged_photo AS other
WHERE other.tag_id = tagged_photo.tag_id
AND other.rel_path = REPLACE(tagged_photo.rel_path, '\', '/')
AND other.id != tagged_photo.id
);
UPDATE tagged_photo
SET rel_path = REPLACE(rel_path, '\', '/')
WHERE rel_path LIKE '%\%';
-- entity_photo_links: UNIQUE(entity_id, library_id, rel_path, role)
DELETE FROM entity_photo_links
WHERE rel_path LIKE '%\%'
AND EXISTS (
SELECT 1 FROM entity_photo_links AS other
WHERE other.entity_id = entity_photo_links.entity_id
AND other.library_id = entity_photo_links.library_id
AND other.role = entity_photo_links.role
AND other.rel_path = REPLACE(entity_photo_links.rel_path, '\', '/')
AND other.id != entity_photo_links.id
);
UPDATE entity_photo_links
SET rel_path = REPLACE(rel_path, '\', '/')
WHERE rel_path LIKE '%\%';
-- video_preview_clips: UNIQUE(library_id, rel_path)
DELETE FROM video_preview_clips
WHERE rel_path LIKE '%\%'
AND EXISTS (
SELECT 1 FROM video_preview_clips AS other
WHERE other.library_id = video_preview_clips.library_id
AND other.rel_path = REPLACE(video_preview_clips.rel_path, '\', '/')
AND other.id != video_preview_clips.id
);
UPDATE video_preview_clips
SET rel_path = REPLACE(rel_path, '\', '/')
WHERE rel_path LIKE '%\%';
-- photo_insights has no UNIQUE on rel_path (history table), so a plain
-- normalize is safe.
UPDATE photo_insights
SET rel_path = REPLACE(rel_path, '\', '/')
WHERE rel_path LIKE '%\%';
ANALYZE;