duplicates: perceptual hash + soft-mark resolution + upload 409
Adds pHash + dHash columns alongside the existing blake3 content_hash so
near-duplicates (re-encoded, resized, format-converted copies) become
queryable. /duplicates/{exact,perceptual} return groups; /duplicates/
{resolve,unresolve} flip a duplicate_of_hash soft-mark on losing rows
and union perceptual-only tag sets onto the survivor. The default
/photos listing filters duplicate_of_hash IS NULL so demoted siblings
stop cluttering the grid; include_duplicates=true opts back in for
Apollo's review modal. Upload now hashes bytes pre-write and returns
409 with the canonical sibling when a file's bytes already exist.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
41
migrations/2026-05-03-000000_add_perceptual_hash/up.sql
Normal file
41
migrations/2026-05-03-000000_add_perceptual_hash/up.sql
Normal file
@@ -0,0 +1,41 @@
|
||||
-- Adds perceptual-hash signals + soft-mark resolution state to image_exif so
|
||||
-- the duplicates surface in Apollo can group near-duplicates (re-encoded,
|
||||
-- resized, format-converted copies) and let the user demote losers without
|
||||
-- touching the file on disk. Image-only for v1: phash_64/dhash_64 are NULL
|
||||
-- on videos and on images that fail to decode. See Apollo CLAUDE.md →
|
||||
-- Duplicate detection / Caching layer for the policy.
|
||||
--
|
||||
-- Soft-mark columns are media-type-agnostic — when video perceptual hashing
|
||||
-- arrives, it lives in a separate hash-keyed companion table and reuses the
|
||||
-- same duplicate_of_hash / duplicate_decided_at machinery.
|
||||
|
||||
-- pHash (DCT, 64-bit) packed as i64 for fast XOR + popcount Hamming.
|
||||
ALTER TABLE image_exif ADD COLUMN phash_64 BIGINT;
|
||||
|
||||
-- dHash (gradient, 64-bit). Cheap, robust to compression/resize. Stored
|
||||
-- alongside pHash so the query layer can fall back if either is null.
|
||||
ALTER TABLE image_exif ADD COLUMN dhash_64 BIGINT;
|
||||
|
||||
-- When non-null, this row is a soft-marked duplicate of the row whose
|
||||
-- content_hash matches. The duplicate file stays on disk; the default
|
||||
-- /photos listing filters it out. /photos?include_duplicates=true opts
|
||||
-- back in (the Apollo duplicates modal uses this).
|
||||
ALTER TABLE image_exif ADD COLUMN duplicate_of_hash TEXT;
|
||||
|
||||
-- Unix seconds of the resolve. Distinguishes "never reviewed" from
|
||||
-- "reviewed and resolved" for the Apollo include_resolved toggle.
|
||||
ALTER TABLE image_exif ADD COLUMN duplicate_decided_at BIGINT;
|
||||
|
||||
-- Partial indexes — the columns are NULL for the vast majority of rows
|
||||
-- during the transitional window and forever for videos / decode failures.
|
||||
CREATE INDEX idx_image_exif_phash
|
||||
ON image_exif (phash_64)
|
||||
WHERE phash_64 IS NOT NULL;
|
||||
|
||||
CREATE INDEX idx_image_exif_dhash
|
||||
ON image_exif (dhash_64)
|
||||
WHERE dhash_64 IS NOT NULL;
|
||||
|
||||
CREATE INDEX idx_image_exif_duplicate_of_hash
|
||||
ON image_exif (duplicate_of_hash)
|
||||
WHERE duplicate_of_hash IS NOT NULL;
|
||||
Reference in New Issue
Block a user