Implement end-to-end nightly pre-generation of memory reels with agentic
scripting that grounds narration in calendar, location, messages, and RAG.
Sections A-E from the plan:
A. Extract produce_reel pipeline core from run_reel_job with
ScripterMode::Fast/Agentic and progress callbacks.
B. Agentic scripter: factor run_readonly_tool_loop from the insight
generator, build read-only tool gate, prompt builder with GPS, and
generate_script_agentic with fallback to fast path.
C. Precomputed reels ledger (SQLite table + DAO), GET /reels/precomputed
handler with validity gate, GET /reels/by-key/{key}/video streaming,
and normalize_library_key helper.
D. Nightly scheduler: spawn_pregen_scheduler with configurable hour,
run_pregen_batch (day/week/month spans), pregen_one with dedup and
disk-check, secs_until_next_run_hour time math.
E. user_ai_prefs passive mirror table + DAO for param capture in
create_reel_handler and replay in the scheduler.
Also fixes resolve_library_param signature to take &[Library] and adds
resolve_library_param_state wrapper for AppState callers.
New files: migrations/2026-06-13-000000_add_precomputed_reels/,
migrations/2026-06-13-000010_add_user_ai_prefs/,
src/database/precomputed_reel_dao.rs,
src/database/user_ai_prefs_dao.rs
Bucket exact-dup rows by (library_id, dirname) pair on each side, then
filter by coverage = shared / min(folder_a_total, folder_b_total) and
an absolute floor on shared count. Surfaces "this folder is mostly
contained in that folder" matches that the per-file EXACT view buries
under one row each — e.g. an old phone-backup tree shadowing the
organized library, or a topic-grouped folder duplicating a date-grouped
one within the same library.
New endpoint: GET /duplicates/folder-pairs?library=&include_resolved=
&min_coverage=&min_shared=. Cached 5 min keyed on (library, include_resolved);
the user-tunable thresholds filter the cached unfiltered pair list so
slider drags don't re-bucket. Shares the resolve / unresolve flow with
the existing tabs — the frontend fans out N parallel /resolve calls,
one per shared content_hash.
Folder names carry no signal (BMW lives under Night Photos, not BMW_backup),
so bucketing is purely on (library_id, dirname) co-occurrence in
exact-dup groups. Within-folder dups (same hash twice in the same
folder) are skipped — those belong to the EXACT tab.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three changes against "still too loose at lowest sensitivity":
- Popcount entropy band tightened from [8, 56] to [16, 48]. The wider
band let too much low-frequency content through (skies, scans,
faded film) where pHash collapses to near-uniform values that
Hamming-trivially across hundreds of unrelated images.
- dHash check now uses an asymmetric stricter threshold
(dhash_threshold = max(2, threshold/2)). pHash is the candidate-
discovery signal; dHash is validation. Splitting the budget means
a real near-dup survives both while incidental pHash collisions
on uniform content get vetoed. Missing dHash on either side now
rejects the edge (was: trust pHash alone).
- Single-link union-find can chain weakly-similar images via
transitive edges. Added a medoid-validation pass: per cluster,
pick the member with smallest summed distance to others, then
drop any whose distance to it exceeds threshold. Two new tests
pin both invariants.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The perceptual cluster was producing one giant first group that
contained hundreds of unrelated images. Two causes:
- Solid-colour images (skies, black frames, monochrome scans) all
hash to near-zero pHashes that Hamming-distance-zero to each other.
- Single-link clustering on pHash alone is too permissive — a chain
of weakly-similar images all collapses into one cluster.
Fixed by skipping hashes outside the popcount [8, 56] band (uniform
content) and requiring dHash agreement within threshold before
unioning a candidate edge from the BK-tree. Two new tests pin both
invariants.
Backfill bin separately fix: decode-failed rows kept phash_64=NULL
and got re-pulled by every batch, infinite-looping on a queue of
unbreakable formats. Persist a 0/0 sentinel on decode failure so
the row leaves the candidate set; the all-zero hash is excluded
from clustering by the same entropy filter so it doesn't pollute
results.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds pHash + dHash columns alongside the existing blake3 content_hash so
near-duplicates (re-encoded, resized, format-converted copies) become
queryable. /duplicates/{exact,perceptual} return groups; /duplicates/
{resolve,unresolve} flip a duplicate_of_hash soft-mark on losing rows
and union perceptual-only tag sets onto the survivor. The default
/photos listing filters duplicate_of_hash IS NULL so demoted siblings
stop cluttering the grid; include_duplicates=true opts back in for
Apollo's review modal. Upload now hashes bytes pre-write and returns
409 with the canonical sibling when a file's bytes already exist.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>