duplicates: folder-pair view of exact dups

Bucket exact-dup rows by (library_id, dirname) pair on each side, then
filter by coverage = shared / min(folder_a_total, folder_b_total) and
an absolute floor on shared count. Surfaces "this folder is mostly
contained in that folder" matches that the per-file EXACT view buries
under one row each — e.g. an old phone-backup tree shadowing the
organized library, or a topic-grouped folder duplicating a date-grouped
one within the same library.

New endpoint: GET /duplicates/folder-pairs?library=&include_resolved=
&min_coverage=&min_shared=. Cached 5 min keyed on (library, include_resolved);
the user-tunable thresholds filter the cached unfiltered pair list so
slider drags don't re-bucket. Shares the resolve / unresolve flow with
the existing tabs — the frontend fans out N parallel /resolve calls,
one per shared content_hash.

Folder names carry no signal (BMW lives under Night Photos, not BMW_backup),
so bucketing is purely on (library_id, dirname) co-occurrence in
exact-dup groups. Within-folder dups (same hash twice in the same
folder) are skipped — those belong to the EXACT tab.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Cameron Cordes
2026-05-06 12:43:29 -04:00
parent 9ccb48233f
commit 67cf0c7f73
3 changed files with 523 additions and 0 deletions

View File

@@ -1767,6 +1767,15 @@ mod tests {
Ok(Vec::new())
}
fn list_image_paths(
&mut self,
_context: &opentelemetry::Context,
_library_id: Option<i32>,
_include_resolved: bool,
) -> Result<Vec<(i32, String)>, DbError> {
Ok(Vec::new())
}
fn lookup_duplicate_row(
&mut self,
_context: &opentelemetry::Context,