duplicates: folder-pair view of exact dups
Bucket exact-dup rows by (library_id, dirname) pair on each side, then filter by coverage = shared / min(folder_a_total, folder_b_total) and an absolute floor on shared count. Surfaces "this folder is mostly contained in that folder" matches that the per-file EXACT view buries under one row each — e.g. an old phone-backup tree shadowing the organized library, or a topic-grouped folder duplicating a date-grouped one within the same library. New endpoint: GET /duplicates/folder-pairs?library=&include_resolved= &min_coverage=&min_shared=. Cached 5 min keyed on (library, include_resolved); the user-tunable thresholds filter the cached unfiltered pair list so slider drags don't re-bucket. Shares the resolve / unresolve flow with the existing tabs — the frontend fans out N parallel /resolve calls, one per shared content_hash. Folder names carry no signal (BMW lives under Night Photos, not BMW_backup), so bucketing is purely on (library_id, dirname) co-occurrence in exact-dup groups. Within-folder dups (same hash twice in the same folder) are skipped — those belong to the EXACT tab. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -446,6 +446,18 @@ pub trait ExifDao: Sync + Send {
|
||||
include_resolved: bool,
|
||||
) -> Result<Vec<DuplicateRow>, DbError>;
|
||||
|
||||
/// Lightweight `(library_id, rel_path)` listing for every hashed
|
||||
/// image_exif row, used to compute per-folder file totals for the
|
||||
/// folder-pair duplicate view. Filters mirror `list_duplicates_exact`
|
||||
/// so the denominator (folder population) and numerator (shared
|
||||
/// dups between two folders) come from the same row population.
|
||||
fn list_image_paths(
|
||||
&mut self,
|
||||
context: &opentelemetry::Context,
|
||||
library_id: Option<i32>,
|
||||
include_resolved: bool,
|
||||
) -> Result<Vec<(i32, String)>, DbError>;
|
||||
|
||||
/// Look up a single row's metadata by `(library_id, rel_path)`. Used
|
||||
/// by the resolve endpoint to map the request payload to the
|
||||
/// underlying `content_hash` before writing the soft-mark. Returns
|
||||
@@ -1585,6 +1597,33 @@ impl ExifDao for SqliteExifDao {
|
||||
.map_err(|_| DbError::new(DbErrorKind::QueryError))
|
||||
}
|
||||
|
||||
fn list_image_paths(
|
||||
&mut self,
|
||||
context: &opentelemetry::Context,
|
||||
library_id_filter: Option<i32>,
|
||||
include_resolved: bool,
|
||||
) -> Result<Vec<(i32, String)>, DbError> {
|
||||
trace_db_call(context, "query", "list_image_paths", |_span| {
|
||||
use schema::image_exif::dsl::*;
|
||||
|
||||
let mut connection = self.connection.lock().expect("Unable to get ExifDao");
|
||||
|
||||
let mut q = image_exif
|
||||
.filter(content_hash.is_not_null())
|
||||
.select((library_id, rel_path))
|
||||
.into_boxed();
|
||||
if let Some(lib) = library_id_filter {
|
||||
q = q.filter(library_id.eq(lib));
|
||||
}
|
||||
if !include_resolved {
|
||||
q = q.filter(duplicate_of_hash.is_null());
|
||||
}
|
||||
q.load::<(i32, String)>(connection.deref_mut())
|
||||
.map_err(|_| anyhow::anyhow!("Query error"))
|
||||
})
|
||||
.map_err(|_| DbError::new(DbErrorKind::QueryError))
|
||||
}
|
||||
|
||||
fn lookup_duplicate_row(
|
||||
&mut self,
|
||||
context: &opentelemetry::Context,
|
||||
|
||||
Reference in New Issue
Block a user