tags: batch lookup expands content-hash siblings cross-library

The first cut matched by rel_path only — fine for single-library
deploys but wrong for multi-library setups where the same content
lives under different rel_paths (e.g. a backup mount holding copies
of the primary library). A tag applied under library A would silently
not appear in the library-B grid badge even though the carousel's
per-path /image/tags would resolve it correctly via siblings.

The batch handler now does the expansion server-side in three queries
regardless of input size:

  1. image_exif batch lookup → query path → content_hash
  2. image_exif JOIN by content_hash → all sibling rel_paths sharing
     each hash (paths are deduped across libraries)
  3. tagged_photo + tags JOIN over the union of (query + sibling)
     rel_paths

Tags are then aggregated back to query paths via a sibling→originals
reverse map, deduped by tag id. Files without a content_hash (just
indexed, hash compute pending, etc.) skip step 2 and only get tags
from their own rel_path — same fallback the per-path handler uses.

Adds ExifDao::get_rel_paths_for_hashes (batch counterpart of
get_rel_paths_by_hash) chunked at 500 to stay under SQLite's
SQLITE_LIMIT_VARIABLE_NUMBER. Five queries for a 4k-photo grid is
still ~800x cheaper than per-path HTTP fan-out.
This commit is contained in:
Cameron Cordes
2026-04-30 00:36:44 +00:00
parent 3112260dc8
commit 6a6a4a6a46
3 changed files with 207 additions and 21 deletions

View File

@@ -386,6 +386,16 @@ pub trait ExifDao: Sync + Send {
hash: &str,
) -> Result<Vec<String>, DbError>;
/// Batch version of [`get_rel_paths_by_hash`]. Returns a
/// `hash → Vec<rel_path>` map for every hash that has at least one
/// rel_path. Used by the batch tag lookup endpoint to expand
/// content-hash siblings without firing a query per hash.
fn get_rel_paths_for_hashes(
&mut self,
context: &opentelemetry::Context,
hashes: &[String],
) -> Result<std::collections::HashMap<String, Vec<String>>, DbError>;
/// List `(library_id, rel_path)` pairs for the given libraries, optionally
/// restricted to rows whose rel_path starts with `path_prefix`. When
/// `library_ids` is empty, rows from every library are returned. Used by
@@ -956,6 +966,40 @@ impl ExifDao for SqliteExifDao {
.map_err(|_| DbError::new(DbErrorKind::QueryError))
}
fn get_rel_paths_for_hashes(
&mut self,
context: &opentelemetry::Context,
hashes: &[String],
) -> Result<std::collections::HashMap<String, Vec<String>>, DbError> {
use std::collections::HashMap;
let mut out: HashMap<String, Vec<String>> = HashMap::new();
if hashes.is_empty() {
return Ok(out);
}
trace_db_call(context, "query", "get_rel_paths_for_hashes", |_span| {
use schema::image_exif::dsl::*;
let mut connection = self.connection.lock().expect("Unable to get ExifDao");
// Chunk the IN clause to stay safely under SQLite's
// SQLITE_LIMIT_VARIABLE_NUMBER (32766 modern, 999 legacy).
const CHUNK: usize = 500;
for chunk in hashes.chunks(CHUNK) {
let rows: Vec<(String, String)> = image_exif
.filter(content_hash.eq_any(chunk))
.select((content_hash.assume_not_null(), rel_path))
.distinct()
.load::<(String, String)>(connection.deref_mut())
.map_err(|_| anyhow::anyhow!("Query error"))?;
for (hash, path) in rows {
out.entry(hash).or_default().push(path);
}
}
Ok(out)
})
.map_err(|_| DbError::new(DbErrorKind::QueryError))
}
fn list_rel_paths_for_libraries(
&mut self,
context: &opentelemetry::Context,