multi-library: availability probe + scoped EXIF queries + collision fixes

Branch A of the multi-library data-model rollout. Three threads of
correctness/safety work that ship together because the new mount
needs all three before it can land:

1. Library availability probe (libraries.rs, state.rs, main.rs)

   New LibraryHealth (Online | Stale { reason, since }) and a shared
   LibraryHealthMap on AppState. Probe checks root_path exists +
   is_dir + readable + non-empty (relative to a "had_data" signal so
   fresh mounts aren't downgraded). The watcher tick begins with a
   refresh_health() per library; stale libraries skip ingest, the
   hash backfill, and face-detection backlog drains for that tick.
   The orphaned-playlist cleanup also gates on every library being
   online — a missing source on a stale library is indistinguishable
   from a transient unmount, and the cleanup is destructive.

   /libraries now returns each library with its current health
   state. Logs only on Online↔Stale transitions so a long outage
   doesn't spam.

   New ExifDao::count_for_library is the "had_data" signal.

2. EXIF queries scoped by library_id (database/mod.rs, files.rs,
   main.rs, tags.rs)

   query_by_exif gains an Option<i32> library filter; /photos and
   /photos/exif now pass it. Without this, an EXIF-filtered request
   scoped to ?library=N returned cross-library results because the
   handler resolved the library but didn't push it through to SQL.

   get_exif_batch gains the same option. The watcher's per-library
   ingest, face-candidate build, and content-hash backfill all
   scope to their library; the union-mode /photos date-sort path
   and the library-agnostic tag fan-out (lookup_tags_batch, by
   design) keep using None.

3. Derivative-path collision fixes (content_hash.rs, main.rs)

   New content_hash::library_scoped_legacy_path helper:
   <derivative_dir>/<library_id>/<rel_path>. Thumbnail generation
   (startup walk + watcher needs-thumb check) and serving now use
   it; serving falls back to the bare-legacy mirrored path so
   pre-multi-library deployments keep working without
   regeneration. Without this, lib2 with the same rel_path as lib1
   would have its thumbnail request short-circuit to lib1's image.

   Orphaned-playlist cleanup walks every library when checking for
   the source video (was: BASE_PATH only). Without this, mounting
   a 2nd library and waiting 24h would delete every playlist whose
   source lived only in the 2nd library.

   The HLS playlist write path collision (filename-only basename,
   not rel_path) is left as a known issue with a TODO at the call
   site — the actor-pipeline rewrite belongs in Branch B/C.

Tests: 212 pass (cargo test --lib). New tests cover the probe
states (online / missing root / non-dir / empty-with-prior-data),
refresh_health transitions, query_by_exif scoping, get_exif_batch
keying on (library_id, rel_path), library_scoped_legacy_path, and
count_for_library.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Cameron Cordes
2026-05-01 14:12:49 +00:00
parent 2f91891459
commit eea1bf3181
7 changed files with 622 additions and 51 deletions

View File

@@ -150,7 +150,12 @@ async fn get_image(
let relative_path_str = relative_path.to_string_lossy().replace('\\', "/");
let thumbs = &app_state.thumbnail_path;
let legacy_thumb_path = Path::new(&thumbs).join(relative_path);
let bare_legacy_thumb_path = Path::new(&thumbs).join(relative_path);
let scoped_legacy_thumb_path = content_hash::library_scoped_legacy_path(
Path::new(&thumbs),
library.id,
relative_path,
);
// Gif thumbnails are a separate lookup (video GIF previews).
// Dual-lookup for gif is out of scope; preserve existing flow.
@@ -168,8 +173,16 @@ async fn get_image(
}
}
// Resolve the hash-keyed thumbnail (if the row already has a
// content_hash) and fall back to the legacy mirrored path.
// Lookup chain (most-specific first, falling back as we miss):
// 1. hash-keyed (`<thumbs>/<hash[..2]>/<hash>.jpg`) — content
// identity, shared across libraries;
// 2. library-scoped legacy (`<thumbs>/<lib_id>/<rel_path>`) —
// written by current generation when hash isn't known;
// 3. bare legacy (`<thumbs>/<rel_path>`) — pre-multi-library
// thumbs from the days before library prefixing existed.
// Stage (3) goes away once a one-time migration lifts every
// bare-legacy file under a library prefix; until then it
// prevents needless 404s for already-warmed deployments.
let hash_thumb_path: Option<PathBuf> = {
let mut dao = exif_dao.lock().expect("Unable to lock ExifDao");
match dao.get_exif(&context, &relative_path_str) {
@@ -184,7 +197,14 @@ async fn get_image(
.as_ref()
.filter(|p| p.exists())
.cloned()
.unwrap_or_else(|| legacy_thumb_path.clone());
.or_else(|| {
if scoped_legacy_thumb_path.exists() {
Some(scoped_legacy_thumb_path.clone())
} else {
None
}
})
.unwrap_or_else(|| bare_legacy_thumb_path.clone());
// Handle circular thumbnail request
if req.shape == Some(ThumbnailShape::Circle) {
@@ -761,6 +781,15 @@ async fn generate_video(
if let Some(name) = filename.file_name() {
let filename = name.to_str().expect("Filename should convert to string");
// KNOWN ISSUE (multi-library): playlist filename is the basename
// alone, so two source files with the same basename — whether in
// different libraries or different subdirs of one library —
// overwrite each other's playlists while ffmpeg runs. The
// hash-keyed `content_hash::hls_dir` is the long-term answer
// (see CLAUDE.md "Multi-library data model"); rewiring the
// actor pipeline to use it is out of scope for this branch.
// The orphan-cleanup job above already walks every library so
// it doesn't false-delete archive playlists.
let playlist = format!("{}/{}.m3u8", app_state.video_path, filename);
let library = libraries::resolve_library_param(&app_state, body.library.as_deref())
@@ -1315,9 +1344,27 @@ fn create_thumbnails(libs: &[libraries::Library], excluded_dirs: &[String]) {
let Ok(relative_path) = src.strip_prefix(&images) else {
return;
};
let thumb_path = Path::new(thumbnail_directory).join(relative_path);
// Library-scoped legacy path: prevents two libraries with
// the same rel_path from clobbering each other's thumbs.
// Hash-keyed promotion happens lazily on first hash-aware
// request — keeping this loop ExifDao-free preserves the
// current "cargo build && go" startup story.
let thumb_path = content_hash::library_scoped_legacy_path(
thumbnail_directory,
lib.id,
relative_path,
);
let bare_legacy = thumbnail_directory.join(relative_path);
if thumb_path.exists() || unsupported_thumbnail_sentinel(&thumb_path).exists() {
// Backwards-compat check: if a single-library install has a
// bare-legacy thumb here already, accept it as present.
// Same for the sentinel. Means we don't redo work after
// upgrade and we don't leave stale duplicates around.
if thumb_path.exists()
|| bare_legacy.exists()
|| unsupported_thumbnail_sentinel(&thumb_path).exists()
|| unsupported_thumbnail_sentinel(&bare_legacy).exists()
{
return;
}
@@ -1462,10 +1509,18 @@ fn main() -> std::io::Result<()> {
preview_gen_for_watcher,
app_state.face_client.clone(),
app_state.excluded_dirs.clone(),
app_state.library_health.clone(),
);
// Start orphaned playlist cleanup job
cleanup_orphaned_playlists(app_state.excluded_dirs.clone());
// Start orphaned playlist cleanup job. Multi-library aware: walks
// every configured library when looking for the source video, and
// skips the whole cycle while any library is stale (a missing
// source is indistinguishable from a transiently-unmounted share).
cleanup_orphaned_playlists(
app_state.libraries.clone(),
app_state.excluded_dirs.clone(),
app_state.library_health.clone(),
);
// Spawn background job to generate daily conversation summaries
{
@@ -1657,10 +1712,13 @@ fn run_migrations(
}
/// Clean up orphaned HLS playlists and segments whose source videos no longer exist
fn cleanup_orphaned_playlists(excluded_dirs: Vec<String>) {
fn cleanup_orphaned_playlists(
libs: Vec<libraries::Library>,
excluded_dirs: Vec<String>,
library_health: libraries::LibraryHealthMap,
) {
std::thread::spawn(move || {
let video_path = dotenv::var("VIDEO_PATH").expect("VIDEO_PATH must be set");
let base_path = dotenv::var("BASE_PATH").expect("BASE_PATH must be set");
// Get cleanup interval from environment (default: 24 hours)
let cleanup_interval_secs = dotenv::var("PLAYLIST_CLEANUP_INTERVAL_SECONDS")
@@ -1671,10 +1729,41 @@ fn cleanup_orphaned_playlists(excluded_dirs: Vec<String>) {
info!("Starting orphaned playlist cleanup job");
info!(" Cleanup interval: {} seconds", cleanup_interval_secs);
info!(" Playlist directory: {}", video_path);
for lib in &libs {
info!(" Checking sources under '{}' at {}", lib.name, lib.root_path);
}
loop {
std::thread::sleep(Duration::from_secs(cleanup_interval_secs));
// Safety gate: skip the cleanup cycle if any library is
// stale. A missing source video on a stale library is
// indistinguishable from a transient unmount, and the
// cleanup is destructive — we'd rather leak a few playlist
// files for a tick than delete one whose source is briefly
// unreachable. The cycle re-runs on the next interval.
{
let guard = library_health.read().unwrap_or_else(|e| e.into_inner());
let stale: Vec<String> = libs
.iter()
.filter(|lib| {
guard
.get(&lib.id)
.map(|h| !h.is_online())
.unwrap_or(false)
})
.map(|lib| lib.name.clone())
.collect();
if !stale.is_empty() {
warn!(
"Skipping orphaned-playlist cleanup: {} library(ies) stale: [{}]",
stale.len(),
stale.join(", ")
);
continue;
}
}
info!("Running orphaned playlist cleanup");
let start = std::time::Instant::now();
let mut deleted_count = 0;
@@ -1703,20 +1792,25 @@ fn cleanup_orphaned_playlists(excluded_dirs: Vec<String>) {
if let Some(filename) = playlist_path.file_stem() {
let video_filename = filename.to_string_lossy();
// Search for this video file in BASE_PATH, respecting
// EXCLUDED_DIRS so we don't false-resurrect playlists for
// videos that only exist inside an excluded subtree.
// Search for this video file across every configured
// library, respecting EXCLUDED_DIRS so we don't
// false-resurrect playlists for videos that only
// exist inside an excluded subtree. As soon as one
// library has a matching source, we're done — the
// playlist isn't orphaned.
let mut video_exists = false;
for entry in image_api::file_scan::walk_library_files(
Path::new(&base_path),
&excluded_dirs,
) {
if let Some(entry_stem) = entry.path().file_stem()
&& entry_stem == filename
&& is_video_file(entry.path())
{
video_exists = true;
break;
'libs: for lib in &libs {
for entry in image_api::file_scan::walk_library_files(
Path::new(&lib.root_path),
&excluded_dirs,
) {
if let Some(entry_stem) = entry.path().file_stem()
&& entry_stem == filename
&& is_video_file(entry.path())
{
video_exists = true;
break 'libs;
}
}
}
@@ -1792,6 +1886,7 @@ fn watch_files(
preview_generator: Addr<video::actors::PreviewClipGenerator>,
face_client: crate::ai::face_client::FaceClient,
excluded_dirs: Vec<String>,
library_health: libraries::LibraryHealthMap,
) {
std::thread::spawn(move || {
// Get polling intervals from environment variables
@@ -1861,6 +1956,31 @@ fn watch_files(
let is_full_scan = since_last_full.as_secs() >= full_interval_secs;
for lib in &libs {
// Availability probe: every tick checks that the
// library's mount is reachable, is a directory, is
// readable, and (if image_exif has rows for it) is
// non-empty. A Stale library skips ingest, backlog
// drains, and metric refresh — reads/serving in HTTP
// handlers continue to work. Branches B/C extend the
// probe gate to cover handoff and orphan GC. See
// CLAUDE.md "Library availability and safety".
let had_data = {
let context = opentelemetry::Context::new();
let mut guard = exif_dao.lock().expect("exif_dao poisoned");
guard
.count_for_library(&context, lib.id)
.map(|n| n > 0)
.unwrap_or(false)
};
let health = libraries::refresh_health(&library_health, lib, had_data);
if !health.is_online() {
// Skip every write path for this library this tick.
// Don't refresh the media-count gauge either — a
// probe-failed library would otherwise flap to 0
// image / 0 video and pollute Prometheus.
continue;
}
// Drain the unhashed-hash backlog AND the face-detection
// backlog every tick, regardless of quick/full. Quick
// scans only walk recently-modified files, so the
@@ -1992,7 +2112,9 @@ fn process_new_files(
let existing_exif_paths: HashMap<String, bool> = {
let mut dao = exif_dao.lock().expect("Unable to lock ExifDao");
match dao.get_exif_batch(&context, &file_paths) {
// Walk is per-library, so scope the lookup so a same-named file
// in another library doesn't make this one look already-indexed.
match dao.get_exif_batch(&context, Some(library.id), &file_paths) {
Ok(exif_records) => exif_records
.into_iter()
.map(|record| (record.file_path, true))
@@ -2012,9 +2134,19 @@ fn process_new_files(
// derivative dedup and DB-indexed sort/filter work for every file,
// not just photos with parseable EXIF.
for (file_path, relative_path) in &files {
let thumb_path = thumbnail_directory.join(relative_path);
let needs_thumbnail =
!thumb_path.exists() && !unsupported_thumbnail_sentinel(&thumb_path).exists();
// Check both the library-scoped legacy path (current shape) and
// the bare-legacy path (pre-multi-library shape). Either one
// existing means a thumbnail is already on disk for this file.
let scoped_thumb_path = content_hash::library_scoped_legacy_path(
thumbnail_directory,
library.id,
relative_path,
);
let bare_legacy_thumb_path = thumbnail_directory.join(relative_path);
let needs_thumbnail = !scoped_thumb_path.exists()
&& !bare_legacy_thumb_path.exists()
&& !unsupported_thumbnail_sentinel(&scoped_thumb_path).exists()
&& !unsupported_thumbnail_sentinel(&bare_legacy_thumb_path).exists();
let needs_row = !existing_exif_paths.contains_key(relative_path);
if needs_thumbnail || needs_row {
@@ -2131,7 +2263,7 @@ fn process_new_files(
// ensures small/medium deploys self-heal without operator
// action.
backfill_missing_content_hashes(&context, &files, library, &exif_dao);
let candidates = build_face_candidates(&context, &files, &exif_dao, &face_dao);
let candidates = build_face_candidates(&context, library, &files, &exif_dao, &face_dao);
debug!(
"face_watch: scan tick — {} image file(s) walked, {} candidate(s) (library '{}', modified_since={})",
files.iter().filter(|(p, _)| !is_video_file(p)).count(),
@@ -2449,7 +2581,7 @@ fn backfill_missing_content_hashes(
let exif_records = {
let mut dao = exif_dao.lock().expect("Unable to lock ExifDao");
dao.get_exif_batch(context, &image_paths)
dao.get_exif_batch(context, Some(library.id), &image_paths)
.unwrap_or_default()
};
// Cheap lookup back from rel_path → absolute file_path so
@@ -2541,6 +2673,7 @@ fn backfill_missing_content_hashes(
/// covers both new uploads and the initial backlog scan.
fn build_face_candidates(
context: &opentelemetry::Context,
library: &libraries::Library,
files: &[(PathBuf, String)],
exif_dao: &Arc<Mutex<Box<dyn ExifDao>>>,
face_dao: &Arc<Mutex<Box<dyn faces::FaceDao>>>,
@@ -2558,7 +2691,7 @@ fn build_face_candidates(
let exif_records = {
let mut dao = exif_dao.lock().expect("Unable to lock ExifDao");
dao.get_exif_batch(context, &image_paths)
dao.get_exif_batch(context, Some(library.id), &image_paths)
.unwrap_or_default()
};
// rel_path → content_hash (only rows with a hash; without one we have