multi-library: availability probe + scoped EXIF queries + collision fixes
Branch A of the multi-library data-model rollout. Three threads of
correctness/safety work that ship together because the new mount
needs all three before it can land:
1. Library availability probe (libraries.rs, state.rs, main.rs)
New LibraryHealth (Online | Stale { reason, since }) and a shared
LibraryHealthMap on AppState. Probe checks root_path exists +
is_dir + readable + non-empty (relative to a "had_data" signal so
fresh mounts aren't downgraded). The watcher tick begins with a
refresh_health() per library; stale libraries skip ingest, the
hash backfill, and face-detection backlog drains for that tick.
The orphaned-playlist cleanup also gates on every library being
online — a missing source on a stale library is indistinguishable
from a transient unmount, and the cleanup is destructive.
/libraries now returns each library with its current health
state. Logs only on Online↔Stale transitions so a long outage
doesn't spam.
New ExifDao::count_for_library is the "had_data" signal.
2. EXIF queries scoped by library_id (database/mod.rs, files.rs,
main.rs, tags.rs)
query_by_exif gains an Option<i32> library filter; /photos and
/photos/exif now pass it. Without this, an EXIF-filtered request
scoped to ?library=N returned cross-library results because the
handler resolved the library but didn't push it through to SQL.
get_exif_batch gains the same option. The watcher's per-library
ingest, face-candidate build, and content-hash backfill all
scope to their library; the union-mode /photos date-sort path
and the library-agnostic tag fan-out (lookup_tags_batch, by
design) keep using None.
3. Derivative-path collision fixes (content_hash.rs, main.rs)
New content_hash::library_scoped_legacy_path helper:
<derivative_dir>/<library_id>/<rel_path>. Thumbnail generation
(startup walk + watcher needs-thumb check) and serving now use
it; serving falls back to the bare-legacy mirrored path so
pre-multi-library deployments keep working without
regeneration. Without this, lib2 with the same rel_path as lib1
would have its thumbnail request short-circuit to lib1's image.
Orphaned-playlist cleanup walks every library when checking for
the source video (was: BASE_PATH only). Without this, mounting
a 2nd library and waiting 24h would delete every playlist whose
source lived only in the 2nd library.
The HLS playlist write path collision (filename-only basename,
not rel_path) is left as a known issue with a TODO at the call
site — the actor-pipeline rewrite belongs in Branch B/C.
Tests: 212 pass (cargo test --lib). New tests cover the probe
states (online / missing root / non-dir / empty-with-prior-data),
refresh_health transitions, query_by_exif scoping, get_exif_batch
keying on (library_id, rel_path), library_scoped_legacy_path, and
count_for_library.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
264
src/libraries.rs
264
src/libraries.rs
@@ -3,7 +3,9 @@ use chrono::Utc;
|
||||
use diesel::prelude::*;
|
||||
use diesel::sqlite::SqliteConnection;
|
||||
use log::{info, warn};
|
||||
use std::collections::HashMap;
|
||||
use std::path::{Path, PathBuf};
|
||||
use std::sync::{Arc, RwLock};
|
||||
|
||||
use crate::data::Claims;
|
||||
use crate::database::models::{InsertLibrary, LibraryRow};
|
||||
@@ -146,16 +148,165 @@ pub fn resolve_library_param<'a>(
|
||||
.ok_or_else(|| format!("unknown library name: {}", raw))
|
||||
}
|
||||
|
||||
/// Health of a library at a point in time. Probed at the top of each
|
||||
/// file-watcher tick. The `Stale` state is the "be conservative" signal:
|
||||
/// destructive paths (ingest writes, future move-handoff and orphan GC in
|
||||
/// branches B/C) skip a stale library, but reads/serving stay unaffected.
|
||||
///
|
||||
/// See `CLAUDE.md` → "Library availability and safety" for the policy.
|
||||
#[derive(Clone, Debug, serde::Serialize, PartialEq, Eq)]
|
||||
#[serde(tag = "state", rename_all = "snake_case")]
|
||||
pub enum LibraryHealth {
|
||||
Online,
|
||||
Stale {
|
||||
reason: String,
|
||||
/// Unix timestamp (seconds) of the most recent transition into
|
||||
/// Stale. Held for telemetry / `/libraries` surfacing only —
|
||||
/// gating logic doesn't read it.
|
||||
since: i64,
|
||||
},
|
||||
}
|
||||
|
||||
impl LibraryHealth {
|
||||
pub fn is_online(&self) -> bool {
|
||||
matches!(self, LibraryHealth::Online)
|
||||
}
|
||||
}
|
||||
|
||||
/// Shared snapshot of every configured library's health, keyed by
|
||||
/// `library_id`. The watcher writes; HTTP handlers read. RwLock because
|
||||
/// reads vastly outnumber writes (one tick vs. every status request).
|
||||
pub type LibraryHealthMap = Arc<RwLock<HashMap<i32, LibraryHealth>>>;
|
||||
|
||||
/// Construct an initial health map. Libraries start `Online`; the first
|
||||
/// probe will downgrade any that fail. Starting `Stale` would block ingest
|
||||
/// for the watcher's first tick on a healthy mount, which is the wrong
|
||||
/// default for a server that's just been restarted.
|
||||
pub fn new_health_map(libs: &[Library]) -> LibraryHealthMap {
|
||||
let mut m = HashMap::with_capacity(libs.len());
|
||||
for lib in libs {
|
||||
m.insert(lib.id, LibraryHealth::Online);
|
||||
}
|
||||
Arc::new(RwLock::new(m))
|
||||
}
|
||||
|
||||
/// Probe a library's mount point. Cheap: stat + open dir + peek one entry.
|
||||
///
|
||||
/// `had_data` is the caller's prior knowledge that this library has been
|
||||
/// non-empty before — typically `image_exif` row count > 0. When true, an
|
||||
/// empty directory is suspicious (it's how an unmounted NFS share looks);
|
||||
/// when false, it's accepted as a fresh mount that simply hasn't been
|
||||
/// indexed yet.
|
||||
///
|
||||
/// Note: stat / read_dir on a hard-mounted, unreachable NFS share can
|
||||
/// block. The watcher accepts that risk for now — the worst case is that
|
||||
/// the tick stalls until the mount returns, which is no more destructive
|
||||
/// than the pre-probe behavior. A future enhancement can wrap this in a
|
||||
/// thread + timeout if it becomes an operational issue.
|
||||
pub fn probe_online(lib: &Library, had_data: bool) -> LibraryHealth {
|
||||
let now = Utc::now().timestamp();
|
||||
let path = Path::new(&lib.root_path);
|
||||
|
||||
let metadata = match std::fs::metadata(path) {
|
||||
Ok(m) => m,
|
||||
Err(e) => {
|
||||
return LibraryHealth::Stale {
|
||||
reason: format!("root_path stat failed: {}", e),
|
||||
since: now,
|
||||
};
|
||||
}
|
||||
};
|
||||
if !metadata.is_dir() {
|
||||
return LibraryHealth::Stale {
|
||||
reason: format!("root_path is not a directory: {}", lib.root_path),
|
||||
since: now,
|
||||
};
|
||||
}
|
||||
|
||||
let mut entries = match std::fs::read_dir(path) {
|
||||
Ok(it) => it,
|
||||
Err(e) => {
|
||||
return LibraryHealth::Stale {
|
||||
reason: format!("read_dir failed: {}", e),
|
||||
since: now,
|
||||
};
|
||||
}
|
||||
};
|
||||
|
||||
// Empty directory only counts as Stale when we have prior evidence
|
||||
// this library used to have content. A genuinely fresh mount is
|
||||
// legitimately empty, and degrading it would block first-time ingest.
|
||||
if had_data && entries.next().is_none() {
|
||||
return LibraryHealth::Stale {
|
||||
reason: "library is empty but image_exif has rows for it".to_string(),
|
||||
since: now,
|
||||
};
|
||||
}
|
||||
|
||||
LibraryHealth::Online
|
||||
}
|
||||
|
||||
/// Probe `lib`, update `map`, and return the new state. Logs only on a
|
||||
/// state transition (Online↔Stale) so a long outage doesn't spam at every
|
||||
/// tick — operators get one warn on the way down and one info on the way
|
||||
/// up.
|
||||
pub fn refresh_health(map: &LibraryHealthMap, lib: &Library, had_data: bool) -> LibraryHealth {
|
||||
let new_state = probe_online(lib, had_data);
|
||||
let mut guard = map.write().unwrap_or_else(|e| e.into_inner());
|
||||
let prev = guard.get(&lib.id).cloned();
|
||||
let transitioned = matches!(
|
||||
(&prev, &new_state),
|
||||
(None, LibraryHealth::Stale { .. })
|
||||
| (Some(LibraryHealth::Online), LibraryHealth::Stale { .. })
|
||||
| (Some(LibraryHealth::Stale { .. }), LibraryHealth::Online)
|
||||
);
|
||||
if transitioned {
|
||||
match &new_state {
|
||||
LibraryHealth::Online => info!(
|
||||
"Library '{}' (id={}) recovered: {} is online",
|
||||
lib.name, lib.id, lib.root_path
|
||||
),
|
||||
LibraryHealth::Stale { reason, .. } => warn!(
|
||||
"Library '{}' (id={}) is STALE — pausing writes. Reason: {}. Path: {}",
|
||||
lib.name, lib.id, reason, lib.root_path
|
||||
),
|
||||
}
|
||||
}
|
||||
guard.insert(lib.id, new_state.clone());
|
||||
new_state
|
||||
}
|
||||
|
||||
/// Snapshot of one library + its current health, for `/libraries`.
|
||||
#[derive(serde::Serialize)]
|
||||
pub struct LibraryStatus {
|
||||
#[serde(flatten)]
|
||||
pub library: Library,
|
||||
pub health: LibraryHealth,
|
||||
}
|
||||
|
||||
#[derive(serde::Serialize)]
|
||||
pub struct LibrariesResponse {
|
||||
pub libraries: Vec<Library>,
|
||||
pub libraries: Vec<LibraryStatus>,
|
||||
}
|
||||
|
||||
#[get("/libraries")]
|
||||
pub async fn list_libraries(_claims: Claims, app_state: Data<AppState>) -> impl Responder {
|
||||
HttpResponse::Ok().json(LibrariesResponse {
|
||||
libraries: app_state.libraries.clone(),
|
||||
})
|
||||
let health_guard = app_state
|
||||
.library_health
|
||||
.read()
|
||||
.unwrap_or_else(|e| e.into_inner());
|
||||
let libraries = app_state
|
||||
.libraries
|
||||
.iter()
|
||||
.map(|lib| LibraryStatus {
|
||||
library: lib.clone(),
|
||||
health: health_guard
|
||||
.get(&lib.id)
|
||||
.cloned()
|
||||
.unwrap_or(LibraryHealth::Online),
|
||||
})
|
||||
.collect();
|
||||
HttpResponse::Ok().json(LibrariesResponse { libraries })
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
@@ -279,4 +430,109 @@ mod tests {
|
||||
let err = resolve_library_param(&state, Some("missing")).unwrap_err();
|
||||
assert!(err.contains("unknown library name"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn probe_online_for_existing_non_empty_dir() {
|
||||
let tmp = tempfile::tempdir().unwrap();
|
||||
std::fs::write(tmp.path().join("photo.jpg"), b"hello").unwrap();
|
||||
let lib = Library {
|
||||
id: 1,
|
||||
name: "main".into(),
|
||||
root_path: tmp.path().to_string_lossy().into(),
|
||||
};
|
||||
// had_data doesn't matter when the dir has entries.
|
||||
assert!(probe_online(&lib, true).is_online());
|
||||
assert!(probe_online(&lib, false).is_online());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn probe_stale_when_root_missing() {
|
||||
let lib = Library {
|
||||
id: 1,
|
||||
name: "main".into(),
|
||||
root_path: "/nonexistent/definitely/not/here".into(),
|
||||
};
|
||||
assert!(matches!(
|
||||
probe_online(&lib, false),
|
||||
LibraryHealth::Stale { .. }
|
||||
));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn probe_stale_when_root_is_a_file() {
|
||||
let tmp = tempfile::tempdir().unwrap();
|
||||
let file = tmp.path().join("not-a-dir");
|
||||
std::fs::write(&file, b"x").unwrap();
|
||||
let lib = Library {
|
||||
id: 1,
|
||||
name: "main".into(),
|
||||
root_path: file.to_string_lossy().into(),
|
||||
};
|
||||
assert!(matches!(
|
||||
probe_online(&lib, false),
|
||||
LibraryHealth::Stale { .. }
|
||||
));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn probe_empty_dir_is_online_when_no_prior_data() {
|
||||
// Fresh mount: empty directory, no rows in image_exif. Accept it.
|
||||
let tmp = tempfile::tempdir().unwrap();
|
||||
let lib = Library {
|
||||
id: 1,
|
||||
name: "main".into(),
|
||||
root_path: tmp.path().to_string_lossy().into(),
|
||||
};
|
||||
assert!(probe_online(&lib, false).is_online());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn probe_empty_dir_is_stale_when_prior_data_existed() {
|
||||
// The "share went offline" signal: directory exists but is empty,
|
||||
// and we know the library used to have content. Treat as Stale.
|
||||
let tmp = tempfile::tempdir().unwrap();
|
||||
let lib = Library {
|
||||
id: 1,
|
||||
name: "main".into(),
|
||||
root_path: tmp.path().to_string_lossy().into(),
|
||||
};
|
||||
match probe_online(&lib, true) {
|
||||
LibraryHealth::Stale { reason, .. } => {
|
||||
assert!(reason.contains("empty"), "unexpected reason: {}", reason)
|
||||
}
|
||||
other => panic!("expected Stale, got {:?}", other),
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn refresh_health_logs_only_on_transition() {
|
||||
// Smoke test: refresh_health updates the map and reports correctly.
|
||||
// (We can't easily assert on logs without a custom logger; the
|
||||
// important thing is that the state churns properly.)
|
||||
let tmp = tempfile::tempdir().unwrap();
|
||||
let lib = Library {
|
||||
id: 42,
|
||||
name: "test".into(),
|
||||
root_path: tmp.path().to_string_lossy().into(),
|
||||
};
|
||||
let map = new_health_map(&[lib.clone()]);
|
||||
|
||||
// First probe: empty dir, no prior data — Online.
|
||||
let s1 = refresh_health(&map, &lib, false);
|
||||
assert!(s1.is_online());
|
||||
|
||||
// Probe again with had_data=true on the same empty dir — Stale.
|
||||
let s2 = refresh_health(&map, &lib, true);
|
||||
assert!(matches!(s2, LibraryHealth::Stale { .. }));
|
||||
assert_eq!(
|
||||
map.read().unwrap().get(&lib.id).cloned(),
|
||||
Some(s2.clone()),
|
||||
"map should reflect the latest probe"
|
||||
);
|
||||
|
||||
// Recovery: drop a file and probe again.
|
||||
std::fs::write(tmp.path().join("photo.jpg"), b"x").unwrap();
|
||||
let s3 = refresh_health(&map, &lib, true);
|
||||
assert!(s3.is_online());
|
||||
}
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user