hls: rewrite orphan cleanup for hash-keyed layout

The cleanup walk previously looked for `$VIDEO_PATH/<basename>.m3u8`
and matched each file's stem against a recursive walk of every
library. With the hash-keyed layout now in place, every playlist's
file_stem is the literal string "playlist" — the old logic would
treat every hash-keyed playlist as orphaned on its next run and wipe
them all in one tick (default cleanup interval is 24h, so this is a
24-hour bomb on top of the prior commit).

New approach: orphan-ness is decided in the database, not on the
filesystem. The cleanup loop:

- Snapshots every distinct non-NULL `image_exif.content_hash` into a
  HashSet (new `ExifDao::list_distinct_content_hashes` method —
  `SELECT DISTINCT content_hash WHERE content_hash IS NOT NULL`).
- Walks `$VIDEO_PATH` two levels deep: top-level entries are filtered
  to 2-char lowercase hex shard dirs, each shard's children to 64-char
  hex hash dirs. Anything else (legacy `.m3u8` at root from the
  pre-content-hash era, operator-stashed dirs, partial writes) is left
  alone.
- Hash dirs whose hash isn't in the alive set are `remove_dir_all`'d.
  Shard dirs that emptied as a result are reaped on the same pass via
  `remove_dir` (no-op if non-empty).
- The library-stale safety gate is preserved: a stale library skips
  the cycle even though the orphan decision is DB-only, because the
  upstream missing-file scan that retires `image_exif` rows itself
  pauses for stale libraries. Belt-and-suspenders — keeping a hash
  dir for one extra 24h cycle is cheaper than wiping one whose source
  was briefly unreachable. The gate now also filters disabled
  libraries out of the stale set (they're intentionally absent from
  the health map).
- The legacy `excluded_dirs` parameter is preserved on the function
  signature but unused (the walk no longer crosses library trees);
  flagged with a leading underscore. Callers in `main.rs` stay
  unchanged.

`MockExifDao` in `files.rs` grows the new method (returns empty);
unit tests for the new `is_hash_shard` / `is_full_hash` validators
guard against an operator's stashed directory under VIDEO_PATH ever
matching the orphan-rm path. Both pass.

A follow-up commit handles the one-shot startup migration that
retires the legacy basename-keyed `.m3u8` / `.ts` files at
`$VIDEO_PATH` root.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Cameron Cordes
2026-05-14 15:41:04 -04:00
parent d1667099c3
commit b8e17e05b7
3 changed files with 220 additions and 115 deletions

View File

@@ -414,6 +414,16 @@ pub trait ExifDao: Sync + Send {
size_bytes: i64,
) -> Result<(), DbError>;
/// Every distinct non-NULL `content_hash` across all libraries. Used
/// by HLS orphan cleanup to identify hash dirs under `$VIDEO_PATH`
/// whose source video no longer exists. Cheap query (single column,
/// indexed) but unbounded in size — the result is a HashSet membership
/// check, so a 100k-photo library produces ~100k strings.
fn list_distinct_content_hashes(
&mut self,
context: &opentelemetry::Context,
) -> Result<Vec<String>, DbError>;
/// Return image_exif rows that need their `date_taken` resolved by the
/// canonical-date waterfall (see `crate::date_resolver`): `date_taken
/// IS NULL`. Returns `(library_id, rel_path)`. The caller filters to
@@ -1231,6 +1241,26 @@ impl ExifDao for SqliteExifDao {
.map_err(|_| DbError::new(DbErrorKind::UpdateError))
}
fn list_distinct_content_hashes(
&mut self,
context: &opentelemetry::Context,
) -> Result<Vec<String>, DbError> {
trace_db_call(context, "query", "list_distinct_content_hashes", |_span| {
use schema::image_exif::dsl::*;
let mut connection = self.connection.lock().expect("Unable to get ExifDao");
image_exif
.filter(content_hash.is_not_null())
.select(content_hash)
.distinct()
.load::<Option<String>>(connection.deref_mut())
.map(|rows| rows.into_iter().flatten().collect())
.map_err(|_| anyhow::anyhow!("Query error"))
})
.map_err(|_| DbError::new(DbErrorKind::QueryError))
}
fn get_rows_needing_date_backfill(
&mut self,
context: &opentelemetry::Context,