The hash-keyed `/video/hls/{hash}/{file}` route fully covers HLS
playback now and both clients (Apollo, FileViewer-React) have
shipped updates that use it directly. Keeping the basename-keyed
fallback only encouraged stale URLs to keep flowing — every legacy
file was deleted by the startup migration, so the routes were
guaranteed 404 machines.
Dropped:
- `stream_video` handler (`GET /video/stream?path=…`) — the original
basename-keyed playlist serve.
- `get_video_part` handler (`GET /video/{path}`) — bare-filename
segment serve. The new layout's segments live in
`<shard>/<hash>/segment_NNN.ts` and reach the client via
`stream_hls_file`.
- `legacy_path` field on `GenerateVideoResponse` (serialised as
`playlist`). The field always pointed at a file the migration had
deleted; current clients ignore it entirely.
- Their service registrations in `main.rs`.
- The body-side `filename` extraction in `generate_video` (existed
only to construct `legacy_path`) and the now-unused `global`
opentelemetry import in `handlers/video.rs`.
All 707 tests still pass. Same hand-rolled validators (`is_valid_hash`
/ `is_allowed_hls_filename`) keep the new route's defense-in-depth
intact.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Switch four `&s[..N]` / `&s[..s.len().min(N)]` sites to
`chars().take(N).collect::<String>()` so truncation lands on character
boundaries instead of mid-codepoint. The agentic summary preview log
was panicking when generated content hit an em-dash at byte 200; the
few-shot passage cap, brief_json_args debug formatter, and a test
assertion message had the same latent bug.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pure mechanical cleanup of accumulated drift in files outside the
HLS-content-hash branch's main change set. No behavior change.
- `cargo fmt` on every previously-misformatted file
(`ai/insight_generator.rs`, `database/knowledge_dao.rs`,
`faces.rs`, `knowledge.rs`, `libraries.rs`).
- `cargo clippy --fix`:
- `needless_borrow`: `&library` → `library` in `handlers/image.rs`
(two sites in the photo-listing path).
- Manual clippy pass for warnings clippy emits but can't auto-apply:
- `field_reassign_with_default` in `database/reconcile.rs::run` —
consolidated into a struct-literal initializer.
- `needless_range_loop` in `database/knowledge_dao.rs::union_perceptual_tags`
— inner `for b in (a+1)..indices.len() { let ib = indices[b]; ... }`
becomes `for &ib in &indices[a + 1..] { ... }`.
- Doc-list indentation: continuation lines under nested bullets in
`database/mod.rs::get_memories_in_window` and
`database/knowledge_dao.rs::build_entity_graph` realigned to the
list-item content column.
Deliberately not touched (each deserves its own focused commit, with
testing, rather than getting bundled into a sweep):
- 4× `deprecated count_distinct` in `faces.rs` — diesel API migration
to `AggregateExpressionMethods::aggregate_distinct` may shift result
types; needs verification against the existing stats queries.
- `await_holding_lock` in `knowledge.rs:807` — `std::sync::Mutex` held
across `ollama.generate(...).await`. Genuine concurrency bug; fix
requires understanding the surrounding flow before just dropping
the guard.
- 2× `type_complexity` in `database/mod.rs` — cosmetic, would need a
`type` alias and corresponding callers updated.
- Dead `total_deleted` on `library_maintenance::GcStats` and
`file_scan::enumerate_indexable_files` — both are public surface
retained for future use; deletion is a separate decision.
All 707 tests still pass. Release build clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pure mechanical pass on the files this branch added/modified:
rustfmt reflow of a few long lines / chains, and the one
non-pre-existing clippy warning — replacing
`&[rel_path.clone()]` with `std::slice::from_ref(&rel_path)` in
`handlers::video::generate_video` to avoid the alloc + clone for a
single-element slice.
All 707 tests still pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The hash-keyed pipeline transcodes lazily, so a freshly mounted (or
freshly upgraded) library is "mostly pending" for the first hour
while the watcher works through the backlog. The operator wants a
live read on remaining work so they can tune `HLS_CONCURRENCY` and
know when to stop waiting.
Adds:
- `src/hls_stats.rs` — pure compute path (`stats_from_rows`) and an
Arc<Mutex<dyn ExifDao>> wrapper (`compute_and_publish`). Per
library: `total`, `with_playlist`, `pending`, `unsupported`,
`hashless_videos`. Dedup is by content_hash so duplicate-bytes-at-
N-paths counts once (same domain rule as `faces::stats`).
`hashless_videos` is a separate counter so the operator can see
the "hash backfill, then transcode" pipeline depth instead of
having NULL-hash rows just hide.
- Prometheus gauges labeled by library name:
`imageserver_hls_videos_total`, `..._with_playlist`, `..._pending`,
`..._unsupported`. Updated by the watcher at the end of every full-
scan tick *and* on every `/hls/stats` hit, so whichever surface the
operator is watching stays fresh. Registered in `main` alongside
the existing image/video gauges.
- `GET /hls/stats` — Claims-protected JSON snapshot of the same data
plus a top-level cross-library aggregate. Runs on a blocking pool
so it doesn't pin the actix worker; per-call cost is one
`list_paths_and_hashes_for_library` SQL query per library plus a
`stat()` per distinct video hash. Bounded — never invoked from
middleware, only from the explicit endpoint and the full-scan
tick. The watcher's end-of-tick `info!` summary line mirrors the
endpoint output for operators tailing the log.
- New `ExifDao::list_paths_and_hashes_for_library` method:
`SELECT rel_path, content_hash FROM image_exif WHERE library_id =
?`. Single round-trip; callers filter to video extensions
client-side because the schema doesn't carry media-type. Mock
impl in `files.rs` returns an empty vec.
Tests in `hls_stats::tests` exercise stats_from_rows directly (videos-
only filter, hash dedup, playlist vs sentinel decision, NULL-hash
hashless counting) plus a publish_gauges round-trip that reads the
gauge value back. Full suite (347 lib + 360 bin = 707) passes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`POST /video/generate` is reshaped to return a JSON object instead of
a bare string. New fields:
- `playlist_url`: stable hash-keyed URL of the form
`/video/hls/<hash>/playlist.m3u8`. Use this with hls.js / native
players — relative segment refs inside the playlist resolve to
`/video/hls/<hash>/segment_NNN.ts` because the URL is path-based.
- `content_hash`: the blake3 hex digest that identifies the bytes.
Stable across libraries, archive ingests, renames; clients can
cache the URL by hash.
- `ready`: true iff the playlist file is already on disk. False means
a transcode was just queued; the client should retry the URL after
a short delay (or rely on hls.js's built-in retry).
- `playlist` (legacy): basename-keyed path string, echoed under the
old field name so clients that destructure `response.playlist` keep
working during the rollout. The startup migration deletes the
underlying file, so this URL will 404; clients should migrate to
`playlist_url`. Field is slated for removal once Apollo / File
Viewer ship the update.
The handler:
- resolves the source path across libraries (same logic as before),
- looks up `image_exif.content_hash` for that (library_id, rel_path),
- falls back to inline `content_hash::compute` when the row is mid-
backfill — pure read, no library mutation,
- sends a single-element `QueueVideosMessage` to `VideoPlaylistManager`
if the playlist isn't already on disk and there's no
`playlist.unsupported` sentinel,
- returns the URL immediately. The actor pipeline owns transcoding.
New route `GET /video/hls/{hash}/{file}`:
- strict validation: hash must be 64 ascii-hex chars; file must be
`playlist.m3u8` or `segment_NNN.ts` (digits only). Anything else
returns 400 so we never have to rely on path canonicalisation
alone to defend against traversal,
- belt-and-suspenders canonicalize() guard verifies the resolved
file lives under `$VIDEO_PATH`,
- serves with the standard `NamedFile::into_response` machinery.
Cleanup in `actors.rs`:
- `ProcessMessage` + its `StreamActor` handler had no senders after
the rewire — removed. `StreamActor` itself stays (still handles
`RefreshThumbnailsMessage` from `files.rs`).
- `create_playlist`, `playlist_file_for`,
`playlist_unsupported_sentinel` are gone — the legacy on-demand
transcode helper and the migration-only path helpers had no
remaining users (the migration uses its own classify() function).
- Imports tightened: dropped `Child`, `ExitStatus`, `trace`.
Tests cover both new validators (`is_valid_hash`,
`is_allowed_hls_filename`) including the strings that motivated the
defence-in-depth (traversal attempts, internal `.tmp`/`.unsupported`
artifacts, malformed segment names).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds `video::legacy_migration::retire_legacy_hls_output`, called once
from `main` right after the diesel migrations run and before the
actor pipeline starts. Walks `$VIDEO_PATH` at depth 1, deletes every
`.m3u8` / `.m3u8.tmp` / `.m3u8.unsupported` / `.ts` file at root, and
logs a single info line with per-class counts. Skips directories
(the new layout's `<shard>/<hash>/` lives there) and unknown
extensions, so an operator's stashed README or `.tmp` from a
different tool is safe.
Why this needs its own one-shot pass rather than letting the rewritten
`cleanup_orphaned_playlists` handle it: the cleanup walk deliberately
only looks at `<shard>/<hash>/` dirs (so it can't accidentally `rm`
operator-stashed content), so without this migration the legacy files
would sit at root forever, never served, never refreshed. Operator
complaint count from the previous IMG_NNNN.MOV collision: ~10
duplicate-basename hits on one library alone; total .m3u8 count was
699 vs a much larger video count — i.e. the loser of every collision
was a permanent orphan. This pass collects all of them, then the
running watcher writes hash-keyed playlists going forward.
Idempotent — a second boot finds nothing and reports zero deletions,
so the call site can stay in `main` across releases until the module
is removed in a later cleanup commit. Tests cover the happy path
(legacy artifacts gone, hash dir untouched, unrelated files left
alone), idempotency, and the missing-directory case.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The cleanup walk previously looked for `$VIDEO_PATH/<basename>.m3u8`
and matched each file's stem against a recursive walk of every
library. With the hash-keyed layout now in place, every playlist's
file_stem is the literal string "playlist" — the old logic would
treat every hash-keyed playlist as orphaned on its next run and wipe
them all in one tick (default cleanup interval is 24h, so this is a
24-hour bomb on top of the prior commit).
New approach: orphan-ness is decided in the database, not on the
filesystem. The cleanup loop:
- Snapshots every distinct non-NULL `image_exif.content_hash` into a
HashSet (new `ExifDao::list_distinct_content_hashes` method —
`SELECT DISTINCT content_hash WHERE content_hash IS NOT NULL`).
- Walks `$VIDEO_PATH` two levels deep: top-level entries are filtered
to 2-char lowercase hex shard dirs, each shard's children to 64-char
hex hash dirs. Anything else (legacy `.m3u8` at root from the
pre-content-hash era, operator-stashed dirs, partial writes) is left
alone.
- Hash dirs whose hash isn't in the alive set are `remove_dir_all`'d.
Shard dirs that emptied as a result are reaped on the same pass via
`remove_dir` (no-op if non-empty).
- The library-stale safety gate is preserved: a stale library skips
the cycle even though the orphan decision is DB-only, because the
upstream missing-file scan that retires `image_exif` rows itself
pauses for stale libraries. Belt-and-suspenders — keeping a hash
dir for one extra 24h cycle is cheaper than wiping one whose source
was briefly unreachable. The gate now also filters disabled
libraries out of the stale set (they're intentionally absent from
the health map).
- The legacy `excluded_dirs` parameter is preserved on the function
signature but unused (the walk no longer crosses library trees);
flagged with a leading underscore. Callers in `main.rs` stay
unchanged.
`MockExifDao` in `files.rs` grows the new method (returns empty);
unit tests for the new `is_hash_shard` / `is_full_hash` validators
guard against an operator's stashed directory under VIDEO_PATH ever
matching the orphan-rm path. Both pass.
A follow-up commit handles the one-shot startup migration that
retires the legacy basename-keyed `.m3u8` / `.ts` files at
`$VIDEO_PATH` root.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Switches the watcher → VideoPlaylistManager → PlaylistGenerator path
from the basename-keyed layout
(`$VIDEO_PATH/{basename}.m3u8`) to the hash-keyed layout
(`$VIDEO_PATH/{hash[..2]}/{hash}/playlist.m3u8`) introduced in the
prior commit. Source videos that share a basename across libraries
(or across subdirs of one library) no longer overwrite each other's
playlists. The legacy HTTP endpoints in `/video/generate` /
`/video/stream` still use the basename layout — those move in a
follow-up commit alongside the stable streaming URL.
actors.rs:
- `QueueVideosMessage.video_paths: Vec<PathBuf>` →
`videos: Vec<VideoToQueue>`. The queue handler dedups against the
hash-keyed playlist + sentinel and forwards `GeneratePlaylistMessage`
carrying the hash.
- `GeneratePlaylistMessage` now carries `content_hash: String`; the
legacy `playlist_path: String` field is gone.
- `PlaylistGenerator` takes a `video_dir: PathBuf` at construction,
computes the hash dir + playlist + sentinel + segment template via
`hls_paths`, `mkdir -p`s the shard/hash dir before ffmpeg runs, and
cleans up partial output on failure by walking the hash dir.
- `ScanDirectoryMessage` and its handler are retired entirely; their
startup-walk role is taken over by the watcher's first tick (see
`watcher.rs` below). Dropping it avoids threading an `ExifDao` into
`VideoPlaylistManager` just so the actor can resolve hashes.
- Legacy `playlist_file_for` / `playlist_unsupported_sentinel` are
retained behind `#[allow(dead_code)]` for the upcoming migration
pass that retires pre-content-hash output.
watcher.rs:
- `process_new_files` keeps `content_hash` in the EXIF-batch result
(formerly threw it away). Videos with `image_exif.content_hash =
NULL` — mid-backfill rows — are skipped this tick rather than
falling back to a basename-colliding playlist; they get picked up
after `backfill_unhashed_backlog` populates the hash on a
subsequent tick. Skipped count is logged at debug.
- The video staleness check now uses `hls_paths::playlist_for_hash`
instead of `$VIDEO_PATH/{basename}.m3u8`.
- `last_full_scan` initialises to `UNIX_EPOCH` so the watcher's first
tick is treated as a full scan. That covers the catch-up gap left
by removing `ScanDirectoryMessage` — every library's existing media
is checked once at watcher boot (≈60s after startup) instead of
waiting up to `WATCH_FULL_INTERVAL_SECONDS` (1h default).
main.rs: removes the `ScanDirectoryMessage` import and the per-library
`do_send` loop, with a comment pointing at the watcher's first-tick
behavior.
state.rs: `PlaylistGenerator::new` now takes the video dir.
Tests: existing `video::hls_paths` (4) and `watcher::tests` (4) pass.
The basename-keyed `/video/generate` endpoint still compiles and
serves; behavior change there is deferred to the follow-up commit.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Foundation for migrating HLS playlist output from basename-keyed
(`$VIDEO_PATH/{basename}.m3u8`) to content-hash-keyed
(`$VIDEO_PATH/{hash[..2]}/{hash}/playlist.m3u8`). The basename layout
collides whenever two source videos share a filename — common with
iPhone-style sequential naming (`IMG_NNNN.MOV`) across libraries — so
the loser's playlist gets overwritten and ffmpeg keeps re-queueing the
file every scan.
This commit adds the path layout and type plumbing without touching the
actor pipeline, watcher, or HTTP handlers yet:
- `src/video/hls_paths.rs`: `playlist_for_hash`, `sentinel_for_hash`,
`segment_template_for_hash` built on top of `content_hash::hls_dir`,
with constants for the filenames inside the hash dir. Unit tests
cover the sharded layout and the playlist/sentinel/segment paths
all landing in the same directory (so HLS relative refs resolve).
- `src/content_hash::hls_dir` un-deaded — was waiting for this branch.
- `VideoToQueue` struct in `actors.rs`: pairs a source path with its
content hash so callers that lack a hash (rows mid-backfill) skip
the video rather than fabricate one.
- `playlist_file_for` / `playlist_unsupported_sentinel` retained as
migration-only helpers — they're only needed by the one-shot startup
pass that retires pre-content-hash output.
Follow-ups (separate commits on this branch): wire `hls_paths` through
the queue handler + `PlaylistGenerator`, update the watcher's
`process_new_files` to build `VideoToQueue`, switch `/video/generate`
and `/video/stream` to resolve path→hash and return stable URLs, add
the legacy-layout migration, rewrite `cleanup_orphaned_playlists` for
the new dir shape, and surface progress via Prometheus + `/hls/stats`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>