feature/hls-content-hash #95
Reference in New Issue
Block a user
Delete Branch "feature/hls-content-hash"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Foundation for migrating HLS playlist output from basename-keyed (`$VIDEO_PATH/{basename}.m3u8`) to content-hash-keyed (`$VIDEO_PATH/{hash[..2]}/{hash}/playlist.m3u8`). The basename layout collides whenever two source videos share a filename — common with iPhone-style sequential naming (`IMG_NNNN.MOV`) across libraries — so the loser's playlist gets overwritten and ffmpeg keeps re-queueing the file every scan. This commit adds the path layout and type plumbing without touching the actor pipeline, watcher, or HTTP handlers yet: - `src/video/hls_paths.rs`: `playlist_for_hash`, `sentinel_for_hash`, `segment_template_for_hash` built on top of `content_hash::hls_dir`, with constants for the filenames inside the hash dir. Unit tests cover the sharded layout and the playlist/sentinel/segment paths all landing in the same directory (so HLS relative refs resolve). - `src/content_hash::hls_dir` un-deaded — was waiting for this branch. - `VideoToQueue` struct in `actors.rs`: pairs a source path with its content hash so callers that lack a hash (rows mid-backfill) skip the video rather than fabricate one. - `playlist_file_for` / `playlist_unsupported_sentinel` retained as migration-only helpers — they're only needed by the one-shot startup pass that retires pre-content-hash output. Follow-ups (separate commits on this branch): wire `hls_paths` through the queue handler + `PlaylistGenerator`, update the watcher's `process_new_files` to build `VideoToQueue`, switch `/video/generate` and `/video/stream` to resolve path→hash and return stable URLs, add the legacy-layout migration, rewrite `cleanup_orphaned_playlists` for the new dir shape, and surface progress via Prometheus + `/hls/stats`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>Switches the watcher → VideoPlaylistManager → PlaylistGenerator path from the basename-keyed layout (`$VIDEO_PATH/{basename}.m3u8`) to the hash-keyed layout (`$VIDEO_PATH/{hash[..2]}/{hash}/playlist.m3u8`) introduced in the prior commit. Source videos that share a basename across libraries (or across subdirs of one library) no longer overwrite each other's playlists. The legacy HTTP endpoints in `/video/generate` / `/video/stream` still use the basename layout — those move in a follow-up commit alongside the stable streaming URL. actors.rs: - `QueueVideosMessage.video_paths: Vec<PathBuf>` → `videos: Vec<VideoToQueue>`. The queue handler dedups against the hash-keyed playlist + sentinel and forwards `GeneratePlaylistMessage` carrying the hash. - `GeneratePlaylistMessage` now carries `content_hash: String`; the legacy `playlist_path: String` field is gone. - `PlaylistGenerator` takes a `video_dir: PathBuf` at construction, computes the hash dir + playlist + sentinel + segment template via `hls_paths`, `mkdir -p`s the shard/hash dir before ffmpeg runs, and cleans up partial output on failure by walking the hash dir. - `ScanDirectoryMessage` and its handler are retired entirely; their startup-walk role is taken over by the watcher's first tick (see `watcher.rs` below). Dropping it avoids threading an `ExifDao` into `VideoPlaylistManager` just so the actor can resolve hashes. - Legacy `playlist_file_for` / `playlist_unsupported_sentinel` are retained behind `#[allow(dead_code)]` for the upcoming migration pass that retires pre-content-hash output. watcher.rs: - `process_new_files` keeps `content_hash` in the EXIF-batch result (formerly threw it away). Videos with `image_exif.content_hash = NULL` — mid-backfill rows — are skipped this tick rather than falling back to a basename-colliding playlist; they get picked up after `backfill_unhashed_backlog` populates the hash on a subsequent tick. Skipped count is logged at debug. - The video staleness check now uses `hls_paths::playlist_for_hash` instead of `$VIDEO_PATH/{basename}.m3u8`. - `last_full_scan` initialises to `UNIX_EPOCH` so the watcher's first tick is treated as a full scan. That covers the catch-up gap left by removing `ScanDirectoryMessage` — every library's existing media is checked once at watcher boot (≈60s after startup) instead of waiting up to `WATCH_FULL_INTERVAL_SECONDS` (1h default). main.rs: removes the `ScanDirectoryMessage` import and the per-library `do_send` loop, with a comment pointing at the watcher's first-tick behavior. state.rs: `PlaylistGenerator::new` now takes the video dir. Tests: existing `video::hls_paths` (4) and `watcher::tests` (4) pass. The basename-keyed `/video/generate` endpoint still compiles and serves; behavior change there is deferred to the follow-up commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>`POST /video/generate` is reshaped to return a JSON object instead of a bare string. New fields: - `playlist_url`: stable hash-keyed URL of the form `/video/hls/<hash>/playlist.m3u8`. Use this with hls.js / native players — relative segment refs inside the playlist resolve to `/video/hls/<hash>/segment_NNN.ts` because the URL is path-based. - `content_hash`: the blake3 hex digest that identifies the bytes. Stable across libraries, archive ingests, renames; clients can cache the URL by hash. - `ready`: true iff the playlist file is already on disk. False means a transcode was just queued; the client should retry the URL after a short delay (or rely on hls.js's built-in retry). - `playlist` (legacy): basename-keyed path string, echoed under the old field name so clients that destructure `response.playlist` keep working during the rollout. The startup migration deletes the underlying file, so this URL will 404; clients should migrate to `playlist_url`. Field is slated for removal once Apollo / File Viewer ship the update. The handler: - resolves the source path across libraries (same logic as before), - looks up `image_exif.content_hash` for that (library_id, rel_path), - falls back to inline `content_hash::compute` when the row is mid- backfill — pure read, no library mutation, - sends a single-element `QueueVideosMessage` to `VideoPlaylistManager` if the playlist isn't already on disk and there's no `playlist.unsupported` sentinel, - returns the URL immediately. The actor pipeline owns transcoding. New route `GET /video/hls/{hash}/{file}`: - strict validation: hash must be 64 ascii-hex chars; file must be `playlist.m3u8` or `segment_NNN.ts` (digits only). Anything else returns 400 so we never have to rely on path canonicalisation alone to defend against traversal, - belt-and-suspenders canonicalize() guard verifies the resolved file lives under `$VIDEO_PATH`, - serves with the standard `NamedFile::into_response` machinery. Cleanup in `actors.rs`: - `ProcessMessage` + its `StreamActor` handler had no senders after the rewire — removed. `StreamActor` itself stays (still handles `RefreshThumbnailsMessage` from `files.rs`). - `create_playlist`, `playlist_file_for`, `playlist_unsupported_sentinel` are gone — the legacy on-demand transcode helper and the migration-only path helpers had no remaining users (the migration uses its own classify() function). - Imports tightened: dropped `Child`, `ExitStatus`, `trace`. Tests cover both new validators (`is_valid_hash`, `is_allowed_hls_filename`) including the strings that motivated the defence-in-depth (traversal attempts, internal `.tmp`/`.unsupported` artifacts, malformed segment names). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>Pure mechanical cleanup of accumulated drift in files outside the HLS-content-hash branch's main change set. No behavior change. - `cargo fmt` on every previously-misformatted file (`ai/insight_generator.rs`, `database/knowledge_dao.rs`, `faces.rs`, `knowledge.rs`, `libraries.rs`). - `cargo clippy --fix`: - `needless_borrow`: `&library` → `library` in `handlers/image.rs` (two sites in the photo-listing path). - Manual clippy pass for warnings clippy emits but can't auto-apply: - `field_reassign_with_default` in `database/reconcile.rs::run` — consolidated into a struct-literal initializer. - `needless_range_loop` in `database/knowledge_dao.rs::union_perceptual_tags` — inner `for b in (a+1)..indices.len() { let ib = indices[b]; ... }` becomes `for &ib in &indices[a + 1..] { ... }`. - Doc-list indentation: continuation lines under nested bullets in `database/mod.rs::get_memories_in_window` and `database/knowledge_dao.rs::build_entity_graph` realigned to the list-item content column. Deliberately not touched (each deserves its own focused commit, with testing, rather than getting bundled into a sweep): - 4× `deprecated count_distinct` in `faces.rs` — diesel API migration to `AggregateExpressionMethods::aggregate_distinct` may shift result types; needs verification against the existing stats queries. - `await_holding_lock` in `knowledge.rs:807` — `std::sync::Mutex` held across `ollama.generate(...).await`. Genuine concurrency bug; fix requires understanding the surrounding flow before just dropping the guard. - 2× `type_complexity` in `database/mod.rs` — cosmetic, would need a `type` alias and corresponding callers updated. - Dead `total_deleted` on `library_maintenance::GcStats` and `file_scan::enumerate_indexable_files` — both are public surface retained for future use; deletion is a separate decision. All 707 tests still pass. Release build clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>The hash-keyed `/video/hls/{hash}/{file}` route fully covers HLS playback now and both clients (Apollo, FileViewer-React) have shipped updates that use it directly. Keeping the basename-keyed fallback only encouraged stale URLs to keep flowing — every legacy file was deleted by the startup migration, so the routes were guaranteed 404 machines. Dropped: - `stream_video` handler (`GET /video/stream?path=…`) — the original basename-keyed playlist serve. - `get_video_part` handler (`GET /video/{path}`) — bare-filename segment serve. The new layout's segments live in `<shard>/<hash>/segment_NNN.ts` and reach the client via `stream_hls_file`. - `legacy_path` field on `GenerateVideoResponse` (serialised as `playlist`). The field always pointed at a file the migration had deleted; current clients ignore it entirely. - Their service registrations in `main.rs`. - The body-side `filename` extraction in `generate_video` (existed only to construct `legacy_path`) and the now-unused `global` opentelemetry import in `handlers/video.rs`. All 707 tests still pass. Same hand-rolled validators (`is_valid_hash` / `is_allowed_hls_filename`) keep the new route's defense-in-depth intact. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>