Wires the persistence layer for CLIP semantic search. The watcher's
per-tick drain encodes any image_exif row with a known content_hash
but no clip_embedding via Apollo (cap CLIP_BACKLOG_MAX_PER_TICK,
default 32). On a query, /photos/search encodes the text via Apollo
and reranks every stored embedding in-memory.
ExifDao additions:
- list_clip_unencoded_candidates — partial-index scan for drain
- backfill_clip_embedding — touches only the two new columns
- list_clip_index — dedup'd (hash, embedding) pull for search
clip_watch::run_clip_encoding_pass is the parallel fan-out — tokio
runtime per pass with CLIP_ENCODE_CONCURRENCY (default 4). No marker
rows for permanent failures yet; per-tick cap bounds the retry cost.
/photos/search params: q, limit, threshold (default 0.20), library,
model_version. Response is intentionally minimal (path + score) so
the frontend joins against existing photo-metadata routes lazily.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pure mechanical pass on the files this branch added/modified:
rustfmt reflow of a few long lines / chains, and the one
non-pre-existing clippy warning — replacing
`&[rel_path.clone()]` with `std::slice::from_ref(&rel_path)` in
`handlers::video::generate_video` to avoid the alloc + clone for a
single-element slice.
All 707 tests still pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The hash-keyed pipeline transcodes lazily, so a freshly mounted (or
freshly upgraded) library is "mostly pending" for the first hour
while the watcher works through the backlog. The operator wants a
live read on remaining work so they can tune `HLS_CONCURRENCY` and
know when to stop waiting.
Adds:
- `src/hls_stats.rs` — pure compute path (`stats_from_rows`) and an
Arc<Mutex<dyn ExifDao>> wrapper (`compute_and_publish`). Per
library: `total`, `with_playlist`, `pending`, `unsupported`,
`hashless_videos`. Dedup is by content_hash so duplicate-bytes-at-
N-paths counts once (same domain rule as `faces::stats`).
`hashless_videos` is a separate counter so the operator can see
the "hash backfill, then transcode" pipeline depth instead of
having NULL-hash rows just hide.
- Prometheus gauges labeled by library name:
`imageserver_hls_videos_total`, `..._with_playlist`, `..._pending`,
`..._unsupported`. Updated by the watcher at the end of every full-
scan tick *and* on every `/hls/stats` hit, so whichever surface the
operator is watching stays fresh. Registered in `main` alongside
the existing image/video gauges.
- `GET /hls/stats` — Claims-protected JSON snapshot of the same data
plus a top-level cross-library aggregate. Runs on a blocking pool
so it doesn't pin the actix worker; per-call cost is one
`list_paths_and_hashes_for_library` SQL query per library plus a
`stat()` per distinct video hash. Bounded — never invoked from
middleware, only from the explicit endpoint and the full-scan
tick. The watcher's end-of-tick `info!` summary line mirrors the
endpoint output for operators tailing the log.
- New `ExifDao::list_paths_and_hashes_for_library` method:
`SELECT rel_path, content_hash FROM image_exif WHERE library_id =
?`. Single round-trip; callers filter to video extensions
client-side because the schema doesn't carry media-type. Mock
impl in `files.rs` returns an empty vec.
Tests in `hls_stats::tests` exercise stats_from_rows directly (videos-
only filter, hash dedup, playlist vs sentinel decision, NULL-hash
hashless counting) plus a publish_gauges round-trip that reads the
gauge value back. Full suite (347 lib + 360 bin = 707) passes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The cleanup walk previously looked for `$VIDEO_PATH/<basename>.m3u8`
and matched each file's stem against a recursive walk of every
library. With the hash-keyed layout now in place, every playlist's
file_stem is the literal string "playlist" — the old logic would
treat every hash-keyed playlist as orphaned on its next run and wipe
them all in one tick (default cleanup interval is 24h, so this is a
24-hour bomb on top of the prior commit).
New approach: orphan-ness is decided in the database, not on the
filesystem. The cleanup loop:
- Snapshots every distinct non-NULL `image_exif.content_hash` into a
HashSet (new `ExifDao::list_distinct_content_hashes` method —
`SELECT DISTINCT content_hash WHERE content_hash IS NOT NULL`).
- Walks `$VIDEO_PATH` two levels deep: top-level entries are filtered
to 2-char lowercase hex shard dirs, each shard's children to 64-char
hex hash dirs. Anything else (legacy `.m3u8` at root from the
pre-content-hash era, operator-stashed dirs, partial writes) is left
alone.
- Hash dirs whose hash isn't in the alive set are `remove_dir_all`'d.
Shard dirs that emptied as a result are reaped on the same pass via
`remove_dir` (no-op if non-empty).
- The library-stale safety gate is preserved: a stale library skips
the cycle even though the orphan decision is DB-only, because the
upstream missing-file scan that retires `image_exif` rows itself
pauses for stale libraries. Belt-and-suspenders — keeping a hash
dir for one extra 24h cycle is cheaper than wiping one whose source
was briefly unreachable. The gate now also filters disabled
libraries out of the stale set (they're intentionally absent from
the health map).
- The legacy `excluded_dirs` parameter is preserved on the function
signature but unused (the walk no longer crosses library trees);
flagged with a leading underscore. Callers in `main.rs` stay
unchanged.
`MockExifDao` in `files.rs` grows the new method (returns empty);
unit tests for the new `is_hash_shard` / `is_full_hash` validators
guard against an operator's stashed directory under VIDEO_PATH ever
matching the orphan-rm path. Both pass.
A follow-up commit handles the one-shot startup migration that
retires the legacy basename-keyed `.m3u8` / `.ts` files at
`$VIDEO_PATH` root.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Switches the watcher → VideoPlaylistManager → PlaylistGenerator path
from the basename-keyed layout
(`$VIDEO_PATH/{basename}.m3u8`) to the hash-keyed layout
(`$VIDEO_PATH/{hash[..2]}/{hash}/playlist.m3u8`) introduced in the
prior commit. Source videos that share a basename across libraries
(or across subdirs of one library) no longer overwrite each other's
playlists. The legacy HTTP endpoints in `/video/generate` /
`/video/stream` still use the basename layout — those move in a
follow-up commit alongside the stable streaming URL.
actors.rs:
- `QueueVideosMessage.video_paths: Vec<PathBuf>` →
`videos: Vec<VideoToQueue>`. The queue handler dedups against the
hash-keyed playlist + sentinel and forwards `GeneratePlaylistMessage`
carrying the hash.
- `GeneratePlaylistMessage` now carries `content_hash: String`; the
legacy `playlist_path: String` field is gone.
- `PlaylistGenerator` takes a `video_dir: PathBuf` at construction,
computes the hash dir + playlist + sentinel + segment template via
`hls_paths`, `mkdir -p`s the shard/hash dir before ffmpeg runs, and
cleans up partial output on failure by walking the hash dir.
- `ScanDirectoryMessage` and its handler are retired entirely; their
startup-walk role is taken over by the watcher's first tick (see
`watcher.rs` below). Dropping it avoids threading an `ExifDao` into
`VideoPlaylistManager` just so the actor can resolve hashes.
- Legacy `playlist_file_for` / `playlist_unsupported_sentinel` are
retained behind `#[allow(dead_code)]` for the upcoming migration
pass that retires pre-content-hash output.
watcher.rs:
- `process_new_files` keeps `content_hash` in the EXIF-batch result
(formerly threw it away). Videos with `image_exif.content_hash =
NULL` — mid-backfill rows — are skipped this tick rather than
falling back to a basename-colliding playlist; they get picked up
after `backfill_unhashed_backlog` populates the hash on a
subsequent tick. Skipped count is logged at debug.
- The video staleness check now uses `hls_paths::playlist_for_hash`
instead of `$VIDEO_PATH/{basename}.m3u8`.
- `last_full_scan` initialises to `UNIX_EPOCH` so the watcher's first
tick is treated as a full scan. That covers the catch-up gap left
by removing `ScanDirectoryMessage` — every library's existing media
is checked once at watcher boot (≈60s after startup) instead of
waiting up to `WATCH_FULL_INTERVAL_SECONDS` (1h default).
main.rs: removes the `ScanDirectoryMessage` import and the per-library
`do_send` loop, with a comment pointing at the watcher's first-tick
behavior.
state.rs: `PlaylistGenerator::new` now takes the video dir.
Tests: existing `video::hls_paths` (4) and `watcher::tests` (4) pass.
The basename-keyed `/video/generate` endpoint still compiles and
serves; behavior change there is deferred to the follow-up commit.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds an HTTP mutation surface for `libraries.enabled` and
`libraries.excluded_dirs`, replacing the SQL-only workflow noted in
CLAUDE.md. Apollo's Settings panel calls this from the LIBRARIES
section so the operator no longer has to ssh + sqlite3 to flip a
library off or edit its excludes.
Live-apply (no restart) via a new `live_libraries: Arc<RwLock<Vec<
Library>>>` field on AppState. The existing immutable `libraries`
Vec stays for hot-path handlers that only need stable id → root_path
lookups, avoiding a 19-call-site refactor. The watcher and
cleanup_orphaned_playlists now take the lock instead of a Vec
snapshot and re-read at the top of each tick, so `enabled` /
`excluded_dirs` changes are picked up within one
WATCH_QUICK_INTERVAL_SECONDS. The GET /libraries handler also reads
through the live view.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
main.rs drops from 1200 → 346 lines (90% smaller than the pre-branch
3542). What's left is the startup wiring it was always meant to be:
.env, migrations, AppState construction, route registration, server
bind. The four background-loop functions move into src/watcher.rs:
- watch_files (310 lines) — quick/full scan tick, per-library probe,
backfill drain dispatch, missing-file scan, back-ref refresh,
orphan GC.
- process_new_files (351 lines) — file walk → EXIF write →
face-candidate build → HLS / preview-clip queueing →
reconciliation. The "biggest untested chunk" from the earlier
audit.
- cleanup_orphaned_playlists (167 lines) — separate slower-tick
thread.
- playlist_needs_generation — small mtime-comparison helper.
Plus 4 unit tests for playlist_needs_generation (covers missing
playlist, newer playlist, newer video, video-missing-metadata
fallback).
main.rs's imports correspondingly shrink — Addr, HashSet, WalkDir,
Utc, InsertImageExif, and the bulk of video::actors all leave with
the watcher. CLAUDE.md updated to reflect the new module layout
(layered architecture box + module map for the face-detection
section).
cargo test --bin image-api: 329 passing (no regression).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>