Commit Graph

597 Commits

Author SHA1 Message Date
Cameron Cordes
0a627f4880 Add contact name filter to SMS search tool + misc improvements
- sms search tool: accept contact name, trim/validate, skip when
  contact_id is set, pass to API client
- sms_client: new contact field in SmsSearchParams, URL-encode on wire
- Tool description clarifies contact_id takes precedence when both given
- Add parse_title_body helper for LLM response parsing
- llamacpp backend improvements
2026-05-25 21:46:18 -04:00
b9175e2718 image: add xlarge (4096px) on-demand preview tier
New `PhotoSize::XLarge` variant sits between `Large` (2048px) and
`Full` (original). On-demand generated and disk-cached at
`_xlarge/<hash>.jpg`, same waterfall as `Large` (embedded RAW preview
→ ffmpeg → image crate). Sources below 4096px serve at native size.

Reduces decoded bitmap memory from ~192MB (48MP full) to ~64MB for
the mobile viewer's zoom tier.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 15:33:03 -04:00
Cameron Cordes
9dba659d1e test: add llamacpp model-slot consistency and content-null tests
Cover the properties that prevent mid-turn model swaps in llama-swap
exclusive mode: vision_model defaults to primary, cloned local client
mirrors the user-selected model, embeddings stay on their own slot.
Also test the content:null serialization for tool-calling messages.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 19:29:51 -04:00
Cameron Cordes
208344ad98 ai: mirror chat model on local client to prevent mid-turn model swap
When the user selects a model from the picker, the local client's
primary_model and vision_model now match the chat model. Prevents
llama-swap exclusive mode from swapping models when describe_photo
or rerank fires during an agentic turn.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 19:27:29 -04:00
Cameron Cordes
fb388c29d7 docs: update env + CLAUDE.md for direct-vision llamacpp + ResolvedBackend
llamacpp models now receive images directly instead of
describe-then-inline. LLAMA_SWAP_VISION_MODEL defaults to the
primary model. Document the ResolvedBackend dispatch pattern.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 15:03:12 -04:00
Cameron Cordes
a8a661f70a ai: extract ResolvedBackend, remove ~480 lines of duplicated dispatch
Replace 5 copies of the ~80-line backend resolution pattern with a
single InsightGenerator::resolve_backend() builder that returns a
ResolvedBackend (chat + local clients, BackendKind enum, images_inline
flag). Tool dispatch now takes &ResolvedBackend instead of
&OllamaClient + model + backend strings.

Remove duplicated ollama/openrouter/llamacpp fields from
InsightChatService — InsightGenerator owns them and resolve_backend
uses them. Delete build_chat_clients (replaced by resolve_backend).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 15:00:50 -04:00
Cameron Cordes
0631820fbf ai: send images directly to llamacpp chat models + add ResolvedBackend
llamacpp models now receive images via OpenAI content-parts instead of
the describe-then-inline strategy (hybrid mode unchanged). Fixes
assistant messages with tool_calls emitting content: null instead of ""
to satisfy strict Jinja template role-alternation checks. Adds debug
logging of message role sequences on llamacpp requests.

Introduces BackendKind enum, SamplingOverrides, and ResolvedBackend in
a new backend.rs module. InsightGenerator::resolve_backend centralises
client construction + vision capability detection — next step wires the
existing inline dispatch through it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 14:00:37 -04:00
Cameron Cordes
be51421b38 ai: collapse llamacpp into LLM_BACKEND env switch
Reverts the per-request backend="llamacpp" value. Chat/vision/embedding
backend is now a deploy-time decision (LLM_BACKEND=ollama|llamacpp),
applied globally across chat, vision describe, and embeddings — so
embedding vectors stay in one space across the index.

- Per-request backend whitelist back to "local"|"hybrid". A request
  arriving with backend="llamacpp" is rejected.
- LLM_BACKEND=llamacpp swaps the entire local stack to llama-swap:
  chat hits the chat slot, describe hits the vision slot, embeddings
  hit the embed slot. Hybrid mode still routes chat to OpenRouter
  but uses LLM_BACKEND for the describe pass.
- Drops env vars HYBRID_VISION_BACKEND, LLAMA_SWAP_VISION_MODELS,
  EMBEDDING_BACKEND (the last never shipped). Drops the
  LlamaCppClient.vision_models allowlist — capability inference now
  reports has_vision only for the configured vision_model slot.
- Drops the /insights/llamacpp/models handler. /insights/models is
  the single endpoint; returns Ollama servers under LLM_BACKEND=ollama
  and llama-swap slots (from LLAMA_SWAP_ALLOWED_MODELS) under
  LLM_BACKEND=llamacpp. Same envelope shape either way.
- New ai::embed_one helper routes embeddings through llama-swap when
  LLM_BACKEND=llamacpp (else Ollama). Wires it into the four
  insight_generator embedding sites.
- Cross-replay matrix simplifies to pre-llamacpp shape (local↔local,
  hybrid↔hybrid, hybrid→local allowed; local→hybrid rejected).
2026-05-21 11:36:58 -04:00
Cameron Cordes
d14df63f19 env.example: document LLAMA_SWAP_* + HYBRID_VISION_BACKEND vars
Mirrors the section added to CLAUDE.md so deploys can opt into the
llamacpp backend from the template alone.
2026-05-20 17:54:08 -04:00
Cameron Cordes
f0927f5355 ai: add llamacpp backend (llama-swap) as third LLM client
Wires a new LlamaCppClient (OpenAI-compatible /v1 wire format) alongside
OllamaClient and OpenRouterClient. Per-slot routing for chat/vision/embed
via env (LLAMA_SWAP_URL + *_MODEL vars); capability inference uses an
env allowlist since /v1/models doesn't report modality.

InsightGenerator + InsightChatService gain three-way dispatch on
chat_backend = "local" | "hybrid" | "llamacpp". Hybrid and llamacpp
share the describe-then-inline path (text-only chat after a separate
vision describe). HYBRID_VISION_BACKEND=llamacpp lets hybrid route its
describe pass through llama-swap's vision slot while chat still goes
to OpenRouter.

Cross-replay matrix added (validate_cross_replay): local<->llamacpp
and hybrid<->llamacpp allowed; local->hybrid and llamacpp->hybrid
rejected. New /insights/llamacpp/models handler mirrors the OpenRouter
shape.
2026-05-20 17:52:33 -04:00
d04b86e32c Merge pull request 'image: add on-demand size=large preview tier (~2048px JPEG q85)' (#100) from feature/image-large-preview into master
Reviewed-on: #100
2026-05-19 21:51:08 +00:00
Cameron Cordes
19798184f0 image: add on-demand size=large preview tier (~2048px JPEG q85)
Adds a third PhotoSize between Thumb (200px) and Full (original). The
viewer placeholder and map callout previously upscaled a 200px thumb
into a full-screen / full-width view, which looked visibly blocky on
3× devices. The new tier is generated on-demand, disk-cached, and
served via the existing /image endpoint.

Storage layout mirrors the Thumb branch's lookup chain:
  1. hash-keyed: <thumbs>/_large/<hash[..2]>/<hash>.jpg (shared across
     libraries when content_hash is known)
  2. library-scoped legacy: <thumbs>/_large/<lib_id>/<rel_path>

Generation pipeline mirrors generate_image_thumbnail:
  - RAW: decode the embedded JPEG preview, apply EXIF orientation,
         resize to 2048-long-edge, encode JPEG q85
  - HEIC/HEIF: ffmpeg with scale + q:v 5 (≈ q85)
  - everything else: image crate decode + thumbnail() + JpegEncoder
Never upscales — sources below the 2048 cap re-encode at native size.

Handler offloads decode/resize to web::block to keep the actix worker
free (a 24MP source takes 100–500ms). Writes via tempfile+rename so
concurrent readers can't observe a half-written JPEG. On any
generation failure, falls through to the Full branch (which itself
serves the RAW embedded preview for unrenderable RAW containers).

Video requests for size=large fall back to the existing thumb pipeline
since there's no useful 2048px video tier.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 13:14:49 -04:00
c3c6cd03db Merge pull request 'file_types: filter macOS AppleDouble + .DS_Store from media predicates' (#99) from feature/filter-fs-metadata into master
Reviewed-on: #99
2026-05-18 17:12:42 +00:00
Cameron Cordes
b843a4a366 file_types: filter macOS AppleDouble + .DS_Store from media predicates
Symptom: Apollo's logs showed bursts of 422 decode_failed from
ImageApi's CLIP backfill — e.g. `._DSC_2182-S.jpg`. macOS writes
`._<name>` AppleDouble sidecars when copying to non-HFS volumes
(SMB, FAT, exFAT), and they carry the original file's extension
even though their bytes are extended-attribute metadata, not the
image. ImageApi's walker matched them via the extension predicate,
sent them through the ingest pipeline, and accumulated failed rows
in face_detections + clip_embedding while pinning Apollo's eviction
timer with the 422 burst.

Fix: predicate-level guard in is_image_file / is_video_file (and
by inheritance is_media_file). Every walker that already gates on
these (face_watch, backfill, clip_watch, watcher, files,
probe_clip_search) inherits the skip without per-callsite edits.
Narrow scope on purpose — `._*` prefix + the exact `.DS_Store`
basename — rather than blanket dotfile filtering, because a user
could plausibly name a cover image `.cover.jpg`.

Existing rows are not cleaned by this change. To purge what
already accumulated (one-shot, run from your DB shell after
deploying):

  DELETE FROM image_exif
   WHERE file_path LIKE '%/._%' OR file_path LIKE '%/.DS_Store';
  DELETE FROM face_detections
   WHERE rel_path LIKE '%/._%' OR rel_path LIKE '%/.DS_Store';
  DELETE FROM tagged_photo
   WHERE file_path LIKE '%/._%' OR file_path LIKE '%/.DS_Store';
  DELETE FROM favorites
   WHERE path LIKE '%/._%' OR path LIKE '%/.DS_Store';

The maintenance pipeline's missing-file scan would NOT catch these
on its own — the files exist on disk (they're real macOS metadata,
just not images), so stat() returns Ok and the row sticks.
2026-05-17 20:10:16 -04:00
d275150db6 Merge pull request 'feature/video-frame-rate' (#98) from feature/video-frame-rate into master
Reviewed-on: #98
2026-05-18 00:09:35 +00:00
Cameron
acdffc1558 cargo fmt: drop trailing blank line in actors.rs
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 21:14:30 -04:00
Cameron
bd61e10158 chore: add .gitattributes + unit tests for ffprobe rational parser
LF normalization across OSes; *.sql pinned to LF for stable diffs.

Tests cover the rational frame-rate parser (NTSC 29.97, integer fps,
slow-mo 240, ffprobe's 0/0 unknown sentinel, malformed and out-of-range
inputs). Extracted the closure into a free fn for the test seam.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 21:13:06 -04:00
Cameron
1b70a6f0b4 video: probe frame rate via ffprobe and return on /video/generate
Adds frame_rate to GenerateVideoResponse so the mobile scrubber can step
at the source's real fps instead of a hardcoded 30. probe_video_stream_meta
gains a frame_rate field (avg_frame_rate preferred, r_frame_rate fallback,
nonsense values rejected) and is now pub so the handler can reuse it.
Cost is one ffprobe per /video/generate call; degrades silently to None
on probe failure.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 21:03:21 -04:00
3162a4f477 Merge pull request 'clip-search: accept library_ids (multi-select whitelist) on /photos/search' (#97) from feature/clip-search-library-ids into master
Reviewed-on: #97
2026-05-16 13:38:00 +00:00
Cameron Cordes
87093a63d7 clip-search: accept library_ids (multi-select whitelist) on /photos/search
Previously the endpoint only accepted `library=<id>` (single id) — multi-
select scopes had to be filtered upstream by Apollo, which kept the
filter logic out of FileViewer-React's reach (it calls ImageApi
directly and got no scoping for 2+ active libraries).

Adds `library_ids` (comma-separated id list, e.g. `?library_ids=1,3`).
Parsed inside the existing scope decision: `library_ids` wins when
both are supplied; either / both empty falls back to "every enabled
library" (historical default). Malformed entries return 400.

Dedupes ids while preserving order so a stray `library_ids=1,1,3`
doesn't double-pass to the DAO. The single-id path still works
unchanged for older clients.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 09:30:46 -04:00
dd7b4befb6 Merge pull request 'feature/clip-semantic-search' (#96) from feature/clip-semantic-search into master
Reviewed-on: #96
2026-05-16 00:32:32 +00:00
Cameron Cordes
922f7df8d3 clip-search: offset-based pagination on /photos/search
Adds `offset` query param (default 0) and `total_matching` + `offset`
response fields. Backend already computes the full sorted list of
above-threshold matches per query; pagination just slices it at
[offset, offset+limit) instead of always returning the top window.
Offsets past the end return an empty page cleanly so the client can
stop fetching naturally.

Re-scores on every page rather than caching the sorted list — at
personal-library scale (~14k embeddings, 768d) the dot-product loop
is sub-100ms and the lack of state means no eviction / staleness
concerns.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 16:56:10 -04:00
Cameron Cordes
ee2ed3005b clip-search: document env knobs in .env.example
APOLLO_CLIP_API_BASE_URL (falls back to APOLLO_API_BASE_URL),
CLIP_BACKLOG_MAX_PER_TICK, CLIP_ENCODE_CONCURRENCY, and
CLIP_REQUEST_TIMEOUT_SEC — all of which the code already reads.
Apollo's side was documented earlier; this closes the parity gap.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 16:10:52 -04:00
Cameron Cordes
66267cc345 clip-search: fmt + clippy clamp + test AppState arg
Pulls cargo fmt + clippy pass over the new files only — pre-existing
files left untouched even though fmt has drift on them. clamp(1,200)
swaps a manual min/max chain that clippy flagged. test AppState
constructor needed ClipClient::new(None) so the lib-test target
compiles.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 16:10:52 -04:00
Cameron Cordes
32195ed89e clip-search: backlog drain + /photos/search endpoint
Wires the persistence layer for CLIP semantic search. The watcher's
per-tick drain encodes any image_exif row with a known content_hash
but no clip_embedding via Apollo (cap CLIP_BACKLOG_MAX_PER_TICK,
default 32). On a query, /photos/search encodes the text via Apollo
and reranks every stored embedding in-memory.

ExifDao additions:
- list_clip_unencoded_candidates — partial-index scan for drain
- backfill_clip_embedding — touches only the two new columns
- list_clip_index — dedup'd (hash, embedding) pull for search

clip_watch::run_clip_encoding_pass is the parallel fan-out — tokio
runtime per pass with CLIP_ENCODE_CONCURRENCY (default 4). No marker
rows for permanent failures yet; per-tick cap bounds the retry cost.

/photos/search params: q, limit, threshold (default 0.20), library,
model_version. Response is intentionally minimal (path + score) so
the frontend joins against existing photo-metadata routes lazily.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 16:10:52 -04:00
Cameron Cordes
8d9e76cf15 clip-search: migration + client + probe binary
Probe-phase scaffolding for CLIP semantic search. Adds the column
that will hold per-photo embeddings, the HTTP client to Apollo's
inference service, and a throwaway probe binary so we can eyeball
search-result quality on the live library before building the
persistence layer (backlog drain, /photos/search endpoint, UI).

- migrations/2026-05-14-000000_add_clip_embedding/ — adds
  image_exif.clip_embedding (BLOB) and clip_model_version (TEXT),
  plus a partial index on (clip_embedding IS NULL AND content_hash
  IS NOT NULL) for the future backfill drain.
- src/database/models.rs — extends ImageExif struct to match.
- src/ai/clip_client.rs — encode_image / encode_text / health,
  same Permanent/Transient/Disabled taxonomy as face_client.
- src/bin/probe_clip_search.rs — --query <q> --library N --limit M
  --top K. Encodes a sample and prints top-K cosine similarities.
  No DB writes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 16:10:52 -04:00
26ffc15c8b Merge pull request 'feature/hls-content-hash' (#95) from feature/hls-content-hash into master
Reviewed-on: #95
2026-05-15 20:09:48 +00:00
Cameron Cordes
0168a4b574 hls: remove legacy /video/stream + /video/{path} routes
The hash-keyed `/video/hls/{hash}/{file}` route fully covers HLS
playback now and both clients (Apollo, FileViewer-React) have
shipped updates that use it directly. Keeping the basename-keyed
fallback only encouraged stale URLs to keep flowing — every legacy
file was deleted by the startup migration, so the routes were
guaranteed 404 machines.

Dropped:
- `stream_video` handler (`GET /video/stream?path=…`) — the original
  basename-keyed playlist serve.
- `get_video_part` handler (`GET /video/{path}`) — bare-filename
  segment serve. The new layout's segments live in
  `<shard>/<hash>/segment_NNN.ts` and reach the client via
  `stream_hls_file`.
- `legacy_path` field on `GenerateVideoResponse` (serialised as
  `playlist`). The field always pointed at a file the migration had
  deleted; current clients ignore it entirely.
- Their service registrations in `main.rs`.
- The body-side `filename` extraction in `generate_video` (existed
  only to construct `legacy_path`) and the now-unused `global`
  opentelemetry import in `handlers/video.rs`.

All 707 tests still pass. Same hand-rolled validators (`is_valid_hash`
/ `is_allowed_hls_filename`) keep the new route's defense-in-depth
intact.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 16:00:19 -04:00
Cameron
c30cadde02 ai: fix UTF-8 byte-slice panics in insight_generator log/truncation paths
Switch four `&s[..N]` / `&s[..s.len().min(N)]` sites to
`chars().take(N).collect::<String>()` so truncation lands on character
boundaries instead of mid-codepoint. The agentic summary preview log
was panicking when generated content hit an em-dash at byte 200; the
few-shot passage cap, brief_json_args debug formatter, and a test
assertion message had the same latent bug.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 15:10:02 -04:00
Cameron Cordes
8503ef7884 chore: cargo fmt + clippy --fix sweep across the crate
Pure mechanical cleanup of accumulated drift in files outside the
HLS-content-hash branch's main change set. No behavior change.

- `cargo fmt` on every previously-misformatted file
  (`ai/insight_generator.rs`, `database/knowledge_dao.rs`,
  `faces.rs`, `knowledge.rs`, `libraries.rs`).
- `cargo clippy --fix`:
  - `needless_borrow`: `&library` → `library` in `handlers/image.rs`
    (two sites in the photo-listing path).
- Manual clippy pass for warnings clippy emits but can't auto-apply:
  - `field_reassign_with_default` in `database/reconcile.rs::run` —
    consolidated into a struct-literal initializer.
  - `needless_range_loop` in `database/knowledge_dao.rs::union_perceptual_tags`
    — inner `for b in (a+1)..indices.len() { let ib = indices[b]; ... }`
    becomes `for &ib in &indices[a + 1..] { ... }`.
  - Doc-list indentation: continuation lines under nested bullets in
    `database/mod.rs::get_memories_in_window` and
    `database/knowledge_dao.rs::build_entity_graph` realigned to the
    list-item content column.

Deliberately not touched (each deserves its own focused commit, with
testing, rather than getting bundled into a sweep):
- 4× `deprecated count_distinct` in `faces.rs` — diesel API migration
  to `AggregateExpressionMethods::aggregate_distinct` may shift result
  types; needs verification against the existing stats queries.
- `await_holding_lock` in `knowledge.rs:807` — `std::sync::Mutex` held
  across `ollama.generate(...).await`. Genuine concurrency bug; fix
  requires understanding the surrounding flow before just dropping
  the guard.
- 2× `type_complexity` in `database/mod.rs` — cosmetic, would need a
  `type` alias and corresponding callers updated.
- Dead `total_deleted` on `library_maintenance::GcStats` and
  `file_scan::enumerate_indexable_files` — both are public surface
  retained for future use; deletion is a separate decision.

All 707 tests still pass. Release build clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 16:25:05 -04:00
Cameron Cordes
8c91bf554b hls: cargo fmt + clippy::cloned_ref_to_slice_refs
Pure mechanical pass on the files this branch added/modified:
rustfmt reflow of a few long lines / chains, and the one
non-pre-existing clippy warning — replacing
`&[rel_path.clone()]` with `std::slice::from_ref(&rel_path)` in
`handlers::video::generate_video` to avoid the alloc + clone for a
single-element slice.

All 707 tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 16:01:16 -04:00
Cameron Cordes
7cd1ea3cf8 hls: per-library readiness gauges + GET /hls/stats endpoint
The hash-keyed pipeline transcodes lazily, so a freshly mounted (or
freshly upgraded) library is "mostly pending" for the first hour
while the watcher works through the backlog. The operator wants a
live read on remaining work so they can tune `HLS_CONCURRENCY` and
know when to stop waiting.

Adds:

- `src/hls_stats.rs` — pure compute path (`stats_from_rows`) and an
  Arc<Mutex<dyn ExifDao>> wrapper (`compute_and_publish`). Per
  library: `total`, `with_playlist`, `pending`, `unsupported`,
  `hashless_videos`. Dedup is by content_hash so duplicate-bytes-at-
  N-paths counts once (same domain rule as `faces::stats`).
  `hashless_videos` is a separate counter so the operator can see
  the "hash backfill, then transcode" pipeline depth instead of
  having NULL-hash rows just hide.

- Prometheus gauges labeled by library name:
  `imageserver_hls_videos_total`, `..._with_playlist`, `..._pending`,
  `..._unsupported`. Updated by the watcher at the end of every full-
  scan tick *and* on every `/hls/stats` hit, so whichever surface the
  operator is watching stays fresh. Registered in `main` alongside
  the existing image/video gauges.

- `GET /hls/stats` — Claims-protected JSON snapshot of the same data
  plus a top-level cross-library aggregate. Runs on a blocking pool
  so it doesn't pin the actix worker; per-call cost is one
  `list_paths_and_hashes_for_library` SQL query per library plus a
  `stat()` per distinct video hash. Bounded — never invoked from
  middleware, only from the explicit endpoint and the full-scan
  tick. The watcher's end-of-tick `info!` summary line mirrors the
  endpoint output for operators tailing the log.

- New `ExifDao::list_paths_and_hashes_for_library` method:
  `SELECT rel_path, content_hash FROM image_exif WHERE library_id =
  ?`. Single round-trip; callers filter to video extensions
  client-side because the schema doesn't carry media-type. Mock
  impl in `files.rs` returns an empty vec.

Tests in `hls_stats::tests` exercise stats_from_rows directly (videos-
only filter, hash dedup, playlist vs sentinel decision, NULL-hash
hashless counting) plus a publish_gauges round-trip that reads the
gauge value back. Full suite (347 lib + 360 bin = 707) passes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 15:58:46 -04:00
Cameron Cordes
7c153596fe hls: hash-keyed HTTP routes for /video/generate and serving
`POST /video/generate` is reshaped to return a JSON object instead of
a bare string. New fields:

- `playlist_url`: stable hash-keyed URL of the form
  `/video/hls/<hash>/playlist.m3u8`. Use this with hls.js / native
  players — relative segment refs inside the playlist resolve to
  `/video/hls/<hash>/segment_NNN.ts` because the URL is path-based.
- `content_hash`: the blake3 hex digest that identifies the bytes.
  Stable across libraries, archive ingests, renames; clients can
  cache the URL by hash.
- `ready`: true iff the playlist file is already on disk. False means
  a transcode was just queued; the client should retry the URL after
  a short delay (or rely on hls.js's built-in retry).
- `playlist` (legacy): basename-keyed path string, echoed under the
  old field name so clients that destructure `response.playlist` keep
  working during the rollout. The startup migration deletes the
  underlying file, so this URL will 404; clients should migrate to
  `playlist_url`. Field is slated for removal once Apollo / File
  Viewer ship the update.

The handler:
- resolves the source path across libraries (same logic as before),
- looks up `image_exif.content_hash` for that (library_id, rel_path),
- falls back to inline `content_hash::compute` when the row is mid-
  backfill — pure read, no library mutation,
- sends a single-element `QueueVideosMessage` to `VideoPlaylistManager`
  if the playlist isn't already on disk and there's no
  `playlist.unsupported` sentinel,
- returns the URL immediately. The actor pipeline owns transcoding.

New route `GET /video/hls/{hash}/{file}`:
- strict validation: hash must be 64 ascii-hex chars; file must be
  `playlist.m3u8` or `segment_NNN.ts` (digits only). Anything else
  returns 400 so we never have to rely on path canonicalisation
  alone to defend against traversal,
- belt-and-suspenders canonicalize() guard verifies the resolved
  file lives under `$VIDEO_PATH`,
- serves with the standard `NamedFile::into_response` machinery.

Cleanup in `actors.rs`:
- `ProcessMessage` + its `StreamActor` handler had no senders after
  the rewire — removed. `StreamActor` itself stays (still handles
  `RefreshThumbnailsMessage` from `files.rs`).
- `create_playlist`, `playlist_file_for`,
  `playlist_unsupported_sentinel` are gone — the legacy on-demand
  transcode helper and the migration-only path helpers had no
  remaining users (the migration uses its own classify() function).
- Imports tightened: dropped `Child`, `ExitStatus`, `trace`.

Tests cover both new validators (`is_valid_hash`,
`is_allowed_hls_filename`) including the strings that motivated the
defence-in-depth (traversal attempts, internal `.tmp`/`.unsupported`
artifacts, malformed segment names).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 15:51:01 -04:00
Cameron Cordes
78fabc2b32 hls: retire legacy basename-keyed HLS files on startup
Adds `video::legacy_migration::retire_legacy_hls_output`, called once
from `main` right after the diesel migrations run and before the
actor pipeline starts. Walks `$VIDEO_PATH` at depth 1, deletes every
`.m3u8` / `.m3u8.tmp` / `.m3u8.unsupported` / `.ts` file at root, and
logs a single info line with per-class counts. Skips directories
(the new layout's `<shard>/<hash>/` lives there) and unknown
extensions, so an operator's stashed README or `.tmp` from a
different tool is safe.

Why this needs its own one-shot pass rather than letting the rewritten
`cleanup_orphaned_playlists` handle it: the cleanup walk deliberately
only looks at `<shard>/<hash>/` dirs (so it can't accidentally `rm`
operator-stashed content), so without this migration the legacy files
would sit at root forever, never served, never refreshed. Operator
complaint count from the previous IMG_NNNN.MOV collision: ~10
duplicate-basename hits on one library alone; total .m3u8 count was
699 vs a much larger video count — i.e. the loser of every collision
was a permanent orphan. This pass collects all of them, then the
running watcher writes hash-keyed playlists going forward.

Idempotent — a second boot finds nothing and reports zero deletions,
so the call site can stay in `main` across releases until the module
is removed in a later cleanup commit. Tests cover the happy path
(legacy artifacts gone, hash dir untouched, unrelated files left
alone), idempotency, and the missing-directory case.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 15:43:13 -04:00
Cameron Cordes
b8e17e05b7 hls: rewrite orphan cleanup for hash-keyed layout
The cleanup walk previously looked for `$VIDEO_PATH/<basename>.m3u8`
and matched each file's stem against a recursive walk of every
library. With the hash-keyed layout now in place, every playlist's
file_stem is the literal string "playlist" — the old logic would
treat every hash-keyed playlist as orphaned on its next run and wipe
them all in one tick (default cleanup interval is 24h, so this is a
24-hour bomb on top of the prior commit).

New approach: orphan-ness is decided in the database, not on the
filesystem. The cleanup loop:

- Snapshots every distinct non-NULL `image_exif.content_hash` into a
  HashSet (new `ExifDao::list_distinct_content_hashes` method —
  `SELECT DISTINCT content_hash WHERE content_hash IS NOT NULL`).
- Walks `$VIDEO_PATH` two levels deep: top-level entries are filtered
  to 2-char lowercase hex shard dirs, each shard's children to 64-char
  hex hash dirs. Anything else (legacy `.m3u8` at root from the
  pre-content-hash era, operator-stashed dirs, partial writes) is left
  alone.
- Hash dirs whose hash isn't in the alive set are `remove_dir_all`'d.
  Shard dirs that emptied as a result are reaped on the same pass via
  `remove_dir` (no-op if non-empty).
- The library-stale safety gate is preserved: a stale library skips
  the cycle even though the orphan decision is DB-only, because the
  upstream missing-file scan that retires `image_exif` rows itself
  pauses for stale libraries. Belt-and-suspenders — keeping a hash
  dir for one extra 24h cycle is cheaper than wiping one whose source
  was briefly unreachable. The gate now also filters disabled
  libraries out of the stale set (they're intentionally absent from
  the health map).
- The legacy `excluded_dirs` parameter is preserved on the function
  signature but unused (the walk no longer crosses library trees);
  flagged with a leading underscore. Callers in `main.rs` stay
  unchanged.

`MockExifDao` in `files.rs` grows the new method (returns empty);
unit tests for the new `is_hash_shard` / `is_full_hash` validators
guard against an operator's stashed directory under VIDEO_PATH ever
matching the orphan-rm path. Both pass.

A follow-up commit handles the one-shot startup migration that
retires the legacy basename-keyed `.m3u8` / `.ts` files at
`$VIDEO_PATH` root.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 15:41:04 -04:00
Cameron Cordes
d1667099c3 hls: rewire queue + generator to write hash-keyed playlists
Switches the watcher → VideoPlaylistManager → PlaylistGenerator path
from the basename-keyed layout
(`$VIDEO_PATH/{basename}.m3u8`) to the hash-keyed layout
(`$VIDEO_PATH/{hash[..2]}/{hash}/playlist.m3u8`) introduced in the
prior commit. Source videos that share a basename across libraries
(or across subdirs of one library) no longer overwrite each other's
playlists. The legacy HTTP endpoints in `/video/generate` /
`/video/stream` still use the basename layout — those move in a
follow-up commit alongside the stable streaming URL.

actors.rs:
- `QueueVideosMessage.video_paths: Vec<PathBuf>` →
  `videos: Vec<VideoToQueue>`. The queue handler dedups against the
  hash-keyed playlist + sentinel and forwards `GeneratePlaylistMessage`
  carrying the hash.
- `GeneratePlaylistMessage` now carries `content_hash: String`; the
  legacy `playlist_path: String` field is gone.
- `PlaylistGenerator` takes a `video_dir: PathBuf` at construction,
  computes the hash dir + playlist + sentinel + segment template via
  `hls_paths`, `mkdir -p`s the shard/hash dir before ffmpeg runs, and
  cleans up partial output on failure by walking the hash dir.
- `ScanDirectoryMessage` and its handler are retired entirely; their
  startup-walk role is taken over by the watcher's first tick (see
  `watcher.rs` below). Dropping it avoids threading an `ExifDao` into
  `VideoPlaylistManager` just so the actor can resolve hashes.
- Legacy `playlist_file_for` / `playlist_unsupported_sentinel` are
  retained behind `#[allow(dead_code)]` for the upcoming migration
  pass that retires pre-content-hash output.

watcher.rs:
- `process_new_files` keeps `content_hash` in the EXIF-batch result
  (formerly threw it away). Videos with `image_exif.content_hash =
  NULL` — mid-backfill rows — are skipped this tick rather than
  falling back to a basename-colliding playlist; they get picked up
  after `backfill_unhashed_backlog` populates the hash on a
  subsequent tick. Skipped count is logged at debug.
- The video staleness check now uses `hls_paths::playlist_for_hash`
  instead of `$VIDEO_PATH/{basename}.m3u8`.
- `last_full_scan` initialises to `UNIX_EPOCH` so the watcher's first
  tick is treated as a full scan. That covers the catch-up gap left
  by removing `ScanDirectoryMessage` — every library's existing media
  is checked once at watcher boot (≈60s after startup) instead of
  waiting up to `WATCH_FULL_INTERVAL_SECONDS` (1h default).

main.rs: removes the `ScanDirectoryMessage` import and the per-library
`do_send` loop, with a comment pointing at the watcher's first-tick
behavior.

state.rs: `PlaylistGenerator::new` now takes the video dir.

Tests: existing `video::hls_paths` (4) and `watcher::tests` (4) pass.
The basename-keyed `/video/generate` endpoint still compiles and
serves; behavior change there is deferred to the follow-up commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 15:36:01 -04:00
Cameron Cordes
c71e1cdce0 hls: add hash-keyed path helpers + VideoToQueue type
Foundation for migrating HLS playlist output from basename-keyed
(`$VIDEO_PATH/{basename}.m3u8`) to content-hash-keyed
(`$VIDEO_PATH/{hash[..2]}/{hash}/playlist.m3u8`). The basename layout
collides whenever two source videos share a filename — common with
iPhone-style sequential naming (`IMG_NNNN.MOV`) across libraries — so
the loser's playlist gets overwritten and ffmpeg keeps re-queueing the
file every scan.

This commit adds the path layout and type plumbing without touching the
actor pipeline, watcher, or HTTP handlers yet:

- `src/video/hls_paths.rs`: `playlist_for_hash`, `sentinel_for_hash`,
  `segment_template_for_hash` built on top of `content_hash::hls_dir`,
  with constants for the filenames inside the hash dir. Unit tests
  cover the sharded layout and the playlist/sentinel/segment paths
  all landing in the same directory (so HLS relative refs resolve).
- `src/content_hash::hls_dir` un-deaded — was waiting for this branch.
- `VideoToQueue` struct in `actors.rs`: pairs a source path with its
  content hash so callers that lack a hash (rows mid-backfill) skip
  the video rather than fabricate one.
- `playlist_file_for` / `playlist_unsupported_sentinel` retained as
  migration-only helpers — they're only needed by the one-shot startup
  pass that retires pre-content-hash output.

Follow-ups (separate commits on this branch): wire `hls_paths` through
the queue handler + `PlaylistGenerator`, update the watcher's
`process_new_files` to build `VideoToQueue`, switch `/video/generate`
and `/video/stream` to resolve path→hash and return stable URLs, add
the legacy-layout migration, rewrite `cleanup_orphaned_playlists` for
the new dir shape, and surface progress via Prometheus + `/hls/stats`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 15:23:31 -04:00
22ce1a20e7 Merge pull request 'feature/library-patch-endpoint' (#94) from feature/library-patch-endpoint into master
Reviewed-on: #94
2026-05-13 13:44:36 +00:00
Cameron Cordes
7ec156fc05 libraries: accept newline as an excluded_dirs separator
Splits parse_excluded_dirs_column on `,`, `\n`, AND `\r` so a textarea
submit with one entry per line works the same as comma-separated.
Mixed input (`a, b\nc`) parses cleanly too — the frontend can paste
from any source without preprocessing.

Motivated by the "forgot the comma" footgun: typing
`.thumbnails .thumbnails2` in a single-line input today stores a
never-matching component pattern. With newlines as a first-class
separator and the frontend switching to a textarea, the natural
one-per-line UX makes that mistake impossible.

The DB store form stays comma-joined (normalize_excluded_dirs_input
hasn't changed) so existing rows are unaffected and no migration is
needed. Newline support matters mostly for the inbound write path;
mirroring it on the read side keeps the parser round-trip safe in
case anything writes a newline form directly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 09:23:51 -04:00
Cameron Cordes
439532377d libraries: validate excluded_dirs entries on write
Reject the silent-footgun shapes that PathExcluder would store but
never match. The watcher would still walk past every photo as if the
exclude wasn't there, and the operator would have no signal that
their entry is dead. Caught at PATCH time with a descriptive 422.

Rules:
- Backslash anywhere → "use forward slashes" (catches \photos,
  photos\2024, \\server\share — Windows-typed entries land in the
  component-pattern bucket and never fire).
- Drive-letter prefix (Z:, Z:/...) → "relative to library root" —
  excludes are root-relative, not absolute system paths.
- Multi-segment name with no leading slash (photos/2024) →
  "did you mean /photos/2024?" — the common "I forgot the slash"
  typo, today silently stored as a component pattern that never hits.
- `..` segments in a path entry → "doesn't normalise". base.join()
  doesn't canonicalise, so the resulting prefix never matches.
- Bare "/" → "almost certainly a typo" for the library root.

Trailing slashes on path entries are stripped silently. Eight new
tests cover each rejection plus the trailing-slash normalisation
and the all-or-nothing failure mode of normalize_excluded_dirs_input
(one bad entry aborts the whole patch rather than silently applying
N-1 of N changes).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 09:02:29 -04:00
Cameron Cordes
ce9fa94cb4 libraries: surface globals, normalise excluded_dirs on write
Two follow-ups to the PATCH endpoint:

1. GET /libraries now returns ``global_excluded_dirs`` alongside the
   library list — the union-with-globals semantics is invisible from
   the per-library row alone, and the admin UI needs to show what's
   already being skipped before the operator adds entries that would
   duplicate.

2. PATCH /libraries/{id} canonicalises the excluded_dirs string on
   write via the new ``normalize_excluded_dirs_input``: trims per
   entry, drops empties, dedupes preserving first-occurrence order,
   comma-joins without inner whitespace. Empty / whitespace-only →
   NULL. Round-trip stable so re-saving an entry produces an
   identical row.

Five new tests cover the empty / whitespace, trim, dedup, round-trip,
and overlap-with-globals cases. effective_excluded_dirs continues to
keep overlapping entries between globals and per-library on purpose —
PathExcluder accepts repeats and there's no behavioural reason to
dedupe at merge time.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 08:58:04 -04:00
Cameron Cordes
b3124437ec libraries: PATCH /libraries/{id} with live-apply
Adds an HTTP mutation surface for `libraries.enabled` and
`libraries.excluded_dirs`, replacing the SQL-only workflow noted in
CLAUDE.md. Apollo's Settings panel calls this from the LIBRARIES
section so the operator no longer has to ssh + sqlite3 to flip a
library off or edit its excludes.

Live-apply (no restart) via a new `live_libraries: Arc<RwLock<Vec<
Library>>>` field on AppState. The existing immutable `libraries`
Vec stays for hot-path handlers that only need stable id → root_path
lookups, avoiding a 19-call-site refactor. The watcher and
cleanup_orphaned_playlists now take the lock instead of a Vec
snapshot and re-read at the top of each tick, so `enabled` /
`excluded_dirs` changes are picked up within one
WATCH_QUICK_INTERVAL_SECONDS. The GET /libraries handler also reads
through the live view.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 08:47:35 -04:00
74bf693878 Merge pull request 'feature/date-backfill-null-only' (#93) from feature/date-backfill-null-only into master
Reviewed-on: #93
2026-05-12 18:42:21 +00:00
Cameron Cordes
2d56047497 Drop fs_time from date-backfill eligibility
The drain queried `date_taken IS NULL OR date_taken_source = 'fs_time'`
ORDER BY id ASC LIMIT 500 every watcher tick. The resolver is
deterministic on file bytes + filename + fs metadata, so any row that
landed on fs_time once landed there again on every retry — the drain
spun on the same lowest-id rows in perpetuity, never advancing to
rows 501+ while still logging more_remain=true.

Side effect: 500 auto-commit UPDATEs per tick sustained the SQLite
write lock long enough that other writers on separate DAO connections
hit the 5s busy_timeout. Manifested as intermittent 500s on
PATCH /image/faces/{id} that succeeded on retry.

Narrow the partial index and query predicate to `date_taken IS NULL`.
If exiftool installs or a new filename regex lands, an operator can
re-resolve fs_time rows out-of-band rather than re-introducing the
steady-state churn.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 14:37:36 -04:00
Cameron Cordes
3427c2916c Log 500-return paths in PATCH /image/faces/{id}
The four 500-return paths in update_face_handler returned e.to_string()
in the body but never logged. When a face PATCH failed with a 16-byte
body and no log entry, the cause (SQLITE_BUSY from cross-DAO writer
contention exhausting the 5s busy_timeout) was invisible. Surface the
full anyhow chain via {:#} on each path so the diesel cause is in the
log even when the response body only shows the top-level context.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 14:37:26 -04:00
6a3e37b7dc Merge pull request 'feature/split-main-rs' (#92) from feature/split-main-rs into master
Reviewed-on: #92
2026-05-12 17:02:06 +00:00
Cameron Cordes
9f8a69fc6d Split main.rs: extract watcher loop into src/watcher.rs
main.rs drops from 1200 → 346 lines (90% smaller than the pre-branch
3542). What's left is the startup wiring it was always meant to be:
.env, migrations, AppState construction, route registration, server
bind. The four background-loop functions move into src/watcher.rs:

- watch_files (310 lines) — quick/full scan tick, per-library probe,
  backfill drain dispatch, missing-file scan, back-ref refresh,
  orphan GC.
- process_new_files (351 lines) — file walk → EXIF write →
  face-candidate build → HLS / preview-clip queueing →
  reconciliation. The "biggest untested chunk" from the earlier
  audit.
- cleanup_orphaned_playlists (167 lines) — separate slower-tick
  thread.
- playlist_needs_generation — small mtime-comparison helper.

Plus 4 unit tests for playlist_needs_generation (covers missing
playlist, newer playlist, newer video, video-missing-metadata
fallback).

main.rs's imports correspondingly shrink — Addr, HashSet, WalkDir,
Utc, InsertImageExif, and the bulk of video::actors all leave with
the watcher. CLAUDE.md updated to reflect the new module layout
(layered architecture box + module map for the face-detection
section).

cargo test --bin image-api: 329 passing (no regression).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 12:54:37 -04:00
Cameron Cordes
bdb69c7d37 Split main.rs: extract HTTP handlers into src/handlers/
main.rs drops from 2935 → 1200 lines, freed for startup wiring +
the watcher. The 16 route handlers move into three domain-grouped
files under src/handlers/:

- handlers/favorites.rs (128 lines): favorites, put_add_favorite,
  delete_favorite.

- handlers/video.rs (665 lines): generate_video, stream_video,
  get_video_part, get_video_preview, get_preview_status. The 5
  pre-existing get_preview_status integration tests move with the
  handler (still pass against TestPreviewDao + AppState::test_state).

- handlers/image.rs (1003 lines): get_image (with the
  hash/library-scoped/bare-legacy thumb lookup), upload_image,
  get_file_metadata, set_image_gps, get_full_exif, set_image_date,
  clear_image_date. Helpers (create_circular_thumbnail,
  build_metadata_response_for_date_mutation) and request structs
  (SetGpsRequest, SetDateRequest, ClearDateRequest, UploadQuery)
  travel with them.

main.rs's import block shrinks from ~50 lines to ~22 as everything
HTTP-specific (NamedFile, mp::Multipart, BytesMut, Span, KeyValue,
StreamExt, …) moves with the handlers. The is_video_file wrapper
also goes — remaining callers in watch_files / cleanup use
file_types::is_video_file directly.

cargo test --bin image-api: 325 passing (no regression).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 12:38:17 -04:00
Cameron Cordes
bec9857426 Split main.rs: extract backfill drains and thumbnails into modules
main.rs drops from 3542 → ~2930 lines by moving:

- src/backfill.rs (new): backfill_unhashed_backlog,
  backfill_missing_date_taken, backfill_missing_content_hashes,
  build_face_candidates, process_face_backlog. Now unit-tested for
  the first time — 5 tests covering cap behavior, library-id
  filtering, missing-on-disk skip, and the video/unhashed/scanned
  filters on face-candidate selection.

- src/thumbnails.rs (new): unsupported_thumbnail_sentinel,
  generate_image_thumbnail, create_thumbnails, update_media_counts,
  is_image, is_video, plus the IMAGE_GAUGE / VIDEO_GAUGE Prometheus
  metrics. Replaces the no-op stubs that used to live in lib.rs.
  4 new unit tests for the sentinel path math and the
  walker-counts-images-vs-videos smoke path.

Supporting:
- SqliteExifDao::from_shared (test-only) so an SqliteExifDao and
  SqliteFaceDao can share one in-memory connection — required to
  test build_face_candidates against the real join.
- files.rs / video/{mod,actors}.rs import from crate::thumbnails::*
  instead of the now-removed stubs in lib.rs.

cargo test --bin image-api: 325 passing (was 314).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 12:22:02 -04:00
05ec5d0c70 Merge pull request 'feature/knowledge-curation' (#91) from feature/knowledge-curation into master
Reviewed-on: #91
2026-05-12 15:40:55 +00:00