ImageApi

Author	SHA1	Message	Date
Cameron Cordes	9f8a69fc6d	Split main.rs: extract watcher loop into src/watcher.rs main.rs drops from 1200 → 346 lines (90% smaller than the pre-branch 3542). What's left is the startup wiring it was always meant to be: .env, migrations, AppState construction, route registration, server bind. The four background-loop functions move into src/watcher.rs: - watch_files (310 lines) — quick/full scan tick, per-library probe, backfill drain dispatch, missing-file scan, back-ref refresh, orphan GC. - process_new_files (351 lines) — file walk → EXIF write → face-candidate build → HLS / preview-clip queueing → reconciliation. The "biggest untested chunk" from the earlier audit. - cleanup_orphaned_playlists (167 lines) — separate slower-tick thread. - playlist_needs_generation — small mtime-comparison helper. Plus 4 unit tests for playlist_needs_generation (covers missing playlist, newer playlist, newer video, video-missing-metadata fallback). main.rs's imports correspondingly shrink — Addr, HashSet, WalkDir, Utc, InsertImageExif, and the bulk of video::actors all leave with the watcher. CLAUDE.md updated to reflect the new module layout (layered architecture box + module map for the face-detection section). cargo test --bin image-api: 329 passing (no regression). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 12:54:37 -04:00
Cameron Cordes	bdb69c7d37	Split main.rs: extract HTTP handlers into src/handlers/ main.rs drops from 2935 → 1200 lines, freed for startup wiring + the watcher. The 16 route handlers move into three domain-grouped files under src/handlers/: - handlers/favorites.rs (128 lines): favorites, put_add_favorite, delete_favorite. - handlers/video.rs (665 lines): generate_video, stream_video, get_video_part, get_video_preview, get_preview_status. The 5 pre-existing get_preview_status integration tests move with the handler (still pass against TestPreviewDao + AppState::test_state). - handlers/image.rs (1003 lines): get_image (with the hash/library-scoped/bare-legacy thumb lookup), upload_image, get_file_metadata, set_image_gps, get_full_exif, set_image_date, clear_image_date. Helpers (create_circular_thumbnail, build_metadata_response_for_date_mutation) and request structs (SetGpsRequest, SetDateRequest, ClearDateRequest, UploadQuery) travel with them. main.rs's import block shrinks from ~50 lines to ~22 as everything HTTP-specific (NamedFile, mp::Multipart, BytesMut, Span, KeyValue, StreamExt, …) moves with the handlers. The is_video_file wrapper also goes — remaining callers in watch_files / cleanup use file_types::is_video_file directly. cargo test --bin image-api: 325 passing (no regression). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 12:38:17 -04:00
Cameron Cordes	bec9857426	Split main.rs: extract backfill drains and thumbnails into modules main.rs drops from 3542 → ~2930 lines by moving: - src/backfill.rs (new): backfill_unhashed_backlog, backfill_missing_date_taken, backfill_missing_content_hashes, build_face_candidates, process_face_backlog. Now unit-tested for the first time — 5 tests covering cap behavior, library-id filtering, missing-on-disk skip, and the video/unhashed/scanned filters on face-candidate selection. - src/thumbnails.rs (new): unsupported_thumbnail_sentinel, generate_image_thumbnail, create_thumbnails, update_media_counts, is_image, is_video, plus the IMAGE_GAUGE / VIDEO_GAUGE Prometheus metrics. Replaces the no-op stubs that used to live in lib.rs. 4 new unit tests for the sentinel path math and the walker-counts-images-vs-videos smoke path. Supporting: - SqliteExifDao::from_shared (test-only) so an SqliteExifDao and SqliteFaceDao can share one in-memory connection — required to test build_face_candidates against the real join. - files.rs / video/{mod,actors}.rs import from crate::thumbnails::* instead of the now-removed stubs in lib.rs. cargo test --bin image-api: 325 passing (was 314). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 12:22:02 -04:00
cameron	25233904aa	Merge pull request 'personas: elevate to server with per-persona fact scoping' (#88 ) from feature/persona-knowledge-segmentation into master Reviewed-on: #88	2026-05-10 03:44:26 +00:00
Cameron Cordes	108bbeb029	date-override: union semantics across libraries + slash forms The date-override path used to look up `image_exif` strictly by `(library_id, rel_path)` with only the forward-slash form, while `/image/metadata`'s `get_exif` falls back across libraries and tries both slash forms. A photo whose row sat under a different library_id than its filesystem-resolved one — or whose rel_path was stored with backslashes — rendered fine in the modal but 404'd on save. `set_manual_date_taken` / `clear_manual_date_taken` now share a `locate_image_exif_row` helper that mirrors `get_exif`'s union semantics (scoped lookup first, library-agnostic fallback by rel_path in both slash forms), then update by primary key so the write hits exactly the row read. Inner anyhow errors are logged with `(library_id, rel_path)` so the next failure mode is debuggable. Handler-side: `resolve_library_param` errors no longer silently fall back to the primary library (which would have masked the original bug with a different "row not found"); a malformed library param now returns 400. New `DbErrorKind::NotFound` lets the handler distinguish genuine misses (404) from real DB failures (500). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 21:21:25 -04:00
Cameron Cordes	3e2f36a748	personas: elevate to server with per-persona fact scoping Move personas off the mobile client into ImageApi as first-class records, and scope entity_facts by persona so each one builds its own voice over a shared entity graph. The new include_all_memories flag lets a persona opt back into the full hive-mind pool for human browsing of /knowledge/*; agentic generation always stays in-voice. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 17:59:20 -04:00
Cameron Cordes	b42acbb3f3	fmt: cargo fmt sweep across drifted files No behavior change — purely whitespace/line-break cleanup that had accumulated since the last format run. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 16:42:41 -04:00
Cameron Cordes	2a273a3ed9	thumbnails: stop video failures from re-logging every watcher tick generate_video_thumbnail used .output().expect(...), which only catches spawn failure — non-zero ffmpeg exits were silently discarded. With no thumbnail and no .unsupported sentinel left behind, the watcher re-detected the file as missing every quick-scan tick and re-logged "New file detected (missing thumbnail)" forever. Mirror the image branch: return io::Result, check status.success(), and write the sentinel from create_thumbnails on failure. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 16:41:24 -04:00
Cameron Cordes	16d6586b7d	exif: GET /image/exif/full — exiftool dump for the DETAILS modal The curated `image_exif` columns are a small slice of what exiftool can read (camera/lens/GPS/capture/dates). Apollo's DETAILS modal wants to surface everything — white balance, metering, MakerNotes, IPTC, ICC profile, Composite tags, the lot — for an operator inspecting a photo's provenance. `read_full_exif_via_exiftool(path)` shells out to `exiftool -j -G -n`: JSON output, group-prefixed keys (`EXIF:Make`, `MakerNotes:LensInfo`), numeric values (callers can reformat). Spawned via web::block to keep it off the actix worker — RAW with rich MakerNotes can take a few seconds. The endpoint is on-demand only; the indexer / file watcher does NOT call it. Falls back to 503 with a clear message when exiftool isn't on PATH so Apollo can render an "install exiftool" hint. Multi-library union resolution mirrors set_image_gps / get_file_metadata. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-06 19:42:41 -04:00
Cameron Cordes	832b50d587	image_exif: manual date_taken override (set/clear endpoints) Add `POST /image/exif/date` and `POST /image/exif/date/clear` so an operator can correct a row whose canonical-date waterfall landed on the wrong value (camera clock reset, fs_time fallback for a copied-from- backup file, etc). New `original_date_taken` / `original_date_taken_source` columns snapshot the prior value on first override so revert is lossless. The waterfall source set is now `'exif' \| 'exiftool' \| 'filename' \| 'fs_time' \| 'manual'`. The existing `idx_image_exif_date_backfill` partial index already filters to `date_taken IS NULL OR date_taken_source = 'fs_time'`, so manual rows are naturally excluded from the per-tick drain — no index change needed. `ExifMetadata` now exposes `date_taken_source` + originals so a UI can render "manually set; was X via filename". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-06 19:26:43 -04:00
Cameron Cordes	54e0635a98	date_backfill: per-tick drain for unresolved date_taken rows Adds two ExifDao methods (`get_rows_needing_date_backfill` / `backfill_date_taken`) and a `backfill_missing_date_taken` watcher pass that runs on every tick alongside `backfill_unhashed_backlog`. The drain queries the partial index for rows where `date_taken IS NULL` or `date_taken_source = 'fs_time'`, batches up to `DATE_BACKFILL_MAX_PER_TICK` paths (default 500), and feeds them through `date_resolver::resolve_dates_batch` — a single exiftool subprocess covers the whole tick. Rows that newly resolve to `exiftool` / `filename` / `fs_time` get persisted via `backfill_date_taken` (touches only `date_taken` + `date_taken_source` so EXIF / hash / perceptual columns survive). `filename`-sourced rows are intentionally not re-resolved — the regex is authoritative when it matches and re-running exiftool wouldn't change the answer. Files that have disappeared from disk are skipped so a ghost row doesn't loop through the drain forever; the missing-file scan in `library_maintenance` retires those separately. Comes with two DAO unit tests (eligibility filter + column-isolation). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-06 16:03:03 -04:00
Cameron Cordes	2d14291733	ingest: stamp canonical date_taken on every InsertImageExif Wires `date_resolver::resolve_date_taken` into the three call sites that build `InsertImageExif`: - `process_new_files` (file watcher) — every newly-registered file gets the resolver's verdict so videos and EXIF-stripped images land with a real date instead of NULL. - Upload handler — same waterfall on the post-multipart-write path. - GPS-write handler — re-runs the waterfall after exiftool writes GPS and re-reads the EXIF, in case a previously fs_time-sourced row now has a real EXIF date to upgrade to. This is a behavior change vs. the pre-rewrite `/memories` request-time priority: EXIF now beats filename when both are present. A photo named `Screenshot_2014-06-01.png` whose EXIF `DateTime` is 2021 now appears under 2021. The reverse case (no EXIF, parseable filename) is unchanged and continues to surface the filename date with `date_taken_source = 'filename'`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-06 16:00:14 -04:00
Cameron Cordes	79e258eccd	date_resolver: canonical date_taken waterfall with exiftool fallback New module that consolidates the four-step ingest waterfall: kamadak-exif (already in process via the caller's prior result) → exiftool fallback → filename regex → earliest_fs_time. Each step is tagged with a `DateSource` so the caller can persist provenance. The exiftool fallback is what makes videos and MakerNote-hosted dates land at all — kamadak-exif can't read QuickTime/MP4 or Nikon-style sub-IFDs. Single-file mode shells out per call; batch mode pipes paths on stdin via `-@ -` and fans the result through one subprocess so the upcoming per-tick drain doesn't pay startup cost per row. The `exiftool` PATH check is cached in a `OnceLock` to keep the drain short-circuited on deploys without exiftool installed. `SubSecDateTimeOriginal` and `ContentCreateDate` are pulled alongside the standard tags to capture iPhone's sub-second precision and Apple's preferred capture-time tag respectively. `FileModifyDate` is deliberately not in the tag list — it's a filesystem-derived value the resolver already covers via the `fs_time` step, and pulling it through exiftool would mask "no real EXIF date" with a misleading `source = exiftool` row. Module is registered in both `lib.rs` and `main.rs` (sibling-module pattern the rest of the bin uses); no callers wired in yet — that lands in the next commit. Comes with 9 unit tests covering JSON parsing edge cases, source-priority short-circuiting, and the fs_time-when-no-exif path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-06 15:59:02 -04:00
Cameron Cordes	84326501a9	image_exif: add date_taken_source column New nullable TEXT column tracks which step of the canonical-date waterfall (kamadak-exif → exiftool → filename → fs_time) populated `date_taken`. Lets a later per-tick drain re-resolve weak sources (`fs_time`) once stronger ones become available, and gives the UI/debug surface a way to answer "why does this photo show up under this date?". Adds the column at all `InsertImageExif` construction sites with `None` placeholders (the resolver wiring lands in a follow-up commit), and extends the `update_exif` SET tuple so the column survives the GPS-write re-read path. Partial index `idx_image_exif_date_backfill` is created for the upcoming drain query. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-06 15:57:49 -04:00
Cameron Cordes	7ca888e95d	duplicates: filter low-entropy hashes + dHash double-check, fix backfill loop The perceptual cluster was producing one giant first group that contained hundreds of unrelated images. Two causes: - Solid-colour images (skies, black frames, monochrome scans) all hash to near-zero pHashes that Hamming-distance-zero to each other. - Single-link clustering on pHash alone is too permissive — a chain of weakly-similar images all collapses into one cluster. Fixed by skipping hashes outside the popcount [8, 56] band (uniform content) and requiring dHash agreement within threshold before unioning a candidate edge from the BK-tree. Two new tests pin both invariants. Backfill bin separately fix: decode-failed rows kept phash_64=NULL and got re-pulled by every batch, infinite-looping on a queue of unbreakable formats. Persist a 0/0 sentinel on decode failure so the row leaves the candidate set; the all-zero hash is excluded from clustering by the same entropy filter so it doesn't pollute results. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 18:08:05 -04:00
Cameron Cordes	7584cd8792	duplicates: perceptual hash + soft-mark resolution + upload 409 Adds pHash + dHash columns alongside the existing blake3 content_hash so near-duplicates (re-encoded, resized, format-converted copies) become queryable. /duplicates/{exact,perceptual} return groups; /duplicates/ {resolve,unresolve} flip a duplicate_of_hash soft-mark on losing rows and union perceptual-only tag sets onto the survivor. The default /photos listing filters duplicate_of_hash IS NULL so demoted siblings stop cluttering the grid; include_duplicates=true opts back in for Apollo's review modal. Upload now hashes bytes pre-write and returns 409 with the canonical sibling when a file's bytes already exist. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 17:36:01 -04:00
Cameron Cordes	fb4df4b195	style: cargo fmt sweep Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 19:01:00 -04:00
Cameron Cordes	814066551e	multi-library: per-library excluded_dirs Adds a nullable comma-separated TEXT column to the libraries table. Effective excludes for a walk = (env-var globals) ∪ (library.excluded_dirs). Empty / NULL = no library-specific extras; the global env var still applies. Migration (2026-05-01-110000_libraries_excluded_dirs) ALTER TABLE libraries ADD COLUMN excluded_dirs TEXT. NULL on every existing row — no behavior change on upgrade. Library struct + helpers (libraries.rs) - Library gains excluded_dirs: Vec<String>, parsed from the column by parse_excluded_dirs_column (drops empties / whitespace, matches the env-var parser). - Library::effective_excluded_dirs(globals) returns the union. - From<LibraryRow> hydrates the field on AppState construction so /libraries surfaces it. Watcher / walkers / memories Every per-library walker now consults the effective set: - process_new_files (file-watch ingest, RAW/EXIF/face) - process_face_backlog (filter_excluded inherits) - create_thumbnails (startup + new-file branch) - update_media_counts (Prometheus gauge) - cleanup_orphaned_playlists (per-library source-existence check) - memories endpoint (PathExcluder) Effective set is computed once per per-library iteration in the watcher tick and threaded through; called functions retain their flat &[String] signature (no per-library awareness needed inside the walker primitives). Use case: mount a parent directory while a sibling library covers a child subtree, and exclude the child subtree from the parent so the libraries don't double-walk / double-write image_exif. With hash-keyed derived data (Branches B/C), the duplication-avoidance is the only cost prevented — face / tag / insight sharing was already correct via content_hash. Tests: 228 pass (226 from previous + 2 new in libraries::tests: parse_excluded_dirs_column edge cases, effective_excluded_dirs_unions_global_and_per_library). CLAUDE.md gains a "Per-library excludes" subsection of the multi-library data model. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 19:54:17 +00:00
Cameron Cordes	3598bb2cfe	multi-library: operator kill switch via libraries.enabled A small follow-up to Branches A/B/C. Adds a nullable-default-1 boolean column to the `libraries` table that controls whether the watcher considers the library at all. Useful for staging a new mount before committing to ingest, and as a maintenance kill switch when a library needs to be quiet without being unmounted. Migration (2026-05-01-100000_libraries_enabled_flag) ALTER TABLE libraries ADD COLUMN enabled BOOLEAN NOT NULL DEFAULT 1. Existing rows stay enabled — no behavior change on upgrade. Watcher gate (main.rs) At the top of the per-library loop, if !lib.enabled { continue; } — runs BEFORE the availability probe. Disabled libraries don't enter the health map, don't get probed, don't get ingest, don't get any maintenance pass. The initial sweep before the loop's first sleep also skips disabled libraries. Orphan-GC consensus (library_maintenance.rs) all_libraries_online filters disabled libraries out of the consensus check — they're treated as out-of-scope, not as blockers. Otherwise flipping enabled=false would permanently halt orphan GC for the rest of the system, which is the opposite of the intended kill-switch semantics. Cross-library duplicates: safe by construction. Hash-keyed derived data (face_detections, tagged_photo with hash, photo_insights with hash) is anchored by ANY image_exif row carrying the hash. Disabling a library does NOT delete its image_exif rows, so a hash referenced by a disabled library's row stays anchored — derived data survives. collect_orphan_hashes deliberately doesn't filter image_exif by library.enabled for exactly this reason. No HTTP endpoint. Library mutation is rare-enough infra work that a SQL toggle is fine, and a public mutation endpoint without a role / permission story would be poorly-prioritized exposure for a single-user tool. Documented in CLAUDE.md. Tests: 226 pass (225 from Branch C + 1 new all_libraries_online_treats_disabled_as_out_of_scope, which proves that even an explicit Stale entry on a disabled library doesn't block the consensus). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 19:10:24 +00:00
Cameron Cordes	263e27e108	multi-library: handoff + orphan GC with two-tick consensus Branch C of the multi-library data-model rollout. Implements the operational maintenance pipeline pinned in CLAUDE.md → "Multi-library data model" / "Library availability and safety". Branches A and B land first; this branch builds on top. New module: src/library_maintenance.rs Three idempotent passes the watcher runs every tick after the per-library ingest loop: 1. Missing-file scan (per online library) For each Online library, load a paginated page of image_exif rows (IMAGE_EXIF_MISSING_SCAN_PAGE_SIZE, default 500), stat() each one, and delete rows whose source file is NotFound. Permission/IO errors are skipped, never deleted. Capped at IMAGE_EXIF_MISSING_DELETE_CAP_PER_TICK (default 200) per library per tick — so a pathological mount that returns NotFound for everything can't wipe the table in one cycle. Cursor advances across ticks, wraps on partial-page returns, and naturally cycles through the entire library over many minutes. Skipped wholesale for Stale libraries via the existing probe gate. 2. Back-ref refresh (DB-only) For face_detections / tagged_photo / photo_insights: any hash-keyed row whose (library_id, rel_path) no longer matches an image_exif row, but whose content_hash does, is repointed at a surviving image_exif location. Pure SQL with EXISTS guards so rows whose hash is fully orphaned are left alone (the orphan GC handles those). Idempotent; no availability gate needed. This is what makes a recent → archive move invisible to readers: when pass 1 retires the lib-A row, pass 2 pivots tags / faces / insights to lib-B's surviving path before any client notices. 3. Orphan GC (destructive) Hash-keyed derived rows whose content_hash has no image_exif referent are GC-eligible. Two-tick consensus: a hash must be observed orphaned on two consecutive ticks AND every library must be Online for both. A single Stale tick within the window cancels all pending deletes (they remain marked but won't be promoted) — they're re-evaluated next tick. The pending set lives in OrphanGcState (in-memory); a watcher restart resets it, which can only delay a delete, never cause one. Hashes that re-appear in image_exif between ticks are "revived" from the pending set (handles transient share unmount / remount). Two new ExifDao methods: - list_rel_paths_for_library_page(library_id, limit, offset) for the paginated missing-file scan. - (count_for_library landed in Branch A.) Watcher wiring (main.rs) Per-library: missing-file scan inside the existing per-library loop, after process_new_files, gated by the same probe check that already protects ingest. After the loop: reconcile (Branch B), back-ref refresh, then run_orphan_gc. The maintenance connection is opened once per tick (image_api::database::connect), used by all three DB-only passes, and dropped at end of tick. CLAUDE.md gains a "Maintenance pipeline" subsection that describes the three passes and their interaction with the existing availability-and-safety policy. Tests: 225 pass (217 from Branch B + 8 new in library_maintenance covering back-ref refresh including the fully-orphaned no-op case, two-tick GC consensus, Stale-tick consensus reset, image_exif re-appearance revival, multi-table delete, and the all_libraries_online helper). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 16:27:53 +00:00
Cameron Cordes	48cac8c285	multi-library: hash-keyed tagged_photo + photo_insights with reconciliation Branch B of the multi-library data-model rollout. tagged_photo and photo_insights now follow the bytes (content_hash), not the path, matching the policy pinned in CLAUDE.md "Multi-library data model". Branch A's availability probe and EXIF scoping land first; this branch builds on top. Migration (2026-05-01-000000_hash_keyed_derived_data) Adds nullable content_hash columns to tagged_photo and photo_insights, with partial indexes on the non-null subset to keep the index small during the transitional window. The migration backfills from image_exif: * tagged_photo joins on rel_path alone (no library_id available); * photo_insights joins on (library_id, rel_path), unambiguous. Rows whose image_exif hash isn't known yet stay null and the runtime reconciliation pass populates them as the hash backlog drains. Insert-time population TagDao::tag_file looks up image_exif.content_hash by rel_path before inserting; the hash is written into the new column. InsightDao::store_insight does the same scoped to (library_id, rel_path). Caller-supplied hash on InsertPhotoInsight wins; otherwise the DAO does the lookup. Both paths fall back to None if the hash isn't known yet — reconciliation backfills. Reconciliation (database/reconcile.rs) Three idempotent passes the watcher runs once per tick after the per-library backfill loop: 1. tagged_photo NULL hashes → populate from image_exif by rel_path. 2. photo_insights NULL hashes → populate by (library_id, rel_path). 3. photo_insights scalar merge — when multiple is_current rows share a content_hash, keep the earliest generated_at as current; demote the rest. Demoted rows keep their data so /insights/history is unaffected; only the "current" pointer narrows to one per hash. No filesystem dependency, so reconcile doesn't need the availability gate; runs every tick. Logs once when something changed, debug otherwise. Tags are set-valued under the policy (union on read, already DISTINCT in queries), so there is no analogous tag-collapse pass — duplicate (tag_id, content_hash) rows across libraries are harmless. Read paths are unchanged in this branch — lookup_tags_batch's existing rel_path-via-hash-sibling expansion still produces the correct merge. A follow-up can simplify reads to use the new column directly for performance. Tests: 217 pass (212 pre-existing + 5 new in reconcile covering NULL-fill, hash-not-yet-known no-op, library scoping on insights, earliest-wins collapse, idempotency). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 14:52:16 +00:00
Cameron Cordes	48ed7be5d9	libraries: initial availability sweep before watcher's first sleep new_health_map seeds every library as Online, and the watcher's tick loop sleeps WATCH_QUICK_INTERVAL_SECONDS (default 60s) before its first probe — meaning /libraries reported the optimistic default for up to a minute after boot, even when a share was clearly unmounted. Run the same refresh_health pass once at the top of the watcher thread before entering the sleep loop. /libraries is then truthful within milliseconds of the watcher thread starting (effectively from the first HTTP request, since the watcher spawns well before the server binds). The per-tick gate inside the loop is unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 14:33:45 +00:00
Cameron Cordes	eea1bf3181	multi-library: availability probe + scoped EXIF queries + collision fixes Branch A of the multi-library data-model rollout. Three threads of correctness/safety work that ship together because the new mount needs all three before it can land: 1. Library availability probe (libraries.rs, state.rs, main.rs) New LibraryHealth (Online \| Stale { reason, since }) and a shared LibraryHealthMap on AppState. Probe checks root_path exists + is_dir + readable + non-empty (relative to a "had_data" signal so fresh mounts aren't downgraded). The watcher tick begins with a refresh_health() per library; stale libraries skip ingest, the hash backfill, and face-detection backlog drains for that tick. The orphaned-playlist cleanup also gates on every library being online — a missing source on a stale library is indistinguishable from a transient unmount, and the cleanup is destructive. /libraries now returns each library with its current health state. Logs only on Online↔Stale transitions so a long outage doesn't spam. New ExifDao::count_for_library is the "had_data" signal. 2. EXIF queries scoped by library_id (database/mod.rs, files.rs, main.rs, tags.rs) query_by_exif gains an Option<i32> library filter; /photos and /photos/exif now pass it. Without this, an EXIF-filtered request scoped to ?library=N returned cross-library results because the handler resolved the library but didn't push it through to SQL. get_exif_batch gains the same option. The watcher's per-library ingest, face-candidate build, and content-hash backfill all scope to their library; the union-mode /photos date-sort path and the library-agnostic tag fan-out (lookup_tags_batch, by design) keep using None. 3. Derivative-path collision fixes (content_hash.rs, main.rs) New content_hash::library_scoped_legacy_path helper: <derivative_dir>/<library_id>/<rel_path>. Thumbnail generation (startup walk + watcher needs-thumb check) and serving now use it; serving falls back to the bare-legacy mirrored path so pre-multi-library deployments keep working without regeneration. Without this, lib2 with the same rel_path as lib1 would have its thumbnail request short-circuit to lib1's image. Orphaned-playlist cleanup walks every library when checking for the source video (was: BASE_PATH only). Without this, mounting a 2nd library and waiting 24h would delete every playlist whose source lived only in the 2nd library. The HLS playlist write path collision (filename-only basename, not rel_path) is left as a known issue with a TODO at the call site — the actor-pipeline rewrite belongs in Branch B/C. Tests: 212 pass (cargo test --lib). New tests cover the probe states (online / missing root / non-dir / empty-with-prior-data), refresh_health transitions, query_by_exif scoping, get_exif_batch keying on (library_id, rel_path), library_scoped_legacy_path, and count_for_library. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 14:12:49 +00:00
Cameron Cordes	f50655fb21	indexer: apply EXCLUDED_DIRS to remaining WalkDir callers Audit follow-up to `5bf4956`. The same `@eaDir` pruning that protects the indexer also needs to protect the other walks under library roots: - `create_thumbnails` walks every file in every library to generate thumbnails. Without EXCLUDED_DIRS, it would generate thumbnails of Synology's `SYNOFILE_THUMB_*.jpg` thumbnails (thumbnails of thumbnails). - `update_media_counts` walks for the prometheus IMAGE / VIDEO gauges. Without EXCLUDED_DIRS, the gauges over-count by however many phantom `@eaDir` images live alongside the real photos. - `cleanup_orphaned_playlists` walks BASE_PATH searching for source videos by filename. EXCLUDED_DIRS isn't a behavior change for typical Synology mounts (no .mp4 in @eaDir), but it's a correctness win for any operator-defined exclude that happens to contain video. Refactor: add `walk_library_files(base, excluded_dirs) -> Vec<DirEntry>` to file_scan.rs as the shared primitive. `enumerate_indexable_files` now layers media-type + mtime filters on top of it. One new test covers the lower-level helper (returns all extensions, prunes excluded subtrees). `generate_video_gifs` (currently `#[allow(dead_code)]`, not reachable from main) gets the `update_media_counts` signature update and reads EXCLUDED_DIRS from env so a future revival isn't broken — but its WalkDir walk stays raw because the dual lib/bin compile makes the file_scan module path non-trivial there. Tagged with a comment. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 20:21:17 +00:00
Cameron Cordes	5bf49568f1	indexer: prune EXCLUDED_DIRS at WalkDir time, extract enumerate_indexable_files Synology drops `@eaDir/.../SYNOFILE_THUMB_.jpg` files alongside every photo. The face-detect pipeline already filters those out via `face_watch::filter_excluded`, but the filter runs after* the indexer has already inserted rows into `image_exif`. Result: phantom rows whose content_hash never matches a `face_detections` row, so the anti-join in `list_unscanned_candidates` returns them every tick. They're filtered out at runtime, no marker is written, and the cycle repeats forever — log spam, wrong stats denominator, and on a real Synology library the phantom rows balloon into the hundreds of thousands. Move the exclusion to the WalkDir pass, where filter_entry can prune whole subtrees instead of walking and discarding leaves. Extract the pre-existing 30-line walker chain in main.rs::process_new_files into `file_scan::enumerate_indexable_files` so it's testable in isolation. Six tests cover the bug (eadir prune), nested patterns, absolute-under-base syntax, non-media filtering, modified_since semantics, and forward-slash rel_path normalization. Out of scope (other WalkDir callers in main.rs that don't yet apply EXCLUDED_DIRS — thumbnail gen at 1309, media scan at 1377, video playlist scan at 1685, and two nested walks at 1709 / 1743): separate audit PR. Operator note: existing phantom rows still need a one-shot cleanup — DELETE FROM face_detections WHERE content_hash IN ( SELECT content_hash FROM image_exif WHERE rel_path LIKE '%/@eaDir/%' ); DELETE FROM image_exif WHERE rel_path LIKE '%/@eaDir/%' OR rel_path LIKE '@eaDir/%'; Run before attaching a fresh Synology-sourced library. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 19:29:37 +00:00
Cameron Cordes	1971eeccd6	faces: drain backfill + detection backlog every tick, not just full scans Symptom: ImageApi restart, then ~60 minutes of silence — no face_watch lines at all. Cause: backfill + face-detection candidate build were both gated inside process_new_files, which during quick scans (every 60s) only walks files modified in the last interval. The pre-existing unhashed / unscanned backlog never entered the candidate set, so it only drained on the full-scan path (default once per hour). Surfaced as "scan stuck at 1101/13118" — most of those rows were waiting on the next full scan. Two new per-tick passes that work directly off the DB: (1) backfill_unhashed_backlog uses ExifDao::get_rows_missing_hash to pull unhashed rows in id order, capped (FACE_HASH_BACKFILL_MAX_PER_TICK default 2000), and writes content_hash for each. No filesystem walk — the walk was the gating filter that hid the backlog. (2) process_face_backlog uses a new FaceDao::list_unscanned_candidates (LEFT-anti-join on content_hash via raw SQL, GROUP BY hash so duplicates fire one detect call) to pull a capped batch of hashed-but-unscanned rows (FACE_BACKLOG_MAX_PER_TICK default 64) and runs the existing face_watch detection pipeline on them. Both run only when face_client.is_enabled(). The cap on (2) is small because each candidate is a real Apollo round-trip — 64/tick at 60s quick interval ≈ 64 detections/min, which paces an 8-core CPU inference comfortably while keeping a steady flow visible in logs. process_new_files's own backfill stays in place for the same-tick flow (a brand-new upload gets hashed AND face-scanned in the tick where it's discovered) but is now belt-and-suspenders. Test backstop pinning the new DAO method's filter contract: only hashed, unscanned, in-library rows are returned; scanned rows, unhashed rows, and other-library rows are filtered out.	2026-04-30 01:46:49 +00:00
Cameron Cordes	16abacf4c5	faces: backfill no longer stalls on chronic-error files at the front The content-hash backfill capped at 500/tick AND counted errors against that cap. So a pocket of files that errored every time (vanished mid-scan, permission denied, unreadable) at the head of the exif_records iteration order burned the entire budget every tick and the rest of the backlog never advanced — surfacing as a face-scan stuck at e.g. 44% with no progress. Without a content_hash, those photos never become face-detection candidates, so it looks like detection is broken when really it's the prerequisite hash that isn't filling. Two fixes: - Cap on successes only. Errors still get counted and logged but don't burn the per-tick budget; the loop keeps moving past them to the working files behind. Errors are bounded by the unhashed backlog size (each record walked at most once per tick), so this can't run away. - Always log the unhashed backlog count when non-zero. Previously "stuck at 44%" looked silent from the outside; now every tick surfaces "backfilled N/M; K still need backfill" so an operator can tell backfill is making progress (or isn't). Also bumps the default cap from 500 to 2000. Hashing is cheap (blake3 + one DB UPDATE), and 500 was conservative for a personal-scale library where 10k+ unhashed files is a normal first-run state.	2026-04-30 00:03:26 +00:00
Cameron Cordes	a24fac5511	faces: backfill missing content_hash from the file watcher Photos indexed before content-hashing landed (or where the hash compute failed silently on insert) end up in image_exif with NULL content_hash. build_face_candidates keys on content_hash, so those rows would never become face candidates without backfill — symptom: face detection logs nothing despite photos being in the library and the watcher running. The dedicated `backfill_hashes` binary already handles this; this commit lets the watcher self-heal during full scans so the deploy 'just works' for face recognition without operator action. Idempotent — subsequent scans see populated hashes and no-op. Bounded per tick by FACE_HASH_BACKFILL_MAX_PER_TICK (default 500) so a watcher tick on a 50k-photo legacy library doesn't blake3 every file in one shot. For very large backlogs the dedicated binary is still faster (no DAO mutex contention with the watcher loop). Only runs when face_client.is_enabled(), so legacy deploys without APOLLO_FACE_API_BASE_URL keep the same behavior. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 20:41:08 +00:00
Cameron Cordes	23f4941471	faces: surface enabled/disabled state + per-tick candidate count Manual deploy debugging: 'Saved thumbnail' logs were visible (boot-time thumbnail backfill) but no face_watch logs were appearing, with no obvious way to tell whether the integration was disabled, hadn't reached a full scan yet, or had simply seen no new files. Two log lines: - watch_files startup: 'Face detection: ENABLED' / 'DISABLED (set APOLLO_FACE_API_BASE_URL or APOLLO_API_BASE_URL to enable)' so you can tell at a glance whether the env wired through. - process_new_files (debug-level): 'face_watch: scan tick — N image file(s) walked, M candidate(s) (library 'main', modified_since=...)' so an empty-candidate scan is distinguishable from a misconfigured or skipped one without bumping log level for the rest of the watcher. No behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 20:19:17 +00:00
Cameron Cordes	1859399759	faces: phase 4 — people-tag bootstrap + auto-bind on detection Wires the existing string people-tags into the new persons table and auto-binds new detections to a same-named person when the photo carries exactly one matching tag. ImageApi has no notion of which tags are people-tags today (purely a user mental model), so this is operator- confirmed: the suggester surfaces candidates with a heuristic flag, the operator confirms, then bootstrap creates persons rows. Auto-bind follows on every detection thereafter. New endpoints: GET /tags/people-bootstrap-candidates Per case-insensitive name group: display name (most-frequent capitalization), normalized lowercase, summed usage_count, looks_like_person heuristic flag, already_exists check against the persons table. Sorted persons-likely-first then by count. POST /persons/bootstrap Body: {names: [string]}. Idempotent — pre-fetches the existing- name set so a duplicate request reports per-row "already exists" instead of 409-ing each insert. Created rows get created_from_tag=true; failed rows surface in `skipped` with a reason. looks_like_person heuristic — conservative on purpose because the operator confirms in the UI: - 1–2 whitespace-separated words - Each word starts uppercase, no digits anywhere - Single-word names not on a small denylist (cat, christmas, beach, sunset, untagged, ...). Two-word names skip the denylist so "Sarah Smith" is never false-rejected. FaceDao additions: - find_persons_by_names_ci — bulk lowercase-name → person_id lookup via sql_query (Diesel's BoxedSelectStatement + LOWER() doesn't play well with the type system). - person_reference_embedding — L2-normalized mean of a person's detected embeddings, filtered by model_version so a future buffalo_xl row can never contaminate an in-flight buffalo_l auto- bind decision. Returns None when the person has no faces yet. - assign_face_to_person — sets face_detections.person_id and, only when persons.cover_face_id is NULL, claims this face as cover. The UI's hand-picked cover survives later auto-binds. - decode_embedding_bytes / cosine_similarity helpers — pub(crate) so face_watch can decode the wire bytes once and feed them through the cosine threshold. Auto-bind in face_watch::process_one: After every successful detect, for each newly-stored auto face we pull the photo's tags, look up which (if any) map to existing persons, and: - skip when zero or multiple distinct persons are matched (multi-match is genuinely ambiguous; cluster suggester handles it) - on first face for a person: bind unconditionally so bootstrap can ever produce a usable reference - thereafter: bind iff cosine(new_emb, person_ref) >= FACE_AUTOBIND_MIN_COS (default 0.4, env-tunable to 0..=1) The reference embedding comes from person_reference_embedding under the same model_version as the candidate, so a model upgrade never silently re-anchors a person's centroid. Plumbing: watch_files now constructs its own SqliteTagDao alongside the other watcher DAOs and threads it through process_new_files → run_face_detection_pass → process_one. The handler-side TagDao registration in main.rs already covers bootstrap_candidates_handler; no extra app_data wiring needed. Tests: 8 new (faces.rs): - looks_like_person accepts/rejects/two-word-skips-denylist (3) - cosine_similarity on identical / orthogonal / opposite / mismatch / zero / empty inputs - decode_embedding_bytes round-trip + size validation - find_persons_by_names_ci groups case + handles empty input - person_reference_embedding filters by model_version (buffalo_l ref must not include buffalo_xl rows) - assign_face_to_person sets cover when unset, doesn't overwrite cargo test --lib: 179 / 0; fmt + clippy clean for new code. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 18:55:01 +00:00
Cameron Cordes	4dee7b6f73	faces: phase 3 — file-watch hook drives auto detection Wire face detection into ImageApi's existing scan loop so new uploads pick up faces automatically and the initial backlog grinds through on full-scan ticks. No new job system; Phase 2's already_scanned check makes the work implicitly idempotent (one face_detections row per content_hash, including no_faces / failed marker rows). face_watch.rs (new): - run_face_detection_pass(library, excluded_dirs, face_client, face_dao, candidates) — sync entry point. Builds a per-pass tokio runtime and fans out detect calls bounded by FACE_DETECT_CONCURRENCY (default 8). The watcher thread itself stays sync. - filter_excluded — applies the same PathExcluder /memories uses, so @eaDir / .thumbnails / EXCLUDED_DIRS-listed paths skip detection before we burn a detect call (and Apollo's GPU memory) on junk. - read_image_bytes_for_detect — RAW/HEIC route through extract_embedded_jpeg_preview because opencv-python-headless can't decode either; everything else gets a plain std::fs::read so EXIF orientation reaches Apollo's exif_transpose intact. - process_one — translates Apollo's response into the Phase 2 marker contract: faces[] empty → no_faces; FaceDetectError::Permanent → failed (don't retry); Transient → no marker (next scan retries); success with N faces → N detected rows with the embeddings unpacked. main.rs (process_new_files + watch_files): - watch_files now also takes face_client + excluded_dirs; the watcher thread builds a SqliteFaceDao the same way it builds ExifDao / PreviewDao. - After the EXIF write loop, build_face_candidates queries image_exif for the just-walked image paths' content_hashes (covers new uploads and pre-existing backlog), filters out anything already_scanned, and hands the rest to face_watch::run_face_detection_pass. - Bypassed wholesale when face_client.is_enabled() is false — keeps the watcher usable on legacy deploys where Apollo isn't configured. Tests: 5 face_watch unit tests cover the parts that don't need a real Apollo: - filter_excluded drops dir-component patterns (@eaDir) without matching substring file names (eaDir-not-a-thing.jpg keeps). - filter_excluded drops absolute-under-base subtrees (/private). - empty EXCLUDED_DIRS short-circuits cleanly. - read_image_bytes_for_detect passes JPEG bytes through verbatim (orientation must reach Apollo unmodified). - read_image_bytes_for_detect falls through to plain read when a RAW-extension file has no embedded preview, so Apollo gets a chance to 422 and we mark failed rather than infinitely-retrying. cargo test --lib: 170 / 0; fmt and clippy clean for new code. End-to-end (drop a photo → face_detections row appears) needs Apollo running and is deferred to deploy-time verification. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 18:21:19 +00:00
Cameron Cordes	860169032b	faces: phase 2 — schema + manual face/person CRUD Land the persistence model and HTTP surface for local face recognition. Inference still lives in Apollo (Phase 1); this side adds the data home plus every endpoint Apollo's UI and FileViewer-React will consume. Schema (new migration 2026-04-29-000000_add_faces): - persons: visual identities. Optional entity_id bridges to the existing knowledge-graph entities table; auto-bridging is left to the management UI (we don't muddy LLM provenance from face rows). UNIQUE(name COLLATE NOCASE) so 'alice' / 'Alice' fold to one row. - face_detections: keyed on content_hash (cross-library dedup), with status='detected' carrying bbox + 512-d embedding BLOB, and 'no_faces' / 'failed' marker rows that tell Phase 3's file watcher not to re-scan. Marker invariant enforced via CHECK; partial UNIQUE on content_hash WHERE status='no_faces' guards against double-marks. Schema regenerated with `diesel print-schema` against a clean migration run; joinables added for face_detections → libraries / persons and persons → entities. face_client.rs (sibling of apollo_client.rs): - reqwest multipart, 60 s timeout (CPU inference on a backlog can be slow; bounded threadpool on Apollo serializes calls anyway). - FaceDetectError::{Permanent, Transient, Disabled} — Phase 3 keys its marker-row decision on this. 422 → mark failed, 5xx → defer. - APOLLO_FACE_API_BASE_URL falls back to APOLLO_API_BASE_URL when unset; both unset = is_enabled() false, callers no-op. faces.rs (DAO + handlers): - SqliteFaceDao implements the full FaceDao trait; person face counts go through sql_query because diesel's BoxedSelectStatement + group_by trips trait-resolver recursion. - merge_persons re-points face rows in a transaction, copies notes when target's are empty, deletes src. - manual POST /image/faces resolves content_hash through image_exif, crops the user-drawn bbox with 10% padding (detector wants context around ears/jaw), POSTs the crop to face_client.embed for a real ArcFace vector, then inserts source='manual'. - Cluster-suggest (Phase 6) gets its data from GET /faces/embeddings — base64-encoded paged BLOBs so Apollo's DBSCAN can stream them without ImageApi pre-aggregating. Endpoints registered alongside add_*_services in main.rs: GET /faces/stats?library= GET /faces/embeddings?library=&unassigned=&limit=&offset= GET /image/faces?path=&library= POST /image/faces (manual create via embed) PATCH /image/faces/{id} DELETE /image/faces/{id} GET /persons?library= POST /persons GET /persons/{id} PATCH /persons/{id} DELETE /persons/{id}?cascade=set_null\|delete (set_null default) POST /persons/{id}/merge GET /persons/{id}/faces?library= The file-watch hook (Phase 3) and the rerun-on-one-photo handler (Phase 6) live behind the FaceDao methods marked dead_code today — they're called only when those phases land. Same shape for the trait methods that aren't reached by Phase 2 routes. Tests: 3 DAO unit tests cover person CRUD + case-insensitive uniqueness, marker-row idempotency (mark_status is a no-op when any row exists), and merge re-pointing faces. Cargo.toml: reqwest gains the `multipart` feature. cargo build / cargo test --lib / cargo fmt / cargo clippy --all-targets all clean for the new code; the two pre-existing test_path_excluder failures and the pre-existing sort_by clippy warnings are unrelated and present on master. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 18:03:42 +00:00
Cameron Cordes	57fb0bcd3c	EXIF GPS write: POST /image/exif/gps via exiftool New endpoint accepts {path, library, latitude, longitude} and shells out to exiftool to write GPSLatitude/GPSLongitude (with N/S, E/W refs) into the file's EXIF in place. After the write, the handler re-extracts EXIF and updates the image_exif row so the DB stays in sync — the response carries the updated metadata block in one round-trip. Falls through to store_exif if the row is missing. `exif::write_gps` is the small helper. `-overwrite_original` so no .orig sidecar is left behind. Validates lat/lon range + supports_exif before spawning exiftool. Format support matches the existing read path (JPEG / TIFF / RAW / HEIF / PNG / WebP) — videos still need a different writer and aren't covered. Apollo's "+ PIN" carousel button (separate commit on the Apollo side) calls this through /api/photos/exif/gps. Drive-by: cargo fmt one-line collapse on apollo_client.rs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 22:25:40 +00:00
Cameron Cordes	00b3c80141	RAW: try IFD0 + IFD1 for embedded preview, serve at full size The thumbnail pipeline's embedded-JPEG extractor only checked IFD1 (THUMBNAIL), which on many Nikon NEFs is missing or zero-length even when IFD0 (PRIMARY) carries a perfectly good 1-2 MP reduced-resolution preview the camera writes for in-body review. The previous behavior produced black thumbs on disk: the buggy IFD1 pointer resolved to a short byte sequence that happened to satisfy the SOI sanity check, image::load_from_memory accepted it, and the resize path quietly wrote a black JPEG. Now both IFDs are checked and the larger valid JPEG wins. Format- agnostic: applies to every TIFF-based RAW (NEF / ARW / CR2 / DNG / RAF / ORF / RW2 / PEF / SRW / TIFF). is_tiff_raw is now pub so main.rs can gate its full-size handler on it. Also extends the /image handler so size=full requests for RAW formats serve the embedded preview as image/jpeg instead of NamedFile-streaming the original RAW bytes - browsers can't decode a .nef container, so <img src=...> would otherwise land as a broken image. Falls through to NamedFile if no preview is present, preserving the historical behavior for callers that genuinely want the original bytes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 16:52:10 +00:00
Cameron Cordes	7621282419	Thumb orientation + library filter on /photos/exif Two follow-ups on the same feature branch: 1. Bake EXIF orientation into generated thumbnails. The `image` crate doesn't apply Orientation on load, and `save_with_format(..Jpeg)` drops EXIF — so portrait phone shots ended up sideways in any client that displays the cached thumb directly (no EXIF tag for the browser to compensate from). New `exif::read_orientation` reads the tag cheaply (no full EXIF parse) and `exif::apply_orientation` does the rotate/flip via image's existing `rotate90/180/270` + `fliph/flipv`. Applied in both branches of `generate_image_thumbnail` (RAW embedded- JPEG path and the regular `image::open` path). Existing thumbnails in the cache are still wrong-orientation; wipe the thumb dir or run a one-off backfill once this lands. 2. Optional `library` query param on `/photos/exif`. Accepts numeric id or name (same shape as `/image?library=...`), resolved via the existing `resolve_library_param` helper so a bad value 400s before we touch the DAO. Filter is applied post-query in the handler rather than pushed into `query_by_exif` to keep the DAO trait (and its test mocks) unchanged. Cheap enough at typical library counts; can be moved into SQL later if it ever isn't. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 17:29:36 -04:00
Cameron Cordes	c6f82ebaba	Batch EXIF endpoint: GET /photos/exif Adds a single round-trip projection of `image_exif` for every photo whose `date_taken` falls in `[date_from, date_to]`. Wraps the existing `ExifDao::query_by_exif` DAO method which already handles the SQL filter in one query against the covering index — the only missing piece was HTTP plumbing. Designed for window-scoped consumers like Apollo's photo-to-track matcher, which currently does N+1 (one `/photos` listing + one `/image/metadata` per photo). Because `/image/metadata` serializes on `Data<Mutex<dyn ExifDao>>`, that pattern can take 10s+ for windows with hundreds of photos. The new endpoint takes one mutex acquisition for the whole batch. Response shape: { photos: [ { file_path, library_id, library_name, camera_model, width, height, gps_latitude, gps_longitude, date_taken } ], total: N } Two notes on scope: - Photos with NULL `date_taken` are excluded by `query_by_exif`'s semantics. Filename-extracted dates are not synthesized here; rare callers that need that fallback can still hit `/image/metadata`. - GPS columns are stored as f32 in image_exif to keep row size small; the JSON shape widens to f64 so clients don't have to know about the on-disk precision. Library names are pre-mapped from `app_state.libraries` once and stamped on each row, avoiding an O(rows × libraries) linear scan. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 16:38:53 -04:00
Cameron	13b9d54861	fix(scan): quiet startup scans & thumbnail RAW/HEIC Three recurring issues on every full scan: 1. Video playlist scans re-enqueued every file only to reject it as AlreadyExists. Pre-filter in ScanDirectoryMessage and QueueVideosMessage so we skip videos whose .m3u8 already exists, and demote the leaked AlreadyExists log to debug. 2. image crate was built with only jpeg/png features, so webp/tiff/avif files logged "format not supported" every scan. Enable those features. 3. RAW (ARW/NEF/CR2/...) and HEIC thumbnails weren't generated, so the scan kept retrying them. Try the file's embedded JPEG preview via kamadak-exif first (fast, pure-Rust, works on Sony ARW where ffmpeg's TIFF decoder fails). Fall back to ffmpeg for HEIC/HEIF and RAWs with no preview. Anything still undecodable gets a <thumb>.unsupported sentinel so future scans skip it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 20:47:13 -04:00
Cameron	079cd4c5b9	feat(ai): streaming chat endpoint with live tool events Add LlmClient::chat_with_tools_stream and SSE endpoint POST /insights/chat/stream that emits text deltas, tool_call / tool_result pairs, truncated notice, and a terminal done frame as the agentic loop runs. - Ollama: parses NDJSON from /api/chat stream, accumulates content deltas, emits Done with tool_calls from the final chunk. - OpenRouter: parses OpenAI-compatible SSE, reassembles tool_call argument deltas by index, asks for stream_options.include_usage. - InsightChatService spawns the loop on a tokio task, feeds events through an mpsc channel, persists training_messages at the end. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 16:57:41 -04:00
Cameron	65ab10e9a8	feat(ai): chat rewind + ollama metrics logging Rewind: POST /insights/chat/rewind truncates training_messages at a given rendered index, dropping the target message plus any preceding tool-call scaffolding. The initial user prompt is protected. Metrics: log prompt_eval_count/duration and eval_count/duration from every Ollama chat response, rendered as tokens + ms + tok/s. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 15:16:32 -04:00
Cameron	0b9528f61e	feat(ai): chat continuation for photo insights (server v1) Adds POST /insights/chat and GET /insights/chat/history. Replays the stored agentic conversation through the same backend the insight was generated with (or a per-turn override), runs a short tool-calling loop, and persists the extended history in append or amend mode. Backend switching: same-backend or hybrid->local replay verbatim; local->hybrid is rejected in v1 (would require on-the-fly vision description rewrite). Per-(library, file) async mutex serialises concurrent turns. Soft context budget drops oldest tool_call+result pairs when the serialized history exceeds num_ctx - 2048 tokens. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 13:00:27 -04:00
Cameron	e2eefbd156	feat(ai): curated OpenRouter model picker for hybrid backend Add OPENROUTER_ALLOWED_MODELS env var and GET /insights/openrouter/models endpoint returning the curated list verbatim. Drop the live capability precheck in hybrid mode — trust the operator's allowlist; bad ids surface as a chat-call error. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 10:36:19 -04:00
Cameron	3027a3ffda	perf: DB-backed recursive /photos + watcher reconciliation Recursive listings now query image_exif instead of walking disk, taking union-mode /photos from ~17s to sub-second on a 10k-file library. The watcher's full scan prunes stale image_exif rows so the DB stays in parity with the filesystem when files are deleted externally. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 01:55:07 +00:00
Cameron	586b735af5	feat: include per-photo library id in /photos response Adds a parallel `photo_libraries: Vec<i32>` array alongside `photos` in `PhotosResponse` so clients can render per-thumbnail badges. Populated with the scoped library id at the two main return sites; left empty for `/favorites` since favorites are library-agnostic. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 01:55:07 +00:00
Cameron	c2ee3996be	chore: apply cargo fmt + clippy cleanup across crate Silence forward-looking dead_code on unused DAO modules, annotate individual placeholder items, rewrite tautological assert!(true/false) in token tests as panic! arms, and pick up fmt drift. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 01:55:07 +00:00
Cameron	a0f3bfab5f	fix: validate gps-summary path against every library The /photos/gps-summary handler validated the incoming path against the primary library's root with new_file=false, which requires the path to exist on disk. For a viewer opened on a file from a non-primary library, tapping the GPS link produced activePath = <folder from lib 2>, the primary-only check failed, and the server 400'd — so the map came up empty. Validation here is purely a traversal guard (the DAO does a prefix LIKE against rel_path), so we now accept the path as long as any configured library can resolve it without escaping its root. Also applies cargo fmt drift on files touched this session. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 01:55:07 +00:00
Cameron	e6ee38edec	fix: resolve media across libraries for video, metadata, and insights The /video/generate and /image/metadata handlers assumed files live under the resolved library only, which broke when a mobile client passed no library (union mode) but the file lived in a non-primary library. Both now fall back to scanning every configured library for an existing file. InsightGenerator held a single base_path, so vision-model loads and filename-date fallbacks failed for non-primary libraries. It now takes Vec<Library> and probes each root in resolve_full_path. /image/metadata responses now carry library_id/library_name so the mobile viewer can surface which library a file belongs to. Thumbnail generation at startup is now spawned on a background thread so the HTTP server can accept traffic while large libraries backfill. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 01:55:07 +00:00
Cameron	2d942a9926	feat: content-hash-aware tag/insight sharing + library scoping Tags and insights now follow content across libraries via content_hash lookups on the read path, so the same file indexed at different rel_paths in multiple libraries shares its annotations. Recursive tag search scopes hits to the selected library by checking each tagged rel_path against the library's disk (with a content-hash sibling fallback so tags attached under one library's rel_path still match a content-equivalent file in another). The /image and /image/metadata handlers fall back across libraries when the file isn't under the resolved one, so union-mode search results (which carry no library attribution in the response) still serve correctly. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-21 01:55:07 +00:00
Cameron	c01a0479b7	fix: honor library param in /image, /photos, /memories The Phase 3 plumbing accepted `library=` but didn't actually route requests through the scoped library once it was resolved. Three concrete bugs surfaced when testing against a second mounted library: - `/image` always resolved paths against AppState.base_path (primary), so thumbnails for non-primary libraries 400'd when their rel_paths didn't exist under primary. Now resolves against the scoped library and defaults to primary when the param is omitted. - `/memories` walked the scoped library correctly but its helper functions hardcoded `library_id: PRIMARY_LIBRARY_ID` on every MemoryItem, causing clients to route thumbnails back to primary regardless of which library the memory actually came from. - `/photos` non-recursive listing delegated to a `RealFileSystem` constructed from AppState.base_path at startup, so walks always hit primary even when `library=2` was passed. The non-primary path now uses list_files against the scoped library's root; primary still goes through FileSystemAccess to preserve the existing test mock plumbing. Also adds `library` to ThumbnailRequest so the /image query param is actually parsed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-21 01:55:07 +00:00
Cameron	0aaea91cc2	feat: add content_hash backfill + register every media file Adds blake3 content hashing as the basis for derivative dedup (thumbnails, HLS) across libraries. Computed inline by the watcher on ingest and by a new `backfill_hashes` binary for historical rows. Key changes: - `content_hash` and `size_bytes` are now populated on new image_exif rows; a new ExifDao surface (`get_rows_missing_hash`, `backfill_content_hash`, `find_by_content_hash`) supports backfill and future hash-keyed lookups. - The watcher now registers every image/video in image_exif, not just files with parseable EXIF. EXIF becomes optional enrichment; videos and other non-EXIF files still get a hashed row. This also makes DB-indexed sort/filter cover the full library. - `/image` thumbnail serve dual-looks up hash-keyed path first, then falls back to the legacy mirrored layout. - Upload flow accepts `?library=` query param + hashes uploaded files. - Store_exif logs the underlying Diesel error on insert failure so constraint violations surface instead of hiding behind a generic InsertError. - New migration normalizes rel_path separators to forward slash across all tables, deduplicating any rows that collide after normalization. Fixes spurious UNIQUE violations from mixed backslash/forward-slash paths on Windows ingest. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-21 01:55:07 +00:00
Cameron	ce5b337582	feat: make file watcher, thumbnails, and upload library-aware `watch_files` and `create_thumbnails` now iterate every configured library, tagging rows with the correct `library_id`. `process_new_files` takes a `&Library` so InsertImageExif no longer hardcodes the primary library. Upload accepts an optional `library` query param to pick a target library; omitted still defaults to primary for backwards compatibility. Hash-keyed thumbnail/HLS storage with dual-lookup fallback is deferred to Phase 5, where it's bundled with the content hash backfill that actually makes the hash-keyed paths meaningful. Until hashes are populated, the legacy mirrored layout is a no-op to change. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-21 01:55:07 +00:00

1 2 3 4

180 Commits