ImageApi

Author	SHA1	Message	Date
Cameron Cordes	58f010f302	docs(claude): pin excluded_dirs entry-form syntax The two entry shapes for libraries.excluded_dirs / EXCLUDED_DIRS are not symmetric: - /sub/path → multi-segment, library-root-anchored, recursive - name → single component anywhere in the tree Without this pinned, a reasonable read of the column doc would be "any path-like string works" — but a multi-segment string without a leading slash silently never matches (the no-slash form scans path components for exact string equality, and components are slash-free). No code change; just documentation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 20:05:58 +00:00
Cameron Cordes	814066551e	multi-library: per-library excluded_dirs Adds a nullable comma-separated TEXT column to the libraries table. Effective excludes for a walk = (env-var globals) ∪ (library.excluded_dirs). Empty / NULL = no library-specific extras; the global env var still applies. Migration (2026-05-01-110000_libraries_excluded_dirs) ALTER TABLE libraries ADD COLUMN excluded_dirs TEXT. NULL on every existing row — no behavior change on upgrade. Library struct + helpers (libraries.rs) - Library gains excluded_dirs: Vec<String>, parsed from the column by parse_excluded_dirs_column (drops empties / whitespace, matches the env-var parser). - Library::effective_excluded_dirs(globals) returns the union. - From<LibraryRow> hydrates the field on AppState construction so /libraries surfaces it. Watcher / walkers / memories Every per-library walker now consults the effective set: - process_new_files (file-watch ingest, RAW/EXIF/face) - process_face_backlog (filter_excluded inherits) - create_thumbnails (startup + new-file branch) - update_media_counts (Prometheus gauge) - cleanup_orphaned_playlists (per-library source-existence check) - memories endpoint (PathExcluder) Effective set is computed once per per-library iteration in the watcher tick and threaded through; called functions retain their flat &[String] signature (no per-library awareness needed inside the walker primitives). Use case: mount a parent directory while a sibling library covers a child subtree, and exclude the child subtree from the parent so the libraries don't double-walk / double-write image_exif. With hash-keyed derived data (Branches B/C), the duplication-avoidance is the only cost prevented — face / tag / insight sharing was already correct via content_hash. Tests: 228 pass (226 from previous + 2 new in libraries::tests: parse_excluded_dirs_column edge cases, effective_excluded_dirs_unions_global_and_per_library). CLAUDE.md gains a "Per-library excludes" subsection of the multi-library data model. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 19:54:17 +00:00
Cameron Cordes	3598bb2cfe	multi-library: operator kill switch via libraries.enabled A small follow-up to Branches A/B/C. Adds a nullable-default-1 boolean column to the `libraries` table that controls whether the watcher considers the library at all. Useful for staging a new mount before committing to ingest, and as a maintenance kill switch when a library needs to be quiet without being unmounted. Migration (2026-05-01-100000_libraries_enabled_flag) ALTER TABLE libraries ADD COLUMN enabled BOOLEAN NOT NULL DEFAULT 1. Existing rows stay enabled — no behavior change on upgrade. Watcher gate (main.rs) At the top of the per-library loop, if !lib.enabled { continue; } — runs BEFORE the availability probe. Disabled libraries don't enter the health map, don't get probed, don't get ingest, don't get any maintenance pass. The initial sweep before the loop's first sleep also skips disabled libraries. Orphan-GC consensus (library_maintenance.rs) all_libraries_online filters disabled libraries out of the consensus check — they're treated as out-of-scope, not as blockers. Otherwise flipping enabled=false would permanently halt orphan GC for the rest of the system, which is the opposite of the intended kill-switch semantics. Cross-library duplicates: safe by construction. Hash-keyed derived data (face_detections, tagged_photo with hash, photo_insights with hash) is anchored by ANY image_exif row carrying the hash. Disabling a library does NOT delete its image_exif rows, so a hash referenced by a disabled library's row stays anchored — derived data survives. collect_orphan_hashes deliberately doesn't filter image_exif by library.enabled for exactly this reason. No HTTP endpoint. Library mutation is rare-enough infra work that a SQL toggle is fine, and a public mutation endpoint without a role / permission story would be poorly-prioritized exposure for a single-user tool. Documented in CLAUDE.md. Tests: 226 pass (225 from Branch C + 1 new all_libraries_online_treats_disabled_as_out_of_scope, which proves that even an explicit Stale entry on a disabled library doesn't block the consensus). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 19:10:24 +00:00
Cameron Cordes	5f247be1f1	docs(claude): note in-place edit gap as future Branch D The maintenance pipeline added in Branch C assumes (library_id, rel_path) bytes are stable for as long as the file lives at that path. In-place edits (crop, re-export to same name) bypass process_new_files's already-indexed check, so the row's content_hash stays pinned to the original bytes — tags / faces / insights remain attached to that hash silently. Document the gap and the proposed shape of the fix: - Stale-content detection pass: compare last_modified / size_bytes to fs::metadata, re-hash on mismatch, update image_exif. - "Content branched" semantics on hash change: faces re-run, tags migrate forward (user intent survives a crop), insights migrate + flag for re-generation, favorites follow path. - Apollo derived.db cache invalidation belongs in the same design cycle, not after. Captured here so the design intent is clear before someone hits the case in real life. No code change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 16:53:08 +00:00
Cameron Cordes	263e27e108	multi-library: handoff + orphan GC with two-tick consensus Branch C of the multi-library data-model rollout. Implements the operational maintenance pipeline pinned in CLAUDE.md → "Multi-library data model" / "Library availability and safety". Branches A and B land first; this branch builds on top. New module: src/library_maintenance.rs Three idempotent passes the watcher runs every tick after the per-library ingest loop: 1. Missing-file scan (per online library) For each Online library, load a paginated page of image_exif rows (IMAGE_EXIF_MISSING_SCAN_PAGE_SIZE, default 500), stat() each one, and delete rows whose source file is NotFound. Permission/IO errors are skipped, never deleted. Capped at IMAGE_EXIF_MISSING_DELETE_CAP_PER_TICK (default 200) per library per tick — so a pathological mount that returns NotFound for everything can't wipe the table in one cycle. Cursor advances across ticks, wraps on partial-page returns, and naturally cycles through the entire library over many minutes. Skipped wholesale for Stale libraries via the existing probe gate. 2. Back-ref refresh (DB-only) For face_detections / tagged_photo / photo_insights: any hash-keyed row whose (library_id, rel_path) no longer matches an image_exif row, but whose content_hash does, is repointed at a surviving image_exif location. Pure SQL with EXISTS guards so rows whose hash is fully orphaned are left alone (the orphan GC handles those). Idempotent; no availability gate needed. This is what makes a recent → archive move invisible to readers: when pass 1 retires the lib-A row, pass 2 pivots tags / faces / insights to lib-B's surviving path before any client notices. 3. Orphan GC (destructive) Hash-keyed derived rows whose content_hash has no image_exif referent are GC-eligible. Two-tick consensus: a hash must be observed orphaned on two consecutive ticks AND every library must be Online for both. A single Stale tick within the window cancels all pending deletes (they remain marked but won't be promoted) — they're re-evaluated next tick. The pending set lives in OrphanGcState (in-memory); a watcher restart resets it, which can only delay a delete, never cause one. Hashes that re-appear in image_exif between ticks are "revived" from the pending set (handles transient share unmount / remount). Two new ExifDao methods: - list_rel_paths_for_library_page(library_id, limit, offset) for the paginated missing-file scan. - (count_for_library landed in Branch A.) Watcher wiring (main.rs) Per-library: missing-file scan inside the existing per-library loop, after process_new_files, gated by the same probe check that already protects ingest. After the loop: reconcile (Branch B), back-ref refresh, then run_orphan_gc. The maintenance connection is opened once per tick (image_api::database::connect), used by all three DB-only passes, and dropped at end of tick. CLAUDE.md gains a "Maintenance pipeline" subsection that describes the three passes and their interaction with the existing availability-and-safety policy. Tests: 225 pass (217 from Branch B + 8 new in library_maintenance covering back-ref refresh including the fully-orphaned no-op case, two-tick GC consensus, Stale-tick consensus reset, image_exif re-appearance revival, multi-table delete, and the all_libraries_online helper). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 16:27:53 +00:00
Cameron Cordes	2f91891459	docs(claude): pin multi-library data model + availability/safety policy Adds a "Multi-library data model" section that classifies each table as intrinsic-to-bytes (hash-keyed), user-intent-about-a-photo (hash-keyed), or library-administrative ((library_id, rel_path)). Spells out merge semantics on read (union for set-valued, earliest-wins for scalar), write attribution (binds to bytes, not to current library), the transitional-state rules for hash-less rows, library handoff behavior on archive moves, and orphan GC. Adds a "Library availability and safety" subsection: every watcher tick begins with a presence probe; destructive paths (move-handoff re-keying, orphan GC) require both/all libraries online and confirmed-clean for two consecutive ticks. A NAS reboot, USB pull, or VPN drop must never trigger destruction — the worst case is that derived-data work pauses until the share returns. The face_detections table is referenced as the existing reference implementation of the policy. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 14:11:42 +00:00
Cameron Cordes	323097c650	faces: count distinct content_hash in stats total_photos face_detections is keyed on content_hash (one row per unique bytes, shared across libraries / duplicate paths) but total_photos was COUNT(*) over image_exif rows. A file present at multiple rel_paths or across libraries inflated the denominator without inflating the numerator, leaving a permanent gap (e.g. 1101/1103 with nothing actually pending detection). Switch total_photos to COUNT(DISTINCT content_hash) so numerator and denominator live in the same domain. Exclude rows with NULL content_hash from the count — they're held in the hash-backfill backlog, not the detection backlog, and counting them pins the bar below 100% for the duration of that pass. CLAUDE.md: document the stats domain rule next to the rest of the face-detection notes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 22:41:20 +00:00
Cameron Cordes	96c539764c	docs: face detection system section + per-tick backlog drain env vars CLAUDE.md gets an "Important Patterns → Face detection system" entry covering the schema (why content_hash and not (library_id, rel_path)), the file-watch hook + per-tick backlog drains, auto-bind on tag-name match, manual-face create with EXIF orientation handling, and the rerun-preserves-manual-rows contract. README's face section adds the two new env vars (FACE_BACKLOG_MAX_PER_TICK and FACE_HASH_BACKFILL_MAX_PER_TICK) shipped this cycle so operators know they're tunable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 14:06:42 +00:00
Cameron Cordes	860169032b	faces: phase 2 — schema + manual face/person CRUD Land the persistence model and HTTP surface for local face recognition. Inference still lives in Apollo (Phase 1); this side adds the data home plus every endpoint Apollo's UI and FileViewer-React will consume. Schema (new migration 2026-04-29-000000_add_faces): - persons: visual identities. Optional entity_id bridges to the existing knowledge-graph entities table; auto-bridging is left to the management UI (we don't muddy LLM provenance from face rows). UNIQUE(name COLLATE NOCASE) so 'alice' / 'Alice' fold to one row. - face_detections: keyed on content_hash (cross-library dedup), with status='detected' carrying bbox + 512-d embedding BLOB, and 'no_faces' / 'failed' marker rows that tell Phase 3's file watcher not to re-scan. Marker invariant enforced via CHECK; partial UNIQUE on content_hash WHERE status='no_faces' guards against double-marks. Schema regenerated with `diesel print-schema` against a clean migration run; joinables added for face_detections → libraries / persons and persons → entities. face_client.rs (sibling of apollo_client.rs): - reqwest multipart, 60 s timeout (CPU inference on a backlog can be slow; bounded threadpool on Apollo serializes calls anyway). - FaceDetectError::{Permanent, Transient, Disabled} — Phase 3 keys its marker-row decision on this. 422 → mark failed, 5xx → defer. - APOLLO_FACE_API_BASE_URL falls back to APOLLO_API_BASE_URL when unset; both unset = is_enabled() false, callers no-op. faces.rs (DAO + handlers): - SqliteFaceDao implements the full FaceDao trait; person face counts go through sql_query because diesel's BoxedSelectStatement + group_by trips trait-resolver recursion. - merge_persons re-points face rows in a transaction, copies notes when target's are empty, deletes src. - manual POST /image/faces resolves content_hash through image_exif, crops the user-drawn bbox with 10% padding (detector wants context around ears/jaw), POSTs the crop to face_client.embed for a real ArcFace vector, then inserts source='manual'. - Cluster-suggest (Phase 6) gets its data from GET /faces/embeddings — base64-encoded paged BLOBs so Apollo's DBSCAN can stream them without ImageApi pre-aggregating. Endpoints registered alongside add_*_services in main.rs: GET /faces/stats?library= GET /faces/embeddings?library=&unassigned=&limit=&offset= GET /image/faces?path=&library= POST /image/faces (manual create via embed) PATCH /image/faces/{id} DELETE /image/faces/{id} GET /persons?library= POST /persons GET /persons/{id} PATCH /persons/{id} DELETE /persons/{id}?cascade=set_null\|delete (set_null default) POST /persons/{id}/merge GET /persons/{id}/faces?library= The file-watch hook (Phase 3) and the rerun-on-one-photo handler (Phase 6) live behind the FaceDao methods marked dead_code today — they're called only when those phases land. Same shape for the trait methods that aren't reached by Phase 2 routes. Tests: 3 DAO unit tests cover person CRUD + case-insensitive uniqueness, marker-row idempotency (mark_status is a no-op when any row exists), and merge re-pointing faces. Cargo.toml: reqwest gains the `multipart` feature. cargo build / cargo test --lib / cargo fmt / cargo clippy --all-targets all clean for the new code; the two pre-existing test_path_excluder failures and the pre-existing sort_by clippy warnings are unrelated and present on master. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 18:03:42 +00:00
Cameron Cordes	4ae7be35e9	Apollo Places: enrich insights with personal place name + notes Optional integration with the sibling Apollo project's user-defined Places (name + lat/lon + radius_m + description + category). When APOLLO_API_BASE_URL is set, the per-photo location resolver folds the most-specific containing Place into the LLM prompt's location string — "Home (My house in Cambridge) — near Cambridge, MA" rather than the city name alone. Smallest-radius wins; Apollo sorts server-side via /api/places/contains, so the carousel badge in Apollo and the prompt string here always agree. Adds an agentic tool `get_personal_place_at(latitude, longitude)` that the LLM can call during chat continuation. Tool description tells the model the call returns the user's free-text notes, not just a name. Deliberately narrow — no enumerate-all variant, lat/lon required. Unset APOLLO_API_BASE_URL = legacy Nominatim-only path, tool is not registered. 5 s timeout; all errors degrade silently. Tests: 5 unit tests for compose_location_string (Apollo only, Nominatim only, both, both-with-description, neither). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 19:11:12 +00:00
Cameron Cordes	6521a328bf	RAW preview: exiftool fallback for MakerNote / SubIFD previews kamadak-exif's In::PRIMARY / In::THUMBNAIL only address IFD0 and IFD1. On modern Nikon NEFs the full-res review JPEG lives in the MakerNote's PreviewIFD (and many Canon CR2s / DNGs put theirs in a SubIFD chain) — both unreachable through the existing reader, so the previous patch still produced no preview for those files and the pipeline fell through to ffmpeg, which writes black frames when it can't decode the RAW. Add a slow-path layer in extract_embedded_jpeg_preview that shells out to exiftool for PreviewImage / JpgFromRaw / OtherImage (one process per tag). All candidates from both layers are pooled and the largest valid JPEG wins. exiftool not on PATH degrades to fast-path-only behavior rather than breaking — the fallback is a strict superset. Documented the new optional dependency in README.md and CLAUDE.md with install commands for apt / brew / winget / choco. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 17:13:36 +00:00
Cameron	d5f944c7b6	chore(bins): retire unused migrate_exif Single-library hardcoded (library_id=1) and missing content_hash/size_bytes backfill, so the watcher's full-scan path subsumes everything it does. Removed the binary and its CLAUDE.md reference. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 23:55:06 -04:00
Cameron	d43f5fc991	docs: document OLLAMA_REQUEST_TIMEOUT_SECONDS env var Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 13:54:23 -04:00
Cameron	e51cd564a3	docs: chat continuation endpoints + env vars Document the four new chat endpoints, SSE event shape, backend routing rules, rewind semantics, amend mode, and the AGENTIC_CHAT_MAX_ITERATIONS cap. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 17:32:43 -04:00
Cameron	e2eefbd156	feat(ai): curated OpenRouter model picker for hybrid backend Add OPENROUTER_ALLOWED_MODELS env var and GET /insights/openrouter/models endpoint returning the curated list verbatim. Drop the live capability precheck in hybrid mode — trust the operator's allowlist; bad ids surface as a chat-call error. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 10:36:19 -04:00
Cameron	0d05033b38	Add comprehensive testing for preview clip and status handling - Implement unit tests for PreviewClipRequest/PreviewStatusRequest serialization and deserialization. - Add tests for PreviewDao (insert, update, batch retrieval, and status-based queries). - Extend Actix-web integration tests for `/video/preview/status` endpoint scenarios. - Introduce in-memory TestPreviewDao for mock database interactions. - Update README with new config parameters for preview clips.	2026-02-26 10:06:21 -05:00
Cameron	cf52d4ab76	Add Insights Model Discovery and Fallback Handling	2026-01-03 20:27:34 -05:00
Cameron	6c543ffa68	Add CLAUDE.md documentation for Claude Code Comprehensive guide covering build commands, architecture overview, database patterns, file processing pipeline, API structure, and development workflows. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2025-12-24 16:07:03 -05:00

18 Commits