Branch C of the multi-library data-model rollout. Implements the
operational maintenance pipeline pinned in CLAUDE.md → "Multi-library
data model" / "Library availability and safety". Branches A and B
land first; this branch builds on top.
New module: src/library_maintenance.rs
Three idempotent passes the watcher runs every tick after the
per-library ingest loop:
1. Missing-file scan (per online library)
For each Online library, load a paginated page of image_exif rows
(IMAGE_EXIF_MISSING_SCAN_PAGE_SIZE, default 500), stat() each one,
and delete rows whose source file is NotFound. Permission/IO
errors are skipped, never deleted. Capped at
IMAGE_EXIF_MISSING_DELETE_CAP_PER_TICK (default 200) per library
per tick — so a pathological mount that returns NotFound for
everything can't wipe the table in one cycle. Cursor advances
across ticks, wraps on partial-page returns, and naturally cycles
through the entire library over many minutes. Skipped wholesale
for Stale libraries via the existing probe gate.
2. Back-ref refresh (DB-only)
For face_detections / tagged_photo / photo_insights: any
hash-keyed row whose (library_id, rel_path) no longer matches an
image_exif row, but whose content_hash does, is repointed at a
surviving image_exif location. Pure SQL with EXISTS guards so
rows whose hash is fully orphaned are left alone (the orphan GC
handles those). Idempotent; no availability gate needed.
This is what makes a recent → archive move invisible to readers:
when pass 1 retires the lib-A row, pass 2 pivots tags / faces /
insights to lib-B's surviving path before any client notices.
3. Orphan GC (destructive)
Hash-keyed derived rows whose content_hash has no image_exif
referent are GC-eligible. Two-tick consensus: a hash must be
observed orphaned on two consecutive ticks AND every library must
be Online for both. A single Stale tick within the window cancels
all pending deletes (they remain marked but won't be promoted) —
they're re-evaluated next tick. The pending set lives in
OrphanGcState (in-memory); a watcher restart resets it, which can
only delay a delete, never cause one. Hashes that re-appear in
image_exif between ticks are "revived" from the pending set
(handles transient share unmount / remount).
Two new ExifDao methods:
- list_rel_paths_for_library_page(library_id, limit, offset) for
the paginated missing-file scan.
- (count_for_library landed in Branch A.)
Watcher wiring (main.rs)
Per-library: missing-file scan inside the existing per-library
loop, after process_new_files, gated by the same probe check that
already protects ingest. After the loop: reconcile (Branch B),
back-ref refresh, then run_orphan_gc. The maintenance connection is
opened once per tick (image_api::database::connect), used by all
three DB-only passes, and dropped at end of tick.
CLAUDE.md gains a "Maintenance pipeline" subsection that describes
the three passes and their interaction with the existing
availability-and-safety policy.
Tests: 225 pass (217 from Branch B + 8 new in library_maintenance
covering back-ref refresh including the fully-orphaned no-op case,
two-tick GC consensus, Stale-tick consensus reset, image_exif
re-appearance revival, multi-table delete, and the
all_libraries_online helper).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
34 KiB
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Project Overview
An Actix-web REST API for serving images and videos from a filesystem with automatic thumbnail generation, EXIF extraction, tag organization, and a memories feature for browsing photos by date. Uses SQLite/Diesel ORM for data persistence and ffmpeg for video processing.
Development Commands
Building & Running
# Build for development
cargo build
# Build for release (uses thin LTO optimization)
cargo build --release
# Run the server (requires .env file with DATABASE_URL, BASE_PATH, THUMBNAILS, VIDEO_PATH, BIND_URL, SECRET_KEY)
cargo run
# Run with specific log level
RUST_LOG=debug cargo run
Testing
# Run all tests (requires BASE_PATH in .env)
cargo test
# Run specific test
cargo test test_name
# Run tests with output
cargo test -- --nocapture
Database Migrations
# Install diesel CLI (one-time setup)
cargo install diesel_cli --no-default-features --features sqlite
# Create new migration
diesel migration generate migration_name
# Run migrations (also runs automatically on app startup)
diesel migration run
# Revert last migration
diesel migration revert
# Regenerate schema.rs after manual migration changes
diesel print-schema > src/database/schema.rs
Code Quality
# Format code
cargo fmt
# Run clippy linter
cargo clippy
# Fix automatically fixable issues
cargo fix
Utility Binaries
# Two-phase cleanup: resolve missing files and validate file types
cargo run --bin cleanup_files -- --base-path /path/to/media --database-url ./database.db
Architecture Overview
Core Components
Layered Architecture:
- HTTP Layer (
main.rs): Route handlers for images, videos, metadata, tags, favorites, memories - Auth Layer (
auth.rs): JWT token validation, Claims extraction via FromRequest trait - Service Layer (
files.rs,exif.rs,memories.rs): Business logic for file operations and EXIF extraction - DAO Layer (
database/mod.rs): Trait-based data access (ExifDao, UserDao, FavoriteDao, TagDao) - Database Layer: Diesel ORM with SQLite, schema in
database/schema.rs
Async Actor System (Actix):
StreamActor: Manages ffmpeg video processing lifecycleVideoPlaylistManager: Scans directories and queues videosPlaylistGenerator: Creates HLS playlists for video streaming
Database Schema & Patterns
Tables:
users: Authentication (id, username, password_hash)favorites: User-specific favorites (userid, path)tags: Custom labels with timestampstagged_photo: Many-to-many photo-tag relationshipsimage_exif: Rich metadata (file_path + 16 EXIF fields: camera, GPS, dates, exposure settings)
DAO Pattern:
All database access goes through trait-based DAOs (e.g., ExifDao, SqliteExifDao). Connection pooling uses Arc<Mutex<SqliteConnection>>. All DB operations are traced with OpenTelemetry in release builds.
Key DAO Methods:
store_exif(),get_exif(),get_exif_batch(): EXIF CRUD operationsquery_by_exif(): Complex filtering by camera, GPS bounds, date ranges- Batch operations minimize DB hits during file watching
Multi-library data model
ImageApi supports more than one library (a library = a (name, root_path)
row in the libraries table that maps to a mounted directory tree). The
same bytes may exist under more than one library — typical case is an
"active" library plus an "archive" library that ingests files as they age
out — and the data model is designed so that derived data follows the
bytes, not the path, while user-managed data does the same.
The principle. A photo's identity is its content_hash (blake3, see
src/content_hash.rs). Anything we compute from or attach to a photo is
keyed on that hash so it survives:
- the same file appearing in a second library (backup / archive / mirror),
- the file moving between libraries (recent → archive handoff),
- the file moving within a library (re-organized rel_path),
- intra-library duplicates (same bytes at two paths).
Table classification. Three categories drive the keying decision:
| Category | Key | Rationale | Tables |
|---|---|---|---|
| Intrinsic to bytes | content_hash |
Rerunning is wasted work (or LLM cost) | face_detections ✓, image_exif (target), photo_insights (target), video_preview_clips (target) |
| User intent about a photo | content_hash |
"Tag this photo" means the bytes, not a path | tagged_photo (target), favorites (target) |
| Library administrative | (library_id, rel_path) |
Tied to a specific filesystem location | libraries, entity_photo_links, the rel_path back-ref columns on hash-keyed tables |
✓ = already implemented this way. (target) = today still keyed on
(library_id, rel_path) and slated for migration. The migration adds a
nullable content_hash column, populates it from image_exif where
known, and read paths fall back to rel_path while the hash is null.
Carrying a rel_path even when hash-keyed. Hash-keyed tables retain
(library_id, rel_path) columns as a denormalized back-reference, not
as the key. This lets a single query answer "what is at this path right
now" without joining through image_exif, and supports the path-only
endpoints that predate the hash. face_detections is the reference
implementation: hash is the truth, path is a hint.
Merge semantics on read. When the same hash has rows under more than one library:
- Set-valued data (tags, favorites, faces, entity links) → union.
- Scalar data (current insight, EXIF row, video preview clip) → earliest
generated_at/created_timewins. The historical lib1 row beats a re-generated lib2 row, so the user's curated insight isn't shadowed by a re-run on archive ingest.
Write attribution. A new tag/favorite/insight created while viewing under lib2 binds to the bytes, not to lib2 — so it shows up under lib1 too. This is by design, but it's the most surprising rule on first encounter; clients should not assume tags are library-scoped.
Hash-less rows (transitional state). During and immediately after a
new mount, image_exif.content_hash is being populated by
backfill_unhashed_backlog (capped per tick). Rules during this window:
- Writes: if the hash is known, write hash-keyed. If not, write
(library_id, rel_path)-keyed and let the reconciliation job collapse duplicates once the hash lands. - Reads: prefer hash key, fall back to
(library_id, rel_path). - Reconciliation: a one-shot pass after every backfill tick collapses rows that now share a hash, applying the merge semantics above. Idempotent — safe to re-run.
Library handoff (recent → archive). When a file moves between
libraries (e.g. operator moves ~/photos/2024/IMG.nef to the archive
mount), the file watcher sees the disappearance under lib1 and the
appearance under lib2. Hash-keyed rows don't need migration; the
(library_id, rel_path) back-ref columns are updated to point to the new
location. Library administrative rows (entity_photo_links,
(library_id, rel_path) rows in image_exif for hash-less items) are
re-keyed by the move detector, which matches a disappearance to an
appearance by content_hash within a configurable window.
Orphans (source deleted while a copy survives). When the only
image_exif row for a hash is deleted (file removed from disk), the
hash-keyed derived rows survive as long as another image_exif row
references the same hash. If the last reference is gone, derived rows
are eligible for GC (deferred — the GC job runs on a slow schedule so
that a brief unmount or rename doesn't wipe history).
Stats and counts. When reporting "how many photos do you have," count
DISTINCT content_hash over image_exif, not row count. Faces stats
already does this (FaceDao::stats in src/faces.rs); other counters
should follow suit. Numerator and denominator must live in the same
domain — see the face-stats commentary below for the cautionary tale.
Per-library scoping when the user asks for it. A request scoped to
?library=N filters the image_exif view to that library, and the
hash-keyed derived data is joined through that view. The user sees only
photos that have a copy under lib N, but the derived data attached to
those photos is the merged hash-keyed view. This is the answer to "show
me archive photos with their original tags."
Library availability and safety. Libraries can be on network shares
or removable media; the file watcher must not interpret a temporary
unavailability as a mass-deletion event. Every tick begins with a
presence probe per library: the library is considered online iff
its root_path exists, is readable, and a top-level scan returns at
least one expected entry (or matches a recent file-count high-water
mark within a tolerance). The probe result gates which actions are safe
to run on that library this tick:
| Action | Requires online? |
|---|---|
| Quick / full scan ingest of new files | yes |
| EXIF / face / insight backlog drains | yes — but the work runs against any online library |
| Move-handoff detection (lib1 disappearance ↔ lib2 appearance match) | both libraries online |
(library_id, rel_path) re-keying on detected move |
both libraries online |
| Orphan GC of hash-keyed derived data | all libraries that have ever held the hash must be online and confirmed-clean for two consecutive ticks |
| Reads / serving | always allowed; falls back to whichever library is online |
A library that fails the probe enters a "stale" state: writes scoped to
it are paused, its rows are flagged stale (not deleted) in
/libraries status, and the watcher logs at warn once per
state-transition (not per tick). A library that recovers re-enters the
online set automatically; no operator action required for transient
outages. The intent is that pulling a USB drive, rebooting a NAS, or
losing a VPN never triggers a destructive code path — the worst case is
that derived-data work pauses until the share returns.
The same rule constrains the move-handoff matcher: a disappearance under lib1 only counts as a "move" if there is a matching appearance under another online library within the window. A bare disappearance with no matching appearance is treated as "unavailable-or-deleted, defer judgment" — it does not re-key any rows and does not enqueue GC.
Maintenance pipeline (src/library_maintenance.rs). The watcher
runs three maintenance passes per tick that together implement the
move/handoff and orphan rules:
-
Missing-file scan — per online library, paginated. A page of
image_exifrows is loaded (IMAGE_EXIF_MISSING_SCAN_PAGE_SIZE, default 500), each row's(root_path/rel_path)isstat()-ed, and confirmed-not-found rows are deleted fromimage_exif(capped atIMAGE_EXIF_MISSING_DELETE_CAP_PER_TICK, default 200). Permission/IO errors are skipped, never deleted — onlyNotFoundtriggers a deletion. The cursor wraps every time a partial page comes back, so the whole library is swept across consecutive ticks. Skipped wholesale for Stale libraries via the per-library probe gate at the top of the loop iteration. -
Back-ref refresh — DB-only. For
face_detections,tagged_photo, andphoto_insights: any hash-keyed row whose(library_id, rel_path)no longer matches animage_exifrow but whosecontent_hashdoes is repointed at the survivingimage_exiflocation. Idempotent SQL; no health gate needed. This is what makes the recent → archive handoff invisible to read paths: when the missing-file scan retires the lib-A row, tags/faces/insights pivot to lib-B's path before any user notices. -
Orphan GC — destructive. Hash-keyed derived rows whose
content_hashno longer has anyimage_exifrow are eligible. Two-tick consensus: a hash must be observed orphaned on two consecutive ticks AND every library must be online for both. A single Stale tick within the window cancels all pending deletes. The pending set is held in memory (OrphanGcState) — restart resets it, which only delays a delete, never causes one. Tags, faces, and insights for orphaned hashes are deleted in one batch per tick.
A backup library that briefly disappears, then returns within two
ticks, never loses any derived data. A move from lib-A to lib-B
without disappearance flips through pass 1 (lib-A row retired) and
pass 2 (back-refs follow), with pass 3 noting nothing because the
hash is still present in image_exif (lib-B's row).
File Processing Pipeline
Thumbnail Generation:
- Startup scan: Rayon parallel walk of BASE_PATH
- Creates 200x200 thumbnails in THUMBNAILS directory (mirrors source structure)
- Videos: extracts frame at 3-second mark via ffmpeg
- Images: uses
imagecrate for JPEG/PNG processing - RAW formats (NEF/CR2/ARW/DNG/etc.): the
imagecrate can't decode RAW pixel data, so the pipeline pulls an embedded JPEG preview instead. Fast path isexif::read_jpeg_at_ifdagainst IFD0 (PRIMARY) and IFD1 (THUMBNAIL) — covers most older bodies and DNGs. Slow-path fallback shells out toexiftoolforPreviewImage/JpgFromRaw/OtherImage, which reaches MakerNote / SubIFD-hosted previews kamadak-exif can't see (e.g. Nikon'sPreviewIFD, where modern Nikon bodies store the full-res review JPEG). All candidates are pooled and the largest valid JPEG wins. Seesrc/exif.rs::extract_embedded_jpeg_preview.
File Watching: Runs in background thread with two-tier strategy:
- Quick scan (default 60s): Recently modified files only
- Full scan (default 3600s): Comprehensive directory check
- Batch queries EXIF DB to detect new files
- Configurable via
WATCH_QUICK_INTERVAL_SECONDSandWATCH_FULL_INTERVAL_SECONDS
EXIF Extraction:
- Uses
kamadak-exifcrate - Supports: JPEG, TIFF, RAW (NEF, CR2, CR3), HEIF/HEIC, PNG, WebP
- Extracts: camera make/model, lens, dimensions, GPS coordinates, focal length, aperture, shutter speed, ISO, date taken
- Triggered on upload and during file watching
File Upload Behavior:
If file exists, appends timestamp to filename (photo_1735124234.jpg) to preserve history without overwrites.
Authentication Flow
Login:
- POST
/loginwith username/password - Verify with
bcrypt::verify()against password_hash - Generate JWT with claims:
{ sub: user_id, exp: 5_days_from_now } - Sign with HS256 using
SECRET_KEYenvironment variable
Authorization:
All protected endpoints extract Claims via FromRequest trait implementation. Token passed as Authorization: Bearer <token> header.
API Structure
Key Endpoint Patterns:
// Image serving & upload
GET /image?path=...&size=...&format=...
POST /image (multipart file upload)
// Metadata & EXIF
GET /image/metadata?path=...
// Advanced search with filters
GET /photos?path=...&recursive=true&sort=DateTakenDesc&camera_make=Canon&gps_lat=...&gps_lon=...&gps_radius_km=10&date_from=...&date_to=...&tag_ids=1,2,3&media_type=Photo
// Video streaming (HLS)
POST /video/generate (creates .m3u8 playlist + .ts segments)
GET /video/stream?path=... (serves playlist)
// Tags
GET /image/tags/all
POST /image/tags (add tag to file)
DELETE /image/tags (remove tag from file)
POST /image/tags/batch (bulk tag updates)
// Memories (week-based grouping)
GET /memories?path=...&recursive=true
// AI Insights
POST /insights/generate (non-agentic single-shot)
POST /insights/generate/agentic (tool-calling loop; body: { file_path, backend?, model?, ... })
GET /insights?path=...&library=...
GET /insights/models (local Ollama models + capabilities)
GET /insights/openrouter/models (curated OpenRouter allowlist)
POST /insights/rate (thumbs up/down for training data)
// Insight Chat Continuation
POST /insights/chat (single-turn reply, non-streaming)
POST /insights/chat/stream (SSE: text / tool_call / tool_result / truncated / done)
GET /insights/chat/history?path=... (rendered transcript with tool invocations)
POST /insights/chat/rewind (truncate transcript at a rendered index)
Request Types:
FilesRequest: Supports complex filtering (tags, EXIF fields, GPS radius, date ranges)SortType: Shuffle, NameAsc/Desc, TagCountAsc/Desc, DateTakenAsc/Desc
Important Patterns
Service Builder Pattern:
Routes are registered via composable ServiceBuilder trait in service.rs. Allows modular feature addition.
Path Validation:
Always use is_valid_full_path(&base_path, &requested_path, check_exists) to prevent directory traversal attacks.
File Type Detection:
Centralized in file_types.rs with constants IMAGE_EXTENSIONS and VIDEO_EXTENSIONS. Provides both Path and DirEntry variants for performance.
OpenTelemetry Tracing:
All database operations and HTTP handlers wrapped in spans. In release builds, exports to OTLP endpoint via OTLP_OTLS_ENDPOINT. Debug builds use basic logger.
Memory Exclusion:
PathExcluder in memories.rs filters out directories from memories API via EXCLUDED_DIRS environment variable (comma-separated paths or substring patterns). The same excluder is applied to face-detection candidates (face_watch::filter_excluded) so junk directories like @eaDir / .thumbnails don't burn detect calls on Apollo.
Face detection system
ImageApi owns the face data; Apollo (sibling repo) hosts the insightface inference service. Inference is triggered automatically by the file watcher and persisted into two tables:
persons(id, name UNIQUE COLLATE NOCASE, cover_face_id, entity_id, created_from_tag, notes, ...)— operator-managed, name is the user-visible identity.face_detections(id, library_id, content_hash, rel_path, bbox_*, embedding BLOB, confidence, source, person_id, status, model_version, ...)— keyed oncontent_hashso a photo duplicated across libraries is detected once. Marker rows forstatus IN ('no_faces','failed')carry NULL bbox/embedding (CHECK constraint enforces this).
Why content_hash and not (library_id, rel_path): ties face data to the bytes, not the path. A backup mount that copies files from the primary library naturally inherits the existing detections without re-running inference. This is the reference implementation of the multi-library data model — see "Multi-library data model" above.
File-watch hook (src/main.rs::process_new_files): for each photo with a populated content_hash, check FaceDao::already_scanned(hash); if not, send bytes (or embedded JPEG preview for RAW via exif::extract_embedded_jpeg_preview) to Apollo's /api/internal/faces/detect. K=FACE_DETECT_CONCURRENCY (default 8) parallel calls per scan tick; Apollo serializes them via its single-worker GPU pool. face_watch.rs is the Tokio orchestration layer.
Per-tick backlog drain (also src/main.rs): two passes that run on every watcher tick regardless of quick-vs-full scan:
backfill_unhashed_backlog— populatesimage_exif.content_hashfor photos that arrived before the hash field was retroactive. Capped byFACE_HASH_BACKFILL_MAX_PER_TICK(default 2000); errors don't burn the cap.process_face_backlog— runs detection on photos that have a hash but noface_detectionsrow. Capped byFACE_BACKLOG_MAX_PER_TICK(default 64). Selected via a SQL anti-join (FaceDao::list_unscanned_candidates); videos and EXCLUDED_DIRS paths filtered out client-side viaface_watch::filter_excludedso they never reach Apollo.
Auto-bind on detection: when a photo carries a tag whose name matches a persons.name (case-insensitive), the new face binds automatically iff cosine similarity to the person's existing-face mean is ≥ FACE_AUTOBIND_MIN_COS (default 0.4). Persons with no existing faces bind unconditionally and the new face becomes the cover.
Manual face create (POST /image/faces): crops the image to the user-supplied bbox, applies EXIF orientation via exif::apply_orientation (the image crate hands raw pre-rotation pixels — without this, manually-drawn bboxes never resolved a face on re-detection), pads to ~50% of bbox dims (RetinaFace anchor scales need ~50% face-fill at det_size=640), then calls Apollo's embed endpoint. A force flag lets the operator save a face the detector couldn't see (e.g. profile shots, occluded faces) — the row gets a zero-vector embedding so it's manually-bound only and won't participate in clustering.
Rerun preserves manual rows (POST /image/faces/{id}/rerun): only source='auto' rows are deleted before re-running detection. already_scanned returns true on ANY row, so a photo whose only faces are manually drawn never auto-redetects.
Stats domain — content_hash, not file rows (FaceDao::stats in src/faces.rs): total_photos counts DISTINCT content_hash over image_exif (filtered to image extensions, content_hash IS NOT NULL), and so do scanned / with_faces / no_faces / failed over face_detections. Numerator and denominator must live in the same domain — face_detections is keyed on content_hash, so the same JPEG present at two rel_paths or in two libraries scans once. Counting image_exif rows in the denominator inflated total by one per duplicate file and produced a permanent gap (e.g. 1101/1103 with nothing actually pending). Hash-less rows are excluded from total_photos while they sit in the backfill_unhashed_backlog queue; otherwise the bar pins below 100% for the duration of that backfill even though those rows aren't pending detection yet — they're pending hashing.
Module map:
src/faces.rs—FaceDaotrait +SqliteFaceDaoimpl, route handlers for/faces/*,/image/faces/*,/persons/*. Mirror oftags.rslayout.src/face_watch.rs— Tokio orchestration for the file-watch detect pass;filter_excluded(PathExcluder + image-extension filter),read_image_bytes_for_detect(RAW preview fallback).src/ai/face_client.rs— HTTP client for Apollo's inference. Configured byAPOLLO_FACE_API_BASE_URL, falls back toAPOLLO_API_BASE_URL. Both unset → feature disabled, file-watch hook is a no-op.migrations/2026-04-29-000000_add_faces/— schema.
Startup Sequence
- Load
.envfile - Run embedded Diesel migrations
- Spawn file watcher thread
- Create initial thumbnails (parallel scan)
- Generate video GIF thumbnails
- Initialize AppState with Actix actors
- Set up Prometheus metrics (
imageserver_image_total,imageserver_video_total) - Scan directory for videos and queue HLS processing
- Start HTTP server on
BIND_URL+ localhost:8088
Testing Patterns
Tests require BASE_PATH environment variable. Many integration tests create temporary directories and files.
When testing database code:
- Use in-memory SQLite:
DATABASE_URL=":memory:" - Run migrations in test setup
- Clean up with
DROP TABLEor use#[serial]fromserial_testcrate if parallel tests conflict
Common Gotchas
EXIF Date Parsing: Multiple formats supported (EXIF DateTime, ISO8601, Unix timestamp). Fallback chain attempts multiple parsers.
Video Processing:
ffmpeg processes run asynchronously via actors. Use StreamActor to track completion. HLS segments written to VIDEO_PATH.
File Extensions:
Extension detection is case-insensitive. Use file_types.rs helpers rather than manual string matching.
Migration Workflow:
After creating a migration, manually edit the SQL, then regenerate schema.rs with diesel print-schema. Migrations auto-run on startup via embedded_migrations!() macro.
Path Absolutization:
Use path-absolutize crate's .absolutize() method when converting user-provided paths to ensure they're within BASE_PATH.
Required Environment Variables
DATABASE_URL=./database.db # SQLite database path
BASE_PATH=/path/to/media # Root media directory
THUMBNAILS=/path/to/thumbnails # Thumbnail storage
VIDEO_PATH=/path/to/video/hls # HLS playlist output
GIFS_DIRECTORY=/path/to/gifs # Video GIF thumbnails
BIND_URL=0.0.0.0:8080 # Server binding
CORS_ALLOWED_ORIGINS=http://localhost:3000
SECRET_KEY=your-secret-key-here # JWT signing secret
RUST_LOG=info # Log level
EXCLUDED_DIRS=/private,/archive # Comma-separated paths to exclude from memories
Optional:
WATCH_QUICK_INTERVAL_SECONDS=60 # Quick scan interval
WATCH_FULL_INTERVAL_SECONDS=3600 # Full scan interval
OTLP_OTLS_ENDPOINT=http://... # OpenTelemetry collector (release builds)
# AI Insights Configuration
OLLAMA_PRIMARY_URL=http://desktop:11434 # Primary Ollama server (e.g., desktop)
OLLAMA_FALLBACK_URL=http://server:11434 # Fallback Ollama server (optional, always-on)
OLLAMA_PRIMARY_MODEL=nemotron-3-nano:30b # Model for primary server (default: nemotron-3-nano:30b)
OLLAMA_FALLBACK_MODEL=llama3.2:3b # Model for fallback server (optional, uses primary if not set)
OLLAMA_REQUEST_TIMEOUT_SECONDS=120 # Per-request generation timeout (default 120). Increase for slow CPU-offloaded models.
SMS_API_URL=http://localhost:8000 # SMS message API endpoint (default: localhost:8000)
SMS_API_TOKEN=your-api-token # SMS API authentication token (optional)
# Apollo Places integration (optional). When set, photo-insight enrichment
# folds the user's personal place name (Home, Work, Cabin, ...) into the
# location string fed to the LLM, and the agentic loop gains a
# `get_personal_place_at` tool. Unset = legacy Nominatim-only path.
APOLLO_API_BASE_URL=http://apollo.lan:8000 # Base URL of the sibling Apollo backend
# Face inference (optional). Apollo also hosts the insightface inference
# service; ImageApi calls it from the file-watch hook (Phase 3) and from
# the manual face-create endpoint. Falls back to APOLLO_API_BASE_URL when
# unset (typical single-Apollo deploy). Both unset = feature disabled.
APOLLO_FACE_API_BASE_URL=http://apollo.lan:8000 # Override if face service runs separately
FACE_AUTOBIND_MIN_COS=0.4 # Phase 3: cosine-sim floor for tag-name auto-bind
FACE_DETECT_CONCURRENCY=8 # Phase 3: per-scan-tick parallel detect calls
FACE_DETECT_TIMEOUT_SEC=60 # reqwest client timeout (CPU inference can be slow)
# OpenRouter (Hybrid Backend) - keeps embeddings + vision local, routes chat to OpenRouter
OPENROUTER_API_KEY=sk-or-... # Required to enable hybrid backend
OPENROUTER_DEFAULT_MODEL=anthropic/claude-sonnet-4 # Used when client doesn't pick a model
OPENROUTER_ALLOWED_MODELS=openai/gpt-4o-mini,anthropic/claude-haiku-4-5,google/gemini-2.5-flash
# Curated allowlist exposed to clients via
# GET /insights/openrouter/models. Empty = no picker.
OPENROUTER_BASE_URL=https://openrouter.ai/api/v1 # Override base URL (optional)
OPENROUTER_EMBEDDING_MODEL=openai/text-embedding-3-small # Optional, embeddings stay local today
OPENROUTER_HTTP_REFERER=https://your-site.example # Optional attribution header
OPENROUTER_APP_TITLE=ImageApi # Optional attribution header
# Insight Chat Continuation
AGENTIC_CHAT_MAX_ITERATIONS=6 # Cap on tool-calling iterations per chat turn (default 6)
AI Insights Fallback Behavior:
- Primary server is tried first with its configured model (5-second connection timeout)
- On connection failure, automatically falls back to secondary server with its model (if configured)
- If
OLLAMA_FALLBACK_MODELnot set, uses same model as primary server on fallback - Total request timeout is 120 seconds to accommodate slow LLM inference
- Logs indicate which server and model was used (info level) and failover attempts (warn level)
- Backwards compatible:
OLLAMA_URLandOLLAMA_MODELstill supported as fallbacks
Model Discovery:
The OllamaClient provides methods to query available models:
OllamaClient::list_models(url)- Returns list of all models on a serverOllamaClient::is_model_available(url, model_name)- Checks if a specific model exists
This allows runtime verification of model availability before generating insights.
Hybrid Backend (OpenRouter):
- Per-request opt-in via
backend=hybridonPOST /insights/generate/agentic. - Local Ollama still describes the image (vision); the description is inlined into the chat prompt and the agentic loop runs on OpenRouter.
request.model(if provided) overridesOPENROUTER_DEFAULT_MODELfor that call. The mobile picker reads fromOPENROUTER_ALLOWED_MODELS.- No live capability precheck — the operator-curated allowlist is trusted. A bad model id surfaces as a chat-call error.
GET /insights/openrouter/modelsreturns{ models, default_model, configured }for client picker UIs.
Insight Chat Continuation:
After an agentic insight is generated, the full Vec<ChatMessage> transcript is
stored in photo_insights.training_messages and can be continued via the
chat endpoints. The PhotoInsightResponse.has_training_messages flag tells
clients whether chat is available for a given insight.
POST /insights/chatruns one turn of the agentic loop against the replayed history. Body:{ file_path, library?, user_message, model?, backend?, num_ctx?, temperature?, top_p?, top_k?, min_p?, max_iterations?, amend? }.POST /insights/chat/streamis the SSE variant — same request body, response istext/event-streamwith events:iteration_start,text(delta),tool_call,tool_result,truncated,done, plus a server-emittederror_messageon failure. Preferred by the mobile client for live tool-chip updates.GET /insights/chat/history?path=...&library=...returns the rendered transcript. Each assistant message carries atools: [{name, arguments, result, result_truncated?}]array with the tool invocations that led up to it. Tool results over 2000 chars are truncated withresult_truncated: true.POST /insights/chat/rewindtruncates the transcript at a given rendered index (drops that message + any tool-call scaffolding that preceded it + all later turns). Index 0 is protected. Used for "try again from here" flows.
Backend routing rules (matches agentic-insight generation):
- Stored
backendon the insight row is authoritative by default. request.backendmay override per-turn.local -> hybridis rejected in v1 (would require on-the-fly visual-description rewrite);hybrid -> localreplays verbatim since the description is already inlined as text.request.modeloverrides the chat model (an Ollama id in local mode, an OpenRouter id in hybrid mode).
Persistence:
- Append mode (default): re-serialize the full history and
UPDATEthe same row'straining_messages. - Amend mode (
amend: true): regenerate the title, insert a new insight row viastore_insight(auto-flips prior rows'is_current=false). Response surfaces the new row's id asamended_insight_id.
Per-(library_id, file_path) async mutex (AppState.insight_chat.chat_locks)
serialises concurrent turns on the same insight so the JSON blob doesn't race.
Context management is a soft bound: if the serialized history exceeds
num_ctx - 2048 tokens (cheap 4-byte/token heuristic), the oldest
assistant-tool_call + tool_result pairs are dropped until under budget. The
initial user message (with any images) and system prompt are always preserved.
The truncated event / flag is surfaced to the client when a drop occurred.
Configurable env:
AGENTIC_CHAT_MAX_ITERATIONS— cap on tool-calling iterations per turn (default 6). Per-requestmax_iterationsis clamped to this cap.
Apollo Places integration (optional):
The sibling Apollo project (personal location-history viewer) owns
user-defined Places: name + lat/lon + radius_m + description (+ optional category). When APOLLO_API_BASE_URL is set, ImageApi queries
/api/places/contains?lat=&lon= to enrich the LLM prompt's location
string. See src/ai/apollo_client.rs and src/ai/insight_generator.rs:
- Auto-enrichment (always on when configured): the per-photo location
resolver folds the most-specific containing Place ("Home — near
Cambridge, MA" or "Home (My house in Cambridge) — near Cambridge, MA"
when a description is set) into the location field of
combine_contexts. Smallest-radius wins — Apollo sorts server-side, this code takes[0]. - Agentic tool
get_personal_place_at(latitude, longitude): registered alongsidereverse_geocodeonly whenapollo_enabled()returns true. Returns "- Name [category]: description (radius N m)" lines, smallest radius first. The tool is deliberately narrow — no enumerate-all variant; auto-enrichment covers the photo-context path and the agentic tool covers ad-hoc lat/lon questions in chat continuation.
Failure modes degrade silently to the legacy Nominatim path: 5 s timeout,
errors logged at warn, empty results returned. Apollo's routes are
unauthenticated (single-user, LAN-trust); add JWT auth here + on Apollo's
side if exposing beyond a trusted network.
Dependencies of Note
Rust crates
- actix-web: HTTP framework
- diesel: ORM for SQLite
- jsonwebtoken: JWT implementation
- kamadak-exif: EXIF parsing
- image: Thumbnail generation
- walkdir: Directory traversal
- rayon: Parallel processing
- opentelemetry: Distributed tracing
- bcrypt: Password hashing
- infer: Magic number file type detection
External binaries (must be on PATH)
ffmpeg— video thumbnail extraction (StreamActor, HLS pipeline) and the HEIF/HEIC/NEF/ARW thumbnail fallback ingenerate_image_thumbnail_ffmpeg. Required for any deploy that holds video or HEIF files.exiftool— optional but strongly recommended for RAW-heavy libraries. The thumbnail pipeline shells out to it as the slow-path fallback for embedded preview extraction (Nikon MakerNotePreviewIFD, Canon SubIFDs, etc. — anything kamadak-exif's IFD0/IFD1 readers can't reach). Without exiftool installed, RAWs whose preview lives outside IFD0/IFD1 will fall through to ffmpeg, which often produces black thumbnails. Install via package manager:apt install libimage-exiftool-perl,brew install exiftool,winget install OliverBetz.ExifTool, orchoco install exiftool.