Files

Cameron Cordes 96c539764c docs: face detection system section + per-tick backlog drain env vars

CLAUDE.md gets an "Important Patterns → Face detection system" entry
covering the schema (why content_hash and not (library_id, rel_path)),
the file-watch hook + per-tick backlog drains, auto-bind on tag-name
match, manual-face create with EXIF orientation handling, and the
rerun-preserves-manual-rows contract. README's face section adds
the two new env vars (FACE_BACKLOG_MAX_PER_TICK and
FACE_HASH_BACKFILL_MAX_PER_TICK) shipped this cycle so operators
know they're tunable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-30 14:06:42 +00:00

24 KiB

Raw Blame History

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

An Actix-web REST API for serving images and videos from a filesystem with automatic thumbnail generation, EXIF extraction, tag organization, and a memories feature for browsing photos by date. Uses SQLite/Diesel ORM for data persistence and ffmpeg for video processing.

Development Commands

Building & Running

# Build for development
cargo build

# Build for release (uses thin LTO optimization)
cargo build --release

# Run the server (requires .env file with DATABASE_URL, BASE_PATH, THUMBNAILS, VIDEO_PATH, BIND_URL, SECRET_KEY)
cargo run

# Run with specific log level
RUST_LOG=debug cargo run

Testing

# Run all tests (requires BASE_PATH in .env)
cargo test

# Run specific test
cargo test test_name

# Run tests with output
cargo test -- --nocapture

Database Migrations

# Install diesel CLI (one-time setup)
cargo install diesel_cli --no-default-features --features sqlite

# Create new migration
diesel migration generate migration_name

# Run migrations (also runs automatically on app startup)
diesel migration run

# Revert last migration
diesel migration revert

# Regenerate schema.rs after manual migration changes
diesel print-schema > src/database/schema.rs

Code Quality

# Format code
cargo fmt

# Run clippy linter
cargo clippy

# Fix automatically fixable issues
cargo fix

Utility Binaries

# Two-phase cleanup: resolve missing files and validate file types
cargo run --bin cleanup_files -- --base-path /path/to/media --database-url ./database.db

Architecture Overview

Core Components

Layered Architecture:

HTTP Layer (main.rs): Route handlers for images, videos, metadata, tags, favorites, memories
Auth Layer (auth.rs): JWT token validation, Claims extraction via FromRequest trait
Service Layer (files.rs, exif.rs, memories.rs): Business logic for file operations and EXIF extraction
DAO Layer (database/mod.rs): Trait-based data access (ExifDao, UserDao, FavoriteDao, TagDao)
Database Layer: Diesel ORM with SQLite, schema in database/schema.rs

Async Actor System (Actix):

StreamActor: Manages ffmpeg video processing lifecycle
VideoPlaylistManager: Scans directories and queues videos
PlaylistGenerator: Creates HLS playlists for video streaming

Database Schema & Patterns

Tables:

users: Authentication (id, username, password_hash)
favorites: User-specific favorites (userid, path)
tags: Custom labels with timestamps
tagged_photo: Many-to-many photo-tag relationships
image_exif: Rich metadata (file_path + 16 EXIF fields: camera, GPS, dates, exposure settings)

DAO Pattern: All database access goes through trait-based DAOs (e.g., ExifDao, SqliteExifDao). Connection pooling uses Arc<Mutex<SqliteConnection>>. All DB operations are traced with OpenTelemetry in release builds.

Key DAO Methods:

store_exif(), get_exif(), get_exif_batch(): EXIF CRUD operations
query_by_exif(): Complex filtering by camera, GPS bounds, date ranges
Batch operations minimize DB hits during file watching

File Processing Pipeline

Thumbnail Generation:

Startup scan: Rayon parallel walk of BASE_PATH
Creates 200x200 thumbnails in THUMBNAILS directory (mirrors source structure)
Videos: extracts frame at 3-second mark via ffmpeg
Images: uses image crate for JPEG/PNG processing
RAW formats (NEF/CR2/ARW/DNG/etc.): the image crate can't decode RAW pixel data, so the pipeline pulls an embedded JPEG preview instead. Fast path is exif::read_jpeg_at_ifd against IFD0 (PRIMARY) and IFD1 (THUMBNAIL) — covers most older bodies and DNGs. Slow-path fallback shells out to exiftool for PreviewImage / JpgFromRaw / OtherImage, which reaches MakerNote / SubIFD-hosted previews kamadak-exif can't see (e.g. Nikon's PreviewIFD, where modern Nikon bodies store the full-res review JPEG). All candidates are pooled and the largest valid JPEG wins. See src/exif.rs::extract_embedded_jpeg_preview.

File Watching: Runs in background thread with two-tier strategy:

Quick scan (default 60s): Recently modified files only
Full scan (default 3600s): Comprehensive directory check
Batch queries EXIF DB to detect new files
Configurable via WATCH_QUICK_INTERVAL_SECONDS and WATCH_FULL_INTERVAL_SECONDS

EXIF Extraction:

Uses kamadak-exif crate
Supports: JPEG, TIFF, RAW (NEF, CR2, CR3), HEIF/HEIC, PNG, WebP
Extracts: camera make/model, lens, dimensions, GPS coordinates, focal length, aperture, shutter speed, ISO, date taken
Triggered on upload and during file watching

File Upload Behavior: If file exists, appends timestamp to filename (photo_1735124234.jpg) to preserve history without overwrites.

Authentication Flow

Login:

POST /login with username/password
Verify with bcrypt::verify() against password_hash
Generate JWT with claims: { sub: user_id, exp: 5_days_from_now }
Sign with HS256 using SECRET_KEY environment variable

Authorization: All protected endpoints extract Claims via FromRequest trait implementation. Token passed as Authorization: Bearer <token> header.

API Structure

Key Endpoint Patterns:

// Image serving & upload
GET  /image?path=...&size=...&format=...
POST /image (multipart file upload)

// Metadata & EXIF
GET /image/metadata?path=...

// Advanced search with filters
GET /photos?path=...&recursive=true&sort=DateTakenDesc&camera_make=Canon&gps_lat=...&gps_lon=...&gps_radius_km=10&date_from=...&date_to=...&tag_ids=1,2,3&media_type=Photo

// Video streaming (HLS)
POST /video/generate (creates .m3u8 playlist + .ts segments)
GET  /video/stream?path=... (serves playlist)

// Tags
GET    /image/tags/all
POST   /image/tags (add tag to file)
DELETE /image/tags (remove tag from file)
POST   /image/tags/batch (bulk tag updates)

// Memories (week-based grouping)
GET /memories?path=...&recursive=true

// AI Insights
POST /insights/generate              (non-agentic single-shot)
POST /insights/generate/agentic      (tool-calling loop; body: { file_path, backend?, model?, ... })
GET  /insights?path=...&library=...
GET  /insights/models                (local Ollama models + capabilities)
GET  /insights/openrouter/models     (curated OpenRouter allowlist)
POST /insights/rate                  (thumbs up/down for training data)

// Insight Chat Continuation
POST /insights/chat                  (single-turn reply, non-streaming)
POST /insights/chat/stream           (SSE: text / tool_call / tool_result / truncated / done)
GET  /insights/chat/history?path=... (rendered transcript with tool invocations)
POST /insights/chat/rewind           (truncate transcript at a rendered index)

Request Types:

FilesRequest: Supports complex filtering (tags, EXIF fields, GPS radius, date ranges)
SortType: Shuffle, NameAsc/Desc, TagCountAsc/Desc, DateTakenAsc/Desc

Important Patterns

Service Builder Pattern: Routes are registered via composable ServiceBuilder trait in service.rs. Allows modular feature addition.

Path Validation: Always use is_valid_full_path(&base_path, &requested_path, check_exists) to prevent directory traversal attacks.

File Type Detection: Centralized in file_types.rs with constants IMAGE_EXTENSIONS and VIDEO_EXTENSIONS. Provides both Path and DirEntry variants for performance.

OpenTelemetry Tracing: All database operations and HTTP handlers wrapped in spans. In release builds, exports to OTLP endpoint via OTLP_OTLS_ENDPOINT. Debug builds use basic logger.

Memory Exclusion: PathExcluder in memories.rs filters out directories from memories API via EXCLUDED_DIRS environment variable (comma-separated paths or substring patterns). The same excluder is applied to face-detection candidates (face_watch::filter_excluded) so junk directories like @eaDir / .thumbnails don't burn detect calls on Apollo.

Face detection system

ImageApi owns the face data; Apollo (sibling repo) hosts the insightface inference service. Inference is triggered automatically by the file watcher and persisted into two tables:

persons(id, name UNIQUE COLLATE NOCASE, cover_face_id, entity_id, created_from_tag, notes, ...) — operator-managed, name is the user-visible identity.
face_detections(id, library_id, content_hash, rel_path, bbox_*, embedding BLOB, confidence, source, person_id, status, model_version, ...) — keyed on content_hash so a photo duplicated across libraries is detected once. Marker rows for status IN ('no_faces','failed') carry NULL bbox/embedding (CHECK constraint enforces this).

Why content_hash and not (library_id, rel_path): ties face data to the bytes, not the path. A backup mount that copies files from the primary library naturally inherits the existing detections without re-running inference.

File-watch hook (src/main.rs::process_new_files): for each photo with a populated content_hash, check FaceDao::already_scanned(hash); if not, send bytes (or embedded JPEG preview for RAW via exif::extract_embedded_jpeg_preview) to Apollo's /api/internal/faces/detect. K=FACE_DETECT_CONCURRENCY (default 8) parallel calls per scan tick; Apollo serializes them via its single-worker GPU pool. face_watch.rs is the Tokio orchestration layer.

Per-tick backlog drain (also src/main.rs): two passes that run on every watcher tick regardless of quick-vs-full scan:

backfill_unhashed_backlog — populates image_exif.content_hash for photos that arrived before the hash field was retroactive. Capped by FACE_HASH_BACKFILL_MAX_PER_TICK (default 2000); errors don't burn the cap.
process_face_backlog — runs detection on photos that have a hash but no face_detections row. Capped by FACE_BACKLOG_MAX_PER_TICK (default 64). Selected via a SQL anti-join (FaceDao::list_unscanned_candidates); videos and EXCLUDED_DIRS paths filtered out client-side via face_watch::filter_excluded so they never reach Apollo.

Auto-bind on detection: when a photo carries a tag whose name matches a persons.name (case-insensitive), the new face binds automatically iff cosine similarity to the person's existing-face mean is ≥ FACE_AUTOBIND_MIN_COS (default 0.4). Persons with no existing faces bind unconditionally and the new face becomes the cover.

Manual face create (POST /image/faces): crops the image to the user-supplied bbox, applies EXIF orientation via exif::apply_orientation (the image crate hands raw pre-rotation pixels — without this, manually-drawn bboxes never resolved a face on re-detection), pads to ~50% of bbox dims (RetinaFace anchor scales need ~50% face-fill at det_size=640), then calls Apollo's embed endpoint. A force flag lets the operator save a face the detector couldn't see (e.g. profile shots, occluded faces) — the row gets a zero-vector embedding so it's manually-bound only and won't participate in clustering.

Rerun preserves manual rows (POST /image/faces/{id}/rerun): only source='auto' rows are deleted before re-running detection. already_scanned returns true on ANY row, so a photo whose only faces are manually drawn never auto-redetects.

Module map:

src/faces.rs — FaceDao trait + SqliteFaceDao impl, route handlers for /faces/*, /image/faces/*, /persons/*. Mirror of tags.rs layout.
src/face_watch.rs — Tokio orchestration for the file-watch detect pass; filter_excluded (PathExcluder + image-extension filter), read_image_bytes_for_detect (RAW preview fallback).
src/ai/face_client.rs — HTTP client for Apollo's inference. Configured by APOLLO_FACE_API_BASE_URL, falls back to APOLLO_API_BASE_URL. Both unset → feature disabled, file-watch hook is a no-op.
migrations/2026-04-29-000000_add_faces/ — schema.

Startup Sequence

Load .env file
Run embedded Diesel migrations
Spawn file watcher thread
Create initial thumbnails (parallel scan)
Generate video GIF thumbnails
Initialize AppState with Actix actors
Set up Prometheus metrics (imageserver_image_total, imageserver_video_total)
Scan directory for videos and queue HLS processing
Start HTTP server on BIND_URL + localhost:8088

Testing Patterns

Tests require BASE_PATH environment variable. Many integration tests create temporary directories and files.

When testing database code:

Use in-memory SQLite: DATABASE_URL=":memory:"
Run migrations in test setup
Clean up with DROP TABLE or use #[serial] from serial_test crate if parallel tests conflict

Common Gotchas

EXIF Date Parsing: Multiple formats supported (EXIF DateTime, ISO8601, Unix timestamp). Fallback chain attempts multiple parsers.

Video Processing: ffmpeg processes run asynchronously via actors. Use StreamActor to track completion. HLS segments written to VIDEO_PATH.

File Extensions: Extension detection is case-insensitive. Use file_types.rs helpers rather than manual string matching.

Migration Workflow: After creating a migration, manually edit the SQL, then regenerate schema.rs with diesel print-schema. Migrations auto-run on startup via embedded_migrations!() macro.

Path Absolutization: Use path-absolutize crate's .absolutize() method when converting user-provided paths to ensure they're within BASE_PATH.

Required Environment Variables

DATABASE_URL=./database.db              # SQLite database path
BASE_PATH=/path/to/media               # Root media directory
THUMBNAILS=/path/to/thumbnails         # Thumbnail storage
VIDEO_PATH=/path/to/video/hls          # HLS playlist output
GIFS_DIRECTORY=/path/to/gifs           # Video GIF thumbnails
BIND_URL=0.0.0.0:8080                  # Server binding
CORS_ALLOWED_ORIGINS=http://localhost:3000
SECRET_KEY=your-secret-key-here        # JWT signing secret
RUST_LOG=info                          # Log level
EXCLUDED_DIRS=/private,/archive        # Comma-separated paths to exclude from memories

Optional:

WATCH_QUICK_INTERVAL_SECONDS=60        # Quick scan interval
WATCH_FULL_INTERVAL_SECONDS=3600       # Full scan interval
OTLP_OTLS_ENDPOINT=http://...          # OpenTelemetry collector (release builds)

# AI Insights Configuration
OLLAMA_PRIMARY_URL=http://desktop:11434        # Primary Ollama server (e.g., desktop)
OLLAMA_FALLBACK_URL=http://server:11434        # Fallback Ollama server (optional, always-on)
OLLAMA_PRIMARY_MODEL=nemotron-3-nano:30b       # Model for primary server (default: nemotron-3-nano:30b)
OLLAMA_FALLBACK_MODEL=llama3.2:3b              # Model for fallback server (optional, uses primary if not set)
OLLAMA_REQUEST_TIMEOUT_SECONDS=120             # Per-request generation timeout (default 120). Increase for slow CPU-offloaded models.
SMS_API_URL=http://localhost:8000              # SMS message API endpoint (default: localhost:8000)
SMS_API_TOKEN=your-api-token                   # SMS API authentication token (optional)

# Apollo Places integration (optional). When set, photo-insight enrichment
# folds the user's personal place name (Home, Work, Cabin, ...) into the
# location string fed to the LLM, and the agentic loop gains a
# `get_personal_place_at` tool. Unset = legacy Nominatim-only path.
APOLLO_API_BASE_URL=http://apollo.lan:8000     # Base URL of the sibling Apollo backend

# Face inference (optional). Apollo also hosts the insightface inference
# service; ImageApi calls it from the file-watch hook (Phase 3) and from
# the manual face-create endpoint. Falls back to APOLLO_API_BASE_URL when
# unset (typical single-Apollo deploy). Both unset = feature disabled.
APOLLO_FACE_API_BASE_URL=http://apollo.lan:8000 # Override if face service runs separately
FACE_AUTOBIND_MIN_COS=0.4                       # Phase 3: cosine-sim floor for tag-name auto-bind
FACE_DETECT_CONCURRENCY=8                       # Phase 3: per-scan-tick parallel detect calls
FACE_DETECT_TIMEOUT_SEC=60                      # reqwest client timeout (CPU inference can be slow)

# OpenRouter (Hybrid Backend) - keeps embeddings + vision local, routes chat to OpenRouter
OPENROUTER_API_KEY=sk-or-...                   # Required to enable hybrid backend
OPENROUTER_DEFAULT_MODEL=anthropic/claude-sonnet-4   # Used when client doesn't pick a model
OPENROUTER_ALLOWED_MODELS=openai/gpt-4o-mini,anthropic/claude-haiku-4-5,google/gemini-2.5-flash
                                                # Curated allowlist exposed to clients via
                                                # GET /insights/openrouter/models. Empty = no picker.
OPENROUTER_BASE_URL=https://openrouter.ai/api/v1     # Override base URL (optional)
OPENROUTER_EMBEDDING_MODEL=openai/text-embedding-3-small  # Optional, embeddings stay local today
OPENROUTER_HTTP_REFERER=https://your-site.example    # Optional attribution header
OPENROUTER_APP_TITLE=ImageApi                  # Optional attribution header

# Insight Chat Continuation
AGENTIC_CHAT_MAX_ITERATIONS=6                  # Cap on tool-calling iterations per chat turn (default 6)

AI Insights Fallback Behavior:

Primary server is tried first with its configured model (5-second connection timeout)
On connection failure, automatically falls back to secondary server with its model (if configured)
If OLLAMA_FALLBACK_MODEL not set, uses same model as primary server on fallback
Total request timeout is 120 seconds to accommodate slow LLM inference
Logs indicate which server and model was used (info level) and failover attempts (warn level)
Backwards compatible: OLLAMA_URL and OLLAMA_MODEL still supported as fallbacks

Model Discovery: The OllamaClient provides methods to query available models:

OllamaClient::list_models(url) - Returns list of all models on a server
OllamaClient::is_model_available(url, model_name) - Checks if a specific model exists

This allows runtime verification of model availability before generating insights.

Hybrid Backend (OpenRouter):

Per-request opt-in via backend=hybrid on POST /insights/generate/agentic.
Local Ollama still describes the image (vision); the description is inlined into the chat prompt and the agentic loop runs on OpenRouter.
request.model (if provided) overrides OPENROUTER_DEFAULT_MODEL for that call. The mobile picker reads from OPENROUTER_ALLOWED_MODELS.
No live capability precheck — the operator-curated allowlist is trusted. A bad model id surfaces as a chat-call error.
GET /insights/openrouter/models returns { models, default_model, configured } for client picker UIs.

Insight Chat Continuation:

After an agentic insight is generated, the full Vec<ChatMessage> transcript is stored in photo_insights.training_messages and can be continued via the chat endpoints. The PhotoInsightResponse.has_training_messages flag tells clients whether chat is available for a given insight.

POST /insights/chat runs one turn of the agentic loop against the replayed history. Body: { file_path, library?, user_message, model?, backend?, num_ctx?, temperature?, top_p?, top_k?, min_p?, max_iterations?, amend? }.
POST /insights/chat/stream is the SSE variant — same request body, response is text/event-stream with events: iteration_start, text (delta), tool_call, tool_result, truncated, done, plus a server-emitted error_message on failure. Preferred by the mobile client for live tool-chip updates.
GET /insights/chat/history?path=...&library=... returns the rendered transcript. Each assistant message carries a tools: [{name, arguments, result, result_truncated?}] array with the tool invocations that led up to it. Tool results over 2000 chars are truncated with result_truncated: true.
POST /insights/chat/rewind truncates the transcript at a given rendered index (drops that message + any tool-call scaffolding that preceded it + all later turns). Index 0 is protected. Used for "try again from here" flows.

Backend routing rules (matches agentic-insight generation):

Stored backend on the insight row is authoritative by default.
request.backend may override per-turn. local -> hybrid is rejected in v1 (would require on-the-fly visual-description rewrite); hybrid -> local replays verbatim since the description is already inlined as text.
request.model overrides the chat model (an Ollama id in local mode, an OpenRouter id in hybrid mode).

Persistence:

Append mode (default): re-serialize the full history and UPDATE the same row's training_messages.
Amend mode (amend: true): regenerate the title, insert a new insight row via store_insight (auto-flips prior rows' is_current=false). Response surfaces the new row's id as amended_insight_id.

Per-(library_id, file_path) async mutex (AppState.insight_chat.chat_locks) serialises concurrent turns on the same insight so the JSON blob doesn't race.

Context management is a soft bound: if the serialized history exceeds num_ctx - 2048 tokens (cheap 4-byte/token heuristic), the oldest assistant-tool_call + tool_result pairs are dropped until under budget. The initial user message (with any images) and system prompt are always preserved. The truncated event / flag is surfaced to the client when a drop occurred.

Configurable env:

AGENTIC_CHAT_MAX_ITERATIONS — cap on tool-calling iterations per turn (default 6). Per-request max_iterations is clamped to this cap.

Apollo Places integration (optional):

The sibling Apollo project (personal location-history viewer) owns user-defined Places: name + lat/lon + radius_m + description (+ optional category). When APOLLO_API_BASE_URL is set, ImageApi queries /api/places/contains?lat=&lon= to enrich the LLM prompt's location string. See src/ai/apollo_client.rs and src/ai/insight_generator.rs:

Auto-enrichment (always on when configured): the per-photo location resolver folds the most-specific containing Place ("Home — near Cambridge, MA" or "Home (My house in Cambridge) — near Cambridge, MA" when a description is set) into the location field of combine_contexts. Smallest-radius wins — Apollo sorts server-side, this code takes [0].
Agentic tool get_personal_place_at(latitude, longitude): registered alongside reverse_geocode only when apollo_enabled() returns true. Returns "- Name [category]: description (radius N m)" lines, smallest radius first. The tool is deliberately narrow — no enumerate-all variant; auto-enrichment covers the photo-context path and the agentic tool covers ad-hoc lat/lon questions in chat continuation.

Failure modes degrade silently to the legacy Nominatim path: 5 s timeout, errors logged at warn, empty results returned. Apollo's routes are unauthenticated (single-user, LAN-trust); add JWT auth here + on Apollo's side if exposing beyond a trusted network.

Dependencies of Note

Rust crates

actix-web: HTTP framework
diesel: ORM for SQLite
jsonwebtoken: JWT implementation
kamadak-exif: EXIF parsing
image: Thumbnail generation
walkdir: Directory traversal
rayon: Parallel processing
opentelemetry: Distributed tracing
bcrypt: Password hashing
infer: Magic number file type detection

External binaries (must be on `PATH`)

ffmpeg — video thumbnail extraction (StreamActor, HLS pipeline) and the HEIF/HEIC/NEF/ARW thumbnail fallback in generate_image_thumbnail_ffmpeg. Required for any deploy that holds video or HEIF files.
exiftool — optional but strongly recommended for RAW-heavy libraries. The thumbnail pipeline shells out to it as the slow-path fallback for embedded preview extraction (Nikon MakerNote PreviewIFD, Canon SubIFDs, etc. — anything kamadak-exif's IFD0/IFD1 readers can't reach). Without exiftool installed, RAWs whose preview lives outside IFD0/IFD1 will fall through to ffmpeg, which often produces black thumbnails. Install via package manager: apt install libimage-exiftool-perl, brew install exiftool, winget install OliverBetz.ExifTool, or choco install exiftool.

24 KiB Raw Blame History