Land the persistence model and HTTP surface for local face recognition.
Inference still lives in Apollo (Phase 1); this side adds the data home
plus every endpoint Apollo's UI and FileViewer-React will consume.
Schema (new migration 2026-04-29-000000_add_faces):
- persons: visual identities. Optional entity_id bridges to the
existing knowledge-graph entities table; auto-bridging is left to
the management UI (we don't muddy LLM provenance from face rows).
UNIQUE(name COLLATE NOCASE) so 'alice' / 'Alice' fold to one row.
- face_detections: keyed on content_hash (cross-library dedup), with
status='detected' carrying bbox + 512-d embedding BLOB, and
'no_faces' / 'failed' marker rows that tell Phase 3's file watcher
not to re-scan. Marker invariant enforced via CHECK; partial UNIQUE
on content_hash WHERE status='no_faces' guards against double-marks.
Schema regenerated with `diesel print-schema` against a clean migration
run; joinables added for face_detections → libraries / persons and
persons → entities.
face_client.rs (sibling of apollo_client.rs):
- reqwest multipart, 60 s timeout (CPU inference on a backlog can be
slow; bounded threadpool on Apollo serializes calls anyway).
- FaceDetectError::{Permanent, Transient, Disabled} — Phase 3 keys
its marker-row decision on this. 422 → mark failed, 5xx → defer.
- APOLLO_FACE_API_BASE_URL falls back to APOLLO_API_BASE_URL when
unset; both unset = is_enabled() false, callers no-op.
faces.rs (DAO + handlers):
- SqliteFaceDao implements the full FaceDao trait; person face counts
go through sql_query because diesel's BoxedSelectStatement +
group_by trips trait-resolver recursion.
- merge_persons re-points face rows in a transaction, copies notes
when target's are empty, deletes src.
- manual POST /image/faces resolves content_hash through image_exif,
crops the user-drawn bbox with 10% padding (detector wants context
around ears/jaw), POSTs the crop to face_client.embed for a real
ArcFace vector, then inserts source='manual'.
- Cluster-suggest (Phase 6) gets its data from
GET /faces/embeddings — base64-encoded paged BLOBs so Apollo's
DBSCAN can stream them without ImageApi pre-aggregating.
Endpoints registered alongside add_*_services in main.rs:
GET /faces/stats?library=
GET /faces/embeddings?library=&unassigned=&limit=&offset=
GET /image/faces?path=&library=
POST /image/faces (manual create via embed)
PATCH /image/faces/{id}
DELETE /image/faces/{id}
GET /persons?library=
POST /persons
GET /persons/{id}
PATCH /persons/{id}
DELETE /persons/{id}?cascade=set_null|delete (set_null default)
POST /persons/{id}/merge
GET /persons/{id}/faces?library=
The file-watch hook (Phase 3) and the rerun-on-one-photo handler
(Phase 6) live behind the FaceDao methods marked dead_code today —
they're called only when those phases land. Same shape for the trait
methods that aren't reached by Phase 2 routes.
Tests: 3 DAO unit tests cover person CRUD + case-insensitive uniqueness,
marker-row idempotency (mark_status is a no-op when any row exists),
and merge re-pointing faces.
Cargo.toml: reqwest gains the `multipart` feature.
cargo build / cargo test --lib / cargo fmt / cargo clippy --all-targets
all clean for the new code; the two pre-existing test_path_excluder
failures and the pre-existing sort_by clippy warnings are unrelated and
present on master.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Image API
This is an Actix-web server for serving images and videos from a filesystem.
Upon first run it will generate thumbnails for all images and videos at BASE_PATH.
Features
- Automatic thumbnail generation for images and videos
- EXIF data extraction and storage for photos
- File watching with NFS support (polling-based)
- Video streaming with HLS
- Tag-based organization
- Memories API for browsing photos by date
- Video Wall - Auto-generated short preview clips for videos, served via a grid view
- AI-Powered Photo Insights - Generate contextual insights from photos using LLMs
- RAG-based Context Retrieval - Semantic search over daily conversation summaries
- Automatic Daily Summaries - LLM-generated summaries of daily conversations with embeddings
External Dependencies
ffmpeg (required)
ffmpeg must be on PATH. It is used for:
- HLS video streaming — transcoding/segmenting source videos into
.m3u8+.tsplaylists - Video thumbnails — extracting a frame at the 3-second mark
- Video preview clips — short looping previews for the Video Wall
- HEIC / HEIF thumbnails — decoding Apple's HEIC format (your ffmpeg build must include
libheif; most modern builds do)
Builds used in development: the gyan.dev full build on Windows, and distro ffmpeg
packages on Linux work fine. If HEIC thumbnails silently fail, check
ffmpeg -formats | grep heif to confirm HEIF support.
RAW photo thumbnails
RAW formats (ARW, NEF, CR2, CR3, DNG, RAF, ORF, RW2, PEF, SRW, TIFF) are thumbnailed by reading an embedded JPEG preview out of the TIFF container — no external RAW decoder (libraw / dcraw) is involved. The pipeline tries two layers in order and keeps the largest valid JPEG:
- Fast path (no extra dependency) —
kamadak-exifreadsJPEGInterchangeFormatfrom IFD0 / IFD1 directly. Covers older bodies and most DNGs. exiftoolfallback (recommended for RAW-heavy libraries) — shells out to extractPreviewImage/JpgFromRaw/OtherImage, which reaches MakerNote and SubIFD-hosted previews kamadak-exif can't see (e.g. Nikon'sPreviewIFD, where modern Nikon bodies stash the full-res review JPEG). Ifexiftoolisn't onPATHthis layer is skipped silently and only the fast-path result is used.
Install exiftool via your package manager:
- macOS:
brew install exiftool - Linux (Debian/Ubuntu):
apt install libimage-exiftool-perl - Windows:
winget install OliverBetz.ExifToolorchoco install exiftool
Files where neither layer produces a valid preview fall back to ffmpeg. Anything
that still can't be decoded is marked with a <thumb>.unsupported sentinel in
the thumbnail directory so we don't retry it every scan. Delete those sentinels
(and any cached black thumbnails) to force retries after a tooling upgrade.
Environment
There are a handful of required environment variables to have the API run.
They should be defined where the binary is located or above it in an .env file.
DATABASE_URLis a path or url to a database (currently only SQLite is tested)BASE_PATHis the root from which you want to serve images and videosTHUMBNAILSis a path where generated thumbnails should be stored. Thumbnails mirror the source tree underBASE_PATHand keep the source's original extension (e.g.foo.arworbar.mp4), though the file contents are always JPEG bytes — browsers content-sniff. Files that can't be thumbnailed by theimagecrate, ffmpeg, or an embedded RAW preview get a zero-byte<thumb_path>.unsupportedsentinel in this directory so subsequent scans skip them. Delete the*.unsupportedfiles to force retries (for example after upgrading ffmpeg or adding libheif)VIDEO_PATHis a path where HLS playlists and video parts should be storedGIFS_DIRECTORYis a path where generated video GIF thumbnails should be storedBIND_URLis the url and port to bind to (typically your own IP address)SECRET_KEYis the hopefully random string to sign Tokens withRUST_LOGis one ofoff, error, warn, info, debug, trace, from least to most noisy [error is default]EXCLUDED_DIRSis a comma separated list of directories to exclude from the Memories APIPREVIEW_CLIPS_DIRECTORY(optional) is a path where generated video preview clips should be stored [default:preview_clips]WATCH_QUICK_INTERVAL_SECONDS(optional) is the interval in seconds for quick file scans [default: 60]WATCH_FULL_INTERVAL_SECONDS(optional) is the interval in seconds for full file scans [default: 3600]
AI Insights Configuration (Optional)
The following environment variables configure AI-powered photo insights and daily conversation summaries:
Ollama Configuration
OLLAMA_PRIMARY_URL- Primary Ollama server URL [default:http://localhost:11434]- Example:
http://desktop:11434(your main/powerful server)
- Example:
OLLAMA_FALLBACK_URL- Fallback Ollama server URL (optional)- Example:
http://server:11434(always-on backup server)
- Example:
OLLAMA_PRIMARY_MODEL- Model to use on primary server [default:nemotron-3-nano:30b]- Example:
nemotron-3-nano:30b,llama3.2:3b, etc.
- Example:
OLLAMA_FALLBACK_MODEL- Model to use on fallback server (optional)- If not set, uses
OLLAMA_PRIMARY_MODELon fallback server
- If not set, uses
Legacy Variables (still supported):
OLLAMA_URL- Used ifOLLAMA_PRIMARY_URLnot setOLLAMA_MODEL- Used ifOLLAMA_PRIMARY_MODELnot set
OpenRouter Configuration (Hybrid Backend)
The hybrid agentic backend keeps embeddings + vision local (Ollama) while routing
chat + tool-calling to OpenRouter. Enabled per-request when the client sends
backend=hybrid.
OPENROUTER_API_KEY- OpenRouter API key. Required to enable the hybrid backend.OPENROUTER_DEFAULT_MODEL- Model id used when the client doesn't specify one [default:anthropic/claude-sonnet-4]- Example:
openai/gpt-4o-mini,google/gemini-2.5-flash
- Example:
OPENROUTER_ALLOWED_MODELS- Comma-separated curated allowlist exposed to clients viaGET /insights/openrouter/models. The mobile picker shows only these. Empty/unset = no picker, server default is used.- Example:
openai/gpt-4o-mini,anthropic/claude-haiku-4-5,google/gemini-2.5-flash
- Example:
OPENROUTER_BASE_URL- Override base URL [default:https://openrouter.ai/api/v1]OPENROUTER_EMBEDDING_MODEL- Embedding model for OpenRouter [default:openai/text-embedding-3-small]. Only used if/when embeddings are routed through OpenRouter (currently embeddings stay local).OPENROUTER_HTTP_REFERER- OptionalHTTP-Refererfor OpenRouter attributionOPENROUTER_APP_TITLE- OptionalX-Titlefor OpenRouter attribution
Capability checks are skipped for the curated allowlist — bad model ids surface as a 4xx from the chat call. Pick tool-capable models.
SMS API Configuration
SMS_API_URL- URL to SMS message API [default:http://localhost:8000]- Used to fetch conversation data for context in insights
SMS_API_TOKEN- Authentication token for SMS API (optional)
Agentic Insight Generation
AGENTIC_MAX_ITERATIONS- Maximum tool-call iterations per agentic insight request [default:10]- Controls how many times the model can invoke tools before being forced to produce a final answer
- Increase for more thorough context gathering; decrease to limit response time
Insight Chat Continuation
After an agentic insight is generated, the conversation can be continued. Endpoints:
POST /insights/chat— single-turn reply (non-streaming)POST /insights/chat/stream— SSE variant with livetextdeltas andtool_call/tool_resultevents. Mobile client uses this.GET /insights/chat/history?path=...&library=...— rendered transcript; each assistant message carries atools: [{name, arguments, result}]arrayPOST /insights/chat/rewind— truncate transcript at a rendered index (drops that message + any preceding tool scaffolding + later turns). Used for "try again from here" flows. The initial user message is protected.
Amend mode (amend: true in the chat request body) regenerates the insight's
title and inserts a new row instead of appending to the existing transcript,
so you can rewrite the saved summary from within chat.
AGENTIC_CHAT_MAX_ITERATIONS- Cap on tool-calling iterations per chat turn [default:6]- Per-request
max_iterations(when sent by the client) is clamped to this cap
- Per-request
Fallback Behavior
- Primary server is tried first with 5-second connection timeout
- On failure, automatically falls back to secondary server (if configured)
- Total request timeout is 120 seconds to accommodate LLM inference
- Logs indicate which server/model was used and any failover attempts
Daily Summary Generation
Daily conversation summaries are generated automatically on server startup. Configure in src/main.rs:
- Date range for summary generation
- Contacts to process
- Model version used for embeddings:
nomic-embed-text:v1.5
Apollo + Face Recognition (Optional)
Apollo (sibling project) hosts both the Places API and the local insightface inference service. Both integrations are optional and degrade gracefully when unset.
APOLLO_API_BASE_URL- Base URL of the sibling Apollo backend.- When set, photo-insight enrichment folds the user's personal place name
(Home, Work, Cabin, ...) into the location string, and the agentic loop
gains a
get_personal_place_attool. Unset = legacy Nominatim-only path.
- When set, photo-insight enrichment folds the user's personal place name
(Home, Work, Cabin, ...) into the location string, and the agentic loop
gains a
APOLLO_FACE_API_BASE_URL- Base URL for the face-detection service.- Falls back to
APOLLO_API_BASE_URLwhen unset (typical single-Apollo deploy). Both unset = face feature disabled (file-watch hook and manual-face endpoints short-circuit silently).
- Falls back to
FACE_AUTOBIND_MIN_COS(Phase 3) - Cosine-sim floor for auto-binding a detected face to an existing same-named person via people-tag bootstrap [default:0.4].FACE_DETECT_CONCURRENCY(Phase 3) - Per-scan-tick concurrent detect calls fired by the file watcher [default:8]. Apollo serializes them via its single-worker GPU pool.FACE_DETECT_TIMEOUT_SEC- reqwest client timeout per detect call [default:60]. CPU inference on a backlog can take many seconds.