# Image API This is an Actix-web server for serving images and videos from a filesystem. Upon first run it will generate thumbnails for all images and videos at `BASE_PATH`. ## Features - Automatic thumbnail generation for images and videos - EXIF data extraction and storage for photos - File watching with NFS support (polling-based) - Video streaming with HLS - Tag-based organization - Memories API for browsing photos by date - **Video Wall** - Auto-generated short preview clips for videos, served via a grid view - **AI-Powered Photo Insights** - Generate contextual insights from photos using LLMs - **RAG-based Context Retrieval** - Semantic search over daily conversation summaries - **Automatic Daily Summaries** - LLM-generated summaries of daily conversations with embeddings ## External Dependencies ### ffmpeg (required) `ffmpeg` must be on `PATH`. It is used for: - **HLS video streaming** — transcoding/segmenting source videos into `.m3u8` + `.ts` playlists - **Video thumbnails** — extracting a frame at the 3-second mark - **Video preview clips** — short looping previews for the Video Wall - **HEIC / HEIF thumbnails** — decoding Apple's HEIC format (your ffmpeg build must include `libheif`; most modern builds do) Builds used in development: the `gyan.dev` full build on Windows, and distro `ffmpeg` packages on Linux work fine. If HEIC thumbnails silently fail, check `ffmpeg -formats | grep heif` to confirm HEIF support. ### RAW photo thumbnails RAW formats (ARW, NEF, CR2, CR3, DNG, RAF, ORF, RW2, PEF, SRW, TIFF) are thumbnailed by reading an embedded JPEG preview out of the TIFF container — no external RAW decoder (libraw / dcraw) is involved. The pipeline tries two layers in order and keeps the largest valid JPEG: 1. **Fast path (no extra dependency)** — `kamadak-exif` reads `JPEGInterchangeFormat` from IFD0 / IFD1 directly. Covers older bodies and most DNGs. 2. **`exiftool` fallback (recommended for RAW-heavy libraries)** — shells out to extract `PreviewImage` / `JpgFromRaw` / `OtherImage`, which reaches MakerNote and SubIFD-hosted previews kamadak-exif can't see (e.g. Nikon's `PreviewIFD`, where modern Nikon bodies stash the full-res review JPEG). If `exiftool` isn't on `PATH` this layer is skipped silently and only the fast-path result is used. Install `exiftool` via your package manager: - macOS: `brew install exiftool` - Linux (Debian/Ubuntu): `apt install libimage-exiftool-perl` - Windows: `winget install OliverBetz.ExifTool` or `choco install exiftool` Files where neither layer produces a valid preview fall back to ffmpeg. Anything that still can't be decoded is marked with a `.unsupported` sentinel in the thumbnail directory so we don't retry it every scan. Delete those sentinels (and any cached black thumbnails) to force retries after a tooling upgrade. ## Environment There are a handful of required environment variables to have the API run. They should be defined where the binary is located or above it in an `.env` file. - `DATABASE_URL` is a path or url to a database (currently only SQLite is tested) - `BASE_PATH` is the root from which you want to serve images and videos - `THUMBNAILS` is a path where generated thumbnails should be stored. Thumbnails mirror the source tree under `BASE_PATH` and keep the source's original extension (e.g. `foo.arw` or `bar.mp4`), though the file contents are always JPEG bytes — browsers content-sniff. Files that can't be thumbnailed by the `image` crate, ffmpeg, or an embedded RAW preview get a zero-byte `.unsupported` sentinel in this directory so subsequent scans skip them. Delete the `*.unsupported` files to force retries (for example after upgrading ffmpeg or adding libheif) - `VIDEO_PATH` is a path where HLS playlists and video parts should be stored - `GIFS_DIRECTORY` is a path where generated video GIF thumbnails should be stored - `BIND_URL` is the url and port to bind to (typically your own IP address) - `SECRET_KEY` is the *hopefully* random string to sign Tokens with - `RUST_LOG` is one of `off, error, warn, info, debug, trace`, from least to most noisy [error is default] - `EXCLUDED_DIRS` is a comma separated list of directories to exclude from the Memories API - `PREVIEW_CLIPS_DIRECTORY` (optional) is a path where generated video preview clips should be stored [default: `preview_clips`] - `WATCH_QUICK_INTERVAL_SECONDS` (optional) is the interval in seconds for quick file scans [default: 60] - `WATCH_FULL_INTERVAL_SECONDS` (optional) is the interval in seconds for full file scans [default: 3600] ### AI Insights Configuration (Optional) The following environment variables configure AI-powered photo insights and daily conversation summaries: #### Ollama Configuration - `OLLAMA_PRIMARY_URL` - Primary Ollama server URL [default: `http://localhost:11434`] - Example: `http://desktop:11434` (your main/powerful server) - `OLLAMA_FALLBACK_URL` - Fallback Ollama server URL (optional) - Example: `http://server:11434` (always-on backup server) - `OLLAMA_PRIMARY_MODEL` - Model to use on primary server [default: `nemotron-3-nano:30b`] - Example: `nemotron-3-nano:30b`, `llama3.2:3b`, etc. - `OLLAMA_FALLBACK_MODEL` - Model to use on fallback server (optional) - If not set, uses `OLLAMA_PRIMARY_MODEL` on fallback server **Legacy Variables** (still supported): - `OLLAMA_URL` - Used if `OLLAMA_PRIMARY_URL` not set - `OLLAMA_MODEL` - Used if `OLLAMA_PRIMARY_MODEL` not set #### OpenRouter Configuration (Hybrid Backend) The hybrid agentic backend keeps embeddings + vision local (Ollama) while routing chat + tool-calling to OpenRouter. Enabled per-request when the client sends `backend=hybrid`. - `OPENROUTER_API_KEY` - OpenRouter API key. Required to enable the hybrid backend. - `OPENROUTER_DEFAULT_MODEL` - Model id used when the client doesn't specify one [default: `anthropic/claude-sonnet-4`] - Example: `openai/gpt-4o-mini`, `google/gemini-2.5-flash` - `OPENROUTER_ALLOWED_MODELS` - Comma-separated curated allowlist exposed to clients via `GET /insights/openrouter/models`. The mobile picker shows only these. Empty/unset = no picker, server default is used. - Example: `openai/gpt-4o-mini,anthropic/claude-haiku-4-5,google/gemini-2.5-flash` - `OPENROUTER_BASE_URL` - Override base URL [default: `https://openrouter.ai/api/v1`] - `OPENROUTER_EMBEDDING_MODEL` - Embedding model for OpenRouter [default: `openai/text-embedding-3-small`]. Only used if/when embeddings are routed through OpenRouter (currently embeddings stay local). - `OPENROUTER_HTTP_REFERER` - Optional `HTTP-Referer` for OpenRouter attribution - `OPENROUTER_APP_TITLE` - Optional `X-Title` for OpenRouter attribution Capability checks are skipped for the curated allowlist — bad model ids surface as a 4xx from the chat call. Pick tool-capable models. #### SMS API Configuration - `SMS_API_URL` - URL to SMS message API [default: `http://localhost:8000`] - Used to fetch conversation data for context in insights - `SMS_API_TOKEN` - Authentication token for SMS API (optional) #### Agentic Insight Generation - `AGENTIC_MAX_ITERATIONS` - Maximum tool-call iterations per agentic insight request [default: `10`] - Controls how many times the model can invoke tools before being forced to produce a final answer - Increase for more thorough context gathering; decrease to limit response time #### Insight Chat Continuation After an agentic insight is generated, the conversation can be continued. Endpoints: - `POST /insights/chat` — single-turn reply (non-streaming) - `POST /insights/chat/stream` — SSE variant with live `text` deltas and `tool_call` / `tool_result` events. Mobile client uses this. - `GET /insights/chat/history?path=...&library=...` — rendered transcript; each assistant message carries a `tools: [{name, arguments, result}]` array - `POST /insights/chat/rewind` — truncate transcript at a rendered index (drops that message + any preceding tool scaffolding + later turns). Used for "try again from here" flows. The initial user message is protected. Amend mode (`amend: true` in the chat request body) regenerates the insight's title and inserts a new row instead of appending to the existing transcript, so you can rewrite the saved summary from within chat. - `AGENTIC_CHAT_MAX_ITERATIONS` - Cap on tool-calling iterations per chat turn [default: `6`] - Per-request `max_iterations` (when sent by the client) is clamped to this cap #### Fallback Behavior - Primary server is tried first with 5-second connection timeout - On failure, automatically falls back to secondary server (if configured) - Total request timeout is 120 seconds to accommodate LLM inference - Logs indicate which server/model was used and any failover attempts #### Daily Summary Generation Daily conversation summaries are generated automatically on server startup. Configure in `src/main.rs`: - Date range for summary generation - Contacts to process - Model version used for embeddings: `nomic-embed-text:v1.5` ### Apollo + Face Recognition (Optional) Apollo (sibling project) hosts both the Places API and the local insightface inference service. Both integrations are optional and degrade gracefully when unset. - `APOLLO_API_BASE_URL` - Base URL of the sibling Apollo backend. - When set, photo-insight enrichment folds the user's personal place name (Home, Work, Cabin, ...) into the location string, and the agentic loop gains a `get_personal_place_at` tool. Unset = legacy Nominatim-only path. - `APOLLO_FACE_API_BASE_URL` - Base URL for the face-detection service. - Falls back to `APOLLO_API_BASE_URL` when unset (typical single-Apollo deploy). Both unset = face feature disabled (file-watch hook and manual-face endpoints short-circuit silently). - `FACE_AUTOBIND_MIN_COS` (Phase 3) - Cosine-sim floor for auto-binding a detected face to an existing same-named person via people-tag bootstrap [default: `0.4`]. - `FACE_DETECT_CONCURRENCY` (Phase 3) - Per-scan-tick concurrent detect calls fired by the file watcher [default: `8`]. Apollo serializes them via its single-worker GPU pool. - `FACE_DETECT_TIMEOUT_SEC` - reqwest client timeout per detect call [default: `60`]. CPU inference on a backlog can take many seconds.