kamadak-exif's In::PRIMARY / In::THUMBNAIL only address IFD0 and IFD1. On modern Nikon NEFs the full-res review JPEG lives in the MakerNote's PreviewIFD (and many Canon CR2s / DNGs put theirs in a SubIFD chain) — both unreachable through the existing reader, so the previous patch still produced no preview for those files and the pipeline fell through to ffmpeg, which writes black frames when it can't decode the RAW. Add a slow-path layer in extract_embedded_jpeg_preview that shells out to exiftool for PreviewImage / JpgFromRaw / OtherImage (one process per tag). All candidates from both layers are pooled and the largest valid JPEG wins. exiftool not on PATH degrades to fast-path-only behavior rather than breaking — the fallback is a strict superset. Documented the new optional dependency in README.md and CLAUDE.md with install commands for apt / brew / winget / choco. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
162 lines
8.9 KiB
Markdown
162 lines
8.9 KiB
Markdown
# Image API
|
|
This is an Actix-web server for serving images and videos from a filesystem.
|
|
Upon first run it will generate thumbnails for all images and videos at `BASE_PATH`.
|
|
|
|
## Features
|
|
- Automatic thumbnail generation for images and videos
|
|
- EXIF data extraction and storage for photos
|
|
- File watching with NFS support (polling-based)
|
|
- Video streaming with HLS
|
|
- Tag-based organization
|
|
- Memories API for browsing photos by date
|
|
- **Video Wall** - Auto-generated short preview clips for videos, served via a grid view
|
|
- **AI-Powered Photo Insights** - Generate contextual insights from photos using LLMs
|
|
- **RAG-based Context Retrieval** - Semantic search over daily conversation summaries
|
|
- **Automatic Daily Summaries** - LLM-generated summaries of daily conversations with embeddings
|
|
|
|
## External Dependencies
|
|
|
|
### ffmpeg (required)
|
|
`ffmpeg` must be on `PATH`. It is used for:
|
|
- **HLS video streaming** — transcoding/segmenting source videos into `.m3u8` + `.ts` playlists
|
|
- **Video thumbnails** — extracting a frame at the 3-second mark
|
|
- **Video preview clips** — short looping previews for the Video Wall
|
|
- **HEIC / HEIF thumbnails** — decoding Apple's HEIC format (your ffmpeg build must include
|
|
`libheif`; most modern builds do)
|
|
|
|
Builds used in development: the `gyan.dev` full build on Windows, and distro `ffmpeg`
|
|
packages on Linux work fine. If HEIC thumbnails silently fail, check
|
|
`ffmpeg -formats | grep heif` to confirm HEIF support.
|
|
|
|
### RAW photo thumbnails
|
|
RAW formats (ARW, NEF, CR2, CR3, DNG, RAF, ORF, RW2, PEF, SRW, TIFF) are thumbnailed
|
|
by reading an embedded JPEG preview out of the TIFF container — no external RAW
|
|
decoder (libraw / dcraw) is involved. The pipeline tries two layers in order and
|
|
keeps the largest valid JPEG:
|
|
|
|
1. **Fast path (no extra dependency)** — `kamadak-exif` reads
|
|
`JPEGInterchangeFormat` from IFD0 / IFD1 directly. Covers older bodies and
|
|
most DNGs.
|
|
2. **`exiftool` fallback (recommended for RAW-heavy libraries)** — shells out
|
|
to extract `PreviewImage` / `JpgFromRaw` / `OtherImage`, which reaches
|
|
MakerNote and SubIFD-hosted previews kamadak-exif can't see (e.g. Nikon's
|
|
`PreviewIFD`, where modern Nikon bodies stash the full-res review JPEG).
|
|
If `exiftool` isn't on `PATH` this layer is skipped silently and only the
|
|
fast-path result is used.
|
|
|
|
Install `exiftool` via your package manager:
|
|
- macOS: `brew install exiftool`
|
|
- Linux (Debian/Ubuntu): `apt install libimage-exiftool-perl`
|
|
- Windows: `winget install OliverBetz.ExifTool` or `choco install exiftool`
|
|
|
|
Files where neither layer produces a valid preview fall back to ffmpeg. Anything
|
|
that still can't be decoded is marked with a `<thumb>.unsupported` sentinel in
|
|
the thumbnail directory so we don't retry it every scan. Delete those sentinels
|
|
(and any cached black thumbnails) to force retries after a tooling upgrade.
|
|
|
|
## Environment
|
|
There are a handful of required environment variables to have the API run.
|
|
They should be defined where the binary is located or above it in an `.env` file.
|
|
|
|
- `DATABASE_URL` is a path or url to a database (currently only SQLite is tested)
|
|
- `BASE_PATH` is the root from which you want to serve images and videos
|
|
- `THUMBNAILS` is a path where generated thumbnails should be stored. Thumbnails
|
|
mirror the source tree under `BASE_PATH` and keep the source's original
|
|
extension (e.g. `foo.arw` or `bar.mp4`), though the file contents are always
|
|
JPEG bytes — browsers content-sniff. Files that can't be thumbnailed by the
|
|
`image` crate, ffmpeg, or an embedded RAW preview get a zero-byte
|
|
`<thumb_path>.unsupported` sentinel in this directory so subsequent scans
|
|
skip them. Delete the `*.unsupported` files to force retries (for example
|
|
after upgrading ffmpeg or adding libheif)
|
|
- `VIDEO_PATH` is a path where HLS playlists and video parts should be stored
|
|
- `GIFS_DIRECTORY` is a path where generated video GIF thumbnails should be stored
|
|
- `BIND_URL` is the url and port to bind to (typically your own IP address)
|
|
- `SECRET_KEY` is the *hopefully* random string to sign Tokens with
|
|
- `RUST_LOG` is one of `off, error, warn, info, debug, trace`, from least to most noisy [error is default]
|
|
- `EXCLUDED_DIRS` is a comma separated list of directories to exclude from the Memories API
|
|
- `PREVIEW_CLIPS_DIRECTORY` (optional) is a path where generated video preview clips should be stored [default: `preview_clips`]
|
|
- `WATCH_QUICK_INTERVAL_SECONDS` (optional) is the interval in seconds for quick file scans [default: 60]
|
|
- `WATCH_FULL_INTERVAL_SECONDS` (optional) is the interval in seconds for full file scans [default: 3600]
|
|
|
|
### AI Insights Configuration (Optional)
|
|
|
|
The following environment variables configure AI-powered photo insights and daily conversation summaries:
|
|
|
|
#### Ollama Configuration
|
|
- `OLLAMA_PRIMARY_URL` - Primary Ollama server URL [default: `http://localhost:11434`]
|
|
- Example: `http://desktop:11434` (your main/powerful server)
|
|
- `OLLAMA_FALLBACK_URL` - Fallback Ollama server URL (optional)
|
|
- Example: `http://server:11434` (always-on backup server)
|
|
- `OLLAMA_PRIMARY_MODEL` - Model to use on primary server [default: `nemotron-3-nano:30b`]
|
|
- Example: `nemotron-3-nano:30b`, `llama3.2:3b`, etc.
|
|
- `OLLAMA_FALLBACK_MODEL` - Model to use on fallback server (optional)
|
|
- If not set, uses `OLLAMA_PRIMARY_MODEL` on fallback server
|
|
|
|
**Legacy Variables** (still supported):
|
|
- `OLLAMA_URL` - Used if `OLLAMA_PRIMARY_URL` not set
|
|
- `OLLAMA_MODEL` - Used if `OLLAMA_PRIMARY_MODEL` not set
|
|
|
|
#### OpenRouter Configuration (Hybrid Backend)
|
|
The hybrid agentic backend keeps embeddings + vision local (Ollama) while routing
|
|
chat + tool-calling to OpenRouter. Enabled per-request when the client sends
|
|
`backend=hybrid`.
|
|
|
|
- `OPENROUTER_API_KEY` - OpenRouter API key. Required to enable the hybrid backend.
|
|
- `OPENROUTER_DEFAULT_MODEL` - Model id used when the client doesn't specify one
|
|
[default: `anthropic/claude-sonnet-4`]
|
|
- Example: `openai/gpt-4o-mini`, `google/gemini-2.5-flash`
|
|
- `OPENROUTER_ALLOWED_MODELS` - Comma-separated curated allowlist exposed to
|
|
clients via `GET /insights/openrouter/models`. The mobile picker shows only
|
|
these. Empty/unset = no picker, server default is used.
|
|
- Example: `openai/gpt-4o-mini,anthropic/claude-haiku-4-5,google/gemini-2.5-flash`
|
|
- `OPENROUTER_BASE_URL` - Override base URL [default: `https://openrouter.ai/api/v1`]
|
|
- `OPENROUTER_EMBEDDING_MODEL` - Embedding model for OpenRouter
|
|
[default: `openai/text-embedding-3-small`]. Only used if/when embeddings are
|
|
routed through OpenRouter (currently embeddings stay local).
|
|
- `OPENROUTER_HTTP_REFERER` - Optional `HTTP-Referer` for OpenRouter attribution
|
|
- `OPENROUTER_APP_TITLE` - Optional `X-Title` for OpenRouter attribution
|
|
|
|
Capability checks are skipped for the curated allowlist — bad model ids surface
|
|
as a 4xx from the chat call. Pick tool-capable models.
|
|
|
|
#### SMS API Configuration
|
|
- `SMS_API_URL` - URL to SMS message API [default: `http://localhost:8000`]
|
|
- Used to fetch conversation data for context in insights
|
|
- `SMS_API_TOKEN` - Authentication token for SMS API (optional)
|
|
|
|
#### Agentic Insight Generation
|
|
- `AGENTIC_MAX_ITERATIONS` - Maximum tool-call iterations per agentic insight request [default: `10`]
|
|
- Controls how many times the model can invoke tools before being forced to produce a final answer
|
|
- Increase for more thorough context gathering; decrease to limit response time
|
|
|
|
#### Insight Chat Continuation
|
|
After an agentic insight is generated, the conversation can be continued. Endpoints:
|
|
- `POST /insights/chat` — single-turn reply (non-streaming)
|
|
- `POST /insights/chat/stream` — SSE variant with live `text` deltas and
|
|
`tool_call` / `tool_result` events. Mobile client uses this.
|
|
- `GET /insights/chat/history?path=...&library=...` — rendered transcript;
|
|
each assistant message carries a `tools: [{name, arguments, result}]` array
|
|
- `POST /insights/chat/rewind` — truncate transcript at a rendered index
|
|
(drops that message + any preceding tool scaffolding + later turns). Used
|
|
for "try again from here" flows. The initial user message is protected.
|
|
|
|
Amend mode (`amend: true` in the chat request body) regenerates the insight's
|
|
title and inserts a new row instead of appending to the existing transcript,
|
|
so you can rewrite the saved summary from within chat.
|
|
|
|
- `AGENTIC_CHAT_MAX_ITERATIONS` - Cap on tool-calling iterations per chat turn [default: `6`]
|
|
- Per-request `max_iterations` (when sent by the client) is clamped to this cap
|
|
|
|
#### Fallback Behavior
|
|
- Primary server is tried first with 5-second connection timeout
|
|
- On failure, automatically falls back to secondary server (if configured)
|
|
- Total request timeout is 120 seconds to accommodate LLM inference
|
|
- Logs indicate which server/model was used and any failover attempts
|
|
|
|
#### Daily Summary Generation
|
|
Daily conversation summaries are generated automatically on server startup. Configure in `src/main.rs`:
|
|
- Date range for summary generation
|
|
- Contacts to process
|
|
- Model version used for embeddings: `nomic-embed-text:v1.5`
|
|
|