Go to file

Cameron Cordes 48cac8c285 multi-library: hash-keyed tagged_photo + photo_insights with reconciliation

Branch B of the multi-library data-model rollout. tagged_photo and
photo_insights now follow the bytes (content_hash), not the path,
matching the policy pinned in CLAUDE.md "Multi-library data model".
Branch A's availability probe and EXIF scoping land first; this
branch builds on top.

Migration (2026-05-01-000000_hash_keyed_derived_data)

  Adds nullable content_hash columns to tagged_photo and photo_insights,
  with partial indexes on the non-null subset to keep the index small
  during the transitional window. The migration backfills from
  image_exif:
    * tagged_photo joins on rel_path alone (no library_id available);
    * photo_insights joins on (library_id, rel_path), unambiguous.
  Rows whose image_exif hash isn't known yet stay null and the runtime
  reconciliation pass populates them as the hash backlog drains.

Insert-time population

  TagDao::tag_file looks up image_exif.content_hash by rel_path before
  inserting; the hash is written into the new column.
  InsightDao::store_insight does the same scoped to (library_id,
  rel_path). Caller-supplied hash on InsertPhotoInsight wins; otherwise
  the DAO does the lookup. Both paths fall back to None if the hash
  isn't known yet — reconciliation backfills.

Reconciliation (database/reconcile.rs)

  Three idempotent passes the watcher runs once per tick after the
  per-library backfill loop:
    1. tagged_photo NULL hashes → populate from image_exif by rel_path.
    2. photo_insights NULL hashes → populate by (library_id, rel_path).
    3. photo_insights scalar merge — when multiple is_current rows
       share a content_hash, keep the earliest generated_at as
       current; demote the rest. Demoted rows keep their data so
       /insights/history is unaffected; only the "current" pointer
       narrows to one per hash.

  No filesystem dependency, so reconcile doesn't need the availability
  gate; runs every tick. Logs once when something changed, debug
  otherwise.

  Tags are set-valued under the policy (union on read, already
  DISTINCT in queries), so there is no analogous tag-collapse pass —
  duplicate (tag_id, content_hash) rows across libraries are
  harmless.

Read paths are unchanged in this branch — lookup_tags_batch's
existing rel_path-via-hash-sibling expansion still produces the
correct merge. A follow-up can simplify reads to use the new column
directly for performance.

Tests: 217 pass (212 pre-existing + 5 new in reconcile covering
NULL-fill, hash-not-yet-known no-op, library scoping on insights,
earliest-wins collapse, idempotency).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-01 14:52:16 +00:00

.claude/commands

Add Speckit and Constitution

2026-02-26 10:05:47 -05:00

.idea

Build insight title from generated summary

2026-02-24 16:08:25 -05:00

.specify

Add Speckit and Constitution

2026-02-26 10:05:47 -05:00

migrations

multi-library: hash-keyed tagged_photo + photo_insights with reconciliation

2026-05-01 14:52:16 +00:00

specs/001-video-wall

Add VideoWall feature: server-side preview clip generation and mobile grid view

2026-02-25 19:40:17 -05:00

src

multi-library: hash-keyed tagged_photo + photo_insights with reconciliation

2026-05-01 14:52:16 +00:00

.env.example

faces: add .env.example template covering all documented env vars

2026-04-30 13:51:45 +00:00

.gitignore

gitignore: SQLite WAL runtime + local docs/specs dirs

2026-04-30 20:31:19 -04:00

Cargo.lock

faces: tighten bootstrap candidate filter, bump to 1.1.0

2026-04-29 19:05:04 +00:00

Cargo.toml

faces: tighten bootstrap candidate filter, bump to 1.1.0

2026-04-29 19:05:04 +00:00

CLAUDE.md

docs(claude): pin multi-library data model + availability/safety policy

2026-05-01 14:11:42 +00:00

diesel.toml

Move database into the main app

2020-07-07 21:48:29 -04:00

Jenkinsfile

Update CI to Rust 1.59

2022-03-01 20:44:51 -05:00

README.md

docs: face detection system section + per-tick backlog drain env vars

2026-04-30 14:06:42 +00:00

README.md

Image API

This is an Actix-web server for serving images and videos from a filesystem. Upon first run it will generate thumbnails for all images and videos at BASE_PATH.

Features

Automatic thumbnail generation for images and videos
EXIF data extraction and storage for photos
File watching with NFS support (polling-based)
Video streaming with HLS
Tag-based organization
Memories API for browsing photos by date
Video Wall - Auto-generated short preview clips for videos, served via a grid view
AI-Powered Photo Insights - Generate contextual insights from photos using LLMs
RAG-based Context Retrieval - Semantic search over daily conversation summaries
Automatic Daily Summaries - LLM-generated summaries of daily conversations with embeddings

External Dependencies

ffmpeg (required)

ffmpeg must be on PATH. It is used for:

HLS video streaming — transcoding/segmenting source videos into .m3u8 + .ts playlists
Video thumbnails — extracting a frame at the 3-second mark
Video preview clips — short looping previews for the Video Wall
HEIC / HEIF thumbnails — decoding Apple's HEIC format (your ffmpeg build must include libheif; most modern builds do)

Builds used in development: the gyan.dev full build on Windows, and distro ffmpeg packages on Linux work fine. If HEIC thumbnails silently fail, check ffmpeg -formats | grep heif to confirm HEIF support.

RAW photo thumbnails

RAW formats (ARW, NEF, CR2, CR3, DNG, RAF, ORF, RW2, PEF, SRW, TIFF) are thumbnailed by reading an embedded JPEG preview out of the TIFF container — no external RAW decoder (libraw / dcraw) is involved. The pipeline tries two layers in order and keeps the largest valid JPEG:

Fast path (no extra dependency) — kamadak-exif reads JPEGInterchangeFormat from IFD0 / IFD1 directly. Covers older bodies and most DNGs.
exiftool fallback (recommended for RAW-heavy libraries) — shells out to extract PreviewImage / JpgFromRaw / OtherImage, which reaches MakerNote and SubIFD-hosted previews kamadak-exif can't see (e.g. Nikon's PreviewIFD, where modern Nikon bodies stash the full-res review JPEG). If exiftool isn't on PATH this layer is skipped silently and only the fast-path result is used.

Install exiftool via your package manager:

macOS: brew install exiftool
Linux (Debian/Ubuntu): apt install libimage-exiftool-perl
Windows: winget install OliverBetz.ExifTool or choco install exiftool

Files where neither layer produces a valid preview fall back to ffmpeg. Anything that still can't be decoded is marked with a <thumb>.unsupported sentinel in the thumbnail directory so we don't retry it every scan. Delete those sentinels (and any cached black thumbnails) to force retries after a tooling upgrade.

Environment

There are a handful of required environment variables to have the API run. They should be defined where the binary is located or above it in an .env file.

DATABASE_URL is a path or url to a database (currently only SQLite is tested)
BASE_PATH is the root from which you want to serve images and videos
THUMBNAILS is a path where generated thumbnails should be stored. Thumbnails mirror the source tree under BASE_PATH and keep the source's original extension (e.g. foo.arw or bar.mp4), though the file contents are always JPEG bytes — browsers content-sniff. Files that can't be thumbnailed by the image crate, ffmpeg, or an embedded RAW preview get a zero-byte <thumb_path>.unsupported sentinel in this directory so subsequent scans skip them. Delete the *.unsupported files to force retries (for example after upgrading ffmpeg or adding libheif)
VIDEO_PATH is a path where HLS playlists and video parts should be stored
GIFS_DIRECTORY is a path where generated video GIF thumbnails should be stored
BIND_URL is the url and port to bind to (typically your own IP address)
SECRET_KEY is the hopefully random string to sign Tokens with
RUST_LOG is one of off, error, warn, info, debug, trace, from least to most noisy [error is default]
EXCLUDED_DIRS is a comma separated list of directories to exclude from the Memories API
PREVIEW_CLIPS_DIRECTORY (optional) is a path where generated video preview clips should be stored [default: preview_clips]
WATCH_QUICK_INTERVAL_SECONDS (optional) is the interval in seconds for quick file scans [default: 60]
WATCH_FULL_INTERVAL_SECONDS (optional) is the interval in seconds for full file scans [default: 3600]

AI Insights Configuration (Optional)

The following environment variables configure AI-powered photo insights and daily conversation summaries:

Ollama Configuration

OLLAMA_PRIMARY_URL - Primary Ollama server URL [default: http://localhost:11434]
- Example: http://desktop:11434 (your main/powerful server)
OLLAMA_FALLBACK_URL - Fallback Ollama server URL (optional)
- Example: http://server:11434 (always-on backup server)
OLLAMA_PRIMARY_MODEL - Model to use on primary server [default: nemotron-3-nano:30b]
- Example: nemotron-3-nano:30b, llama3.2:3b, etc.
OLLAMA_FALLBACK_MODEL - Model to use on fallback server (optional)
- If not set, uses OLLAMA_PRIMARY_MODEL on fallback server

Legacy Variables (still supported):

OLLAMA_URL - Used if OLLAMA_PRIMARY_URL not set
OLLAMA_MODEL - Used if OLLAMA_PRIMARY_MODEL not set

OpenRouter Configuration (Hybrid Backend)

The hybrid agentic backend keeps embeddings + vision local (Ollama) while routing chat + tool-calling to OpenRouter. Enabled per-request when the client sends backend=hybrid.

OPENROUTER_API_KEY - OpenRouter API key. Required to enable the hybrid backend.
OPENROUTER_DEFAULT_MODEL - Model id used when the client doesn't specify one [default: anthropic/claude-sonnet-4]
- Example: openai/gpt-4o-mini, google/gemini-2.5-flash
OPENROUTER_ALLOWED_MODELS - Comma-separated curated allowlist exposed to clients via GET /insights/openrouter/models. The mobile picker shows only these. Empty/unset = no picker, server default is used.
- Example: openai/gpt-4o-mini,anthropic/claude-haiku-4-5,google/gemini-2.5-flash
OPENROUTER_BASE_URL - Override base URL [default: https://openrouter.ai/api/v1]
OPENROUTER_EMBEDDING_MODEL - Embedding model for OpenRouter [default: openai/text-embedding-3-small]. Only used if/when embeddings are routed through OpenRouter (currently embeddings stay local).
OPENROUTER_HTTP_REFERER - Optional HTTP-Referer for OpenRouter attribution
OPENROUTER_APP_TITLE - Optional X-Title for OpenRouter attribution

Capability checks are skipped for the curated allowlist — bad model ids surface as a 4xx from the chat call. Pick tool-capable models.

SMS API Configuration

SMS_API_URL - URL to SMS message API [default: http://localhost:8000]
- Used to fetch conversation data for context in insights
SMS_API_TOKEN - Authentication token for SMS API (optional)

Agentic Insight Generation

AGENTIC_MAX_ITERATIONS - Maximum tool-call iterations per agentic insight request [default: 10]
- Controls how many times the model can invoke tools before being forced to produce a final answer
- Increase for more thorough context gathering; decrease to limit response time

Insight Chat Continuation

After an agentic insight is generated, the conversation can be continued. Endpoints:

POST /insights/chat — single-turn reply (non-streaming)
POST /insights/chat/stream — SSE variant with live text deltas and tool_call / tool_result events. Mobile client uses this.
GET /insights/chat/history?path=...&library=... — rendered transcript; each assistant message carries a tools: [{name, arguments, result}] array
POST /insights/chat/rewind — truncate transcript at a rendered index (drops that message + any preceding tool scaffolding + later turns). Used for "try again from here" flows. The initial user message is protected.

Amend mode (amend: true in the chat request body) regenerates the insight's title and inserts a new row instead of appending to the existing transcript, so you can rewrite the saved summary from within chat.

AGENTIC_CHAT_MAX_ITERATIONS - Cap on tool-calling iterations per chat turn [default: 6]
- Per-request max_iterations (when sent by the client) is clamped to this cap

Fallback Behavior

Primary server is tried first with 5-second connection timeout
On failure, automatically falls back to secondary server (if configured)
Total request timeout is 120 seconds to accommodate LLM inference
Logs indicate which server/model was used and any failover attempts

Daily Summary Generation

Daily conversation summaries are generated automatically on server startup. Configure in src/main.rs:

Date range for summary generation
Contacts to process
Model version used for embeddings: nomic-embed-text:v1.5

Apollo + Face Recognition (Optional)

Apollo (sibling project) hosts both the Places API and the local insightface inference service. Both integrations are optional and degrade gracefully when unset.

APOLLO_API_BASE_URL - Base URL of the sibling Apollo backend.
- When set, photo-insight enrichment folds the user's personal place name (Home, Work, Cabin, ...) into the location string, and the agentic loop gains a get_personal_place_at tool. Unset = legacy Nominatim-only path.
APOLLO_FACE_API_BASE_URL - Base URL for the face-detection service.
- Falls back to APOLLO_API_BASE_URL when unset (typical single-Apollo deploy). Both unset = face feature disabled (file-watch hook and manual-face endpoints short-circuit silently).
FACE_AUTOBIND_MIN_COS (Phase 3) - Cosine-sim floor for auto-binding a detected face to an existing same-named person via people-tag bootstrap [default: 0.4].
FACE_DETECT_CONCURRENCY (Phase 3) - Per-scan-tick concurrent detect calls fired by the file watcher [default: 8]. Apollo serializes them via its single-worker GPU pool.
FACE_DETECT_TIMEOUT_SEC - reqwest client timeout per detect call [default: 60]. CPU inference on a backlog can take many seconds.
FACE_BACKLOG_MAX_PER_TICK - Cap on the per-tick backlog drain (photos with a content_hash but no face_detections row) [default: 64]. Runs every watcher tick regardless of quick-vs-full scan, so the unscanned set drains independently of the file walk.
FACE_HASH_BACKFILL_MAX_PER_TICK - Cap on the per-tick content_hash backfill (photos that were registered before the hash field was populated retroactively) [default: 2000]. Errors don't burn the cap; only successful hashes count.