Adds a "Multi-library data model" section that classifies each table as intrinsic-to-bytes (hash-keyed), user-intent-about-a-photo (hash-keyed), or library-administrative ((library_id, rel_path)). Spells out merge semantics on read (union for set-valued, earliest-wins for scalar), write attribution (binds to bytes, not to current library), the transitional-state rules for hash-less rows, library handoff behavior on archive moves, and orphan GC. Adds a "Library availability and safety" subsection: every watcher tick begins with a presence probe; destructive paths (move-handoff re-keying, orphan GC) require both/all libraries online and confirmed-clean for two consecutive ticks. A NAS reboot, USB pull, or VPN drop must never trigger destruction — the worst case is that derived-data work pauses until the share returns. The face_detections table is referenced as the existing reference implementation of the policy. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
597 lines
32 KiB
Markdown
597 lines
32 KiB
Markdown
# CLAUDE.md
|
|
|
|
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
|
|
|
## Project Overview
|
|
|
|
An Actix-web REST API for serving images and videos from a filesystem with automatic thumbnail generation, EXIF extraction, tag organization, and a memories feature for browsing photos by date. Uses SQLite/Diesel ORM for data persistence and ffmpeg for video processing.
|
|
|
|
## Development Commands
|
|
|
|
### Building & Running
|
|
```bash
|
|
# Build for development
|
|
cargo build
|
|
|
|
# Build for release (uses thin LTO optimization)
|
|
cargo build --release
|
|
|
|
# Run the server (requires .env file with DATABASE_URL, BASE_PATH, THUMBNAILS, VIDEO_PATH, BIND_URL, SECRET_KEY)
|
|
cargo run
|
|
|
|
# Run with specific log level
|
|
RUST_LOG=debug cargo run
|
|
```
|
|
|
|
### Testing
|
|
```bash
|
|
# Run all tests (requires BASE_PATH in .env)
|
|
cargo test
|
|
|
|
# Run specific test
|
|
cargo test test_name
|
|
|
|
# Run tests with output
|
|
cargo test -- --nocapture
|
|
```
|
|
|
|
### Database Migrations
|
|
```bash
|
|
# Install diesel CLI (one-time setup)
|
|
cargo install diesel_cli --no-default-features --features sqlite
|
|
|
|
# Create new migration
|
|
diesel migration generate migration_name
|
|
|
|
# Run migrations (also runs automatically on app startup)
|
|
diesel migration run
|
|
|
|
# Revert last migration
|
|
diesel migration revert
|
|
|
|
# Regenerate schema.rs after manual migration changes
|
|
diesel print-schema > src/database/schema.rs
|
|
```
|
|
|
|
### Code Quality
|
|
```bash
|
|
# Format code
|
|
cargo fmt
|
|
|
|
# Run clippy linter
|
|
cargo clippy
|
|
|
|
# Fix automatically fixable issues
|
|
cargo fix
|
|
```
|
|
|
|
### Utility Binaries
|
|
```bash
|
|
# Two-phase cleanup: resolve missing files and validate file types
|
|
cargo run --bin cleanup_files -- --base-path /path/to/media --database-url ./database.db
|
|
```
|
|
|
|
## Architecture Overview
|
|
|
|
### Core Components
|
|
|
|
**Layered Architecture:**
|
|
- **HTTP Layer** (`main.rs`): Route handlers for images, videos, metadata, tags, favorites, memories
|
|
- **Auth Layer** (`auth.rs`): JWT token validation, Claims extraction via FromRequest trait
|
|
- **Service Layer** (`files.rs`, `exif.rs`, `memories.rs`): Business logic for file operations and EXIF extraction
|
|
- **DAO Layer** (`database/mod.rs`): Trait-based data access (ExifDao, UserDao, FavoriteDao, TagDao)
|
|
- **Database Layer**: Diesel ORM with SQLite, schema in `database/schema.rs`
|
|
|
|
**Async Actor System (Actix):**
|
|
- `StreamActor`: Manages ffmpeg video processing lifecycle
|
|
- `VideoPlaylistManager`: Scans directories and queues videos
|
|
- `PlaylistGenerator`: Creates HLS playlists for video streaming
|
|
|
|
### Database Schema & Patterns
|
|
|
|
**Tables:**
|
|
- `users`: Authentication (id, username, password_hash)
|
|
- `favorites`: User-specific favorites (userid, path)
|
|
- `tags`: Custom labels with timestamps
|
|
- `tagged_photo`: Many-to-many photo-tag relationships
|
|
- `image_exif`: Rich metadata (file_path + 16 EXIF fields: camera, GPS, dates, exposure settings)
|
|
|
|
**DAO Pattern:**
|
|
All database access goes through trait-based DAOs (e.g., `ExifDao`, `SqliteExifDao`). Connection pooling uses `Arc<Mutex<SqliteConnection>>`. All DB operations are traced with OpenTelemetry in release builds.
|
|
|
|
**Key DAO Methods:**
|
|
- `store_exif()`, `get_exif()`, `get_exif_batch()`: EXIF CRUD operations
|
|
- `query_by_exif()`: Complex filtering by camera, GPS bounds, date ranges
|
|
- Batch operations minimize DB hits during file watching
|
|
|
|
### Multi-library data model
|
|
|
|
ImageApi supports more than one library (a library = a `(name, root_path)`
|
|
row in the `libraries` table that maps to a mounted directory tree). The
|
|
same bytes may exist under more than one library — typical case is an
|
|
"active" library plus an "archive" library that ingests files as they age
|
|
out — and the data model is designed so that derived data follows the
|
|
**bytes**, not the path, while user-managed data does the same.
|
|
|
|
**The principle.** A photo's identity is its `content_hash` (blake3, see
|
|
`src/content_hash.rs`). Anything we compute from or attach to a photo is
|
|
keyed on that hash so it survives:
|
|
- the same file appearing in a second library (backup / archive / mirror),
|
|
- the file moving between libraries (recent → archive handoff),
|
|
- the file moving within a library (re-organized rel_path),
|
|
- intra-library duplicates (same bytes at two paths).
|
|
|
|
**Table classification.** Three categories drive the keying decision:
|
|
|
|
| Category | Key | Rationale | Tables |
|
|
|---|---|---|---|
|
|
| Intrinsic to bytes | `content_hash` | Rerunning is wasted work (or LLM cost) | `face_detections` ✓, `image_exif` (target), `photo_insights` (target), `video_preview_clips` (target) |
|
|
| User intent about a photo | `content_hash` | "Tag this photo" means the bytes, not a path | `tagged_photo` (target), `favorites` (target) |
|
|
| Library administrative | `(library_id, rel_path)` | Tied to a specific filesystem location | `libraries`, `entity_photo_links`, the `rel_path` back-ref columns on hash-keyed tables |
|
|
|
|
✓ = already implemented this way. *(target)* = today still keyed on
|
|
`(library_id, rel_path)` and slated for migration. The migration adds a
|
|
nullable `content_hash` column, populates it from `image_exif` where
|
|
known, and read paths fall back to rel_path while the hash is null.
|
|
|
|
**Carrying a `rel_path` even when hash-keyed.** Hash-keyed tables retain
|
|
`(library_id, rel_path)` columns as a denormalized **back-reference**, not
|
|
as the key. This lets a single query answer "what is at this path right
|
|
now" without joining through `image_exif`, and supports the path-only
|
|
endpoints that predate the hash. `face_detections` is the reference
|
|
implementation: hash is the truth, path is a hint.
|
|
|
|
**Merge semantics on read.** When the same hash has rows under more than
|
|
one library:
|
|
- Set-valued data (tags, favorites, faces, entity links) → **union**.
|
|
- Scalar data (current insight, EXIF row, video preview clip) → earliest
|
|
`generated_at` / `created_time` wins. The historical lib1 row beats a
|
|
re-generated lib2 row, so the user's curated insight isn't shadowed by
|
|
a re-run on archive ingest.
|
|
|
|
**Write attribution.** A new tag/favorite/insight created while viewing
|
|
under lib2 binds to the bytes, not to lib2 — so it shows up under lib1
|
|
too. This is by design, but it's the most surprising rule on first
|
|
encounter; clients should not assume tags are library-scoped.
|
|
|
|
**Hash-less rows (transitional state).** During and immediately after a
|
|
new mount, `image_exif.content_hash` is being populated by
|
|
`backfill_unhashed_backlog` (capped per tick). Rules during this window:
|
|
- Writes: if the hash is known, write hash-keyed. If not, write
|
|
`(library_id, rel_path)`-keyed and let the reconciliation job collapse
|
|
duplicates once the hash lands.
|
|
- Reads: prefer hash key, fall back to `(library_id, rel_path)`.
|
|
- Reconciliation: a one-shot pass after every backfill tick collapses
|
|
rows that now share a hash, applying the merge semantics above.
|
|
Idempotent — safe to re-run.
|
|
|
|
**Library handoff (recent → archive).** When a file moves between
|
|
libraries (e.g. operator moves `~/photos/2024/IMG.nef` to the archive
|
|
mount), the file watcher sees the disappearance under lib1 and the
|
|
appearance under lib2. Hash-keyed rows don't need migration; the
|
|
`(library_id, rel_path)` back-ref columns are updated to point to the new
|
|
location. Library administrative rows (`entity_photo_links`,
|
|
`(library_id, rel_path)` rows in `image_exif` for hash-less items) are
|
|
re-keyed by the move detector, which matches a disappearance to an
|
|
appearance by `content_hash` within a configurable window.
|
|
|
|
**Orphans (source deleted while a copy survives).** When the only
|
|
`image_exif` row for a hash is deleted (file removed from disk), the
|
|
hash-keyed derived rows survive **as long as another `image_exif` row
|
|
references the same hash**. If the last reference is gone, derived rows
|
|
are eligible for GC (deferred — the GC job runs on a slow schedule so
|
|
that a brief unmount or rename doesn't wipe history).
|
|
|
|
**Stats and counts.** When reporting "how many photos do you have," count
|
|
`DISTINCT content_hash` over `image_exif`, not row count. Faces stats
|
|
already does this (`FaceDao::stats` in `src/faces.rs`); other counters
|
|
should follow suit. Numerator and denominator must live in the same
|
|
domain — see the face-stats commentary below for the cautionary tale.
|
|
|
|
**Per-library scoping when the user asks for it.** A request scoped to
|
|
`?library=N` filters the `image_exif` view to that library, and the
|
|
hash-keyed derived data is joined through that view. The user sees only
|
|
photos that have a copy under lib N, but the derived data attached to
|
|
those photos is the merged hash-keyed view. This is the answer to "show
|
|
me archive photos with their original tags."
|
|
|
|
**Library availability and safety.** Libraries can be on network shares
|
|
or removable media; the file watcher must not interpret a temporary
|
|
unavailability as a mass-deletion event. Every tick begins with a
|
|
**presence probe** per library: the library is considered online iff
|
|
its `root_path` exists, is readable, and a top-level scan returns at
|
|
least one expected entry (or matches a recent file-count high-water
|
|
mark within a tolerance). The probe result gates which actions are safe
|
|
to run on that library this tick:
|
|
|
|
| Action | Requires online? |
|
|
|---|---|
|
|
| Quick / full scan ingest of new files | yes |
|
|
| EXIF / face / insight backlog drains | yes — but the work runs against any online library |
|
|
| Move-handoff detection (lib1 disappearance ↔ lib2 appearance match) | **both** libraries online |
|
|
| `(library_id, rel_path)` re-keying on detected move | **both** libraries online |
|
|
| Orphan GC of hash-keyed derived data | all libraries that have *ever* held the hash must be online and confirmed-clean for two consecutive ticks |
|
|
| Reads / serving | always allowed; falls back to whichever library is online |
|
|
|
|
A library that fails the probe enters a "stale" state: writes scoped to
|
|
it are paused, its rows are flagged stale (not deleted) in
|
|
`/libraries` status, and the watcher logs at `warn` once per
|
|
state-transition (not per tick). A library that recovers re-enters the
|
|
online set automatically; no operator action required for transient
|
|
outages. The intent is that pulling a USB drive, rebooting a NAS, or
|
|
losing a VPN never triggers a destructive code path — the worst case is
|
|
that derived-data work pauses until the share returns.
|
|
|
|
The same rule constrains the move-handoff matcher: a disappearance
|
|
under lib1 only counts as a "move" if there is a matching appearance
|
|
under another **online** library within the window. A bare
|
|
disappearance with no matching appearance is treated as
|
|
"unavailable-or-deleted, defer judgment" — it does not re-key any rows
|
|
and does not enqueue GC.
|
|
|
|
### File Processing Pipeline
|
|
|
|
**Thumbnail Generation:**
|
|
1. Startup scan: Rayon parallel walk of BASE_PATH
|
|
2. Creates 200x200 thumbnails in THUMBNAILS directory (mirrors source structure)
|
|
3. Videos: extracts frame at 3-second mark via ffmpeg
|
|
4. Images: uses `image` crate for JPEG/PNG processing
|
|
5. RAW formats (NEF/CR2/ARW/DNG/etc.): the `image` crate can't decode RAW
|
|
pixel data, so the pipeline pulls an embedded JPEG preview instead. Fast
|
|
path is `exif::read_jpeg_at_ifd` against IFD0 (PRIMARY) and IFD1
|
|
(THUMBNAIL) — covers most older bodies and DNGs. Slow-path fallback shells
|
|
out to **`exiftool`** for `PreviewImage` / `JpgFromRaw` / `OtherImage`,
|
|
which reaches MakerNote / SubIFD-hosted previews kamadak-exif can't see
|
|
(e.g. Nikon's `PreviewIFD`, where modern Nikon bodies store the full-res
|
|
review JPEG). All candidates are pooled and the largest valid JPEG wins.
|
|
See `src/exif.rs::extract_embedded_jpeg_preview`.
|
|
|
|
**File Watching:**
|
|
Runs in background thread with two-tier strategy:
|
|
- **Quick scan** (default 60s): Recently modified files only
|
|
- **Full scan** (default 3600s): Comprehensive directory check
|
|
- Batch queries EXIF DB to detect new files
|
|
- Configurable via `WATCH_QUICK_INTERVAL_SECONDS` and `WATCH_FULL_INTERVAL_SECONDS`
|
|
|
|
**EXIF Extraction:**
|
|
- Uses `kamadak-exif` crate
|
|
- Supports: JPEG, TIFF, RAW (NEF, CR2, CR3), HEIF/HEIC, PNG, WebP
|
|
- Extracts: camera make/model, lens, dimensions, GPS coordinates, focal length, aperture, shutter speed, ISO, date taken
|
|
- Triggered on upload and during file watching
|
|
|
|
**File Upload Behavior:**
|
|
If file exists, appends timestamp to filename (`photo_1735124234.jpg`) to preserve history without overwrites.
|
|
|
|
### Authentication Flow
|
|
|
|
**Login:**
|
|
1. POST `/login` with username/password
|
|
2. Verify with `bcrypt::verify()` against password_hash
|
|
3. Generate JWT with claims: `{ sub: user_id, exp: 5_days_from_now }`
|
|
4. Sign with HS256 using `SECRET_KEY` environment variable
|
|
|
|
**Authorization:**
|
|
All protected endpoints extract `Claims` via `FromRequest` trait implementation. Token passed as `Authorization: Bearer <token>` header.
|
|
|
|
### API Structure
|
|
|
|
**Key Endpoint Patterns:**
|
|
|
|
```rust
|
|
// Image serving & upload
|
|
GET /image?path=...&size=...&format=...
|
|
POST /image (multipart file upload)
|
|
|
|
// Metadata & EXIF
|
|
GET /image/metadata?path=...
|
|
|
|
// Advanced search with filters
|
|
GET /photos?path=...&recursive=true&sort=DateTakenDesc&camera_make=Canon&gps_lat=...&gps_lon=...&gps_radius_km=10&date_from=...&date_to=...&tag_ids=1,2,3&media_type=Photo
|
|
|
|
// Video streaming (HLS)
|
|
POST /video/generate (creates .m3u8 playlist + .ts segments)
|
|
GET /video/stream?path=... (serves playlist)
|
|
|
|
// Tags
|
|
GET /image/tags/all
|
|
POST /image/tags (add tag to file)
|
|
DELETE /image/tags (remove tag from file)
|
|
POST /image/tags/batch (bulk tag updates)
|
|
|
|
// Memories (week-based grouping)
|
|
GET /memories?path=...&recursive=true
|
|
|
|
// AI Insights
|
|
POST /insights/generate (non-agentic single-shot)
|
|
POST /insights/generate/agentic (tool-calling loop; body: { file_path, backend?, model?, ... })
|
|
GET /insights?path=...&library=...
|
|
GET /insights/models (local Ollama models + capabilities)
|
|
GET /insights/openrouter/models (curated OpenRouter allowlist)
|
|
POST /insights/rate (thumbs up/down for training data)
|
|
|
|
// Insight Chat Continuation
|
|
POST /insights/chat (single-turn reply, non-streaming)
|
|
POST /insights/chat/stream (SSE: text / tool_call / tool_result / truncated / done)
|
|
GET /insights/chat/history?path=... (rendered transcript with tool invocations)
|
|
POST /insights/chat/rewind (truncate transcript at a rendered index)
|
|
```
|
|
|
|
**Request Types:**
|
|
- `FilesRequest`: Supports complex filtering (tags, EXIF fields, GPS radius, date ranges)
|
|
- `SortType`: Shuffle, NameAsc/Desc, TagCountAsc/Desc, DateTakenAsc/Desc
|
|
|
|
### Important Patterns
|
|
|
|
**Service Builder Pattern:**
|
|
Routes are registered via composable `ServiceBuilder` trait in `service.rs`. Allows modular feature addition.
|
|
|
|
**Path Validation:**
|
|
Always use `is_valid_full_path(&base_path, &requested_path, check_exists)` to prevent directory traversal attacks.
|
|
|
|
**File Type Detection:**
|
|
Centralized in `file_types.rs` with constants `IMAGE_EXTENSIONS` and `VIDEO_EXTENSIONS`. Provides both `Path` and `DirEntry` variants for performance.
|
|
|
|
**OpenTelemetry Tracing:**
|
|
All database operations and HTTP handlers wrapped in spans. In release builds, exports to OTLP endpoint via `OTLP_OTLS_ENDPOINT`. Debug builds use basic logger.
|
|
|
|
**Memory Exclusion:**
|
|
`PathExcluder` in `memories.rs` filters out directories from memories API via `EXCLUDED_DIRS` environment variable (comma-separated paths or substring patterns). The same excluder is applied to face-detection candidates (`face_watch::filter_excluded`) so junk directories like `@eaDir` / `.thumbnails` don't burn detect calls on Apollo.
|
|
|
|
### Face detection system
|
|
|
|
ImageApi owns the face data; Apollo (sibling repo) hosts the insightface inference service. Inference is triggered automatically by the file watcher and persisted into two tables:
|
|
|
|
- `persons(id, name UNIQUE COLLATE NOCASE, cover_face_id, entity_id, created_from_tag, notes, ...)` — operator-managed, name is the user-visible identity.
|
|
- `face_detections(id, library_id, content_hash, rel_path, bbox_*, embedding BLOB, confidence, source, person_id, status, model_version, ...)` — keyed on `content_hash` so a photo duplicated across libraries is detected once. Marker rows for `status IN ('no_faces','failed')` carry NULL bbox/embedding (CHECK constraint enforces this).
|
|
|
|
**Why content_hash and not (library_id, rel_path):** ties face data to the bytes, not the path. A backup mount that copies files from the primary library naturally inherits the existing detections without re-running inference. This is the reference implementation of the multi-library data model — see "Multi-library data model" above.
|
|
|
|
**File-watch hook** (`src/main.rs::process_new_files`): for each photo with a populated `content_hash`, check `FaceDao::already_scanned(hash)`; if not, send bytes (or embedded JPEG preview for RAW via `exif::extract_embedded_jpeg_preview`) to Apollo's `/api/internal/faces/detect`. K=`FACE_DETECT_CONCURRENCY` (default 8) parallel calls per scan tick; Apollo serializes them via its single-worker GPU pool. `face_watch.rs` is the Tokio orchestration layer.
|
|
|
|
**Per-tick backlog drain** (also `src/main.rs`): two passes that run on every watcher tick regardless of quick-vs-full scan:
|
|
- `backfill_unhashed_backlog` — populates `image_exif.content_hash` for photos that arrived before the hash field was retroactive. Capped by `FACE_HASH_BACKFILL_MAX_PER_TICK` (default 2000); errors don't burn the cap.
|
|
- `process_face_backlog` — runs detection on photos that have a hash but no `face_detections` row. Capped by `FACE_BACKLOG_MAX_PER_TICK` (default 64). Selected via a SQL anti-join (`FaceDao::list_unscanned_candidates`); videos and EXCLUDED_DIRS paths filtered out client-side via `face_watch::filter_excluded` so they never reach Apollo.
|
|
|
|
**Auto-bind on detection:** when a photo carries a tag whose name matches a `persons.name` (case-insensitive), the new face binds automatically iff cosine similarity to the person's existing-face mean is ≥ `FACE_AUTOBIND_MIN_COS` (default 0.4). Persons with no existing faces bind unconditionally and the new face becomes the cover.
|
|
|
|
**Manual face create** (`POST /image/faces`): crops the image to the user-supplied bbox, applies EXIF orientation via `exif::apply_orientation` (the `image` crate hands raw pre-rotation pixels — without this, manually-drawn bboxes never resolved a face on re-detection), pads to ~50% of bbox dims (RetinaFace anchor scales need ~50% face-fill at det_size=640), then calls Apollo's embed endpoint. A `force` flag lets the operator save a face the detector couldn't see (e.g. profile shots, occluded faces) — the row gets a zero-vector embedding so it's manually-bound only and won't participate in clustering.
|
|
|
|
**Rerun preserves manual rows** (`POST /image/faces/{id}/rerun`): only `source='auto'` rows are deleted before re-running detection. `already_scanned` returns true on ANY row, so a photo whose only faces are manually drawn never auto-redetects.
|
|
|
|
**Stats domain — content_hash, not file rows** (`FaceDao::stats` in `src/faces.rs`): `total_photos` counts `DISTINCT content_hash` over `image_exif` (filtered to image extensions, `content_hash IS NOT NULL`), and so do `scanned` / `with_faces` / `no_faces` / `failed` over `face_detections`. Numerator and denominator must live in the same domain — `face_detections` is keyed on content_hash, so the same JPEG present at two rel_paths or in two libraries scans once. Counting `image_exif` rows in the denominator inflated total by one per duplicate file and produced a permanent gap (e.g. 1101/1103 with nothing actually pending). Hash-less rows are excluded from total_photos while they sit in the `backfill_unhashed_backlog` queue; otherwise the bar pins below 100% for the duration of that backfill even though those rows aren't pending detection yet — they're pending hashing.
|
|
|
|
Module map:
|
|
- `src/faces.rs` — `FaceDao` trait + `SqliteFaceDao` impl, route handlers for `/faces/*`, `/image/faces/*`, `/persons/*`. Mirror of `tags.rs` layout.
|
|
- `src/face_watch.rs` — Tokio orchestration for the file-watch detect pass; `filter_excluded` (PathExcluder + image-extension filter), `read_image_bytes_for_detect` (RAW preview fallback).
|
|
- `src/ai/face_client.rs` — HTTP client for Apollo's inference. Configured by `APOLLO_FACE_API_BASE_URL`, falls back to `APOLLO_API_BASE_URL`. Both unset → feature disabled, file-watch hook is a no-op.
|
|
- `migrations/2026-04-29-000000_add_faces/` — schema.
|
|
|
|
### Startup Sequence
|
|
|
|
1. Load `.env` file
|
|
2. Run embedded Diesel migrations
|
|
3. Spawn file watcher thread
|
|
4. Create initial thumbnails (parallel scan)
|
|
5. Generate video GIF thumbnails
|
|
6. Initialize AppState with Actix actors
|
|
7. Set up Prometheus metrics (`imageserver_image_total`, `imageserver_video_total`)
|
|
8. Scan directory for videos and queue HLS processing
|
|
9. Start HTTP server on `BIND_URL` + localhost:8088
|
|
|
|
## Testing Patterns
|
|
|
|
Tests require `BASE_PATH` environment variable. Many integration tests create temporary directories and files.
|
|
|
|
When testing database code:
|
|
- Use in-memory SQLite: `DATABASE_URL=":memory:"`
|
|
- Run migrations in test setup
|
|
- Clean up with `DROP TABLE` or use `#[serial]` from `serial_test` crate if parallel tests conflict
|
|
|
|
## Common Gotchas
|
|
|
|
**EXIF Date Parsing:**
|
|
Multiple formats supported (EXIF DateTime, ISO8601, Unix timestamp). Fallback chain attempts multiple parsers.
|
|
|
|
**Video Processing:**
|
|
ffmpeg processes run asynchronously via actors. Use `StreamActor` to track completion. HLS segments written to `VIDEO_PATH`.
|
|
|
|
**File Extensions:**
|
|
Extension detection is case-insensitive. Use `file_types.rs` helpers rather than manual string matching.
|
|
|
|
**Migration Workflow:**
|
|
After creating a migration, manually edit the SQL, then regenerate `schema.rs` with `diesel print-schema`. Migrations auto-run on startup via `embedded_migrations!()` macro.
|
|
|
|
**Path Absolutization:**
|
|
Use `path-absolutize` crate's `.absolutize()` method when converting user-provided paths to ensure they're within `BASE_PATH`.
|
|
|
|
## Required Environment Variables
|
|
|
|
```bash
|
|
DATABASE_URL=./database.db # SQLite database path
|
|
BASE_PATH=/path/to/media # Root media directory
|
|
THUMBNAILS=/path/to/thumbnails # Thumbnail storage
|
|
VIDEO_PATH=/path/to/video/hls # HLS playlist output
|
|
GIFS_DIRECTORY=/path/to/gifs # Video GIF thumbnails
|
|
BIND_URL=0.0.0.0:8080 # Server binding
|
|
CORS_ALLOWED_ORIGINS=http://localhost:3000
|
|
SECRET_KEY=your-secret-key-here # JWT signing secret
|
|
RUST_LOG=info # Log level
|
|
EXCLUDED_DIRS=/private,/archive # Comma-separated paths to exclude from memories
|
|
```
|
|
|
|
Optional:
|
|
```bash
|
|
WATCH_QUICK_INTERVAL_SECONDS=60 # Quick scan interval
|
|
WATCH_FULL_INTERVAL_SECONDS=3600 # Full scan interval
|
|
OTLP_OTLS_ENDPOINT=http://... # OpenTelemetry collector (release builds)
|
|
|
|
# AI Insights Configuration
|
|
OLLAMA_PRIMARY_URL=http://desktop:11434 # Primary Ollama server (e.g., desktop)
|
|
OLLAMA_FALLBACK_URL=http://server:11434 # Fallback Ollama server (optional, always-on)
|
|
OLLAMA_PRIMARY_MODEL=nemotron-3-nano:30b # Model for primary server (default: nemotron-3-nano:30b)
|
|
OLLAMA_FALLBACK_MODEL=llama3.2:3b # Model for fallback server (optional, uses primary if not set)
|
|
OLLAMA_REQUEST_TIMEOUT_SECONDS=120 # Per-request generation timeout (default 120). Increase for slow CPU-offloaded models.
|
|
SMS_API_URL=http://localhost:8000 # SMS message API endpoint (default: localhost:8000)
|
|
SMS_API_TOKEN=your-api-token # SMS API authentication token (optional)
|
|
|
|
# Apollo Places integration (optional). When set, photo-insight enrichment
|
|
# folds the user's personal place name (Home, Work, Cabin, ...) into the
|
|
# location string fed to the LLM, and the agentic loop gains a
|
|
# `get_personal_place_at` tool. Unset = legacy Nominatim-only path.
|
|
APOLLO_API_BASE_URL=http://apollo.lan:8000 # Base URL of the sibling Apollo backend
|
|
|
|
# Face inference (optional). Apollo also hosts the insightface inference
|
|
# service; ImageApi calls it from the file-watch hook (Phase 3) and from
|
|
# the manual face-create endpoint. Falls back to APOLLO_API_BASE_URL when
|
|
# unset (typical single-Apollo deploy). Both unset = feature disabled.
|
|
APOLLO_FACE_API_BASE_URL=http://apollo.lan:8000 # Override if face service runs separately
|
|
FACE_AUTOBIND_MIN_COS=0.4 # Phase 3: cosine-sim floor for tag-name auto-bind
|
|
FACE_DETECT_CONCURRENCY=8 # Phase 3: per-scan-tick parallel detect calls
|
|
FACE_DETECT_TIMEOUT_SEC=60 # reqwest client timeout (CPU inference can be slow)
|
|
|
|
# OpenRouter (Hybrid Backend) - keeps embeddings + vision local, routes chat to OpenRouter
|
|
OPENROUTER_API_KEY=sk-or-... # Required to enable hybrid backend
|
|
OPENROUTER_DEFAULT_MODEL=anthropic/claude-sonnet-4 # Used when client doesn't pick a model
|
|
OPENROUTER_ALLOWED_MODELS=openai/gpt-4o-mini,anthropic/claude-haiku-4-5,google/gemini-2.5-flash
|
|
# Curated allowlist exposed to clients via
|
|
# GET /insights/openrouter/models. Empty = no picker.
|
|
OPENROUTER_BASE_URL=https://openrouter.ai/api/v1 # Override base URL (optional)
|
|
OPENROUTER_EMBEDDING_MODEL=openai/text-embedding-3-small # Optional, embeddings stay local today
|
|
OPENROUTER_HTTP_REFERER=https://your-site.example # Optional attribution header
|
|
OPENROUTER_APP_TITLE=ImageApi # Optional attribution header
|
|
|
|
# Insight Chat Continuation
|
|
AGENTIC_CHAT_MAX_ITERATIONS=6 # Cap on tool-calling iterations per chat turn (default 6)
|
|
```
|
|
|
|
**AI Insights Fallback Behavior:**
|
|
- Primary server is tried first with its configured model (5-second connection timeout)
|
|
- On connection failure, automatically falls back to secondary server with its model (if configured)
|
|
- If `OLLAMA_FALLBACK_MODEL` not set, uses same model as primary server on fallback
|
|
- Total request timeout is 120 seconds to accommodate slow LLM inference
|
|
- Logs indicate which server and model was used (info level) and failover attempts (warn level)
|
|
- Backwards compatible: `OLLAMA_URL` and `OLLAMA_MODEL` still supported as fallbacks
|
|
|
|
**Model Discovery:**
|
|
The `OllamaClient` provides methods to query available models:
|
|
- `OllamaClient::list_models(url)` - Returns list of all models on a server
|
|
- `OllamaClient::is_model_available(url, model_name)` - Checks if a specific model exists
|
|
|
|
This allows runtime verification of model availability before generating insights.
|
|
|
|
**Hybrid Backend (OpenRouter):**
|
|
- Per-request opt-in via `backend=hybrid` on `POST /insights/generate/agentic`.
|
|
- Local Ollama still describes the image (vision); the description is inlined
|
|
into the chat prompt and the agentic loop runs on OpenRouter.
|
|
- `request.model` (if provided) overrides `OPENROUTER_DEFAULT_MODEL` for that
|
|
call. The mobile picker reads from `OPENROUTER_ALLOWED_MODELS`.
|
|
- No live capability precheck — the operator-curated allowlist is trusted.
|
|
A bad model id surfaces as a chat-call error.
|
|
- `GET /insights/openrouter/models` returns `{ models, default_model, configured }`
|
|
for client picker UIs.
|
|
|
|
**Insight Chat Continuation:**
|
|
|
|
After an agentic insight is generated, the full `Vec<ChatMessage>` transcript is
|
|
stored in `photo_insights.training_messages` and can be continued via the
|
|
chat endpoints. The `PhotoInsightResponse.has_training_messages` flag tells
|
|
clients whether chat is available for a given insight.
|
|
|
|
- `POST /insights/chat` runs one turn of the agentic loop against the replayed
|
|
history. Body: `{ file_path, library?, user_message, model?, backend?, num_ctx?,
|
|
temperature?, top_p?, top_k?, min_p?, max_iterations?, amend? }`.
|
|
- `POST /insights/chat/stream` is the SSE variant — same request body, response
|
|
is `text/event-stream` with events: `iteration_start`, `text` (delta), `tool_call`,
|
|
`tool_result`, `truncated`, `done`, plus a server-emitted `error_message` on
|
|
failure. Preferred by the mobile client for live tool-chip updates.
|
|
- `GET /insights/chat/history?path=...&library=...` returns the rendered
|
|
transcript. Each assistant message carries a `tools: [{name, arguments, result,
|
|
result_truncated?}]` array with the tool invocations that led up to it. Tool
|
|
results over 2000 chars are truncated with `result_truncated: true`.
|
|
- `POST /insights/chat/rewind` truncates the transcript at a given rendered
|
|
index (drops that message + any tool-call scaffolding that preceded it + all
|
|
later turns). Index 0 is protected. Used for "try again from here" flows.
|
|
|
|
Backend routing rules (matches agentic-insight generation):
|
|
- Stored `backend` on the insight row is authoritative by default.
|
|
- `request.backend` may override per-turn. `local -> hybrid` is rejected in
|
|
v1 (would require on-the-fly visual-description rewrite); `hybrid -> local`
|
|
replays verbatim since the description is already inlined as text.
|
|
- `request.model` overrides the chat model (an Ollama id in local mode, an
|
|
OpenRouter id in hybrid mode).
|
|
|
|
Persistence:
|
|
- Append mode (default): re-serialize the full history and `UPDATE` the same
|
|
row's `training_messages`.
|
|
- Amend mode (`amend: true`): regenerate the title, insert a new insight row
|
|
via `store_insight` (auto-flips prior rows' `is_current=false`). Response
|
|
surfaces the new row's id as `amended_insight_id`.
|
|
|
|
Per-`(library_id, file_path)` async mutex (`AppState.insight_chat.chat_locks`)
|
|
serialises concurrent turns on the same insight so the JSON blob doesn't race.
|
|
|
|
Context management is a soft bound: if the serialized history exceeds
|
|
`num_ctx - 2048` tokens (cheap 4-byte/token heuristic), the oldest
|
|
assistant-tool_call + tool_result pairs are dropped until under budget. The
|
|
initial user message (with any images) and system prompt are always preserved.
|
|
The `truncated` event / flag is surfaced to the client when a drop occurred.
|
|
|
|
Configurable env:
|
|
- `AGENTIC_CHAT_MAX_ITERATIONS` — cap on tool-calling iterations per turn
|
|
(default 6). Per-request `max_iterations` is clamped to this cap.
|
|
|
|
**Apollo Places integration (optional):**
|
|
|
|
The sibling Apollo project (personal location-history viewer) owns
|
|
user-defined Places: `name + lat/lon + radius_m + description (+ optional
|
|
category)`. When `APOLLO_API_BASE_URL` is set, ImageApi queries
|
|
`/api/places/contains?lat=&lon=` to enrich the LLM prompt's location
|
|
string. See `src/ai/apollo_client.rs` and `src/ai/insight_generator.rs`:
|
|
|
|
- **Auto-enrichment** (always on when configured): the per-photo location
|
|
resolver folds the most-specific containing Place ("Home — near
|
|
Cambridge, MA" or "Home (My house in Cambridge) — near Cambridge, MA"
|
|
when a description is set) into the location field of `combine_contexts`.
|
|
Smallest-radius wins — Apollo sorts server-side, this code takes `[0]`.
|
|
- **Agentic tool** `get_personal_place_at(latitude, longitude)`: registered
|
|
alongside `reverse_geocode` only when `apollo_enabled()` returns true.
|
|
Returns "- Name [category]: description (radius N m)" lines, smallest
|
|
radius first. The tool is **deliberately narrow** — no enumerate-all
|
|
variant; auto-enrichment covers the photo-context path and the agentic
|
|
tool covers ad-hoc lat/lon questions in chat continuation.
|
|
|
|
Failure modes degrade silently to the legacy Nominatim path: 5 s timeout,
|
|
errors logged at `warn`, empty results returned. Apollo's routes are
|
|
unauthenticated (single-user, LAN-trust); add JWT auth here + on Apollo's
|
|
side if exposing beyond a trusted network.
|
|
|
|
## Dependencies of Note
|
|
|
|
### Rust crates
|
|
|
|
- **actix-web**: HTTP framework
|
|
- **diesel**: ORM for SQLite
|
|
- **jsonwebtoken**: JWT implementation
|
|
- **kamadak-exif**: EXIF parsing
|
|
- **image**: Thumbnail generation
|
|
- **walkdir**: Directory traversal
|
|
- **rayon**: Parallel processing
|
|
- **opentelemetry**: Distributed tracing
|
|
- **bcrypt**: Password hashing
|
|
- **infer**: Magic number file type detection
|
|
|
|
### External binaries (must be on `PATH`)
|
|
|
|
- **`ffmpeg`** — video thumbnail extraction (`StreamActor`, HLS pipeline) and
|
|
the HEIF/HEIC/NEF/ARW thumbnail fallback in `generate_image_thumbnail_ffmpeg`.
|
|
Required for any deploy that holds video or HEIF files.
|
|
- **`exiftool`** — optional but strongly recommended for RAW-heavy libraries.
|
|
The thumbnail pipeline shells out to it as the slow-path fallback for
|
|
embedded preview extraction (Nikon MakerNote `PreviewIFD`, Canon SubIFDs,
|
|
etc. — anything kamadak-exif's IFD0/IFD1 readers can't reach). Without
|
|
exiftool installed, RAWs whose preview lives outside IFD0/IFD1 will fall
|
|
through to ffmpeg, which often produces black thumbnails. Install via
|
|
package manager: `apt install libimage-exiftool-perl`,
|
|
`brew install exiftool`, `winget install OliverBetz.ExifTool`, or
|
|
`choco install exiftool`.
|