The maintenance pipeline added in Branch C assumes (library_id,
rel_path) bytes are stable for as long as the file lives at that
path. In-place edits (crop, re-export to same name) bypass
process_new_files's already-indexed check, so the row's
content_hash stays pinned to the original bytes — tags / faces /
insights remain attached to that hash silently.
Document the gap and the proposed shape of the fix:
- Stale-content detection pass: compare last_modified / size_bytes
to fs::metadata, re-hash on mismatch, update image_exif.
- "Content branched" semantics on hash change: faces re-run, tags
migrate forward (user intent survives a crop), insights migrate
+ flag for re-generation, favorites follow path.
- Apollo derived.db cache invalidation belongs in the same design
cycle, not after.
Captured here so the design intent is clear before someone hits the
case in real life. No code change.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
669 lines
36 KiB
Markdown
669 lines
36 KiB
Markdown
# CLAUDE.md
|
|
|
|
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
|
|
|
## Project Overview
|
|
|
|
An Actix-web REST API for serving images and videos from a filesystem with automatic thumbnail generation, EXIF extraction, tag organization, and a memories feature for browsing photos by date. Uses SQLite/Diesel ORM for data persistence and ffmpeg for video processing.
|
|
|
|
## Development Commands
|
|
|
|
### Building & Running
|
|
```bash
|
|
# Build for development
|
|
cargo build
|
|
|
|
# Build for release (uses thin LTO optimization)
|
|
cargo build --release
|
|
|
|
# Run the server (requires .env file with DATABASE_URL, BASE_PATH, THUMBNAILS, VIDEO_PATH, BIND_URL, SECRET_KEY)
|
|
cargo run
|
|
|
|
# Run with specific log level
|
|
RUST_LOG=debug cargo run
|
|
```
|
|
|
|
### Testing
|
|
```bash
|
|
# Run all tests (requires BASE_PATH in .env)
|
|
cargo test
|
|
|
|
# Run specific test
|
|
cargo test test_name
|
|
|
|
# Run tests with output
|
|
cargo test -- --nocapture
|
|
```
|
|
|
|
### Database Migrations
|
|
```bash
|
|
# Install diesel CLI (one-time setup)
|
|
cargo install diesel_cli --no-default-features --features sqlite
|
|
|
|
# Create new migration
|
|
diesel migration generate migration_name
|
|
|
|
# Run migrations (also runs automatically on app startup)
|
|
diesel migration run
|
|
|
|
# Revert last migration
|
|
diesel migration revert
|
|
|
|
# Regenerate schema.rs after manual migration changes
|
|
diesel print-schema > src/database/schema.rs
|
|
```
|
|
|
|
### Code Quality
|
|
```bash
|
|
# Format code
|
|
cargo fmt
|
|
|
|
# Run clippy linter
|
|
cargo clippy
|
|
|
|
# Fix automatically fixable issues
|
|
cargo fix
|
|
```
|
|
|
|
### Utility Binaries
|
|
```bash
|
|
# Two-phase cleanup: resolve missing files and validate file types
|
|
cargo run --bin cleanup_files -- --base-path /path/to/media --database-url ./database.db
|
|
```
|
|
|
|
## Architecture Overview
|
|
|
|
### Core Components
|
|
|
|
**Layered Architecture:**
|
|
- **HTTP Layer** (`main.rs`): Route handlers for images, videos, metadata, tags, favorites, memories
|
|
- **Auth Layer** (`auth.rs`): JWT token validation, Claims extraction via FromRequest trait
|
|
- **Service Layer** (`files.rs`, `exif.rs`, `memories.rs`): Business logic for file operations and EXIF extraction
|
|
- **DAO Layer** (`database/mod.rs`): Trait-based data access (ExifDao, UserDao, FavoriteDao, TagDao)
|
|
- **Database Layer**: Diesel ORM with SQLite, schema in `database/schema.rs`
|
|
|
|
**Async Actor System (Actix):**
|
|
- `StreamActor`: Manages ffmpeg video processing lifecycle
|
|
- `VideoPlaylistManager`: Scans directories and queues videos
|
|
- `PlaylistGenerator`: Creates HLS playlists for video streaming
|
|
|
|
### Database Schema & Patterns
|
|
|
|
**Tables:**
|
|
- `users`: Authentication (id, username, password_hash)
|
|
- `favorites`: User-specific favorites (userid, path)
|
|
- `tags`: Custom labels with timestamps
|
|
- `tagged_photo`: Many-to-many photo-tag relationships
|
|
- `image_exif`: Rich metadata (file_path + 16 EXIF fields: camera, GPS, dates, exposure settings)
|
|
|
|
**DAO Pattern:**
|
|
All database access goes through trait-based DAOs (e.g., `ExifDao`, `SqliteExifDao`). Connection pooling uses `Arc<Mutex<SqliteConnection>>`. All DB operations are traced with OpenTelemetry in release builds.
|
|
|
|
**Key DAO Methods:**
|
|
- `store_exif()`, `get_exif()`, `get_exif_batch()`: EXIF CRUD operations
|
|
- `query_by_exif()`: Complex filtering by camera, GPS bounds, date ranges
|
|
- Batch operations minimize DB hits during file watching
|
|
|
|
### Multi-library data model
|
|
|
|
ImageApi supports more than one library (a library = a `(name, root_path)`
|
|
row in the `libraries` table that maps to a mounted directory tree). The
|
|
same bytes may exist under more than one library — typical case is an
|
|
"active" library plus an "archive" library that ingests files as they age
|
|
out — and the data model is designed so that derived data follows the
|
|
**bytes**, not the path, while user-managed data does the same.
|
|
|
|
**The principle.** A photo's identity is its `content_hash` (blake3, see
|
|
`src/content_hash.rs`). Anything we compute from or attach to a photo is
|
|
keyed on that hash so it survives:
|
|
- the same file appearing in a second library (backup / archive / mirror),
|
|
- the file moving between libraries (recent → archive handoff),
|
|
- the file moving within a library (re-organized rel_path),
|
|
- intra-library duplicates (same bytes at two paths).
|
|
|
|
**Table classification.** Three categories drive the keying decision:
|
|
|
|
| Category | Key | Rationale | Tables |
|
|
|---|---|---|---|
|
|
| Intrinsic to bytes | `content_hash` | Rerunning is wasted work (or LLM cost) | `face_detections` ✓, `image_exif` (target), `photo_insights` (target), `video_preview_clips` (target) |
|
|
| User intent about a photo | `content_hash` | "Tag this photo" means the bytes, not a path | `tagged_photo` (target), `favorites` (target) |
|
|
| Library administrative | `(library_id, rel_path)` | Tied to a specific filesystem location | `libraries`, `entity_photo_links`, the `rel_path` back-ref columns on hash-keyed tables |
|
|
|
|
✓ = already implemented this way. *(target)* = today still keyed on
|
|
`(library_id, rel_path)` and slated for migration. The migration adds a
|
|
nullable `content_hash` column, populates it from `image_exif` where
|
|
known, and read paths fall back to rel_path while the hash is null.
|
|
|
|
**Carrying a `rel_path` even when hash-keyed.** Hash-keyed tables retain
|
|
`(library_id, rel_path)` columns as a denormalized **back-reference**, not
|
|
as the key. This lets a single query answer "what is at this path right
|
|
now" without joining through `image_exif`, and supports the path-only
|
|
endpoints that predate the hash. `face_detections` is the reference
|
|
implementation: hash is the truth, path is a hint.
|
|
|
|
**Merge semantics on read.** When the same hash has rows under more than
|
|
one library:
|
|
- Set-valued data (tags, favorites, faces, entity links) → **union**.
|
|
- Scalar data (current insight, EXIF row, video preview clip) → earliest
|
|
`generated_at` / `created_time` wins. The historical lib1 row beats a
|
|
re-generated lib2 row, so the user's curated insight isn't shadowed by
|
|
a re-run on archive ingest.
|
|
|
|
**Write attribution.** A new tag/favorite/insight created while viewing
|
|
under lib2 binds to the bytes, not to lib2 — so it shows up under lib1
|
|
too. This is by design, but it's the most surprising rule on first
|
|
encounter; clients should not assume tags are library-scoped.
|
|
|
|
**Hash-less rows (transitional state).** During and immediately after a
|
|
new mount, `image_exif.content_hash` is being populated by
|
|
`backfill_unhashed_backlog` (capped per tick). Rules during this window:
|
|
- Writes: if the hash is known, write hash-keyed. If not, write
|
|
`(library_id, rel_path)`-keyed and let the reconciliation job collapse
|
|
duplicates once the hash lands.
|
|
- Reads: prefer hash key, fall back to `(library_id, rel_path)`.
|
|
- Reconciliation: a one-shot pass after every backfill tick collapses
|
|
rows that now share a hash, applying the merge semantics above.
|
|
Idempotent — safe to re-run.
|
|
|
|
**Library handoff (recent → archive).** When a file moves between
|
|
libraries (e.g. operator moves `~/photos/2024/IMG.nef` to the archive
|
|
mount), the file watcher sees the disappearance under lib1 and the
|
|
appearance under lib2. Hash-keyed rows don't need migration; the
|
|
`(library_id, rel_path)` back-ref columns are updated to point to the new
|
|
location. Library administrative rows (`entity_photo_links`,
|
|
`(library_id, rel_path)` rows in `image_exif` for hash-less items) are
|
|
re-keyed by the move detector, which matches a disappearance to an
|
|
appearance by `content_hash` within a configurable window.
|
|
|
|
**Orphans (source deleted while a copy survives).** When the only
|
|
`image_exif` row for a hash is deleted (file removed from disk), the
|
|
hash-keyed derived rows survive **as long as another `image_exif` row
|
|
references the same hash**. If the last reference is gone, derived rows
|
|
are eligible for GC (deferred — the GC job runs on a slow schedule so
|
|
that a brief unmount or rename doesn't wipe history).
|
|
|
|
**Stats and counts.** When reporting "how many photos do you have," count
|
|
`DISTINCT content_hash` over `image_exif`, not row count. Faces stats
|
|
already does this (`FaceDao::stats` in `src/faces.rs`); other counters
|
|
should follow suit. Numerator and denominator must live in the same
|
|
domain — see the face-stats commentary below for the cautionary tale.
|
|
|
|
**Per-library scoping when the user asks for it.** A request scoped to
|
|
`?library=N` filters the `image_exif` view to that library, and the
|
|
hash-keyed derived data is joined through that view. The user sees only
|
|
photos that have a copy under lib N, but the derived data attached to
|
|
those photos is the merged hash-keyed view. This is the answer to "show
|
|
me archive photos with their original tags."
|
|
|
|
**Library availability and safety.** Libraries can be on network shares
|
|
or removable media; the file watcher must not interpret a temporary
|
|
unavailability as a mass-deletion event. Every tick begins with a
|
|
**presence probe** per library: the library is considered online iff
|
|
its `root_path` exists, is readable, and a top-level scan returns at
|
|
least one expected entry (or matches a recent file-count high-water
|
|
mark within a tolerance). The probe result gates which actions are safe
|
|
to run on that library this tick:
|
|
|
|
| Action | Requires online? |
|
|
|---|---|
|
|
| Quick / full scan ingest of new files | yes |
|
|
| EXIF / face / insight backlog drains | yes — but the work runs against any online library |
|
|
| Move-handoff detection (lib1 disappearance ↔ lib2 appearance match) | **both** libraries online |
|
|
| `(library_id, rel_path)` re-keying on detected move | **both** libraries online |
|
|
| Orphan GC of hash-keyed derived data | all libraries that have *ever* held the hash must be online and confirmed-clean for two consecutive ticks |
|
|
| Reads / serving | always allowed; falls back to whichever library is online |
|
|
|
|
A library that fails the probe enters a "stale" state: writes scoped to
|
|
it are paused, its rows are flagged stale (not deleted) in
|
|
`/libraries` status, and the watcher logs at `warn` once per
|
|
state-transition (not per tick). A library that recovers re-enters the
|
|
online set automatically; no operator action required for transient
|
|
outages. The intent is that pulling a USB drive, rebooting a NAS, or
|
|
losing a VPN never triggers a destructive code path — the worst case is
|
|
that derived-data work pauses until the share returns.
|
|
|
|
The same rule constrains the move-handoff matcher: a disappearance
|
|
under lib1 only counts as a "move" if there is a matching appearance
|
|
under another **online** library within the window. A bare
|
|
disappearance with no matching appearance is treated as
|
|
"unavailable-or-deleted, defer judgment" — it does not re-key any rows
|
|
and does not enqueue GC.
|
|
|
|
**Maintenance pipeline (`src/library_maintenance.rs`).** The watcher
|
|
runs three maintenance passes per tick that together implement the
|
|
move/handoff and orphan rules:
|
|
|
|
1. **Missing-file scan** — per online library, paginated. A page of
|
|
`image_exif` rows is loaded (`IMAGE_EXIF_MISSING_SCAN_PAGE_SIZE`,
|
|
default 500), each row's `(root_path/rel_path)` is `stat()`-ed,
|
|
and confirmed-not-found rows are deleted from `image_exif`
|
|
(capped at `IMAGE_EXIF_MISSING_DELETE_CAP_PER_TICK`, default 200).
|
|
Permission/IO errors are skipped, never deleted — only `NotFound`
|
|
triggers a deletion. The cursor wraps every time a partial page
|
|
comes back, so the whole library is swept across consecutive ticks.
|
|
Skipped wholesale for Stale libraries via the per-library probe
|
|
gate at the top of the loop iteration.
|
|
|
|
2. **Back-ref refresh** — DB-only. For `face_detections`,
|
|
`tagged_photo`, and `photo_insights`: any hash-keyed row whose
|
|
`(library_id, rel_path)` no longer matches an `image_exif` row
|
|
*but whose `content_hash` does* is repointed at the surviving
|
|
`image_exif` location. Idempotent SQL; no health gate needed.
|
|
This is what makes the recent → archive handoff invisible to
|
|
read paths: when the missing-file scan retires the lib-A row,
|
|
tags/faces/insights pivot to lib-B's path before any user
|
|
notices.
|
|
|
|
3. **Orphan GC** — destructive. Hash-keyed derived rows whose
|
|
`content_hash` no longer has any `image_exif` row are eligible.
|
|
Two-tick consensus: a hash must be observed orphaned on two
|
|
consecutive ticks AND every library must be online for both. A
|
|
single Stale tick within the window cancels all pending deletes.
|
|
The pending set is held in memory (`OrphanGcState`) — restart
|
|
resets it, which only delays a delete, never causes one. Tags,
|
|
faces, and insights for orphaned hashes are deleted in one batch
|
|
per tick.
|
|
|
|
A backup library that briefly disappears, then returns within two
|
|
ticks, never loses any derived data. A move from lib-A to lib-B
|
|
without disappearance flips through pass 1 (lib-A row retired) and
|
|
pass 2 (back-refs follow), with pass 3 noting nothing because the
|
|
hash is still present in `image_exif` (lib-B's row).
|
|
|
|
**Known gap: in-place content changes (future Branch D).** The
|
|
maintenance pipeline assumes a `(library_id, rel_path)`'s bytes are
|
|
stable for as long as the file exists at that path. If a user edits
|
|
a file in place (crop, re-export) without renaming, the watcher's
|
|
quick scan walks the file (mtime is recent) but `process_new_files`
|
|
short-circuits because `(library_id, rel_path)` already has an
|
|
`image_exif` row — no re-hash, no re-EXIF, no face redetection. The
|
|
row's `content_hash` keeps pointing at the original bytes. Tags /
|
|
faces / insights stay attached to the original hash and continue to
|
|
display because the rel_path back-ref still resolves; new faces
|
|
introduced by the edit are never detected.
|
|
|
|
The right place to fix this is a **stale-content detection pass**
|
|
that compares `image_exif.last_modified` / `size_bytes` to
|
|
`fs::metadata` for rows the quick scan would otherwise skip. On
|
|
mismatch, recompute the hash, update `image_exif`, and apply the
|
|
"content branched" semantics:
|
|
- **Faces** re-run (faces are fully derived from bytes).
|
|
- **Tags** migrate to the new hash (user intent — "this photo is
|
|
vacation" survives a crop). Insights migrate forward as a
|
|
starting point and are flagged for re-generation.
|
|
- **Favorites** (when migrated to hash-keyed) follow the path /
|
|
user intent.
|
|
|
|
The interesting case is the operator who keeps an unedited copy in
|
|
the archive library and edits the local copy: post-detection, the
|
|
archive copy stays on the original hash, the local copy branches to
|
|
the new hash, and the two histories cleanly split. Apollo's
|
|
`derived.db` cache will need an invalidation hook for the changed
|
|
hash — design it alongside Branch D.
|
|
|
|
### File Processing Pipeline
|
|
|
|
**Thumbnail Generation:**
|
|
1. Startup scan: Rayon parallel walk of BASE_PATH
|
|
2. Creates 200x200 thumbnails in THUMBNAILS directory (mirrors source structure)
|
|
3. Videos: extracts frame at 3-second mark via ffmpeg
|
|
4. Images: uses `image` crate for JPEG/PNG processing
|
|
5. RAW formats (NEF/CR2/ARW/DNG/etc.): the `image` crate can't decode RAW
|
|
pixel data, so the pipeline pulls an embedded JPEG preview instead. Fast
|
|
path is `exif::read_jpeg_at_ifd` against IFD0 (PRIMARY) and IFD1
|
|
(THUMBNAIL) — covers most older bodies and DNGs. Slow-path fallback shells
|
|
out to **`exiftool`** for `PreviewImage` / `JpgFromRaw` / `OtherImage`,
|
|
which reaches MakerNote / SubIFD-hosted previews kamadak-exif can't see
|
|
(e.g. Nikon's `PreviewIFD`, where modern Nikon bodies store the full-res
|
|
review JPEG). All candidates are pooled and the largest valid JPEG wins.
|
|
See `src/exif.rs::extract_embedded_jpeg_preview`.
|
|
|
|
**File Watching:**
|
|
Runs in background thread with two-tier strategy:
|
|
- **Quick scan** (default 60s): Recently modified files only
|
|
- **Full scan** (default 3600s): Comprehensive directory check
|
|
- Batch queries EXIF DB to detect new files
|
|
- Configurable via `WATCH_QUICK_INTERVAL_SECONDS` and `WATCH_FULL_INTERVAL_SECONDS`
|
|
|
|
**EXIF Extraction:**
|
|
- Uses `kamadak-exif` crate
|
|
- Supports: JPEG, TIFF, RAW (NEF, CR2, CR3), HEIF/HEIC, PNG, WebP
|
|
- Extracts: camera make/model, lens, dimensions, GPS coordinates, focal length, aperture, shutter speed, ISO, date taken
|
|
- Triggered on upload and during file watching
|
|
|
|
**File Upload Behavior:**
|
|
If file exists, appends timestamp to filename (`photo_1735124234.jpg`) to preserve history without overwrites.
|
|
|
|
### Authentication Flow
|
|
|
|
**Login:**
|
|
1. POST `/login` with username/password
|
|
2. Verify with `bcrypt::verify()` against password_hash
|
|
3. Generate JWT with claims: `{ sub: user_id, exp: 5_days_from_now }`
|
|
4. Sign with HS256 using `SECRET_KEY` environment variable
|
|
|
|
**Authorization:**
|
|
All protected endpoints extract `Claims` via `FromRequest` trait implementation. Token passed as `Authorization: Bearer <token>` header.
|
|
|
|
### API Structure
|
|
|
|
**Key Endpoint Patterns:**
|
|
|
|
```rust
|
|
// Image serving & upload
|
|
GET /image?path=...&size=...&format=...
|
|
POST /image (multipart file upload)
|
|
|
|
// Metadata & EXIF
|
|
GET /image/metadata?path=...
|
|
|
|
// Advanced search with filters
|
|
GET /photos?path=...&recursive=true&sort=DateTakenDesc&camera_make=Canon&gps_lat=...&gps_lon=...&gps_radius_km=10&date_from=...&date_to=...&tag_ids=1,2,3&media_type=Photo
|
|
|
|
// Video streaming (HLS)
|
|
POST /video/generate (creates .m3u8 playlist + .ts segments)
|
|
GET /video/stream?path=... (serves playlist)
|
|
|
|
// Tags
|
|
GET /image/tags/all
|
|
POST /image/tags (add tag to file)
|
|
DELETE /image/tags (remove tag from file)
|
|
POST /image/tags/batch (bulk tag updates)
|
|
|
|
// Memories (week-based grouping)
|
|
GET /memories?path=...&recursive=true
|
|
|
|
// AI Insights
|
|
POST /insights/generate (non-agentic single-shot)
|
|
POST /insights/generate/agentic (tool-calling loop; body: { file_path, backend?, model?, ... })
|
|
GET /insights?path=...&library=...
|
|
GET /insights/models (local Ollama models + capabilities)
|
|
GET /insights/openrouter/models (curated OpenRouter allowlist)
|
|
POST /insights/rate (thumbs up/down for training data)
|
|
|
|
// Insight Chat Continuation
|
|
POST /insights/chat (single-turn reply, non-streaming)
|
|
POST /insights/chat/stream (SSE: text / tool_call / tool_result / truncated / done)
|
|
GET /insights/chat/history?path=... (rendered transcript with tool invocations)
|
|
POST /insights/chat/rewind (truncate transcript at a rendered index)
|
|
```
|
|
|
|
**Request Types:**
|
|
- `FilesRequest`: Supports complex filtering (tags, EXIF fields, GPS radius, date ranges)
|
|
- `SortType`: Shuffle, NameAsc/Desc, TagCountAsc/Desc, DateTakenAsc/Desc
|
|
|
|
### Important Patterns
|
|
|
|
**Service Builder Pattern:**
|
|
Routes are registered via composable `ServiceBuilder` trait in `service.rs`. Allows modular feature addition.
|
|
|
|
**Path Validation:**
|
|
Always use `is_valid_full_path(&base_path, &requested_path, check_exists)` to prevent directory traversal attacks.
|
|
|
|
**File Type Detection:**
|
|
Centralized in `file_types.rs` with constants `IMAGE_EXTENSIONS` and `VIDEO_EXTENSIONS`. Provides both `Path` and `DirEntry` variants for performance.
|
|
|
|
**OpenTelemetry Tracing:**
|
|
All database operations and HTTP handlers wrapped in spans. In release builds, exports to OTLP endpoint via `OTLP_OTLS_ENDPOINT`. Debug builds use basic logger.
|
|
|
|
**Memory Exclusion:**
|
|
`PathExcluder` in `memories.rs` filters out directories from memories API via `EXCLUDED_DIRS` environment variable (comma-separated paths or substring patterns). The same excluder is applied to face-detection candidates (`face_watch::filter_excluded`) so junk directories like `@eaDir` / `.thumbnails` don't burn detect calls on Apollo.
|
|
|
|
### Face detection system
|
|
|
|
ImageApi owns the face data; Apollo (sibling repo) hosts the insightface inference service. Inference is triggered automatically by the file watcher and persisted into two tables:
|
|
|
|
- `persons(id, name UNIQUE COLLATE NOCASE, cover_face_id, entity_id, created_from_tag, notes, ...)` — operator-managed, name is the user-visible identity.
|
|
- `face_detections(id, library_id, content_hash, rel_path, bbox_*, embedding BLOB, confidence, source, person_id, status, model_version, ...)` — keyed on `content_hash` so a photo duplicated across libraries is detected once. Marker rows for `status IN ('no_faces','failed')` carry NULL bbox/embedding (CHECK constraint enforces this).
|
|
|
|
**Why content_hash and not (library_id, rel_path):** ties face data to the bytes, not the path. A backup mount that copies files from the primary library naturally inherits the existing detections without re-running inference. This is the reference implementation of the multi-library data model — see "Multi-library data model" above.
|
|
|
|
**File-watch hook** (`src/main.rs::process_new_files`): for each photo with a populated `content_hash`, check `FaceDao::already_scanned(hash)`; if not, send bytes (or embedded JPEG preview for RAW via `exif::extract_embedded_jpeg_preview`) to Apollo's `/api/internal/faces/detect`. K=`FACE_DETECT_CONCURRENCY` (default 8) parallel calls per scan tick; Apollo serializes them via its single-worker GPU pool. `face_watch.rs` is the Tokio orchestration layer.
|
|
|
|
**Per-tick backlog drain** (also `src/main.rs`): two passes that run on every watcher tick regardless of quick-vs-full scan:
|
|
- `backfill_unhashed_backlog` — populates `image_exif.content_hash` for photos that arrived before the hash field was retroactive. Capped by `FACE_HASH_BACKFILL_MAX_PER_TICK` (default 2000); errors don't burn the cap.
|
|
- `process_face_backlog` — runs detection on photos that have a hash but no `face_detections` row. Capped by `FACE_BACKLOG_MAX_PER_TICK` (default 64). Selected via a SQL anti-join (`FaceDao::list_unscanned_candidates`); videos and EXCLUDED_DIRS paths filtered out client-side via `face_watch::filter_excluded` so they never reach Apollo.
|
|
|
|
**Auto-bind on detection:** when a photo carries a tag whose name matches a `persons.name` (case-insensitive), the new face binds automatically iff cosine similarity to the person's existing-face mean is ≥ `FACE_AUTOBIND_MIN_COS` (default 0.4). Persons with no existing faces bind unconditionally and the new face becomes the cover.
|
|
|
|
**Manual face create** (`POST /image/faces`): crops the image to the user-supplied bbox, applies EXIF orientation via `exif::apply_orientation` (the `image` crate hands raw pre-rotation pixels — without this, manually-drawn bboxes never resolved a face on re-detection), pads to ~50% of bbox dims (RetinaFace anchor scales need ~50% face-fill at det_size=640), then calls Apollo's embed endpoint. A `force` flag lets the operator save a face the detector couldn't see (e.g. profile shots, occluded faces) — the row gets a zero-vector embedding so it's manually-bound only and won't participate in clustering.
|
|
|
|
**Rerun preserves manual rows** (`POST /image/faces/{id}/rerun`): only `source='auto'` rows are deleted before re-running detection. `already_scanned` returns true on ANY row, so a photo whose only faces are manually drawn never auto-redetects.
|
|
|
|
**Stats domain — content_hash, not file rows** (`FaceDao::stats` in `src/faces.rs`): `total_photos` counts `DISTINCT content_hash` over `image_exif` (filtered to image extensions, `content_hash IS NOT NULL`), and so do `scanned` / `with_faces` / `no_faces` / `failed` over `face_detections`. Numerator and denominator must live in the same domain — `face_detections` is keyed on content_hash, so the same JPEG present at two rel_paths or in two libraries scans once. Counting `image_exif` rows in the denominator inflated total by one per duplicate file and produced a permanent gap (e.g. 1101/1103 with nothing actually pending). Hash-less rows are excluded from total_photos while they sit in the `backfill_unhashed_backlog` queue; otherwise the bar pins below 100% for the duration of that backfill even though those rows aren't pending detection yet — they're pending hashing.
|
|
|
|
Module map:
|
|
- `src/faces.rs` — `FaceDao` trait + `SqliteFaceDao` impl, route handlers for `/faces/*`, `/image/faces/*`, `/persons/*`. Mirror of `tags.rs` layout.
|
|
- `src/face_watch.rs` — Tokio orchestration for the file-watch detect pass; `filter_excluded` (PathExcluder + image-extension filter), `read_image_bytes_for_detect` (RAW preview fallback).
|
|
- `src/ai/face_client.rs` — HTTP client for Apollo's inference. Configured by `APOLLO_FACE_API_BASE_URL`, falls back to `APOLLO_API_BASE_URL`. Both unset → feature disabled, file-watch hook is a no-op.
|
|
- `migrations/2026-04-29-000000_add_faces/` — schema.
|
|
|
|
### Startup Sequence
|
|
|
|
1. Load `.env` file
|
|
2. Run embedded Diesel migrations
|
|
3. Spawn file watcher thread
|
|
4. Create initial thumbnails (parallel scan)
|
|
5. Generate video GIF thumbnails
|
|
6. Initialize AppState with Actix actors
|
|
7. Set up Prometheus metrics (`imageserver_image_total`, `imageserver_video_total`)
|
|
8. Scan directory for videos and queue HLS processing
|
|
9. Start HTTP server on `BIND_URL` + localhost:8088
|
|
|
|
## Testing Patterns
|
|
|
|
Tests require `BASE_PATH` environment variable. Many integration tests create temporary directories and files.
|
|
|
|
When testing database code:
|
|
- Use in-memory SQLite: `DATABASE_URL=":memory:"`
|
|
- Run migrations in test setup
|
|
- Clean up with `DROP TABLE` or use `#[serial]` from `serial_test` crate if parallel tests conflict
|
|
|
|
## Common Gotchas
|
|
|
|
**EXIF Date Parsing:**
|
|
Multiple formats supported (EXIF DateTime, ISO8601, Unix timestamp). Fallback chain attempts multiple parsers.
|
|
|
|
**Video Processing:**
|
|
ffmpeg processes run asynchronously via actors. Use `StreamActor` to track completion. HLS segments written to `VIDEO_PATH`.
|
|
|
|
**File Extensions:**
|
|
Extension detection is case-insensitive. Use `file_types.rs` helpers rather than manual string matching.
|
|
|
|
**Migration Workflow:**
|
|
After creating a migration, manually edit the SQL, then regenerate `schema.rs` with `diesel print-schema`. Migrations auto-run on startup via `embedded_migrations!()` macro.
|
|
|
|
**Path Absolutization:**
|
|
Use `path-absolutize` crate's `.absolutize()` method when converting user-provided paths to ensure they're within `BASE_PATH`.
|
|
|
|
## Required Environment Variables
|
|
|
|
```bash
|
|
DATABASE_URL=./database.db # SQLite database path
|
|
BASE_PATH=/path/to/media # Root media directory
|
|
THUMBNAILS=/path/to/thumbnails # Thumbnail storage
|
|
VIDEO_PATH=/path/to/video/hls # HLS playlist output
|
|
GIFS_DIRECTORY=/path/to/gifs # Video GIF thumbnails
|
|
BIND_URL=0.0.0.0:8080 # Server binding
|
|
CORS_ALLOWED_ORIGINS=http://localhost:3000
|
|
SECRET_KEY=your-secret-key-here # JWT signing secret
|
|
RUST_LOG=info # Log level
|
|
EXCLUDED_DIRS=/private,/archive # Comma-separated paths to exclude from memories
|
|
```
|
|
|
|
Optional:
|
|
```bash
|
|
WATCH_QUICK_INTERVAL_SECONDS=60 # Quick scan interval
|
|
WATCH_FULL_INTERVAL_SECONDS=3600 # Full scan interval
|
|
OTLP_OTLS_ENDPOINT=http://... # OpenTelemetry collector (release builds)
|
|
|
|
# AI Insights Configuration
|
|
OLLAMA_PRIMARY_URL=http://desktop:11434 # Primary Ollama server (e.g., desktop)
|
|
OLLAMA_FALLBACK_URL=http://server:11434 # Fallback Ollama server (optional, always-on)
|
|
OLLAMA_PRIMARY_MODEL=nemotron-3-nano:30b # Model for primary server (default: nemotron-3-nano:30b)
|
|
OLLAMA_FALLBACK_MODEL=llama3.2:3b # Model for fallback server (optional, uses primary if not set)
|
|
OLLAMA_REQUEST_TIMEOUT_SECONDS=120 # Per-request generation timeout (default 120). Increase for slow CPU-offloaded models.
|
|
SMS_API_URL=http://localhost:8000 # SMS message API endpoint (default: localhost:8000)
|
|
SMS_API_TOKEN=your-api-token # SMS API authentication token (optional)
|
|
|
|
# Apollo Places integration (optional). When set, photo-insight enrichment
|
|
# folds the user's personal place name (Home, Work, Cabin, ...) into the
|
|
# location string fed to the LLM, and the agentic loop gains a
|
|
# `get_personal_place_at` tool. Unset = legacy Nominatim-only path.
|
|
APOLLO_API_BASE_URL=http://apollo.lan:8000 # Base URL of the sibling Apollo backend
|
|
|
|
# Face inference (optional). Apollo also hosts the insightface inference
|
|
# service; ImageApi calls it from the file-watch hook (Phase 3) and from
|
|
# the manual face-create endpoint. Falls back to APOLLO_API_BASE_URL when
|
|
# unset (typical single-Apollo deploy). Both unset = feature disabled.
|
|
APOLLO_FACE_API_BASE_URL=http://apollo.lan:8000 # Override if face service runs separately
|
|
FACE_AUTOBIND_MIN_COS=0.4 # Phase 3: cosine-sim floor for tag-name auto-bind
|
|
FACE_DETECT_CONCURRENCY=8 # Phase 3: per-scan-tick parallel detect calls
|
|
FACE_DETECT_TIMEOUT_SEC=60 # reqwest client timeout (CPU inference can be slow)
|
|
|
|
# OpenRouter (Hybrid Backend) - keeps embeddings + vision local, routes chat to OpenRouter
|
|
OPENROUTER_API_KEY=sk-or-... # Required to enable hybrid backend
|
|
OPENROUTER_DEFAULT_MODEL=anthropic/claude-sonnet-4 # Used when client doesn't pick a model
|
|
OPENROUTER_ALLOWED_MODELS=openai/gpt-4o-mini,anthropic/claude-haiku-4-5,google/gemini-2.5-flash
|
|
# Curated allowlist exposed to clients via
|
|
# GET /insights/openrouter/models. Empty = no picker.
|
|
OPENROUTER_BASE_URL=https://openrouter.ai/api/v1 # Override base URL (optional)
|
|
OPENROUTER_EMBEDDING_MODEL=openai/text-embedding-3-small # Optional, embeddings stay local today
|
|
OPENROUTER_HTTP_REFERER=https://your-site.example # Optional attribution header
|
|
OPENROUTER_APP_TITLE=ImageApi # Optional attribution header
|
|
|
|
# Insight Chat Continuation
|
|
AGENTIC_CHAT_MAX_ITERATIONS=6 # Cap on tool-calling iterations per chat turn (default 6)
|
|
```
|
|
|
|
**AI Insights Fallback Behavior:**
|
|
- Primary server is tried first with its configured model (5-second connection timeout)
|
|
- On connection failure, automatically falls back to secondary server with its model (if configured)
|
|
- If `OLLAMA_FALLBACK_MODEL` not set, uses same model as primary server on fallback
|
|
- Total request timeout is 120 seconds to accommodate slow LLM inference
|
|
- Logs indicate which server and model was used (info level) and failover attempts (warn level)
|
|
- Backwards compatible: `OLLAMA_URL` and `OLLAMA_MODEL` still supported as fallbacks
|
|
|
|
**Model Discovery:**
|
|
The `OllamaClient` provides methods to query available models:
|
|
- `OllamaClient::list_models(url)` - Returns list of all models on a server
|
|
- `OllamaClient::is_model_available(url, model_name)` - Checks if a specific model exists
|
|
|
|
This allows runtime verification of model availability before generating insights.
|
|
|
|
**Hybrid Backend (OpenRouter):**
|
|
- Per-request opt-in via `backend=hybrid` on `POST /insights/generate/agentic`.
|
|
- Local Ollama still describes the image (vision); the description is inlined
|
|
into the chat prompt and the agentic loop runs on OpenRouter.
|
|
- `request.model` (if provided) overrides `OPENROUTER_DEFAULT_MODEL` for that
|
|
call. The mobile picker reads from `OPENROUTER_ALLOWED_MODELS`.
|
|
- No live capability precheck — the operator-curated allowlist is trusted.
|
|
A bad model id surfaces as a chat-call error.
|
|
- `GET /insights/openrouter/models` returns `{ models, default_model, configured }`
|
|
for client picker UIs.
|
|
|
|
**Insight Chat Continuation:**
|
|
|
|
After an agentic insight is generated, the full `Vec<ChatMessage>` transcript is
|
|
stored in `photo_insights.training_messages` and can be continued via the
|
|
chat endpoints. The `PhotoInsightResponse.has_training_messages` flag tells
|
|
clients whether chat is available for a given insight.
|
|
|
|
- `POST /insights/chat` runs one turn of the agentic loop against the replayed
|
|
history. Body: `{ file_path, library?, user_message, model?, backend?, num_ctx?,
|
|
temperature?, top_p?, top_k?, min_p?, max_iterations?, amend? }`.
|
|
- `POST /insights/chat/stream` is the SSE variant — same request body, response
|
|
is `text/event-stream` with events: `iteration_start`, `text` (delta), `tool_call`,
|
|
`tool_result`, `truncated`, `done`, plus a server-emitted `error_message` on
|
|
failure. Preferred by the mobile client for live tool-chip updates.
|
|
- `GET /insights/chat/history?path=...&library=...` returns the rendered
|
|
transcript. Each assistant message carries a `tools: [{name, arguments, result,
|
|
result_truncated?}]` array with the tool invocations that led up to it. Tool
|
|
results over 2000 chars are truncated with `result_truncated: true`.
|
|
- `POST /insights/chat/rewind` truncates the transcript at a given rendered
|
|
index (drops that message + any tool-call scaffolding that preceded it + all
|
|
later turns). Index 0 is protected. Used for "try again from here" flows.
|
|
|
|
Backend routing rules (matches agentic-insight generation):
|
|
- Stored `backend` on the insight row is authoritative by default.
|
|
- `request.backend` may override per-turn. `local -> hybrid` is rejected in
|
|
v1 (would require on-the-fly visual-description rewrite); `hybrid -> local`
|
|
replays verbatim since the description is already inlined as text.
|
|
- `request.model` overrides the chat model (an Ollama id in local mode, an
|
|
OpenRouter id in hybrid mode).
|
|
|
|
Persistence:
|
|
- Append mode (default): re-serialize the full history and `UPDATE` the same
|
|
row's `training_messages`.
|
|
- Amend mode (`amend: true`): regenerate the title, insert a new insight row
|
|
via `store_insight` (auto-flips prior rows' `is_current=false`). Response
|
|
surfaces the new row's id as `amended_insight_id`.
|
|
|
|
Per-`(library_id, file_path)` async mutex (`AppState.insight_chat.chat_locks`)
|
|
serialises concurrent turns on the same insight so the JSON blob doesn't race.
|
|
|
|
Context management is a soft bound: if the serialized history exceeds
|
|
`num_ctx - 2048` tokens (cheap 4-byte/token heuristic), the oldest
|
|
assistant-tool_call + tool_result pairs are dropped until under budget. The
|
|
initial user message (with any images) and system prompt are always preserved.
|
|
The `truncated` event / flag is surfaced to the client when a drop occurred.
|
|
|
|
Configurable env:
|
|
- `AGENTIC_CHAT_MAX_ITERATIONS` — cap on tool-calling iterations per turn
|
|
(default 6). Per-request `max_iterations` is clamped to this cap.
|
|
|
|
**Apollo Places integration (optional):**
|
|
|
|
The sibling Apollo project (personal location-history viewer) owns
|
|
user-defined Places: `name + lat/lon + radius_m + description (+ optional
|
|
category)`. When `APOLLO_API_BASE_URL` is set, ImageApi queries
|
|
`/api/places/contains?lat=&lon=` to enrich the LLM prompt's location
|
|
string. See `src/ai/apollo_client.rs` and `src/ai/insight_generator.rs`:
|
|
|
|
- **Auto-enrichment** (always on when configured): the per-photo location
|
|
resolver folds the most-specific containing Place ("Home — near
|
|
Cambridge, MA" or "Home (My house in Cambridge) — near Cambridge, MA"
|
|
when a description is set) into the location field of `combine_contexts`.
|
|
Smallest-radius wins — Apollo sorts server-side, this code takes `[0]`.
|
|
- **Agentic tool** `get_personal_place_at(latitude, longitude)`: registered
|
|
alongside `reverse_geocode` only when `apollo_enabled()` returns true.
|
|
Returns "- Name [category]: description (radius N m)" lines, smallest
|
|
radius first. The tool is **deliberately narrow** — no enumerate-all
|
|
variant; auto-enrichment covers the photo-context path and the agentic
|
|
tool covers ad-hoc lat/lon questions in chat continuation.
|
|
|
|
Failure modes degrade silently to the legacy Nominatim path: 5 s timeout,
|
|
errors logged at `warn`, empty results returned. Apollo's routes are
|
|
unauthenticated (single-user, LAN-trust); add JWT auth here + on Apollo's
|
|
side if exposing beyond a trusted network.
|
|
|
|
## Dependencies of Note
|
|
|
|
### Rust crates
|
|
|
|
- **actix-web**: HTTP framework
|
|
- **diesel**: ORM for SQLite
|
|
- **jsonwebtoken**: JWT implementation
|
|
- **kamadak-exif**: EXIF parsing
|
|
- **image**: Thumbnail generation
|
|
- **walkdir**: Directory traversal
|
|
- **rayon**: Parallel processing
|
|
- **opentelemetry**: Distributed tracing
|
|
- **bcrypt**: Password hashing
|
|
- **infer**: Magic number file type detection
|
|
|
|
### External binaries (must be on `PATH`)
|
|
|
|
- **`ffmpeg`** — video thumbnail extraction (`StreamActor`, HLS pipeline) and
|
|
the HEIF/HEIC/NEF/ARW thumbnail fallback in `generate_image_thumbnail_ffmpeg`.
|
|
Required for any deploy that holds video or HEIF files.
|
|
- **`exiftool`** — optional but strongly recommended for RAW-heavy libraries.
|
|
The thumbnail pipeline shells out to it as the slow-path fallback for
|
|
embedded preview extraction (Nikon MakerNote `PreviewIFD`, Canon SubIFDs,
|
|
etc. — anything kamadak-exif's IFD0/IFD1 readers can't reach). Without
|
|
exiftool installed, RAWs whose preview lives outside IFD0/IFD1 will fall
|
|
through to ffmpeg, which often produces black thumbnails. Install via
|
|
package manager: `apt install libimage-exiftool-perl`,
|
|
`brew install exiftool`, `winget install OliverBetz.ExifTool`, or
|
|
`choco install exiftool`.
|