Compare commits
1 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| a48744c7ad |
@@ -1,3 +0,0 @@
|
||||
[target.x86_64-unknown-linux-gnu]
|
||||
linker = "/usr/bin/gcc"
|
||||
rustflags = ["-C", "link-arg=-fuse-ld=mold"]
|
||||
@@ -2,12 +2,8 @@
|
||||
database/target
|
||||
*.db
|
||||
*.db.bak
|
||||
*.db-shm
|
||||
*.db-wal
|
||||
.env
|
||||
/tmp
|
||||
/docs
|
||||
/specs
|
||||
|
||||
# Default ignored files
|
||||
.idea/shelf/
|
||||
|
||||
@@ -76,10 +76,7 @@ cargo run --bin cleanup_files -- --base-path /path/to/media --database-url ./dat
|
||||
### Core Components
|
||||
|
||||
**Layered Architecture:**
|
||||
- **Startup wiring** (`main.rs`): only ~350 lines — env load, migrations, AppState, route registration, server bind. Background jobs are kicked off here but defined elsewhere.
|
||||
- **HTTP Layer** (`handlers/{image,video,favorites}.rs`, `files.rs`, `tags.rs`, `faces.rs`, `memories.rs`, `ai/handlers.rs`): the route handlers, grouped by domain.
|
||||
- **Background loops** (`watcher.rs`): the file-watcher tick (`watch_files`, `process_new_files`) and the orphaned-playlist cleanup (`cleanup_orphaned_playlists`). Per-tick drains are factored into `backfill.rs` (`backfill_unhashed_backlog`, `backfill_missing_date_taken`, `backfill_missing_content_hashes`, `process_face_backlog`, `build_face_candidates`).
|
||||
- **Thumbnails** (`thumbnails.rs`): generation pipeline + the `IMAGE_GAUGE` / `VIDEO_GAUGE` Prometheus metrics.
|
||||
- **HTTP Layer** (`main.rs`): Route handlers for images, videos, metadata, tags, favorites, memories
|
||||
- **Auth Layer** (`auth.rs`): JWT token validation, Claims extraction via FromRequest trait
|
||||
- **Service Layer** (`files.rs`, `exif.rs`, `memories.rs`): Business logic for file operations and EXIF extraction
|
||||
- **DAO Layer** (`database/mod.rs`): Trait-based data access (ExifDao, UserDao, FavoriteDao, TagDao)
|
||||
@@ -107,242 +104,6 @@ All database access goes through trait-based DAOs (e.g., `ExifDao`, `SqliteExifD
|
||||
- `query_by_exif()`: Complex filtering by camera, GPS bounds, date ranges
|
||||
- Batch operations minimize DB hits during file watching
|
||||
|
||||
### Multi-library data model
|
||||
|
||||
ImageApi supports more than one library (a library = a `(name, root_path)`
|
||||
row in the `libraries` table that maps to a mounted directory tree). The
|
||||
same bytes may exist under more than one library — typical case is an
|
||||
"active" library plus an "archive" library that ingests files as they age
|
||||
out — and the data model is designed so that derived data follows the
|
||||
**bytes**, not the path, while user-managed data does the same.
|
||||
|
||||
**The principle.** A photo's identity is its `content_hash` (blake3, see
|
||||
`src/content_hash.rs`). Anything we compute from or attach to a photo is
|
||||
keyed on that hash so it survives:
|
||||
- the same file appearing in a second library (backup / archive / mirror),
|
||||
- the file moving between libraries (recent → archive handoff),
|
||||
- the file moving within a library (re-organized rel_path),
|
||||
- intra-library duplicates (same bytes at two paths).
|
||||
|
||||
**Table classification.** Three categories drive the keying decision:
|
||||
|
||||
| Category | Key | Rationale | Tables |
|
||||
|---|---|---|---|
|
||||
| Intrinsic to bytes | `content_hash` | Rerunning is wasted work (or LLM cost) | `face_detections` ✓, `image_exif` (target), `photo_insights` (target), `video_preview_clips` (target) |
|
||||
| User intent about a photo | `content_hash` | "Tag this photo" means the bytes, not a path | `tagged_photo` (target), `favorites` (target) |
|
||||
| Library administrative | `(library_id, rel_path)` | Tied to a specific filesystem location | `libraries`, `entity_photo_links`, the `rel_path` back-ref columns on hash-keyed tables |
|
||||
|
||||
✓ = already implemented this way. *(target)* = today still keyed on
|
||||
`(library_id, rel_path)` and slated for migration. The migration adds a
|
||||
nullable `content_hash` column, populates it from `image_exif` where
|
||||
known, and read paths fall back to rel_path while the hash is null.
|
||||
|
||||
**Carrying a `rel_path` even when hash-keyed.** Hash-keyed tables retain
|
||||
`(library_id, rel_path)` columns as a denormalized **back-reference**, not
|
||||
as the key. This lets a single query answer "what is at this path right
|
||||
now" without joining through `image_exif`, and supports the path-only
|
||||
endpoints that predate the hash. `face_detections` is the reference
|
||||
implementation: hash is the truth, path is a hint.
|
||||
|
||||
**Merge semantics on read.** When the same hash has rows under more than
|
||||
one library:
|
||||
- Set-valued data (tags, favorites, faces, entity links) → **union**.
|
||||
- Scalar data (current insight, EXIF row, video preview clip) → earliest
|
||||
`generated_at` / `created_time` wins. The historical lib1 row beats a
|
||||
re-generated lib2 row, so the user's curated insight isn't shadowed by
|
||||
a re-run on archive ingest.
|
||||
|
||||
**Write attribution.** A new tag/favorite/insight created while viewing
|
||||
under lib2 binds to the bytes, not to lib2 — so it shows up under lib1
|
||||
too. This is by design, but it's the most surprising rule on first
|
||||
encounter; clients should not assume tags are library-scoped.
|
||||
|
||||
**Hash-less rows (transitional state).** During and immediately after a
|
||||
new mount, `image_exif.content_hash` is being populated by
|
||||
`backfill_unhashed_backlog` (capped per tick). Rules during this window:
|
||||
- Writes: if the hash is known, write hash-keyed. If not, write
|
||||
`(library_id, rel_path)`-keyed and let the reconciliation job collapse
|
||||
duplicates once the hash lands.
|
||||
- Reads: prefer hash key, fall back to `(library_id, rel_path)`.
|
||||
- Reconciliation: a one-shot pass after every backfill tick collapses
|
||||
rows that now share a hash, applying the merge semantics above.
|
||||
Idempotent — safe to re-run.
|
||||
|
||||
**Library handoff (recent → archive).** When a file moves between
|
||||
libraries (e.g. operator moves `~/photos/2024/IMG.nef` to the archive
|
||||
mount), the file watcher sees the disappearance under lib1 and the
|
||||
appearance under lib2. Hash-keyed rows don't need migration; the
|
||||
`(library_id, rel_path)` back-ref columns are updated to point to the new
|
||||
location. Library administrative rows (`entity_photo_links`,
|
||||
`(library_id, rel_path)` rows in `image_exif` for hash-less items) are
|
||||
re-keyed by the move detector, which matches a disappearance to an
|
||||
appearance by `content_hash` within a configurable window.
|
||||
|
||||
**Orphans (source deleted while a copy survives).** When the only
|
||||
`image_exif` row for a hash is deleted (file removed from disk), the
|
||||
hash-keyed derived rows survive **as long as another `image_exif` row
|
||||
references the same hash**. If the last reference is gone, derived rows
|
||||
are eligible for GC (deferred — the GC job runs on a slow schedule so
|
||||
that a brief unmount or rename doesn't wipe history).
|
||||
|
||||
**Stats and counts.** When reporting "how many photos do you have," count
|
||||
`DISTINCT content_hash` over `image_exif`, not row count. Faces stats
|
||||
already does this (`FaceDao::stats` in `src/faces.rs`); other counters
|
||||
should follow suit. Numerator and denominator must live in the same
|
||||
domain — see the face-stats commentary below for the cautionary tale.
|
||||
|
||||
**Per-library scoping when the user asks for it.** A request scoped to
|
||||
`?library=N` filters the `image_exif` view to that library, and the
|
||||
hash-keyed derived data is joined through that view. The user sees only
|
||||
photos that have a copy under lib N, but the derived data attached to
|
||||
those photos is the merged hash-keyed view. This is the answer to "show
|
||||
me archive photos with their original tags."
|
||||
|
||||
**Operator kill switch (`libraries.enabled`).** Setting `enabled=0` on a
|
||||
library is a hard pause: the watcher skips it entirely — before the
|
||||
probe, before ingest, before any maintenance pass — and the orphan-GC
|
||||
all-online consensus check filters disabled libraries out (they don't
|
||||
keep the GC window closed). Reads / serving are unaffected; nothing
|
||||
prevents `/image?path=...` from resolving against a disabled library's
|
||||
root if the file is on disk. The existing `image_exif` rows for a
|
||||
disabled library are **not deleted** — they continue to anchor
|
||||
hash-keyed derived data, so cross-library duplicates survive the
|
||||
disable. Toggle via SQL; there is intentionally no HTTP endpoint for
|
||||
library mutation (single-user tool, no role / permission story).
|
||||
Typical workflows: stage a new mount with `enabled=0` then flip to `1`;
|
||||
quiet a flaky NAS during maintenance without disturbing the rest of
|
||||
the system.
|
||||
|
||||
**Per-library excludes (`libraries.excluded_dirs`).** A
|
||||
comma-separated column, same shape as the global `EXCLUDED_DIRS` env
|
||||
var, that's applied **in union** with the env-var globals when a
|
||||
walker scans this library. Use case: mount a parent directory as a
|
||||
new library while a sibling library covers a child subtree, and
|
||||
exclude that child subtree from the parent so the two libraries
|
||||
don't double-walk and double-write `image_exif`. Two entry forms
|
||||
(parsed by `memories::PathExcluder`):
|
||||
- `/sub/path` — leading slash flags it as a path under the library
|
||||
root. Joins to root + matches by `path.starts_with(...)`. Works
|
||||
at any depth (`/photos`, `/media/2024/raw`).
|
||||
- `name` — no leading slash flags it as a component name to skip
|
||||
anywhere in the tree (`@eaDir`, `.thumbnails`). Single segment
|
||||
only — `media/photos/a` without a leading slash never matches
|
||||
anything. Hash-keyed derived
|
||||
data (faces, tags, insights) is unaffected either way — those
|
||||
follow the bytes — but `image_exif` row count, walker CPU, and
|
||||
thumbnail disk usage all drop to 1× instead of 2× for the overlap.
|
||||
Affects: file-watch ingest (`process_new_files`), thumbnail
|
||||
generation, media-count gauges, the orphaned-playlist cleanup walk,
|
||||
and the `/memories` endpoint. The face-detection backlog drain
|
||||
inherits via `face_watch::filter_excluded`. NULL = no extras (only
|
||||
the global env var applies).
|
||||
|
||||
**Library availability and safety.** Libraries can be on network shares
|
||||
or removable media; the file watcher must not interpret a temporary
|
||||
unavailability as a mass-deletion event. Every tick begins with a
|
||||
**presence probe** per library: the library is considered online iff
|
||||
its `root_path` exists, is readable, and a top-level scan returns at
|
||||
least one expected entry (or matches a recent file-count high-water
|
||||
mark within a tolerance). The probe result gates which actions are safe
|
||||
to run on that library this tick:
|
||||
|
||||
| Action | Requires online? |
|
||||
|---|---|
|
||||
| Quick / full scan ingest of new files | yes |
|
||||
| EXIF / face / insight backlog drains | yes — but the work runs against any online library |
|
||||
| Move-handoff detection (lib1 disappearance ↔ lib2 appearance match) | **both** libraries online |
|
||||
| `(library_id, rel_path)` re-keying on detected move | **both** libraries online |
|
||||
| Orphan GC of hash-keyed derived data | all libraries that have *ever* held the hash must be online and confirmed-clean for two consecutive ticks |
|
||||
| Reads / serving | always allowed; falls back to whichever library is online |
|
||||
|
||||
A library that fails the probe enters a "stale" state: writes scoped to
|
||||
it are paused, its rows are flagged stale (not deleted) in
|
||||
`/libraries` status, and the watcher logs at `warn` once per
|
||||
state-transition (not per tick). A library that recovers re-enters the
|
||||
online set automatically; no operator action required for transient
|
||||
outages. The intent is that pulling a USB drive, rebooting a NAS, or
|
||||
losing a VPN never triggers a destructive code path — the worst case is
|
||||
that derived-data work pauses until the share returns.
|
||||
|
||||
The same rule constrains the move-handoff matcher: a disappearance
|
||||
under lib1 only counts as a "move" if there is a matching appearance
|
||||
under another **online** library within the window. A bare
|
||||
disappearance with no matching appearance is treated as
|
||||
"unavailable-or-deleted, defer judgment" — it does not re-key any rows
|
||||
and does not enqueue GC.
|
||||
|
||||
**Maintenance pipeline (`src/library_maintenance.rs`).** The watcher
|
||||
runs three maintenance passes per tick that together implement the
|
||||
move/handoff and orphan rules:
|
||||
|
||||
1. **Missing-file scan** — per online library, paginated. A page of
|
||||
`image_exif` rows is loaded (`IMAGE_EXIF_MISSING_SCAN_PAGE_SIZE`,
|
||||
default 500), each row's `(root_path/rel_path)` is `stat()`-ed,
|
||||
and confirmed-not-found rows are deleted from `image_exif`
|
||||
(capped at `IMAGE_EXIF_MISSING_DELETE_CAP_PER_TICK`, default 200).
|
||||
Permission/IO errors are skipped, never deleted — only `NotFound`
|
||||
triggers a deletion. The cursor wraps every time a partial page
|
||||
comes back, so the whole library is swept across consecutive ticks.
|
||||
Skipped wholesale for Stale libraries via the per-library probe
|
||||
gate at the top of the loop iteration.
|
||||
|
||||
2. **Back-ref refresh** — DB-only. For `face_detections`,
|
||||
`tagged_photo`, and `photo_insights`: any hash-keyed row whose
|
||||
`(library_id, rel_path)` no longer matches an `image_exif` row
|
||||
*but whose `content_hash` does* is repointed at the surviving
|
||||
`image_exif` location. Idempotent SQL; no health gate needed.
|
||||
This is what makes the recent → archive handoff invisible to
|
||||
read paths: when the missing-file scan retires the lib-A row,
|
||||
tags/faces/insights pivot to lib-B's path before any user
|
||||
notices.
|
||||
|
||||
3. **Orphan GC** — destructive. Hash-keyed derived rows whose
|
||||
`content_hash` no longer has any `image_exif` row are eligible.
|
||||
Two-tick consensus: a hash must be observed orphaned on two
|
||||
consecutive ticks AND every library must be online for both. A
|
||||
single Stale tick within the window cancels all pending deletes.
|
||||
The pending set is held in memory (`OrphanGcState`) — restart
|
||||
resets it, which only delays a delete, never causes one. Tags,
|
||||
faces, and insights for orphaned hashes are deleted in one batch
|
||||
per tick.
|
||||
|
||||
A backup library that briefly disappears, then returns within two
|
||||
ticks, never loses any derived data. A move from lib-A to lib-B
|
||||
without disappearance flips through pass 1 (lib-A row retired) and
|
||||
pass 2 (back-refs follow), with pass 3 noting nothing because the
|
||||
hash is still present in `image_exif` (lib-B's row).
|
||||
|
||||
**Known gap: in-place content changes (future Branch D).** The
|
||||
maintenance pipeline assumes a `(library_id, rel_path)`'s bytes are
|
||||
stable for as long as the file exists at that path. If a user edits
|
||||
a file in place (crop, re-export) without renaming, the watcher's
|
||||
quick scan walks the file (mtime is recent) but `process_new_files`
|
||||
short-circuits because `(library_id, rel_path)` already has an
|
||||
`image_exif` row — no re-hash, no re-EXIF, no face redetection. The
|
||||
row's `content_hash` keeps pointing at the original bytes. Tags /
|
||||
faces / insights stay attached to the original hash and continue to
|
||||
display because the rel_path back-ref still resolves; new faces
|
||||
introduced by the edit are never detected.
|
||||
|
||||
The right place to fix this is a **stale-content detection pass**
|
||||
that compares `image_exif.last_modified` / `size_bytes` to
|
||||
`fs::metadata` for rows the quick scan would otherwise skip. On
|
||||
mismatch, recompute the hash, update `image_exif`, and apply the
|
||||
"content branched" semantics:
|
||||
- **Faces** re-run (faces are fully derived from bytes).
|
||||
- **Tags** migrate to the new hash (user intent — "this photo is
|
||||
vacation" survives a crop). Insights migrate forward as a
|
||||
starting point and are flagged for re-generation.
|
||||
- **Favorites** (when migrated to hash-keyed) follow the path /
|
||||
user intent.
|
||||
|
||||
The interesting case is the operator who keeps an unedited copy in
|
||||
the archive library and edits the local copy: post-detection, the
|
||||
archive copy stays on the original hash, the local copy branches to
|
||||
the new hash, and the two histories cleanly split. Apollo's
|
||||
`derived.db` cache will need an invalidation hook for the changed
|
||||
hash — design it alongside Branch D.
|
||||
|
||||
### File Processing Pipeline
|
||||
|
||||
**Thumbnail Generation:**
|
||||
@@ -367,60 +128,6 @@ Runs in background thread with two-tier strategy:
|
||||
- Batch queries EXIF DB to detect new files
|
||||
- Configurable via `WATCH_QUICK_INTERVAL_SECONDS` and `WATCH_FULL_INTERVAL_SECONDS`
|
||||
|
||||
**Canonical date_taken pipeline (`src/date_resolver.rs`).** Every row's
|
||||
`image_exif.date_taken` is populated at ingest by a four-step waterfall;
|
||||
which step won is recorded in `image_exif.date_taken_source` so the
|
||||
per-tick drain can re-resolve weak entries when better tools become
|
||||
available, and so the UI/debug surface can answer "why did this photo
|
||||
land on this date?". Order:
|
||||
|
||||
1. **`exif`** — kamadak-exif `DateTime` / `DateTimeOriginal`. Fast,
|
||||
in-process, image-only.
|
||||
2. **`exiftool`** — shell-out fallback for tags kamadak can't reach:
|
||||
QuickTime/MP4 (`MediaCreateDate`, `TrackCreateDate`, `CreateDate`),
|
||||
Apple's `ContentCreateDate`, MakerNote sub-IFDs. Required for
|
||||
videos to land a real date. Single-file at ingest; the per-tick
|
||||
drain feeds the whole batch through one `exiftool -@ -` subprocess.
|
||||
Degrades silently when `exiftool` isn't on PATH (resolver caches the
|
||||
"available" check via `OnceLock`).
|
||||
3. **`filename`** — `extract_date_from_filename` in `memories.rs`
|
||||
matches screenshot, chat-export, and timestamp-named patterns.
|
||||
4. **`fs_time`** — `earliest_fs_time(metadata)` (earlier of created /
|
||||
modified). Last resort.
|
||||
|
||||
Notable behavior change vs. the pre-2026-05 request-time logic:
|
||||
**EXIF beats filename when both are present.** A photo named
|
||||
`Screenshot_2014-06-01.png` whose EXIF `DateTime` is 2021 now appears
|
||||
under 2021, not 2014 — on the theory that EXIF is more reliable than
|
||||
import-named filenames. The reverse case (no EXIF, filename has a
|
||||
date) is unchanged.
|
||||
|
||||
The `backfill_missing_date_taken` drain (`src/backfill.rs`) runs every
|
||||
watcher tick alongside `backfill_unhashed_backlog` (also `src/backfill.rs`). It loads up to
|
||||
`DATE_BACKFILL_MAX_PER_TICK` rows (default 500) where
|
||||
`date_taken IS NULL` (backed by the `idx_image_exif_date_backfill`
|
||||
partial index), runs the waterfall batch via `resolve_dates_batch`,
|
||||
and writes results via the `backfill_date_taken` DAO method (touches
|
||||
only `date_taken` + `date_taken_source` so EXIF / hash / perceptual
|
||||
columns are preserved). Resolved rows — including the ones the
|
||||
waterfall could only resolve via `fs_time` — are not re-eligible:
|
||||
the resolver is deterministic on file bytes + filename + fs metadata,
|
||||
so re-running on the same inputs lands on the same source every time.
|
||||
An earlier version included `date_taken_source = 'fs_time'` in the
|
||||
eligibility predicate, but with `ORDER BY id ASC LIMIT 500` it spun on
|
||||
the same lowest-id rows in perpetuity and held the SQLite write lock
|
||||
long enough to starve face-PATCH writers (5s busy_timeout → 500). If
|
||||
a stronger tool comes online (exiftool install, new filename regex),
|
||||
re-resolve out-of-band rather than re-introducing the steady-state
|
||||
eligibility.
|
||||
|
||||
`/memories` is a single SQL query against this column
|
||||
(`get_memories_in_window` in `src/database/mod.rs`), using
|
||||
`strftime('%m-%d' | '%W' | '%m', date_taken, 'unixepoch', tz)` for
|
||||
calendar matching with the client's timezone offset. The pre-rewrite
|
||||
version stat'd every row and walked the entire library tree — at
|
||||
~14k photos this took 10–15 s; the rewrite is single-digit ms.
|
||||
|
||||
**EXIF Extraction:**
|
||||
- Uses `kamadak-exif` crate
|
||||
- Supports: JPEG, TIFF, RAW (NEF, CR2, CR3), HEIF/HEIC, PNG, WebP
|
||||
@@ -512,11 +219,11 @@ ImageApi owns the face data; Apollo (sibling repo) hosts the insightface inferen
|
||||
- `persons(id, name UNIQUE COLLATE NOCASE, cover_face_id, entity_id, created_from_tag, notes, ...)` — operator-managed, name is the user-visible identity.
|
||||
- `face_detections(id, library_id, content_hash, rel_path, bbox_*, embedding BLOB, confidence, source, person_id, status, model_version, ...)` — keyed on `content_hash` so a photo duplicated across libraries is detected once. Marker rows for `status IN ('no_faces','failed')` carry NULL bbox/embedding (CHECK constraint enforces this).
|
||||
|
||||
**Why content_hash and not (library_id, rel_path):** ties face data to the bytes, not the path. A backup mount that copies files from the primary library naturally inherits the existing detections without re-running inference. This is the reference implementation of the multi-library data model — see "Multi-library data model" above.
|
||||
**Why content_hash and not (library_id, rel_path):** ties face data to the bytes, not the path. A backup mount that copies files from the primary library naturally inherits the existing detections without re-running inference.
|
||||
|
||||
**File-watch hook** (`src/watcher.rs::process_new_files`): for each photo with a populated `content_hash`, check `FaceDao::already_scanned(hash)`; if not, send bytes (or embedded JPEG preview for RAW via `exif::extract_embedded_jpeg_preview`) to Apollo's `/api/internal/faces/detect`. K=`FACE_DETECT_CONCURRENCY` (default 8) parallel calls per scan tick; Apollo serializes them via its single-worker GPU pool. `face_watch.rs` is the Tokio orchestration layer.
|
||||
**File-watch hook** (`src/main.rs::process_new_files`): for each photo with a populated `content_hash`, check `FaceDao::already_scanned(hash)`; if not, send bytes (or embedded JPEG preview for RAW via `exif::extract_embedded_jpeg_preview`) to Apollo's `/api/internal/faces/detect`. K=`FACE_DETECT_CONCURRENCY` (default 8) parallel calls per scan tick; Apollo serializes them via its single-worker GPU pool. `face_watch.rs` is the Tokio orchestration layer.
|
||||
|
||||
**Per-tick backlog drain** (`src/backfill.rs`): two passes that run on every watcher tick regardless of quick-vs-full scan:
|
||||
**Per-tick backlog drain** (also `src/main.rs`): two passes that run on every watcher tick regardless of quick-vs-full scan:
|
||||
- `backfill_unhashed_backlog` — populates `image_exif.content_hash` for photos that arrived before the hash field was retroactive. Capped by `FACE_HASH_BACKFILL_MAX_PER_TICK` (default 2000); errors don't burn the cap.
|
||||
- `process_face_backlog` — runs detection on photos that have a hash but no `face_detections` row. Capped by `FACE_BACKLOG_MAX_PER_TICK` (default 64). Selected via a SQL anti-join (`FaceDao::list_unscanned_candidates`); videos and EXCLUDED_DIRS paths filtered out client-side via `face_watch::filter_excluded` so they never reach Apollo.
|
||||
|
||||
@@ -526,13 +233,9 @@ ImageApi owns the face data; Apollo (sibling repo) hosts the insightface inferen
|
||||
|
||||
**Rerun preserves manual rows** (`POST /image/faces/{id}/rerun`): only `source='auto'` rows are deleted before re-running detection. `already_scanned` returns true on ANY row, so a photo whose only faces are manually drawn never auto-redetects.
|
||||
|
||||
**Stats domain — content_hash, not file rows** (`FaceDao::stats` in `src/faces.rs`): `total_photos` counts `DISTINCT content_hash` over `image_exif` (filtered to image extensions, `content_hash IS NOT NULL`), and so do `scanned` / `with_faces` / `no_faces` / `failed` over `face_detections`. Numerator and denominator must live in the same domain — `face_detections` is keyed on content_hash, so the same JPEG present at two rel_paths or in two libraries scans once. Counting `image_exif` rows in the denominator inflated total by one per duplicate file and produced a permanent gap (e.g. 1101/1103 with nothing actually pending). Hash-less rows are excluded from total_photos while they sit in the `backfill_unhashed_backlog` queue; otherwise the bar pins below 100% for the duration of that backfill even though those rows aren't pending detection yet — they're pending hashing.
|
||||
|
||||
Module map:
|
||||
- `src/faces.rs` — `FaceDao` trait + `SqliteFaceDao` impl, route handlers for `/faces/*`, `/image/faces/*`, `/persons/*`. Mirror of `tags.rs` layout.
|
||||
- `src/face_watch.rs` — Tokio orchestration for the file-watch detect pass; `filter_excluded` (PathExcluder + image-extension filter), `read_image_bytes_for_detect` (RAW preview fallback).
|
||||
- `src/backfill.rs` — per-tick drains (unhashed-hash, date_taken, face-backlog, etc.) called from `watcher::watch_files` and `watcher::process_new_files`.
|
||||
- `src/watcher.rs` — the watcher loop itself and `process_new_files` (file walk → EXIF write → face-candidate build).
|
||||
- `src/ai/face_client.rs` — HTTP client for Apollo's inference. Configured by `APOLLO_FACE_API_BASE_URL`, falls back to `APOLLO_API_BASE_URL`. Both unset → feature disabled, file-watch hook is a no-op.
|
||||
- `migrations/2026-04-29-000000_add_faces/` — schema.
|
||||
|
||||
@@ -593,7 +296,6 @@ Optional:
|
||||
```bash
|
||||
WATCH_QUICK_INTERVAL_SECONDS=60 # Quick scan interval
|
||||
WATCH_FULL_INTERVAL_SECONDS=3600 # Full scan interval
|
||||
DATE_BACKFILL_MAX_PER_TICK=500 # Cap on canonical-date drain per watcher tick
|
||||
OTLP_OTLS_ENDPOINT=http://... # OpenTelemetry collector (release builds)
|
||||
|
||||
# AI Insights Configuration
|
||||
@@ -670,12 +372,7 @@ clients whether chat is available for a given insight.
|
||||
|
||||
- `POST /insights/chat` runs one turn of the agentic loop against the replayed
|
||||
history. Body: `{ file_path, library?, user_message, model?, backend?, num_ctx?,
|
||||
temperature?, top_p?, top_k?, min_p?, max_iterations?, system_prompt?, amend? }`.
|
||||
`system_prompt` is a per-turn override: in append mode (default) it's applied
|
||||
ephemerally — the original system message is restored before persistence so
|
||||
the stored transcript keeps its baked persona. In amend mode the override
|
||||
stays in place and becomes the new insight row's system message. Mirrors the
|
||||
internal `annotate_system_with_budget` swap-and-restore pattern.
|
||||
temperature?, top_p?, top_k?, min_p?, max_iterations?, amend? }`.
|
||||
- `POST /insights/chat/stream` is the SSE variant — same request body, response
|
||||
is `text/event-stream` with events: `iteration_start`, `text` (delta), `tool_call`,
|
||||
`tool_result`, `truncated`, `done`, plus a server-emitted `error_message` on
|
||||
|
||||
Generated
+869
-1106
File diff suppressed because it is too large
Load Diff
+1
-11
@@ -9,9 +9,6 @@ edition = "2024"
|
||||
[profile.release]
|
||||
lto = "thin"
|
||||
|
||||
[profile.dev]
|
||||
debug = "line-tables-only"
|
||||
|
||||
[dependencies]
|
||||
actix = "0.13.1"
|
||||
actix-web = "4"
|
||||
@@ -26,7 +23,7 @@ jsonwebtoken = "9.3.0"
|
||||
serde = "1"
|
||||
serde_json = "1"
|
||||
diesel = { version = "2.2.10", features = ["sqlite"] }
|
||||
libsqlite3-sys = "0.35"
|
||||
libsqlite3-sys = { version = "0.35", features = ["bundled"] }
|
||||
diesel_migrations = "2.2.0"
|
||||
chrono = "0.4"
|
||||
clap = { version = "4.5", features = ["derive"] }
|
||||
@@ -62,12 +59,5 @@ ical = "0.11"
|
||||
scraper = "0.20"
|
||||
base64 = "0.22"
|
||||
blake3 = "1.5"
|
||||
image_hasher = "3.0"
|
||||
bk-tree = "0.5"
|
||||
async-trait = "0.1"
|
||||
indicatif = "0.17"
|
||||
|
||||
# Windows lacks system sqlite3, so re-enable the bundled C build there.
|
||||
# Linux/macOS use the system library (faster builds, smaller binary).
|
||||
[target.'cfg(windows)'.dependencies]
|
||||
libsqlite3-sys = { version = "0.35", features = ["bundled"] }
|
||||
|
||||
@@ -1 +0,0 @@
|
||||
DROP INDEX IF EXISTS idx_tags_name_nocase;
|
||||
@@ -1,28 +0,0 @@
|
||||
-- Tags only enforced uniqueness in application code (the add_tag handler
|
||||
-- looks up by name before inserting). The schema itself accepted dupes,
|
||||
-- so a divergent code path could land two tags with the same name. Now
|
||||
-- that we expose a rename endpoint we want a hard guarantee: case-
|
||||
-- insensitive UNIQUE on tags.name.
|
||||
|
||||
-- Pre-flight: collapse exact-name duplicates (case-insensitive) onto the
|
||||
-- lowest-id row before adding the constraint, otherwise the index
|
||||
-- creation fails on any DB that ever produced dupes. On a clean DB this
|
||||
-- is a no-op.
|
||||
UPDATE tagged_photo
|
||||
SET tag_id = (
|
||||
SELECT MIN(t2.id) FROM tags t2
|
||||
WHERE LOWER(t2.name) = LOWER((SELECT name FROM tags WHERE id = tagged_photo.tag_id))
|
||||
)
|
||||
WHERE tag_id IN (
|
||||
SELECT t.id FROM tags t
|
||||
WHERE t.id <> (
|
||||
SELECT MIN(t2.id) FROM tags t2 WHERE LOWER(t2.name) = LOWER(t.name)
|
||||
)
|
||||
);
|
||||
|
||||
DELETE FROM tags
|
||||
WHERE id <> (
|
||||
SELECT MIN(t2.id) FROM tags t2 WHERE LOWER(t2.name) = LOWER(tags.name)
|
||||
);
|
||||
|
||||
CREATE UNIQUE INDEX idx_tags_name_nocase ON tags (name COLLATE NOCASE);
|
||||
@@ -1,5 +0,0 @@
|
||||
DROP INDEX IF EXISTS idx_photo_insights_content_hash;
|
||||
ALTER TABLE photo_insights DROP COLUMN content_hash;
|
||||
|
||||
DROP INDEX IF EXISTS idx_tagged_photo_content_hash;
|
||||
ALTER TABLE tagged_photo DROP COLUMN content_hash;
|
||||
@@ -1,64 +0,0 @@
|
||||
-- Phase B of the multi-library data-model rollout: add a nullable
|
||||
-- `content_hash` column to derived/user-intent tables that should follow
|
||||
-- the bytes rather than the path. Reads will prefer hash-key joins and
|
||||
-- fall back to rel_path while the column is null. A separate
|
||||
-- reconciliation pass collapses duplicates as the column populates.
|
||||
--
|
||||
-- See CLAUDE.md → "Multi-library data model" for the policy. The
|
||||
-- reference implementation is `face_detections`, which has been
|
||||
-- hash-keyed since it was introduced.
|
||||
--
|
||||
-- Tables in this migration:
|
||||
-- * tagged_photo — user-intent (tags follow the bytes)
|
||||
-- * photo_insights — intrinsic to bytes (LLM-generated description)
|
||||
--
|
||||
-- favorites is the natural third candidate but its DAO is barely used in
|
||||
-- v1 and the row count is tiny; deferring lets this migration stay
|
||||
-- focused on the high-volume tables that drive cross-library overhead.
|
||||
|
||||
-- ---------------------------------------------------------------------------
|
||||
-- tagged_photo
|
||||
-- ---------------------------------------------------------------------------
|
||||
ALTER TABLE tagged_photo ADD COLUMN content_hash TEXT;
|
||||
|
||||
-- Backfill: for each tagged_photo row, find the content_hash for its
|
||||
-- rel_path. tagged_photo doesn't carry a library_id, so a rel_path that
|
||||
-- exists under multiple libraries with different content is genuinely
|
||||
-- ambiguous — we take the first matching image_exif row. The
|
||||
-- reconciliation pass at runtime cleans up any rows that resolve
|
||||
-- differently once a hash is known per library.
|
||||
UPDATE tagged_photo
|
||||
SET content_hash = (
|
||||
SELECT content_hash FROM image_exif
|
||||
WHERE image_exif.rel_path = tagged_photo.rel_path
|
||||
AND image_exif.content_hash IS NOT NULL
|
||||
LIMIT 1
|
||||
)
|
||||
WHERE content_hash IS NULL;
|
||||
|
||||
-- Hash-key index. Partial (only non-null rows) to keep the index small
|
||||
-- during the transitional window where most rows are still null.
|
||||
CREATE INDEX idx_tagged_photo_content_hash
|
||||
ON tagged_photo (content_hash)
|
||||
WHERE content_hash IS NOT NULL;
|
||||
|
||||
-- ---------------------------------------------------------------------------
|
||||
-- photo_insights
|
||||
-- ---------------------------------------------------------------------------
|
||||
ALTER TABLE photo_insights ADD COLUMN content_hash TEXT;
|
||||
|
||||
-- Backfill keyed on (library_id, rel_path) — photo_insights already
|
||||
-- carries library_id, so the resolution is unambiguous.
|
||||
UPDATE photo_insights
|
||||
SET content_hash = (
|
||||
SELECT content_hash FROM image_exif
|
||||
WHERE image_exif.library_id = photo_insights.library_id
|
||||
AND image_exif.rel_path = photo_insights.rel_path
|
||||
AND image_exif.content_hash IS NOT NULL
|
||||
LIMIT 1
|
||||
)
|
||||
WHERE content_hash IS NULL;
|
||||
|
||||
CREATE INDEX idx_photo_insights_content_hash
|
||||
ON photo_insights (content_hash)
|
||||
WHERE content_hash IS NOT NULL;
|
||||
@@ -1,2 +0,0 @@
|
||||
-- Requires SQLite 3.35+ for ALTER TABLE DROP COLUMN.
|
||||
ALTER TABLE libraries DROP COLUMN enabled;
|
||||
@@ -1,14 +0,0 @@
|
||||
-- Operator-controlled kill switch for a library. When `enabled = 0` the
|
||||
-- watcher tick skips that library entirely — before the availability
|
||||
-- probe, before ingest, before any maintenance pass — and the orphan-GC
|
||||
-- all-online check treats it as out-of-scope rather than as a blocker.
|
||||
--
|
||||
-- The intended workflow is staging a new mount: insert with enabled=0,
|
||||
-- verify the row appears in /libraries with enabled=false, then UPDATE
|
||||
-- to 1 to start ingest. Same toggle works as a maintenance kill switch
|
||||
-- after the fact ("don't keep probing this NAS while I'm rebooting it").
|
||||
--
|
||||
-- Default 1 so every existing library stays running on upgrade — no
|
||||
-- behavior change without an explicit flip.
|
||||
|
||||
ALTER TABLE libraries ADD COLUMN enabled BOOLEAN NOT NULL DEFAULT 1;
|
||||
@@ -1,2 +0,0 @@
|
||||
-- Requires SQLite 3.35+ for ALTER TABLE DROP COLUMN.
|
||||
ALTER TABLE libraries DROP COLUMN excluded_dirs;
|
||||
@@ -1,14 +0,0 @@
|
||||
-- Per-library excluded directories.
|
||||
--
|
||||
-- The global EXCLUDED_DIRS env var is the right knob for excludes that
|
||||
-- every library shares (Synology @eaDir, .thumbnails, etc.). It's a
|
||||
-- poor fit for "exclude this subtree from THIS library only", which
|
||||
-- the natural use case for is mounting a parent directory while
|
||||
-- another library already covers a child subtree underneath.
|
||||
--
|
||||
-- This column is parsed comma-separated, same shape as the env var,
|
||||
-- and the watcher / memories / thumbnail walks each apply
|
||||
-- (env_globals ∪ library.excluded_dirs) when scanning the library.
|
||||
-- NULL = no extra excludes; the global env var still applies.
|
||||
|
||||
ALTER TABLE libraries ADD COLUMN excluded_dirs TEXT;
|
||||
@@ -1,8 +0,0 @@
|
||||
DROP INDEX IF EXISTS idx_image_exif_duplicate_of_hash;
|
||||
DROP INDEX IF EXISTS idx_image_exif_dhash;
|
||||
DROP INDEX IF EXISTS idx_image_exif_phash;
|
||||
|
||||
ALTER TABLE image_exif DROP COLUMN duplicate_decided_at;
|
||||
ALTER TABLE image_exif DROP COLUMN duplicate_of_hash;
|
||||
ALTER TABLE image_exif DROP COLUMN dhash_64;
|
||||
ALTER TABLE image_exif DROP COLUMN phash_64;
|
||||
@@ -1,41 +0,0 @@
|
||||
-- Adds perceptual-hash signals + soft-mark resolution state to image_exif so
|
||||
-- the duplicates surface in Apollo can group near-duplicates (re-encoded,
|
||||
-- resized, format-converted copies) and let the user demote losers without
|
||||
-- touching the file on disk. Image-only for v1: phash_64/dhash_64 are NULL
|
||||
-- on videos and on images that fail to decode. See Apollo CLAUDE.md →
|
||||
-- Duplicate detection / Caching layer for the policy.
|
||||
--
|
||||
-- Soft-mark columns are media-type-agnostic — when video perceptual hashing
|
||||
-- arrives, it lives in a separate hash-keyed companion table and reuses the
|
||||
-- same duplicate_of_hash / duplicate_decided_at machinery.
|
||||
|
||||
-- pHash (DCT, 64-bit) packed as i64 for fast XOR + popcount Hamming.
|
||||
ALTER TABLE image_exif ADD COLUMN phash_64 BIGINT;
|
||||
|
||||
-- dHash (gradient, 64-bit). Cheap, robust to compression/resize. Stored
|
||||
-- alongside pHash so the query layer can fall back if either is null.
|
||||
ALTER TABLE image_exif ADD COLUMN dhash_64 BIGINT;
|
||||
|
||||
-- When non-null, this row is a soft-marked duplicate of the row whose
|
||||
-- content_hash matches. The duplicate file stays on disk; the default
|
||||
-- /photos listing filters it out. /photos?include_duplicates=true opts
|
||||
-- back in (the Apollo duplicates modal uses this).
|
||||
ALTER TABLE image_exif ADD COLUMN duplicate_of_hash TEXT;
|
||||
|
||||
-- Unix seconds of the resolve. Distinguishes "never reviewed" from
|
||||
-- "reviewed and resolved" for the Apollo include_resolved toggle.
|
||||
ALTER TABLE image_exif ADD COLUMN duplicate_decided_at BIGINT;
|
||||
|
||||
-- Partial indexes — the columns are NULL for the vast majority of rows
|
||||
-- during the transitional window and forever for videos / decode failures.
|
||||
CREATE INDEX idx_image_exif_phash
|
||||
ON image_exif (phash_64)
|
||||
WHERE phash_64 IS NOT NULL;
|
||||
|
||||
CREATE INDEX idx_image_exif_dhash
|
||||
ON image_exif (dhash_64)
|
||||
WHERE dhash_64 IS NOT NULL;
|
||||
|
||||
CREATE INDEX idx_image_exif_duplicate_of_hash
|
||||
ON image_exif (duplicate_of_hash)
|
||||
WHERE duplicate_of_hash IS NOT NULL;
|
||||
@@ -1,2 +0,0 @@
|
||||
DROP INDEX IF EXISTS idx_image_exif_date_backfill;
|
||||
ALTER TABLE image_exif DROP COLUMN date_taken_source;
|
||||
@@ -1,24 +0,0 @@
|
||||
-- Tracks where a row's `date_taken` was sourced so the canonical-date
|
||||
-- waterfall (kamadak-exif → exiftool → filename → earliest_fs_time) is
|
||||
-- visible to debugging and to the per-tick backfill drain that re-runs
|
||||
-- weak sources once stronger ones become available (e.g. exiftool gets
|
||||
-- installed on a deploy that didn't have it). See CLAUDE.md → Memories
|
||||
-- canonical-date pipeline.
|
||||
--
|
||||
-- Values:
|
||||
-- 'exif' — kamadak-exif read DateTime/DateTimeOriginal directly
|
||||
-- 'exiftool' — exiftool fallback caught a video / MakerNote / QuickTime tag
|
||||
-- 'filename' — extract_date_from_filename matched a known pattern
|
||||
-- 'fs_time' — fell through to earliest_fs_time(metadata)
|
||||
--
|
||||
-- NULL when `date_taken` itself is NULL (no source resolved the date).
|
||||
ALTER TABLE image_exif ADD COLUMN date_taken_source TEXT;
|
||||
|
||||
-- Partial index for the per-tick backfill drain: targets rows that need
|
||||
-- re-resolution (no date yet, or only the weakest source resolved it).
|
||||
-- Filename-sourced rows are intentionally excluded — the regex is
|
||||
-- authoritative when it matches and re-running exiftool wouldn't change
|
||||
-- the answer.
|
||||
CREATE INDEX idx_image_exif_date_backfill
|
||||
ON image_exif (library_id, id)
|
||||
WHERE date_taken IS NULL OR date_taken_source = 'fs_time';
|
||||
@@ -1,9 +0,0 @@
|
||||
-- Reverting this migration is a no-op: the labels we wrote in `up.sql`
|
||||
-- are correct under any state of the schema (every dated row was indeed
|
||||
-- exif-sourced before the resolver landed), and there's no signal that
|
||||
-- distinguishes "labelled by this migration" from "labelled by the
|
||||
-- ingest path post-resolver". Clearing them would break the drain's
|
||||
-- eligibility filter again.
|
||||
--
|
||||
-- The companion migration `2026-05-06-000000_add_date_taken_source` is
|
||||
-- the one to revert if you need to remove the column entirely.
|
||||
@@ -1,20 +0,0 @@
|
||||
-- Backfill `date_taken_source` for rows that pre-date the canonical-date
|
||||
-- pipeline. Before the resolver landed, `image_exif.date_taken` could
|
||||
-- only be populated via `exif::extract_exif_from_path` (kamadak-exif)
|
||||
-- on the file-watcher, upload, or GPS-write paths. The resolver column
|
||||
-- migration added `date_taken_source` defaulting to NULL, so every
|
||||
-- historical row with a date is currently unlabelled — and the
|
||||
-- per-tick drain skips them because its eligibility predicate is
|
||||
-- `date_taken IS NULL OR date_taken_source = 'fs_time'`.
|
||||
--
|
||||
-- Label them `'exif'` once and let the drain take over from here. Safe
|
||||
-- because every code path that wrote `date_taken` prior to the
|
||||
-- resolver was a kamadak-exif read — there was no other source.
|
||||
--
|
||||
-- Idempotent: re-running this migration on a DB that has already been
|
||||
-- backfilled is a no-op (the WHERE clause matches nothing the second
|
||||
-- time around).
|
||||
UPDATE image_exif
|
||||
SET date_taken_source = 'exif'
|
||||
WHERE date_taken IS NOT NULL
|
||||
AND date_taken_source IS NULL;
|
||||
@@ -1,2 +0,0 @@
|
||||
ALTER TABLE image_exif DROP COLUMN original_date_taken_source;
|
||||
ALTER TABLE image_exif DROP COLUMN original_date_taken;
|
||||
@@ -1,15 +0,0 @@
|
||||
-- Manual date_taken override: when an operator overrides a row's date via
|
||||
-- POST /image/exif/date, the prior `(date_taken, date_taken_source)` is
|
||||
-- snapshotted into these columns and the live columns hold the new value
|
||||
-- with `date_taken_source = 'manual'`. POST /image/exif/date/clear restores
|
||||
-- the pair and nulls the originals.
|
||||
--
|
||||
-- The waterfall source-name set is now:
|
||||
-- 'exif' | 'exiftool' | 'filename' | 'fs_time' | 'manual'
|
||||
--
|
||||
-- The `idx_image_exif_date_backfill` partial index already filters to
|
||||
-- `date_taken IS NULL OR date_taken_source = 'fs_time'`, so 'manual' rows
|
||||
-- are naturally excluded from the per-tick backfill drain — no index
|
||||
-- change needed.
|
||||
ALTER TABLE image_exif ADD COLUMN original_date_taken BIGINT;
|
||||
ALTER TABLE image_exif ADD COLUMN original_date_taken_source TEXT;
|
||||
@@ -1,43 +0,0 @@
|
||||
-- Drop the persona-scoping column on entity_facts via the table-rebuild
|
||||
-- dance for SQLite-version portability (matches the pattern in
|
||||
-- 2026-04-20-000000_add_backend_to_insights/down.sql).
|
||||
DROP INDEX IF EXISTS idx_entity_facts_persona;
|
||||
|
||||
CREATE TABLE entity_facts_backup AS
|
||||
SELECT id, subject_entity_id, predicate, object_entity_id, object_value,
|
||||
source_photo, source_insight_id, confidence, status, created_at
|
||||
FROM entity_facts;
|
||||
|
||||
DROP TABLE entity_facts;
|
||||
|
||||
CREATE TABLE entity_facts (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
|
||||
subject_entity_id INTEGER NOT NULL,
|
||||
predicate TEXT NOT NULL,
|
||||
object_entity_id INTEGER,
|
||||
object_value TEXT,
|
||||
source_photo TEXT,
|
||||
source_insight_id INTEGER,
|
||||
confidence REAL NOT NULL DEFAULT 0.6,
|
||||
status TEXT NOT NULL DEFAULT 'active',
|
||||
created_at BIGINT NOT NULL,
|
||||
CONSTRAINT fk_ef_subject FOREIGN KEY (subject_entity_id) REFERENCES entities(id) ON DELETE CASCADE,
|
||||
CONSTRAINT fk_ef_object FOREIGN KEY (object_entity_id) REFERENCES entities(id) ON DELETE SET NULL,
|
||||
CONSTRAINT fk_ef_insight FOREIGN KEY (source_insight_id) REFERENCES photo_insights(id) ON DELETE SET NULL,
|
||||
CHECK (object_entity_id IS NOT NULL OR object_value IS NOT NULL)
|
||||
);
|
||||
|
||||
INSERT INTO entity_facts
|
||||
SELECT id, subject_entity_id, predicate, object_entity_id, object_value,
|
||||
source_photo, source_insight_id, confidence, status, created_at
|
||||
FROM entity_facts_backup;
|
||||
|
||||
DROP TABLE entity_facts_backup;
|
||||
|
||||
CREATE INDEX idx_entity_facts_subject ON entity_facts(subject_entity_id);
|
||||
CREATE INDEX idx_entity_facts_predicate ON entity_facts(predicate);
|
||||
CREATE INDEX idx_entity_facts_status ON entity_facts(status);
|
||||
CREATE INDEX idx_entity_facts_source_photo ON entity_facts(source_photo);
|
||||
|
||||
DROP INDEX IF EXISTS idx_personas_user;
|
||||
DROP TABLE IF EXISTS personas;
|
||||
@@ -1,64 +0,0 @@
|
||||
-- Personas live server-side now (mobile previously stored them in
|
||||
-- AsyncStorage only). Each user gets the three built-ins seeded; custom
|
||||
-- personas land here too via POST /personas or POST /personas/migrate.
|
||||
--
|
||||
-- `entity_facts` gains a persona_id so each persona accumulates its own
|
||||
-- voice over a shared entity graph (entities themselves stay unscoped).
|
||||
-- Existing rows backfill to 'default' via the column DEFAULT — that
|
||||
-- becomes the historical baseline. The `include_all_memories` flag on
|
||||
-- personas lets any persona opt back into reading the full pool.
|
||||
|
||||
CREATE TABLE personas (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
|
||||
user_id INTEGER NOT NULL,
|
||||
persona_id TEXT NOT NULL,
|
||||
name TEXT NOT NULL,
|
||||
system_prompt TEXT NOT NULL,
|
||||
is_built_in BOOLEAN NOT NULL DEFAULT FALSE,
|
||||
include_all_memories BOOLEAN NOT NULL DEFAULT FALSE,
|
||||
created_at BIGINT NOT NULL,
|
||||
updated_at BIGINT NOT NULL,
|
||||
UNIQUE(user_id, persona_id),
|
||||
CONSTRAINT fk_personas_user FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE
|
||||
);
|
||||
|
||||
CREATE INDEX idx_personas_user ON personas(user_id);
|
||||
|
||||
-- Seed built-ins for every existing user. System prompts copied verbatim
|
||||
-- from FileViewer-React/hooks/usePersonas.tsx so server and client agree
|
||||
-- on the canonical voice for each built-in.
|
||||
INSERT INTO personas (user_id, persona_id, name, system_prompt, is_built_in, created_at, updated_at)
|
||||
SELECT
|
||||
u.id,
|
||||
'default',
|
||||
'Default Assistant',
|
||||
'You are my long-term memory assistant. Use only the information provided. Do not invent details. Respond in 3–6 sentences in third person, leading with the most concrete moment from the photo and the surrounding context. Plain prose, no headings.',
|
||||
TRUE,
|
||||
strftime('%s', 'now') * 1000,
|
||||
strftime('%s', 'now') * 1000
|
||||
FROM users u
|
||||
UNION ALL
|
||||
SELECT
|
||||
u.id,
|
||||
'journal',
|
||||
'Personal Journal',
|
||||
'You are a personal journal writer. Write in first person, present tense, with warmth and reflection — focusing on emotions and meaningful moments. Use only the information provided; do not invent details. Aim for 4–8 sentences in a single flowing paragraph, no headings.',
|
||||
TRUE,
|
||||
strftime('%s', 'now') * 1000,
|
||||
strftime('%s', 'now') * 1000
|
||||
FROM users u
|
||||
UNION ALL
|
||||
SELECT
|
||||
u.id,
|
||||
'factual',
|
||||
'Factual Reporter',
|
||||
'You are a factual memory recorder. Be precise, objective, and concise. Lead with the date and place, then list what / when / who in 2–4 short sentences. Use only the information provided; if a detail is unknown, say so rather than guessing.',
|
||||
TRUE,
|
||||
strftime('%s', 'now') * 1000,
|
||||
strftime('%s', 'now') * 1000
|
||||
FROM users u;
|
||||
|
||||
-- Persona scoping on facts only. Entities and entity_photo_links stay
|
||||
-- shared (real-world referents and shared photo ↔ entity associations).
|
||||
ALTER TABLE entity_facts ADD COLUMN persona_id TEXT NOT NULL DEFAULT 'default';
|
||||
CREATE INDEX idx_entity_facts_persona ON entity_facts(persona_id);
|
||||
@@ -1,47 +0,0 @@
|
||||
-- Reverse 2026-05-10-000000_entity_facts_persona_fk: drop the
|
||||
-- composite FK and the user_id column via the same rebuild pattern.
|
||||
|
||||
DROP INDEX IF EXISTS idx_entity_facts_user_persona;
|
||||
DROP INDEX IF EXISTS idx_entity_facts_persona;
|
||||
DROP INDEX IF EXISTS idx_entity_facts_source_photo;
|
||||
DROP INDEX IF EXISTS idx_entity_facts_status;
|
||||
DROP INDEX IF EXISTS idx_entity_facts_predicate;
|
||||
DROP INDEX IF EXISTS idx_entity_facts_subject;
|
||||
|
||||
ALTER TABLE entity_facts RENAME TO entity_facts_old;
|
||||
|
||||
CREATE TABLE entity_facts (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
|
||||
subject_entity_id INTEGER NOT NULL,
|
||||
predicate TEXT NOT NULL,
|
||||
object_entity_id INTEGER,
|
||||
object_value TEXT,
|
||||
source_photo TEXT,
|
||||
source_insight_id INTEGER,
|
||||
confidence REAL NOT NULL DEFAULT 0.6,
|
||||
status TEXT NOT NULL DEFAULT 'active',
|
||||
created_at BIGINT NOT NULL,
|
||||
persona_id TEXT NOT NULL DEFAULT 'default',
|
||||
CONSTRAINT fk_ef_subject FOREIGN KEY (subject_entity_id) REFERENCES entities(id) ON DELETE CASCADE,
|
||||
CONSTRAINT fk_ef_object FOREIGN KEY (object_entity_id) REFERENCES entities(id) ON DELETE SET NULL,
|
||||
CONSTRAINT fk_ef_insight FOREIGN KEY (source_insight_id) REFERENCES photo_insights(id) ON DELETE SET NULL,
|
||||
CHECK (object_entity_id IS NOT NULL OR object_value IS NOT NULL)
|
||||
);
|
||||
|
||||
INSERT INTO entity_facts
|
||||
(id, subject_entity_id, predicate, object_entity_id, object_value,
|
||||
source_photo, source_insight_id, confidence, status, created_at,
|
||||
persona_id)
|
||||
SELECT
|
||||
id, subject_entity_id, predicate, object_entity_id, object_value,
|
||||
source_photo, source_insight_id, confidence, status, created_at,
|
||||
persona_id
|
||||
FROM entity_facts_old;
|
||||
|
||||
DROP TABLE entity_facts_old;
|
||||
|
||||
CREATE INDEX idx_entity_facts_subject ON entity_facts(subject_entity_id);
|
||||
CREATE INDEX idx_entity_facts_predicate ON entity_facts(predicate);
|
||||
CREATE INDEX idx_entity_facts_status ON entity_facts(status);
|
||||
CREATE INDEX idx_entity_facts_source_photo ON entity_facts(source_photo);
|
||||
CREATE INDEX idx_entity_facts_persona ON entity_facts(persona_id);
|
||||
@@ -1,82 +0,0 @@
|
||||
-- Add a real foreign key from entity_facts to personas. Until now,
|
||||
-- entity_facts.persona_id was a free-form string with no integrity
|
||||
-- guarantee — deleting a persona orphaned its facts, which then sat
|
||||
-- forever in the readable-only-via-PersonaFilter::All hive-mind view.
|
||||
--
|
||||
-- personas is keyed (user_id, persona_id) so the FK has to be
|
||||
-- composite. That requires entity_facts to carry user_id too, which
|
||||
-- has the side benefit of fixing multi-user fact leakage on the read
|
||||
-- path (without it, two users with the same 'default' persona would
|
||||
-- see each other's default-scoped facts).
|
||||
--
|
||||
-- SQLite can't ALTER TABLE to add an FK; the table-rebuild dance is
|
||||
-- the only way. Pattern matches 2026-05-09's down.sql and the older
|
||||
-- 2026-04-20-000000 migration.
|
||||
|
||||
DROP INDEX IF EXISTS idx_entity_facts_subject;
|
||||
DROP INDEX IF EXISTS idx_entity_facts_predicate;
|
||||
DROP INDEX IF EXISTS idx_entity_facts_status;
|
||||
DROP INDEX IF EXISTS idx_entity_facts_source_photo;
|
||||
DROP INDEX IF EXISTS idx_entity_facts_persona;
|
||||
|
||||
ALTER TABLE entity_facts RENAME TO entity_facts_old;
|
||||
|
||||
CREATE TABLE entity_facts (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
|
||||
subject_entity_id INTEGER NOT NULL,
|
||||
predicate TEXT NOT NULL,
|
||||
object_entity_id INTEGER,
|
||||
object_value TEXT,
|
||||
source_photo TEXT,
|
||||
source_insight_id INTEGER,
|
||||
confidence REAL NOT NULL DEFAULT 0.6,
|
||||
status TEXT NOT NULL DEFAULT 'active',
|
||||
created_at BIGINT NOT NULL,
|
||||
persona_id TEXT NOT NULL DEFAULT 'default',
|
||||
user_id INTEGER NOT NULL DEFAULT 1,
|
||||
CONSTRAINT fk_ef_subject FOREIGN KEY (subject_entity_id) REFERENCES entities(id) ON DELETE CASCADE,
|
||||
CONSTRAINT fk_ef_object FOREIGN KEY (object_entity_id) REFERENCES entities(id) ON DELETE SET NULL,
|
||||
CONSTRAINT fk_ef_insight FOREIGN KEY (source_insight_id) REFERENCES photo_insights(id) ON DELETE SET NULL,
|
||||
CONSTRAINT fk_ef_persona FOREIGN KEY (user_id, persona_id) REFERENCES personas(user_id, persona_id) ON DELETE CASCADE,
|
||||
CHECK (object_entity_id IS NOT NULL OR object_value IS NOT NULL)
|
||||
);
|
||||
|
||||
-- Backfill: assign each legacy fact to the user that owns the matching
|
||||
-- persona. Built-ins are seeded per-user with the same persona_id
|
||||
-- string for everyone, so MIN(user_id) deterministically picks the
|
||||
-- earliest registered user (typically user 1, the operator). Custom
|
||||
-- persona_ids exist for at most one user, so MIN is also unique.
|
||||
-- Falls back to user_id=1 when no matching persona row exists; in that
|
||||
-- case the FK below would still fail, but legacy rows shouldn't be in
|
||||
-- that state because 2026-05-09 ADD COLUMN defaulted persona_id to
|
||||
-- 'default', which is seeded for every user.
|
||||
INSERT INTO entity_facts
|
||||
(id, subject_entity_id, predicate, object_entity_id, object_value,
|
||||
source_photo, source_insight_id, confidence, status, created_at,
|
||||
persona_id, user_id)
|
||||
SELECT
|
||||
old.id,
|
||||
old.subject_entity_id,
|
||||
old.predicate,
|
||||
old.object_entity_id,
|
||||
old.object_value,
|
||||
old.source_photo,
|
||||
old.source_insight_id,
|
||||
old.confidence,
|
||||
old.status,
|
||||
old.created_at,
|
||||
old.persona_id,
|
||||
COALESCE(
|
||||
(SELECT MIN(p.user_id) FROM personas p WHERE p.persona_id = old.persona_id),
|
||||
1
|
||||
)
|
||||
FROM entity_facts_old old;
|
||||
|
||||
DROP TABLE entity_facts_old;
|
||||
|
||||
CREATE INDEX idx_entity_facts_subject ON entity_facts(subject_entity_id);
|
||||
CREATE INDEX idx_entity_facts_predicate ON entity_facts(predicate);
|
||||
CREATE INDEX idx_entity_facts_status ON entity_facts(status);
|
||||
CREATE INDEX idx_entity_facts_source_photo ON entity_facts(source_photo);
|
||||
CREATE INDEX idx_entity_facts_persona ON entity_facts(persona_id);
|
||||
CREATE INDEX idx_entity_facts_user_persona ON entity_facts(user_id, persona_id);
|
||||
@@ -1,5 +0,0 @@
|
||||
-- SQLite can drop columns since 3.35 (March 2021); embedded
|
||||
-- libsqlite3-sys is well past that. Drop in reverse insert order so
|
||||
-- a partial down still leaves the schema valid.
|
||||
ALTER TABLE entity_facts DROP COLUMN valid_until;
|
||||
ALTER TABLE entity_facts DROP COLUMN valid_from;
|
||||
@@ -1,25 +0,0 @@
|
||||
-- Add valid-time columns to entity_facts.
|
||||
--
|
||||
-- entity_facts already has created_at — *transaction time*, the
|
||||
-- moment WE recorded the fact. That's not the same as the real-world
|
||||
-- period the fact was true. "Cameron is_in_relationship_with X" was
|
||||
-- only true during a window; recording it in 2026 doesn't make it
|
||||
-- true today. Without the distinction, every former relationship,
|
||||
-- former job, former address reads as currently-true.
|
||||
--
|
||||
-- Adding two BIGINT NULL columns: valid_from / valid_until (unix
|
||||
-- seconds). NULL means "unbounded on that side" — `valid_from IS
|
||||
-- NULL` reads as "always-true-back-to-the-beginning",
|
||||
-- `valid_until IS NULL` as "still-true-now-or-unknown". Both NULL =
|
||||
-- temporal validity unknown (current state of all legacy rows).
|
||||
--
|
||||
-- Conflict detection refines accordingly: same-predicate facts with
|
||||
-- different objects stop flagging when their intervals are disjoint
|
||||
-- ("lives_in NYC 2018-2020" and "lives_in SF 2020-present" are both
|
||||
-- valid, just at different times).
|
||||
|
||||
ALTER TABLE entity_facts ADD COLUMN valid_from BIGINT;
|
||||
ALTER TABLE entity_facts ADD COLUMN valid_until BIGINT;
|
||||
|
||||
-- Optional partial index for time-bounded scans. Skipped for now —
|
||||
-- conflict detection runs per-entity (small N) and doesn't need it.
|
||||
@@ -1,2 +0,0 @@
|
||||
DROP INDEX IF EXISTS idx_entity_facts_superseded_by;
|
||||
ALTER TABLE entity_facts DROP COLUMN superseded_by;
|
||||
@@ -1,31 +0,0 @@
|
||||
-- Add a supersession pointer to entity_facts.
|
||||
--
|
||||
-- Status alone is a one-way trapdoor: 'rejected' loses the link
|
||||
-- between the rejected fact and the one that replaced it. For
|
||||
-- evolving facts (Cameron's relationship, employer, address) the
|
||||
-- curator wants to *replace* a stale fact with a new one and keep
|
||||
-- the history readable: "from 2018 until 2022 this was true, then
|
||||
-- it became this other thing".
|
||||
--
|
||||
-- A nullable INTEGER column pointing at another entity_facts.id —
|
||||
-- no FK constraint because SQLite can't ALTER ADD COLUMN with REFs;
|
||||
-- the DAO's delete_fact clears dangling pointers in the same
|
||||
-- transaction as the parent delete to keep the column honest.
|
||||
--
|
||||
-- A status of 'superseded' on the old fact (alongside the existing
|
||||
-- active / reviewed / rejected) signals "replaced by a newer
|
||||
-- claim". Read paths already filter 'rejected' out of the active
|
||||
-- view; the curation UI will treat 'superseded' the same way for
|
||||
-- conflict detection so they don't keep flagging.
|
||||
--
|
||||
-- Pairs with the valid-time columns from 2026-05-10-000100: the
|
||||
-- supersede action auto-stamps the old fact's `valid_until` from
|
||||
-- the new fact's `valid_from`, closing the interval cleanly.
|
||||
|
||||
ALTER TABLE entity_facts ADD COLUMN superseded_by INTEGER;
|
||||
|
||||
-- Helpful index for "show me what superseded this fact" walks
|
||||
-- (rare today; cheap to add now while the table is small).
|
||||
CREATE INDEX idx_entity_facts_superseded_by
|
||||
ON entity_facts(superseded_by)
|
||||
WHERE superseded_by IS NOT NULL;
|
||||
@@ -1,4 +0,0 @@
|
||||
DROP INDEX IF EXISTS idx_entity_facts_created_by_backend;
|
||||
DROP INDEX IF EXISTS idx_entity_facts_created_by_model;
|
||||
ALTER TABLE entity_facts DROP COLUMN created_by_backend;
|
||||
ALTER TABLE entity_facts DROP COLUMN created_by_model;
|
||||
@@ -1,30 +0,0 @@
|
||||
-- Track which model + backend generated each fact so the curator
|
||||
-- can audit which configurations produce trustworthy knowledge.
|
||||
--
|
||||
-- photo_insights already carries `model_version` + `backend`, and
|
||||
-- entity_facts.source_insight_id links to it — but:
|
||||
-- 1. source_insight_id is only set after an insight is stored
|
||||
-- (post-loop), so chat-continuation facts and facts whose insight
|
||||
-- was regenerated lose the link.
|
||||
-- 2. JOINing for every read is more friction than just embedding the
|
||||
-- provenance on the fact row itself.
|
||||
-- 3. Manual facts (POST /knowledge/facts) have no insight at all and
|
||||
-- need to record "manual" as their provenance.
|
||||
--
|
||||
-- Two nullable TEXT columns are enough for the audit use case: model
|
||||
-- (e.g. "qwen2.5:7b", "anthropic/claude-sonnet-4") and backend
|
||||
-- ("local", "hybrid", "manual"). Pre-existing rows leave both NULL —
|
||||
-- legacy facts predate this tracking and can't be back-filled
|
||||
-- reliably from training_messages without burning compute.
|
||||
|
||||
ALTER TABLE entity_facts ADD COLUMN created_by_model TEXT;
|
||||
ALTER TABLE entity_facts ADD COLUMN created_by_backend TEXT;
|
||||
|
||||
-- Indexes are cheap and useful for "show me all facts from model X"
|
||||
-- audit queries — partial so the legacy NULL rows don't bloat them.
|
||||
CREATE INDEX idx_entity_facts_created_by_model
|
||||
ON entity_facts(created_by_model)
|
||||
WHERE created_by_model IS NOT NULL;
|
||||
CREATE INDEX idx_entity_facts_created_by_backend
|
||||
ON entity_facts(created_by_backend)
|
||||
WHERE created_by_backend IS NOT NULL;
|
||||
@@ -1 +0,0 @@
|
||||
ALTER TABLE personas DROP COLUMN reviewed_only_facts;
|
||||
@@ -1,16 +0,0 @@
|
||||
-- Per-persona toggle: when true, agent reads only see facts whose
|
||||
-- status is exactly 'reviewed' (human-verified). When false (the
|
||||
-- default), agent reads see 'active' OR 'reviewed' — everything not
|
||||
-- rejected or superseded.
|
||||
--
|
||||
-- The mobile app surfaces this as "Strict mode" on the persona
|
||||
-- editor: useful when you want a persona's chat to be grounded
|
||||
-- exclusively on the curated subset, e.g. for tasks where
|
||||
-- hallucinated agent claims are particularly costly.
|
||||
--
|
||||
-- Note: this is separate from `include_all_memories` (which unions
|
||||
-- across personas for hive-mind reads). Reviewed-only operates on
|
||||
-- the status axis; include_all_memories operates on the persona-
|
||||
-- scope axis. They compose freely.
|
||||
|
||||
ALTER TABLE personas ADD COLUMN reviewed_only_facts BOOLEAN NOT NULL DEFAULT 0;
|
||||
@@ -1,5 +0,0 @@
|
||||
ALTER TABLE personas DROP COLUMN allow_agent_corrections;
|
||||
DROP INDEX IF EXISTS idx_entity_facts_last_modified_at;
|
||||
ALTER TABLE entity_facts DROP COLUMN last_modified_at;
|
||||
ALTER TABLE entity_facts DROP COLUMN last_modified_by_backend;
|
||||
ALTER TABLE entity_facts DROP COLUMN last_modified_by_model;
|
||||
@@ -1,30 +0,0 @@
|
||||
-- Three coupled changes for agent self-correction safety:
|
||||
--
|
||||
-- 1. `entity_facts.last_modified_by_*` + `last_modified_at` track who
|
||||
-- most recently mutated each fact. `created_by_*` from migration
|
||||
-- 2026-05-10-000300 records who first wrote the row; this records
|
||||
-- who last *changed* it. Separate columns so the create vs update
|
||||
-- audit is independently grep-able ("show me every fact gpt-5
|
||||
-- altered last week" stays a single index scan).
|
||||
--
|
||||
-- 2. `personas.allow_agent_corrections` is the gate for the new
|
||||
-- agent-side `update_fact` / `supersede_fact` tools. Default OFF —
|
||||
-- a fresh persona's agent can create but can't alter or replace.
|
||||
-- Operator opts in per-persona after the model has earned trust,
|
||||
-- typically via the strict-mode flow (curate, then ratchet up
|
||||
-- agent autonomy as confidence rises). Parallel in shape to
|
||||
-- `reviewed_only_facts` from 2026-05-10-000400; they compose.
|
||||
--
|
||||
-- 3. Index on `last_modified_at` (partial, NOT NULL) for the
|
||||
-- audit-feed reads in the curation UI ("show recent agent edits
|
||||
-- sorted newest first").
|
||||
|
||||
ALTER TABLE entity_facts ADD COLUMN last_modified_by_model TEXT;
|
||||
ALTER TABLE entity_facts ADD COLUMN last_modified_by_backend TEXT;
|
||||
ALTER TABLE entity_facts ADD COLUMN last_modified_at BIGINT;
|
||||
|
||||
CREATE INDEX idx_entity_facts_last_modified_at
|
||||
ON entity_facts(last_modified_at)
|
||||
WHERE last_modified_at IS NOT NULL;
|
||||
|
||||
ALTER TABLE personas ADD COLUMN allow_agent_corrections BOOLEAN NOT NULL DEFAULT 0;
|
||||
@@ -1,6 +0,0 @@
|
||||
-- Irreversible: we collapsed multiple raw entity_type strings to
|
||||
-- canonical forms and don't have a per-row record of the original.
|
||||
-- The down migration is intentionally a no-op (the rewritten values
|
||||
-- are still semantically correct), and the up migration is safe to
|
||||
-- re-run because every UPDATE is conditional on `!= canonical`.
|
||||
SELECT 1;
|
||||
@@ -1,43 +0,0 @@
|
||||
-- Canonicalize `entities.entity_type` so legacy rows from before
|
||||
-- `normalize_entity_type` landed in upsert_entity stop polluting
|
||||
-- client-side filters. Mirrors the synonym map in
|
||||
-- `src/database/knowledge_dao.rs::normalize_entity_type`:
|
||||
-- person ← person | people | human | individual | contact
|
||||
-- place ← place | location | venue | site | area | landmark
|
||||
-- event ← event | occasion | activity | celebration
|
||||
-- thing ← thing | object | item | product
|
||||
-- Types outside the synonym set (e.g. "friend", "family") are not
|
||||
-- recognized as canonical and get a lowercase+trim pass instead, so
|
||||
-- at minimum case variants collapse.
|
||||
--
|
||||
-- `UPDATE OR IGNORE` skips rows that would violate UNIQUE(name,
|
||||
-- entity_type) after the rewrite. Two rows like ("Sarah", "person")
|
||||
-- + ("Sarah", "Person") would otherwise collide — the duplicate
|
||||
-- survives unchanged so the curator can merge it via the curation
|
||||
-- UI rather than have the migration silently delete data.
|
||||
|
||||
UPDATE OR IGNORE entities
|
||||
SET entity_type = 'person'
|
||||
WHERE LOWER(TRIM(entity_type)) IN ('person', 'people', 'human', 'individual', 'contact')
|
||||
AND entity_type != 'person';
|
||||
|
||||
UPDATE OR IGNORE entities
|
||||
SET entity_type = 'place'
|
||||
WHERE LOWER(TRIM(entity_type)) IN ('place', 'location', 'venue', 'site', 'area', 'landmark')
|
||||
AND entity_type != 'place';
|
||||
|
||||
UPDATE OR IGNORE entities
|
||||
SET entity_type = 'event'
|
||||
WHERE LOWER(TRIM(entity_type)) IN ('event', 'occasion', 'activity', 'celebration')
|
||||
AND entity_type != 'event';
|
||||
|
||||
UPDATE OR IGNORE entities
|
||||
SET entity_type = 'thing'
|
||||
WHERE LOWER(TRIM(entity_type)) IN ('thing', 'object', 'item', 'product')
|
||||
AND entity_type != 'thing';
|
||||
|
||||
-- Anything left ("Friend" vs "friend") gets a lowercase+trim sweep
|
||||
-- so at least case variants of the same custom type collapse.
|
||||
UPDATE OR IGNORE entities
|
||||
SET entity_type = LOWER(TRIM(entity_type))
|
||||
WHERE entity_type != LOWER(TRIM(entity_type));
|
||||
@@ -1,5 +0,0 @@
|
||||
DROP INDEX IF EXISTS idx_image_exif_date_backfill;
|
||||
|
||||
CREATE INDEX idx_image_exif_date_backfill
|
||||
ON image_exif (library_id, id)
|
||||
WHERE date_taken IS NULL OR date_taken_source = 'fs_time';
|
||||
@@ -1,18 +0,0 @@
|
||||
-- Narrow the date-backfill partial index to NULL-only rows.
|
||||
--
|
||||
-- The original index (2026-05-06-000000_add_date_taken_source) also matched
|
||||
-- `date_taken_source = 'fs_time'` so the drain could "re-resolve weak
|
||||
-- entries when better tools become available." In practice the resolver
|
||||
-- is deterministic on file bytes + filename + fs metadata: a row that
|
||||
-- landed on fs_time once will land on fs_time again on every subsequent
|
||||
-- tick. With `ORDER BY id ASC LIMIT 500`, the drain spun on the same
|
||||
-- lowest-id fs_time rows in perpetuity, never advancing, while hammering
|
||||
-- the SQLite write lock once per row and starving other writers (face
|
||||
-- PATCHes were hitting busy_timeout and returning 500). Drop fs_time
|
||||
-- from the eligibility set; if exiftool / a new filename pattern ever
|
||||
-- comes online, a one-shot operator command can re-resolve.
|
||||
DROP INDEX IF EXISTS idx_image_exif_date_backfill;
|
||||
|
||||
CREATE INDEX idx_image_exif_date_backfill
|
||||
ON image_exif (library_id, id)
|
||||
WHERE date_taken IS NULL;
|
||||
@@ -1,392 +0,0 @@
|
||||
# Insight Chat improvements — design
|
||||
|
||||
**Date:** 2026-05-07
|
||||
**Branch:** `feature/insight-chat-improvements` (in both `ImageApi/` and `FileViewer-React/`)
|
||||
**Scope:** ImageApi photo-anchored insight + chat surface, plus the
|
||||
FileViewer-React client. Apollo's free/visit chat is **not** in this cycle.
|
||||
|
||||
## Problem
|
||||
|
||||
Three concrete gaps in today's insight + chat surface:
|
||||
|
||||
1. **Tool drift.** ImageApi exposes 13 tools to the LLM. Some are gated on
|
||||
`apollo_enabled` / `has_vision`, but several optional ones
|
||||
(`search_rag`, `get_calendar_events`, `get_location_history`) are
|
||||
registered unconditionally even when their backing tables are empty.
|
||||
Descriptions vary in quality and a couple have outright bugs.
|
||||
2. **Inconsistent / incomplete tool descriptions.** Tools like
|
||||
`search_messages` describe their selection rules but omit useful
|
||||
examples; `store_fact` doesn't show the `object_entity_id` vs
|
||||
`object_value` choice; `get_sms_messages` accepts a `days_radius`
|
||||
parameter that the backing client silently ignores. The LLM is being
|
||||
instructed against a slightly wrong reality.
|
||||
3. **System prompt fights the persona.** Today's generation prompt
|
||||
prepends the user's `custom_system_prompt` and then immediately asserts
|
||||
`"You are a personal photo memory assistant..."`. The user message
|
||||
demands `"a detailed insight with a title and summary"`. Both
|
||||
contradict whatever voice / shape / POV the persona just established.
|
||||
On chat continuation the persona is baked into the stored transcript at
|
||||
generation time and can't be changed without regenerating.
|
||||
|
||||
## Goals
|
||||
|
||||
- Tool catalog is **representative** — every tool registered for a turn is
|
||||
backed by data the user actually has.
|
||||
- Tool descriptions are **concise but complete**, with examples for any
|
||||
tool whose param choice has multiple modes or non-obvious interactions.
|
||||
- Persona / system prompt is **authoritative** for voice, length, and
|
||||
shape — both at generation and during chat continuation.
|
||||
- Per-turn system prompt overrides on chat work without surprising
|
||||
side-effects on the stored transcript outside `amend` mode.
|
||||
|
||||
## Non-goals
|
||||
|
||||
- Apollo backend / frontend changes. Separate cycle.
|
||||
- Refactoring the `generate_photo_title` post-hoc title flow. Already
|
||||
takes `custom_system_prompt`.
|
||||
- Tool consolidation (e.g. merging `search_messages` + `get_sms_messages`).
|
||||
Considered and deferred — keeps blast radius small.
|
||||
- Removing knowledge-memory tools (`recall_*` / `store_*`). Audit
|
||||
confirmed they have a live read path via `knowledge.rs` HTTP routes.
|
||||
- Persisting persona changes to the stored transcript outside `amend`
|
||||
mode. Deliberate — re-opens use the persona currently active in the
|
||||
client, not a sticky historical setting.
|
||||
|
||||
---
|
||||
|
||||
## Design
|
||||
|
||||
### A. System prompt — generation
|
||||
|
||||
Today (`insight_generator.rs:3305–3326`):
|
||||
|
||||
```
|
||||
[custom_system_prompt if any] +
|
||||
"You are a personal photo memory assistant helping to reconstruct..." +
|
||||
{owner_id_note} +
|
||||
{fewshot_block} +
|
||||
"IMPORTANT INSTRUCTIONS:
|
||||
1. You MUST call multiple tools...
|
||||
2. When calling get_sms_messages and search_rag...
|
||||
3. Use recall_facts_for_photo...
|
||||
...
|
||||
8. You have a hard budget of {max_iterations} iterations..."
|
||||
```
|
||||
|
||||
The first concatenation is the bug: `custom` claims one identity, the
|
||||
next line asserts another.
|
||||
|
||||
**New structure** — two named blocks, in order:
|
||||
|
||||
```
|
||||
[Identity / voice / format block] ← persona-controlled (or neutral default)
|
||||
[Procedural block] ← always identity-free
|
||||
```
|
||||
|
||||
**Identity block:**
|
||||
- When `custom_system_prompt` is supplied: use that string verbatim, no
|
||||
pre/append.
|
||||
- When not: a neutral default that doesn't fight a future persona.
|
||||
Working text: `"You are reconstructing a memory from a photo. Use the
|
||||
gathered context to write a thoughtful summary; you decide voice,
|
||||
length, and shape."`
|
||||
|
||||
**Procedural block** — identity-free, always emitted:
|
||||
|
||||
```
|
||||
Tool-use guidance:
|
||||
- You have a budget of {max_iterations} tool-calling iterations.
|
||||
- Call tools to gather context BEFORE writing your final answer; don't
|
||||
answer after one or two calls.
|
||||
- When calling get_sms_messages or search_rag, make at least one call
|
||||
WITHOUT a contact filter — surrounding events matter even when a
|
||||
contact is known.
|
||||
- Use recall_facts_for_photo + recall_entities to load any prior
|
||||
knowledge about subjects in the photo.
|
||||
- When you identify people / places / events / things, use store_entity
|
||||
+ store_fact to grow the persistent memory.
|
||||
- A tool returning no results is informative; continue with the others.
|
||||
|
||||
{owner_id_note if applicable}
|
||||
{fewshot_block if applicable}
|
||||
```
|
||||
|
||||
Differences from today's "IMPORTANT INSTRUCTIONS" block: removed the
|
||||
"you are a personal photo memory assistant" framing and the explicit
|
||||
"at least 5 tool calls" floor (replaced with the softer "don't answer
|
||||
after one or two"). Few-shot stays — it's pattern-of-tool-use, not
|
||||
identity.
|
||||
|
||||
### B. User message — generation
|
||||
|
||||
Today (line 3357):
|
||||
|
||||
```
|
||||
{visual_block}Please analyze this photo and gather any relevant context
|
||||
from the surrounding weeks.
|
||||
|
||||
Photo file path: {file_path}
|
||||
Date taken: {date}
|
||||
{contact_info}
|
||||
{gps_info}
|
||||
{tags_info}
|
||||
|
||||
Use the available tools to gather more context about this moment
|
||||
(messages, calendar events, location history, etc.), then write a
|
||||
detailed insight with a title and summary.
|
||||
```
|
||||
|
||||
Problems: the trailing line bakes in output shape ("title and
|
||||
summary"), and the title from the resulting response is **discarded
|
||||
anyway** — `generate_photo_title` (line 3494) regenerates the title
|
||||
post-hoc from the summary. So the prompt is constraining voice for no
|
||||
data-model benefit.
|
||||
|
||||
**New payload** — context-only, no output prescription:
|
||||
|
||||
```
|
||||
{visual_block}Photo file path: {file_path}
|
||||
Date taken: {date}
|
||||
{contact_info}
|
||||
{gps_info}
|
||||
{tags_info}
|
||||
|
||||
Gather context with the available tools, then respond.
|
||||
```
|
||||
|
||||
The persona owns shape. If a user wants "title-then-paragraph" output,
|
||||
their persona prompt says so.
|
||||
|
||||
### C. System prompt — chat continuation
|
||||
|
||||
Add `system_prompt: Option<String>` to `ChatTurnRequest` (and to its
|
||||
HTTP wrapper `ChatTurnHttpRequest`). It carries through both the
|
||||
non-streaming `chat_turn` and the streaming `chat_turn_stream`.
|
||||
|
||||
**Append mode (default, `amend=false`)** — ephemeral
|
||||
swap-and-restore, mirroring the existing `annotate_system_with_budget`
|
||||
pattern:
|
||||
|
||||
1. Load stored transcript.
|
||||
2. If `system_prompt` is `Some(s)`:
|
||||
- If first message is a `system` role: stash original content,
|
||||
replace with `s`.
|
||||
- Else: prepend a synthetic ephemeral system message with `s` (note
|
||||
it's synthetic so the restore step pops it rather than rewriting).
|
||||
3. Run `annotate_system_with_budget` on top (existing per-turn budget
|
||||
note appends to whatever's there now).
|
||||
4. Run the agentic loop.
|
||||
5. **Before persistence**, restore the original system content (or pop
|
||||
the synthetic one). Run `restore_system_content` for the budget
|
||||
annotation as today.
|
||||
6. Save.
|
||||
|
||||
Result: the model sees the override; the stored transcript is
|
||||
unchanged outside the model's actual reply.
|
||||
|
||||
**Amend mode (`amend=true`)**:
|
||||
|
||||
- If `system_prompt` is supplied: the override stays in place during
|
||||
the serialization for the new insight row. The new row's
|
||||
`training_messages` system message is the override. `is_current=false`
|
||||
flips on prior rows as today.
|
||||
- If not supplied: behaves as today (stored transcript's system message
|
||||
carries forward unchanged).
|
||||
|
||||
### D. FileViewer-React — client wiring
|
||||
|
||||
`hooks/useInsightChat.tsx`:
|
||||
- `SendTurnOptions` gains `systemPromptOverride?: string | null`.
|
||||
- Inside `sendTurn`, before issuing the streaming POST:
|
||||
1. Read the active persona's `systemPrompt` from AsyncStorage
|
||||
(already loaded for generation flows — reuse the same accessor).
|
||||
2. If a one-shot `systemPromptOverride` is set, append as a suffix
|
||||
(`${persona}\n\n${override}`) so persona voice survives + override
|
||||
tweaks the turn.
|
||||
3. Include the resulting string as `system_prompt` on the request body.
|
||||
- No history-load change. The history endpoint still returns the stored
|
||||
transcript.
|
||||
|
||||
`components/InsightChatModal.tsx`:
|
||||
- Add a small "Style note" composer affordance — a one-shot text input
|
||||
that, when filled, becomes the `systemPromptOverride` for the next
|
||||
send. Cleared after send.
|
||||
- The existing persona chip continues to open `PersonaManagerModal`.
|
||||
|
||||
`hooks/usePersonas.tsx` and the bundled defaults:
|
||||
- Built-in `assistant` and `journal` prompts get audited and rewritten
|
||||
to **explicitly state voice / shape / length** — since the framework
|
||||
no longer guarantees a default shape, the persona must.
|
||||
|
||||
### E. Tool catalog — gating
|
||||
|
||||
Widen `build_tool_definitions` from `(has_vision: bool, apollo_enabled:
|
||||
bool)` to a single `ToolGateOpts` struct:
|
||||
|
||||
```rust
|
||||
pub struct ToolGateOpts {
|
||||
pub has_vision: bool,
|
||||
pub apollo_enabled: bool,
|
||||
pub daily_summaries_present: bool,
|
||||
pub calendar_present: bool,
|
||||
pub location_history_present: bool,
|
||||
}
|
||||
```
|
||||
|
||||
The chat / generation services compute the three new fields lazily per
|
||||
turn via `SELECT 1 FROM <table> LIMIT 1` (cheap; cached for the turn's
|
||||
duration). Lazy because operators import data after launch and we don't
|
||||
want to require a restart for the LLM to discover its new capabilities.
|
||||
|
||||
Per-tool gating:
|
||||
|
||||
| Tool | Existing gate | New gate |
|
||||
|---|---|---|
|
||||
| `describe_photo` | `has_vision` | unchanged |
|
||||
| `get_personal_place_at` | `apollo_enabled` | unchanged |
|
||||
| `get_calendar_events` | none | `calendar_present` |
|
||||
| `get_location_history` | none | `location_history_present` |
|
||||
| `search_rag` | none | `daily_summaries_present` |
|
||||
|
||||
All other tools always-on. (`get_sms_messages` and `search_messages`
|
||||
fail informatively if SMS-API is unreachable; not worth a startup probe
|
||||
since intermittent failures are the same shape.)
|
||||
|
||||
### F. Tool descriptions — convention
|
||||
|
||||
Every description follows:
|
||||
|
||||
1. One sentence: **what** + **when to call**.
|
||||
2. Param semantics worth knowing (units, ranges, mode behavior,
|
||||
precedence).
|
||||
3. **Example invocation** for tools with multiple modes, optional bands,
|
||||
or non-obvious parameter interactions.
|
||||
4. Cross-references when relevant: `prefer X when both apply`.
|
||||
|
||||
Banned: all-caps section headers inside descriptions
|
||||
(`"CONTENT search"`, `"TIME-BASED fetch"`); persona-prescriptive language
|
||||
(`"you are a..."`); behavioral references to other tools by description
|
||||
rather than name.
|
||||
|
||||
Tools getting examples: `search_messages`, `search_rag`, `store_fact`,
|
||||
`get_sms_messages`. Trivial tools (`get_current_datetime`,
|
||||
`reverse_geocode`, `get_file_tags`) skip the example.
|
||||
|
||||
Sample (`search_messages`):
|
||||
|
||||
> Search SMS/MMS message bodies. Modes: `fts5` (keyword + phrase + prefix
|
||||
> + AND/OR/NOT + NEAR proximity), `semantic` (embedding similarity,
|
||||
> requires generated embeddings), `hybrid` (RRF merge, recommended;
|
||||
> degrades to `fts5` when embeddings absent). Optional `start_ts` /
|
||||
> `end_ts` (real-UTC unix seconds) and `contact_id` filters. For pure
|
||||
> date / contact browsing without keywords, prefer `get_sms_messages`.
|
||||
>
|
||||
> Examples:
|
||||
> - `{query: "trader joe's"}` — phrase across all time.
|
||||
> - `{query: "dinner", contact_id: 42, start_ts: 1700000000, end_ts: 1700604800}`
|
||||
> — keyword within a contact and a week.
|
||||
> - `{query: "NEAR(meeting work, 5)"}` — proximity search.
|
||||
|
||||
### G. SMS tool fixes
|
||||
|
||||
#### `get_sms_messages` — honor `days_radius`
|
||||
|
||||
Today: `sms_client::fetch_messages_for_contact(contact, center_ts)`
|
||||
hardcodes `Duration::days(4)` (lines 31–37). The tool accepts
|
||||
`days_radius` and silently ignores it.
|
||||
|
||||
**Fix:** widen the signature to
|
||||
`fetch_messages_for_contact(contact, center_ts, days_radius)`. Tool
|
||||
plumbs through. Default 4 retained for back-compat.
|
||||
|
||||
#### `search_messages` — add date and contact_id filters
|
||||
|
||||
Today: ImageApi's `search_messages` only forwards `query`, `mode`,
|
||||
`limit` to SMS-API.
|
||||
|
||||
**Fix:** add `start_ts`, `end_ts`, `contact_id` parameters.
|
||||
- `contact_id` forwards directly to SMS-API
|
||||
(`/api/messages/search/?contact_id=`).
|
||||
- `start_ts` / `end_ts` are not natively accepted by SMS-API's search
|
||||
endpoint. Apply client-side post-filter on the response (Apollo's
|
||||
pattern: `chat_tools.py:670–680`). Bump the SMS-API `limit` to a
|
||||
larger fetch pool when a date filter is supplied so in-window matches
|
||||
aren't lost to out-of-window FTS rank.
|
||||
|
||||
---
|
||||
|
||||
## Implementation sequencing
|
||||
|
||||
Each step is independently mergeable.
|
||||
|
||||
### ImageApi PRs
|
||||
|
||||
1. **Split system-prompt assembly + neutralize user message.** Two
|
||||
named blocks; user message context-only. Default identity string
|
||||
added. Tests: golden snapshots of the resulting `system_content`
|
||||
with and without `custom_system_prompt`.
|
||||
2. **`system_prompt` field on chat request + swap/restore + amend
|
||||
persistence.** Mirrors `annotate_system_with_budget` pattern. Tests:
|
||||
round-trip system content unchanged in append mode; persisted in
|
||||
amend mode.
|
||||
3. **`fetch_messages_for_contact` honors `days_radius`.** Tool wires
|
||||
the param through. Tests: window math at the client level.
|
||||
4. **`ToolGateOpts` + per-tool description rewrites.** Description
|
||||
text changes are the bulk of the diff but no behavior change beyond
|
||||
gating.
|
||||
|
||||
### FileViewer-React PR
|
||||
|
||||
5. **Chat hook sends `system_prompt`; modal gets style-note input;
|
||||
built-in personas updated to specify shape.** The
|
||||
`useInsightChat.sendTurn` call site picks up the persona and
|
||||
includes it on every chat turn body. Style-note input is a one-shot
|
||||
suffix.
|
||||
|
||||
## Testing & verification
|
||||
|
||||
**Automated:**
|
||||
- Unit (Rust): swap-and-restore round-trip preserves stored transcript.
|
||||
- Unit (Rust): amend mode persists override into new insight row.
|
||||
- Unit (Rust): `fetch_messages_for_contact(days_radius=N)` produces a
|
||||
window of `2N` days centered on `center_ts`.
|
||||
- Unit (Rust): `build_tool_definitions(opts)` excludes gated tools when
|
||||
the corresponding flag is false.
|
||||
|
||||
**Manual:**
|
||||
- Run a chat turn against an existing insight without `system_prompt` →
|
||||
output unchanged from baseline.
|
||||
- Same insight, with override → output reflects new voice.
|
||||
- Re-open chat → original baked persona still authoritative (override
|
||||
was ephemeral).
|
||||
- Regenerate an insight with the journal persona → model's voice
|
||||
matches journal style; no "memory assistant" framing leaks through.
|
||||
- Toggle data presence (delete a row from `calendar_events`) → tool
|
||||
drops from the catalog on the next turn.
|
||||
|
||||
## Risks
|
||||
|
||||
- **Default identity wording matters.** A too-neutral default ("Use the
|
||||
gathered context to write a summary") might produce flatter output
|
||||
than today's "personal photo memory assistant" framing for users
|
||||
who never set a persona. Mitigation: tune the default with a small
|
||||
set of test photos before merging.
|
||||
- **Persona-suffix style notes can contradict persona voice.** A user
|
||||
who picks `journal` (first person, warm) and adds the style note
|
||||
"respond in bullet points" will get a tonal collision. Acceptable —
|
||||
the user expressed a per-turn intent and we honor it. Document the
|
||||
composition rule in the persona-manager UI.
|
||||
- **Lazy data-presence probes add a per-turn `SELECT 1`.** Negligible
|
||||
on SQLite (sub-millisecond) but adds up across many turns. Cache the
|
||||
result for the turn's duration; don't re-probe per-tool.
|
||||
|
||||
## Open questions
|
||||
|
||||
None blocking. Items deferred to a possible follow-up cycle:
|
||||
|
||||
- Apollo parity for the same per-turn override pattern (already
|
||||
present; just needs RN client wiring on the photo path which is
|
||||
already proxy).
|
||||
- Tool consolidation (`search_messages` + `get_sms_messages` →
|
||||
single `search_messages` with optional date filter, Apollo-style).
|
||||
Considered and deferred — separate spec.
|
||||
@@ -383,10 +383,7 @@ mod tests {
|
||||
// body cap and rejected normal-size photos before they reached
|
||||
// the backend.
|
||||
assert!(is_transient(&classify_error_response(408, "")));
|
||||
assert!(is_transient(&classify_error_response(
|
||||
413,
|
||||
"<html>nginx</html>"
|
||||
)));
|
||||
assert!(is_transient(&classify_error_response(413, "<html>nginx</html>")));
|
||||
assert!(is_transient(&classify_error_response(429, "{}")));
|
||||
}
|
||||
|
||||
|
||||
+13
-71
@@ -48,11 +48,6 @@ pub struct GeneratePhotoInsightRequest {
|
||||
/// falls back to `DEFAULT_FEWSHOT_INSIGHT_IDS`.
|
||||
#[serde(default)]
|
||||
pub fewshot_insight_ids: Option<Vec<i32>>,
|
||||
/// Active persona id for this generation. New facts are tagged with
|
||||
/// it (`entity_facts.persona_id`); recall during the agentic loop is
|
||||
/// scoped to it. Defaults to `"default"` when absent.
|
||||
#[serde(default)]
|
||||
pub persona_id: Option<String>,
|
||||
}
|
||||
|
||||
#[derive(Debug, Deserialize)]
|
||||
@@ -305,14 +300,11 @@ pub async fn get_all_insights_handler(
|
||||
#[post("/insights/generate/agentic")]
|
||||
pub async fn generate_agentic_insight_handler(
|
||||
http_request: HttpRequest,
|
||||
claims: Claims,
|
||||
_claims: Claims,
|
||||
request: web::Json<GeneratePhotoInsightRequest>,
|
||||
insight_generator: web::Data<InsightGenerator>,
|
||||
insight_dao: web::Data<std::sync::Mutex<Box<dyn InsightDao>>>,
|
||||
) -> impl Responder {
|
||||
// Service tokens (sub: "service:apollo") fall through to user_id=1
|
||||
// — the operator convention. Mobile/web clients have a numeric sub.
|
||||
let user_id = claims.sub.parse::<i32>().unwrap_or(1);
|
||||
let parent_context = extract_context_from_request(&http_request);
|
||||
let tracer = global_tracer();
|
||||
let mut span = tracer.start_with_context("http.insights.generate_agentic", &parent_context);
|
||||
@@ -384,13 +376,6 @@ pub async fn generate_agentic_insight_handler(
|
||||
.collect()
|
||||
};
|
||||
|
||||
let persona_id = request
|
||||
.persona_id
|
||||
.clone()
|
||||
.filter(|s| !s.trim().is_empty())
|
||||
.unwrap_or_else(|| "default".to_string());
|
||||
span.set_attribute(KeyValue::new("persona_id", persona_id.clone()));
|
||||
|
||||
let result = insight_generator
|
||||
.generate_agentic_insight_for_photo(
|
||||
&normalized_path,
|
||||
@@ -405,8 +390,6 @@ pub async fn generate_agentic_insight_handler(
|
||||
request.backend.clone(),
|
||||
fewshot_examples,
|
||||
fewshot_ids,
|
||||
user_id,
|
||||
persona_id,
|
||||
)
|
||||
.await;
|
||||
|
||||
@@ -657,23 +640,8 @@ pub struct ChatTurnHttpRequest {
|
||||
pub min_p: Option<f32>,
|
||||
#[serde(default)]
|
||||
pub max_iterations: Option<usize>,
|
||||
/// Per-turn system-prompt override. Ephemeral in append mode,
|
||||
/// persisted in amend / regenerate mode. See ChatTurnRequest for
|
||||
/// semantics. Also seeds the bootstrap path when no insight exists.
|
||||
#[serde(default)]
|
||||
pub system_prompt: Option<String>,
|
||||
/// Active persona id for this turn. New facts/recalls scope to it.
|
||||
/// Defaults to `"default"` when missing.
|
||||
#[serde(default)]
|
||||
pub persona_id: Option<String>,
|
||||
#[serde(default)]
|
||||
pub amend: bool,
|
||||
/// When true, force the bootstrap path even if an insight already
|
||||
/// exists: flip the existing row(s) to `is_current=false` and create
|
||||
/// a new insight row from this turn. Takes precedence over `amend`.
|
||||
/// Collapses to a normal bootstrap when no insight exists.
|
||||
#[serde(default)]
|
||||
pub regenerate: bool,
|
||||
}
|
||||
|
||||
#[derive(Debug, Serialize)]
|
||||
@@ -696,7 +664,7 @@ pub struct ChatTurnHttpResponse {
|
||||
#[post("/insights/chat")]
|
||||
pub async fn chat_turn_handler(
|
||||
http_request: HttpRequest,
|
||||
claims: Claims,
|
||||
_claims: Claims,
|
||||
request: web::Json<ChatTurnHttpRequest>,
|
||||
app_state: web::Data<AppState>,
|
||||
) -> impl Responder {
|
||||
@@ -715,14 +683,8 @@ pub async fn chat_turn_handler(
|
||||
}
|
||||
};
|
||||
|
||||
// Service-token claims (sub: "service:apollo") fall through to
|
||||
// user_id=1 — the operator convention. Mobile/web clients have a
|
||||
// numeric sub. Required for the entity_facts composite FK.
|
||||
let user_id = claims.sub.parse::<i32>().unwrap_or(1);
|
||||
|
||||
let chat_req = ChatTurnRequest {
|
||||
library_id: library.id,
|
||||
user_id,
|
||||
file_path: request.file_path.clone(),
|
||||
user_message: request.user_message.clone(),
|
||||
model: request.model.clone(),
|
||||
@@ -733,10 +695,7 @@ pub async fn chat_turn_handler(
|
||||
top_k: request.top_k,
|
||||
min_p: request.min_p,
|
||||
max_iterations: request.max_iterations,
|
||||
system_prompt: request.system_prompt.clone(),
|
||||
persona_id: request.persona_id.clone(),
|
||||
amend: request.amend,
|
||||
regenerate: request.regenerate,
|
||||
};
|
||||
|
||||
match app_state.insight_chat.chat_turn(chat_req).await {
|
||||
@@ -874,18 +833,15 @@ pub async fn chat_history_handler(
|
||||
query: web::Query<ChatHistoryQuery>,
|
||||
app_state: web::Data<AppState>,
|
||||
) -> impl Responder {
|
||||
// library_id scopes the lookup so a regenerate on this library
|
||||
// isn't shadowed by an untouched is_current=true row in another
|
||||
// library for the same rel_path. load_history falls back to the
|
||||
// cross-library lookup when the scoped one misses, so a photo
|
||||
// with no insight in this library but one in another still
|
||||
// surfaces (the "show this photo's primary insight" merge case).
|
||||
let library = libraries::resolve_library_param(&app_state, query.library.as_deref())
|
||||
// library param parsed for parity with other insight endpoints, even
|
||||
// though load_history currently keys on file_path alone (matches the
|
||||
// existing get_insight DAO contract).
|
||||
let _library = libraries::resolve_library_param(&app_state, query.library.as_deref())
|
||||
.ok()
|
||||
.flatten()
|
||||
.unwrap_or_else(|| app_state.primary_library());
|
||||
|
||||
match app_state.insight_chat.load_history(library.id, &query.path) {
|
||||
match app_state.insight_chat.load_history(&query.path) {
|
||||
Ok(view) => HttpResponse::Ok().json(ChatHistoryHttpResponse {
|
||||
messages: view
|
||||
.messages
|
||||
@@ -927,7 +883,7 @@ pub async fn chat_history_handler(
|
||||
/// Returns `text/event-stream` with one event per chat stream event.
|
||||
#[post("/insights/chat/stream")]
|
||||
pub async fn chat_stream_handler(
|
||||
claims: Claims,
|
||||
_claims: Claims,
|
||||
request: web::Json<ChatTurnHttpRequest>,
|
||||
app_state: web::Data<AppState>,
|
||||
) -> HttpResponse {
|
||||
@@ -941,12 +897,8 @@ pub async fn chat_stream_handler(
|
||||
}
|
||||
};
|
||||
|
||||
// Service-token sub falls through to user_id=1 (see chat_turn_handler).
|
||||
let user_id = claims.sub.parse::<i32>().unwrap_or(1);
|
||||
|
||||
let chat_req = ChatTurnRequest {
|
||||
library_id: library.id,
|
||||
user_id,
|
||||
file_path: request.file_path.clone(),
|
||||
user_message: request.user_message.clone(),
|
||||
model: request.model.clone(),
|
||||
@@ -957,10 +909,7 @@ pub async fn chat_stream_handler(
|
||||
top_k: request.top_k,
|
||||
min_p: request.min_p,
|
||||
max_iterations: request.max_iterations,
|
||||
system_prompt: request.system_prompt.clone(),
|
||||
persona_id: request.persona_id.clone(),
|
||||
amend: request.amend,
|
||||
regenerate: request.regenerate,
|
||||
};
|
||||
|
||||
let service = app_state.insight_chat.clone();
|
||||
@@ -1012,9 +961,8 @@ fn render_sse_frame(ev: &ChatStreamEvent) -> String {
|
||||
tool_calls_made,
|
||||
iterations_used,
|
||||
truncated,
|
||||
prompt_tokens,
|
||||
eval_tokens,
|
||||
num_ctx,
|
||||
prompt_eval_count,
|
||||
eval_count,
|
||||
amended_insight_id,
|
||||
backend_used,
|
||||
model_used,
|
||||
@@ -1024,20 +972,14 @@ fn render_sse_frame(ev: &ChatStreamEvent) -> String {
|
||||
"tool_calls_made": tool_calls_made,
|
||||
"iterations_used": iterations_used,
|
||||
"truncated": truncated,
|
||||
"prompt_tokens": prompt_tokens,
|
||||
"eval_tokens": eval_tokens,
|
||||
"num_ctx": num_ctx,
|
||||
"prompt_eval_count": prompt_eval_count,
|
||||
"eval_count": eval_count,
|
||||
"amended_insight_id": amended_insight_id,
|
||||
"backend": backend_used,
|
||||
"model": model_used,
|
||||
}),
|
||||
),
|
||||
// Apollo's frontend SSE consumer (and its free-chat backend, which
|
||||
// is the de-facto convention) listens for `error_message`. Emitting
|
||||
// `error` here meant any failure on the photo-chat path (e.g.
|
||||
// "no insight found for path") was silently dropped, leaving an
|
||||
// empty assistant bubble with no clue why the turn died.
|
||||
ChatStreamEvent::Error(msg) => ("error_message", serde_json::json!({ "message": msg })),
|
||||
ChatStreamEvent::Error(msg) => ("error", serde_json::json!({ "message": msg })),
|
||||
};
|
||||
let data = serde_json::to_string(&payload).unwrap_or_else(|_| "{}".to_string());
|
||||
format!("event: {}\ndata: {}\n\n", event_name, data)
|
||||
|
||||
+152
-971
File diff suppressed because it is too large
Load Diff
+307
-1083
File diff suppressed because it is too large
Load Diff
@@ -8,7 +8,6 @@ pub mod llm_client;
|
||||
pub mod ollama;
|
||||
pub mod openrouter;
|
||||
pub mod sms_client;
|
||||
pub mod tag_client;
|
||||
|
||||
// strip_summary_boilerplate is used by binaries (test_daily_summary), not the library
|
||||
#[allow(unused_imports)]
|
||||
|
||||
+19
-97
@@ -20,36 +20,31 @@ impl SmsApiClient {
|
||||
}
|
||||
}
|
||||
|
||||
/// Compute a `[start, end]` unix-second window of `2 * radius_days`
|
||||
/// centered on `center_ts`. `radius_days < 1` is clamped to 1 to avoid
|
||||
/// degenerate zero-width windows.
|
||||
pub(crate) fn window_for_radius(center_ts: i64, radius_days: i64) -> (i64, i64) {
|
||||
let r = radius_days.max(1);
|
||||
let span = r * 86400;
|
||||
(center_ts - span, center_ts + span)
|
||||
}
|
||||
|
||||
/// Fetch messages for a specific contact within ±`radius_days` of the
|
||||
/// given timestamp. Falls back to all contacts when no messages found
|
||||
/// for the named contact. Sorted by proximity to the center timestamp.
|
||||
/// Fetch messages for a specific contact within ±4 days of the given timestamp
|
||||
/// Falls back to all contacts if no messages found for the specific contact
|
||||
/// Messages are sorted by proximity to the center timestamp
|
||||
pub async fn fetch_messages_for_contact(
|
||||
&self,
|
||||
contact: Option<&str>,
|
||||
center_timestamp: i64,
|
||||
radius_days: i64,
|
||||
) -> Result<Vec<SmsMessage>> {
|
||||
let effective_radius = radius_days.max(1);
|
||||
let (start_ts, end_ts) = Self::window_for_radius(center_timestamp, radius_days);
|
||||
use chrono::Duration;
|
||||
|
||||
// Calculate ±4 days range around the center timestamp
|
||||
let center_dt = chrono::DateTime::from_timestamp(center_timestamp, 0)
|
||||
.ok_or_else(|| anyhow::anyhow!("Invalid timestamp"))?;
|
||||
|
||||
let start_dt = center_dt - Duration::days(4);
|
||||
let end_dt = center_dt + Duration::days(4);
|
||||
|
||||
let start_ts = start_dt.timestamp();
|
||||
let end_ts = end_dt.timestamp();
|
||||
|
||||
// If contact specified, try fetching for that contact first
|
||||
if let Some(contact_name) = contact {
|
||||
log::info!(
|
||||
"Fetching SMS for contact: {} (±{} days from {})",
|
||||
"Fetching SMS for contact: {} (±4 days from {})",
|
||||
contact_name,
|
||||
effective_radius,
|
||||
center_dt.format("%Y-%m-%d %H:%M:%S")
|
||||
);
|
||||
let messages = self
|
||||
@@ -73,8 +68,7 @@ impl SmsApiClient {
|
||||
|
||||
// Fallback to all contacts
|
||||
log::info!(
|
||||
"Fetching all SMS messages (±{} days from {})",
|
||||
effective_radius,
|
||||
"Fetching all SMS messages (±4 days from {})",
|
||||
center_dt.format("%Y-%m-%d %H:%M:%S")
|
||||
);
|
||||
self.fetch_messages(start_ts, end_ts, None, Some(center_timestamp))
|
||||
@@ -257,45 +251,23 @@ impl SmsApiClient {
|
||||
}
|
||||
|
||||
/// Search message bodies via the Django side's FTS5 / semantic / hybrid
|
||||
/// endpoint. `params.mode` selects the ranking strategy:
|
||||
/// endpoint. `mode` selects the ranking strategy:
|
||||
/// - "fts5" keyword-only, supports phrase / prefix / boolean / NEAR
|
||||
/// - "semantic" embedding similarity
|
||||
/// - "hybrid" both merged via reciprocal rank fusion (recommended)
|
||||
///
|
||||
/// All of `contact_id`, `date_from` / `date_to` (unix seconds), `is_mms`,
|
||||
/// `has_media`, and `offset` are pushed to SMS-API server-side so the
|
||||
/// filtered+paginated result set is exact rather than a client-side
|
||||
/// over-fetch.
|
||||
pub async fn search_messages(
|
||||
&self,
|
||||
query: &str,
|
||||
params: &SmsSearchParams<'_>,
|
||||
mode: &str,
|
||||
limit: usize,
|
||||
) -> Result<Vec<SmsSearchHit>> {
|
||||
let mut url = format!(
|
||||
let url = format!(
|
||||
"{}/api/messages/search/?q={}&mode={}&limit={}",
|
||||
self.base_url,
|
||||
urlencoding::encode(query),
|
||||
urlencoding::encode(params.mode),
|
||||
params.limit,
|
||||
urlencoding::encode(mode),
|
||||
limit
|
||||
);
|
||||
if let Some(cid) = params.contact_id {
|
||||
url.push_str(&format!("&contact_id={}", cid));
|
||||
}
|
||||
if let Some(off) = params.offset {
|
||||
url.push_str(&format!("&offset={}", off));
|
||||
}
|
||||
if let Some(from) = params.date_from {
|
||||
url.push_str(&format!("&date_from={}", from));
|
||||
}
|
||||
if let Some(to) = params.date_to {
|
||||
url.push_str(&format!("&date_to={}", to));
|
||||
}
|
||||
if let Some(is_mms) = params.is_mms {
|
||||
url.push_str(&format!("&is_mms={}", is_mms));
|
||||
}
|
||||
if let Some(has_media) = params.has_media {
|
||||
url.push_str(&format!("&has_media={}", has_media));
|
||||
}
|
||||
|
||||
let mut request = self.client.get(&url);
|
||||
if let Some(token) = &self.token {
|
||||
@@ -398,30 +370,6 @@ pub struct SmsSearchHit {
|
||||
/// Present for semantic / hybrid modes; absent for fts5.
|
||||
#[serde(default)]
|
||||
pub similarity_score: Option<f32>,
|
||||
/// SMS-API-generated excerpt around the match, wrapped in `<mark>` tags.
|
||||
/// For MMS messages that only matched via attachment text / filename
|
||||
/// (empty `body`), the snippet is the only meaningful preview.
|
||||
#[serde(default)]
|
||||
pub snippet: Option<String>,
|
||||
}
|
||||
|
||||
/// Optional filter / paging knobs for [`SmsApiClient::search_messages`].
|
||||
/// All fields except `mode` and `limit` map 1:1 to the same-named SMS-API
|
||||
/// query params (added in the 2026-05 search-enhancements release).
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct SmsSearchParams<'a> {
|
||||
pub mode: &'a str,
|
||||
pub limit: usize,
|
||||
pub contact_id: Option<i64>,
|
||||
/// Unix-seconds inclusive lower bound on `date`.
|
||||
pub date_from: Option<i64>,
|
||||
/// Unix-seconds inclusive upper bound on `date`.
|
||||
pub date_to: Option<i64>,
|
||||
/// `Some(true)` = MMS only, `Some(false)` = SMS only, `None` = both.
|
||||
pub is_mms: Option<bool>,
|
||||
/// `Some(true)` = only messages with image/video/audio attachments.
|
||||
pub has_media: Option<bool>,
|
||||
pub offset: Option<usize>,
|
||||
}
|
||||
|
||||
#[derive(Deserialize)]
|
||||
@@ -431,29 +379,3 @@ struct SmsSearchResponse {
|
||||
#[serde(default)]
|
||||
search_method: String,
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
||||
#[test]
|
||||
fn window_for_radius_produces_2n_day_span() {
|
||||
let center: i64 = 1_700_000_000;
|
||||
let (start, end) = SmsApiClient::window_for_radius(center, 7);
|
||||
assert_eq!(end - start, 14 * 86400);
|
||||
assert_eq!(start + 7 * 86400, center);
|
||||
assert_eq!(end - 7 * 86400, center);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn window_for_radius_clamps_zero_to_one() {
|
||||
let (start, end) = SmsApiClient::window_for_radius(100_000, 0);
|
||||
assert_eq!(end - start, 2 * 86400);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn window_for_radius_clamps_negative_to_one() {
|
||||
let (start, end) = SmsApiClient::window_for_radius(100_000, -7);
|
||||
assert_eq!(end - start, 2 * 86400);
|
||||
}
|
||||
}
|
||||
|
||||
@@ -1,319 +0,0 @@
|
||||
//! Thin async HTTP client for Apollo's `/api/internal/tags/*` endpoints.
|
||||
//!
|
||||
//! Apollo hosts the RAM++ auto-tag inference service alongside insightface.
|
||||
//! This client is the ImageApi side — shove image bytes through `/auto` and
|
||||
//! get back a list of `(name, confidence)` predictions over RAM++'s
|
||||
//! ~4585-tag vocabulary.
|
||||
//!
|
||||
//! Mirrors `face_client.rs` shape: optional base URL (None = disabled), one
|
||||
//! reqwest client with a generous timeout because GPU inference under a
|
||||
//! backlog can queue server-side (Apollo's threadpool is bounded to 1
|
||||
//! worker on CUDA).
|
||||
//!
|
||||
//! Configured via `APOLLO_TAG_API_BASE_URL`, falling back to
|
||||
//! `APOLLO_API_BASE_URL` when the dedicated var is unset (single-Apollo
|
||||
//! deploys are the common case). Both unset → `is_enabled()` returns false
|
||||
//! and the probe binary / future backlog drain no-op.
|
||||
//!
|
||||
//! Wire format: multipart/form-data with `file=<bytes>` and `meta=<json>`.
|
||||
//! `meta` carries `{content_hash, library_id, rel_path, threshold?}` —
|
||||
//! Apollo logs the path/lib for traceability and reads `threshold` to
|
||||
//! override the engine default for that call (the probe binary uses this
|
||||
//! to sweep without restarting Apollo).
|
||||
//!
|
||||
//! Error mapping (reflected in [`TagDetectError`]):
|
||||
//! - 422 `decode_failed` → permanent: ImageApi marks `status='failed'` and
|
||||
//! doesn't retry until a manual rerun.
|
||||
//! - 200 with `tags:[]` → `status='no_tags'` marker (success-with-zero).
|
||||
//! - 503 `cuda_oom` / `engine_unavailable` → defer-and-retry: no marker
|
||||
//! written.
|
||||
//! - Any other 5xx / network error → defer.
|
||||
|
||||
use anyhow::{Context, Result};
|
||||
use reqwest::Client;
|
||||
use serde::{Deserialize, Serialize};
|
||||
use std::time::Duration;
|
||||
|
||||
#[derive(Debug, Clone, Serialize)]
|
||||
pub struct TagMeta {
|
||||
pub content_hash: String,
|
||||
pub library_id: i32,
|
||||
pub rel_path: String,
|
||||
/// Per-call threshold override. Apollo's engine default (0.68 for
|
||||
/// ram_plus_swin_large_14m) is used when unset. The probe binary
|
||||
/// uses this to sweep without restarting Apollo.
|
||||
#[serde(skip_serializing_if = "Option::is_none")]
|
||||
pub threshold: Option<f32>,
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Deserialize)]
|
||||
pub struct TagPrediction {
|
||||
pub name: String,
|
||||
pub confidence: f32,
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Deserialize)]
|
||||
pub struct TagResponse {
|
||||
pub model_version: String,
|
||||
pub duration_ms: i64,
|
||||
pub threshold: f32,
|
||||
pub tags: Vec<TagPrediction>,
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Deserialize)]
|
||||
#[allow(dead_code)] // Reported by Apollo; load_error consumed by future health probe
|
||||
pub struct TagHealth {
|
||||
pub loaded: bool,
|
||||
pub device: String,
|
||||
pub model_version: String,
|
||||
pub image_size: i32,
|
||||
pub threshold: f32,
|
||||
#[serde(default)]
|
||||
pub load_error: Option<String>,
|
||||
}
|
||||
|
||||
/// Distinguishes permanent failures (don't retry) from transient ones
|
||||
/// (defer and retry on next scan tick). Mirrors `FaceDetectError` so the
|
||||
/// future backlog drain can use the same marker-row decision tree.
|
||||
#[derive(Debug)]
|
||||
pub enum TagDetectError {
|
||||
/// Apollo refused the bytes for a reason that won't change on retry
|
||||
/// (decode failure, zero-dim image). Mark `status='failed'`.
|
||||
Permanent(anyhow::Error),
|
||||
/// Apollo couldn't process this turn but might next time (CUDA OOM,
|
||||
/// engine not loaded yet, network hiccup). Don't mark anything.
|
||||
Transient(anyhow::Error),
|
||||
/// Feature is disabled (no APOLLO_TAG_API_BASE_URL / APOLLO_API_BASE_URL).
|
||||
/// Caller should silently no-op.
|
||||
Disabled,
|
||||
}
|
||||
|
||||
impl std::fmt::Display for TagDetectError {
|
||||
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
|
||||
match self {
|
||||
TagDetectError::Permanent(e) => write!(f, "permanent: {e}"),
|
||||
TagDetectError::Transient(e) => write!(f, "transient: {e}"),
|
||||
TagDetectError::Disabled => write!(f, "tag client disabled"),
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl std::error::Error for TagDetectError {}
|
||||
|
||||
#[derive(Clone)]
|
||||
pub struct TagClient {
|
||||
client: Client,
|
||||
/// `None` → disabled. Trailing slash trimmed at construction so url
|
||||
/// building doesn't double up.
|
||||
base_url: Option<String>,
|
||||
}
|
||||
|
||||
impl TagClient {
|
||||
pub fn new(base_url: Option<String>) -> Self {
|
||||
// 60 s timeout: GPU inference is fast (~50–150 ms on RTX-class
|
||||
// hardware) but Apollo's 1-worker threadpool means a backlog drain
|
||||
// queues server-side. 60 s is enough headroom for a small queue
|
||||
// depth without surfacing a false transient.
|
||||
let timeout_secs = std::env::var("TAG_DETECT_TIMEOUT_SEC")
|
||||
.ok()
|
||||
.and_then(|s| s.parse::<u64>().ok())
|
||||
.unwrap_or(60);
|
||||
let client = Client::builder()
|
||||
.timeout(Duration::from_secs(timeout_secs))
|
||||
.build()
|
||||
.expect("reqwest client build");
|
||||
Self {
|
||||
client,
|
||||
base_url: base_url.map(|u| u.trim_end_matches('/').to_string()),
|
||||
}
|
||||
}
|
||||
|
||||
/// Construct a client from the standard env vars. APOLLO_TAG_API_BASE_URL
|
||||
/// wins; falls back to APOLLO_API_BASE_URL. Both unset → disabled.
|
||||
pub fn from_env() -> Self {
|
||||
let base = std::env::var("APOLLO_TAG_API_BASE_URL")
|
||||
.ok()
|
||||
.filter(|s| !s.trim().is_empty())
|
||||
.or_else(|| {
|
||||
std::env::var("APOLLO_API_BASE_URL")
|
||||
.ok()
|
||||
.filter(|s| !s.trim().is_empty())
|
||||
});
|
||||
Self::new(base)
|
||||
}
|
||||
|
||||
pub fn is_enabled(&self) -> bool {
|
||||
self.base_url.is_some()
|
||||
}
|
||||
|
||||
/// Run RAM++ auto-tagging over `bytes`. Empty `tags[]` is the no-tags
|
||||
/// signal — caller writes a marker row in the persistence phase.
|
||||
pub async fn auto_tag(
|
||||
&self,
|
||||
bytes: Vec<u8>,
|
||||
meta: TagMeta,
|
||||
) -> std::result::Result<TagResponse, TagDetectError> {
|
||||
let Some(base) = self.base_url.as_deref() else {
|
||||
return Err(TagDetectError::Disabled);
|
||||
};
|
||||
let url = format!("{}/api/internal/tags/auto", base);
|
||||
self.post_multipart(&url, bytes, &meta).await
|
||||
}
|
||||
|
||||
/// Engine reachability + device/model report.
|
||||
#[allow(dead_code)] // consumed by future startup probe
|
||||
pub async fn health(&self) -> Result<TagHealth> {
|
||||
let base = self.base_url.as_deref().context("tag client disabled")?;
|
||||
let url = format!("{}/api/internal/tags/health", base);
|
||||
let resp = self.client.get(&url).send().await?.error_for_status()?;
|
||||
let body: TagHealth = resp.json().await?;
|
||||
Ok(body)
|
||||
}
|
||||
|
||||
async fn post_multipart(
|
||||
&self,
|
||||
url: &str,
|
||||
bytes: Vec<u8>,
|
||||
meta: &TagMeta,
|
||||
) -> std::result::Result<TagResponse, TagDetectError> {
|
||||
let meta_json = serde_json::to_string(meta)
|
||||
.map_err(|e| TagDetectError::Permanent(anyhow::anyhow!("meta serialize: {e}")))?;
|
||||
let form = reqwest::multipart::Form::new()
|
||||
.text("meta", meta_json)
|
||||
.part(
|
||||
"file",
|
||||
reqwest::multipart::Part::bytes(bytes)
|
||||
.file_name(meta.rel_path.clone())
|
||||
.mime_str("application/octet-stream")
|
||||
.unwrap_or_else(|_| reqwest::multipart::Part::bytes(Vec::new())),
|
||||
);
|
||||
|
||||
let resp = match self.client.post(url).multipart(form).send().await {
|
||||
Ok(r) => r,
|
||||
Err(e) if e.is_timeout() || e.is_connect() => {
|
||||
return Err(TagDetectError::Transient(anyhow::anyhow!(
|
||||
"tag client network: {e}"
|
||||
)));
|
||||
}
|
||||
Err(e) => {
|
||||
return Err(TagDetectError::Transient(anyhow::anyhow!(
|
||||
"tag client request: {e}"
|
||||
)));
|
||||
}
|
||||
};
|
||||
|
||||
let status = resp.status();
|
||||
if status.is_success() {
|
||||
let body: TagResponse = resp.json().await.map_err(|e| {
|
||||
TagDetectError::Transient(anyhow::anyhow!("tag response decode: {e}"))
|
||||
})?;
|
||||
return Ok(body);
|
||||
}
|
||||
|
||||
let body_text = resp.text().await.unwrap_or_default();
|
||||
Err(classify_error_response(status.as_u16(), &body_text))
|
||||
}
|
||||
}
|
||||
|
||||
/// Pulled out as a pure function so the marker-row contract is unit-testable
|
||||
/// without spinning up an HTTP server. Behavior matches face_client::classify
|
||||
/// so the future backlog drain can share the same retry policy.
|
||||
fn classify_error_response(status: u16, body_text: &str) -> TagDetectError {
|
||||
let detail_code = serde_json::from_str::<serde_json::Value>(body_text)
|
||||
.ok()
|
||||
.and_then(|v| {
|
||||
v.get("detail")
|
||||
.and_then(|d| d.as_str().map(str::to_string))
|
||||
.or_else(|| {
|
||||
v.get("detail")
|
||||
.and_then(|d| d.get("code"))
|
||||
.and_then(|c| c.as_str())
|
||||
.map(str::to_string)
|
||||
})
|
||||
})
|
||||
.unwrap_or_default();
|
||||
|
||||
if status == 422 {
|
||||
return TagDetectError::Permanent(anyhow::anyhow!(
|
||||
"tag detect 422 {}: {}",
|
||||
detail_code,
|
||||
body_text
|
||||
));
|
||||
}
|
||||
if status == 503 {
|
||||
return TagDetectError::Transient(anyhow::anyhow!(
|
||||
"tag detect 503 {}: {}",
|
||||
detail_code,
|
||||
body_text
|
||||
));
|
||||
}
|
||||
// 408 / 413 / 429 are operator-fixable infra issues — defer so the
|
||||
// next pass retries naturally once the proxy is fixed (see
|
||||
// face_client::classify_error_response for the cautionary tale).
|
||||
if matches!(status, 408 | 413 | 429) {
|
||||
return TagDetectError::Transient(anyhow::anyhow!(
|
||||
"tag detect {} {}: {}",
|
||||
status,
|
||||
detail_code,
|
||||
body_text
|
||||
));
|
||||
}
|
||||
if (400..500).contains(&status) {
|
||||
TagDetectError::Permanent(anyhow::anyhow!(
|
||||
"tag detect {} {}: {}",
|
||||
status,
|
||||
detail_code,
|
||||
body_text
|
||||
))
|
||||
} else {
|
||||
TagDetectError::Transient(anyhow::anyhow!(
|
||||
"tag detect {} {}: {}",
|
||||
status,
|
||||
detail_code,
|
||||
body_text
|
||||
))
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
||||
fn is_permanent(e: &TagDetectError) -> bool {
|
||||
matches!(e, TagDetectError::Permanent(_))
|
||||
}
|
||||
fn is_transient(e: &TagDetectError) -> bool {
|
||||
matches!(e, TagDetectError::Transient(_))
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn classify_422_decode_failed_is_permanent() {
|
||||
let e = classify_error_response(422, r#"{"detail":"decode_failed: bad bytes"}"#);
|
||||
assert!(is_permanent(&e));
|
||||
assert!(format!("{e}").contains("decode_failed"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn classify_503_cuda_oom_is_transient() {
|
||||
let e = classify_error_response(
|
||||
503,
|
||||
r#"{"detail":{"code":"cuda_oom","error":"out of memory"}}"#,
|
||||
);
|
||||
assert!(is_transient(&e));
|
||||
assert!(format!("{e}").contains("cuda_oom"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn classify_5xx_is_transient_other_4xx_is_permanent() {
|
||||
assert!(is_transient(&classify_error_response(500, "")));
|
||||
assert!(is_permanent(&classify_error_response(400, "{}")));
|
||||
assert!(is_permanent(&classify_error_response(404, "{}")));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn classify_infra_4xx_is_transient() {
|
||||
assert!(is_transient(&classify_error_response(408, "")));
|
||||
assert!(is_transient(&classify_error_response(413, "<html>")));
|
||||
assert!(is_transient(&classify_error_response(429, "{}")));
|
||||
}
|
||||
}
|
||||
-721
@@ -1,721 +0,0 @@
|
||||
//! Per-tick drains the watcher runs alongside ingest.
|
||||
//!
|
||||
//! These passes were previously inlined in `main.rs`; they exist because
|
||||
//! a quick scan only walks recently-modified files, so any backlog of
|
||||
//! rows missing a `content_hash` / `date_taken` / face detection
|
||||
//! wouldn't otherwise drain except during the once-an-hour full scan.
|
||||
//! Each function is bounded per call by a `*_PER_TICK` env-var cap.
|
||||
|
||||
use std::collections::HashMap;
|
||||
use std::path::PathBuf;
|
||||
use std::sync::{Arc, Mutex};
|
||||
|
||||
use log::{debug, info, warn};
|
||||
|
||||
use crate::content_hash;
|
||||
use crate::database::ExifDao;
|
||||
use crate::date_resolver;
|
||||
use crate::face_watch;
|
||||
use crate::faces;
|
||||
use crate::file_types;
|
||||
use crate::libraries;
|
||||
use crate::tags;
|
||||
|
||||
/// Compute and persist content_hash for image_exif rows where it's NULL.
|
||||
///
|
||||
/// Bounded per call by `FACE_HASH_BACKFILL_MAX_PER_TICK` (default 2000)
|
||||
/// so a watcher tick on a large legacy library doesn't block for hours
|
||||
/// blake3-ing every photo at once. Subsequent scans pick up the rest.
|
||||
/// For 50k+ libraries the dedicated `cargo run --bin backfill_hashes`
|
||||
/// is still faster (it doesn't fight a watcher loop for the DAO mutex).
|
||||
///
|
||||
/// Drains unhashed image_exif rows by querying them directly, independent
|
||||
/// of the filesystem walk. Quick scans only walk recently-modified files,
|
||||
/// so a backlog of pre-existing unhashed rows never enters
|
||||
/// `process_new_files`'s candidate set — left alone, it would only drain
|
||||
/// on full scans (default once an hour). Calling this every tick keeps
|
||||
/// the face-detection backlog moving regardless.
|
||||
///
|
||||
/// Returns the number of rows successfully backfilled this pass.
|
||||
pub fn backfill_unhashed_backlog(
|
||||
context: &opentelemetry::Context,
|
||||
library: &libraries::Library,
|
||||
exif_dao: &Arc<Mutex<Box<dyn ExifDao>>>,
|
||||
) -> usize {
|
||||
let cap: i64 = dotenv::var("FACE_HASH_BACKFILL_MAX_PER_TICK")
|
||||
.ok()
|
||||
.and_then(|s| s.parse().ok())
|
||||
.filter(|n: &i64| *n > 0)
|
||||
.unwrap_or(2000);
|
||||
|
||||
// Fetch up to cap+1 rows so we can tell "more remain" without a
|
||||
// separate count query. Across libraries — there's no per-library
|
||||
// filter on get_rows_missing_hash today — but we only ever update
|
||||
// rows whose library_id matches the caller's library, so other
|
||||
// libraries' rows just get skipped here and picked up on the next
|
||||
// library's tick. Negligible cost given the cap.
|
||||
let rows: Vec<(i32, String)> = {
|
||||
let mut dao = exif_dao.lock().expect("Unable to lock ExifDao");
|
||||
dao.get_rows_missing_hash(context, cap + 1)
|
||||
.unwrap_or_default()
|
||||
};
|
||||
if rows.is_empty() {
|
||||
return 0;
|
||||
}
|
||||
|
||||
let more_than_cap = rows.len() as i64 > cap;
|
||||
let base_path = std::path::Path::new(&library.root_path);
|
||||
|
||||
let mut backfilled = 0usize;
|
||||
let mut errors = 0usize;
|
||||
let mut skipped_other_lib = 0usize;
|
||||
for (lib_id, rel_path) in rows.iter().take(cap as usize) {
|
||||
if *lib_id != library.id {
|
||||
skipped_other_lib += 1;
|
||||
continue;
|
||||
}
|
||||
let abs = base_path.join(rel_path);
|
||||
if !abs.exists() {
|
||||
// File walked away — the watcher's reconciliation pass will
|
||||
// remove the orphan exif row eventually.
|
||||
continue;
|
||||
}
|
||||
match content_hash::compute(&abs) {
|
||||
Ok(id) => {
|
||||
let mut dao = exif_dao.lock().expect("Unable to lock ExifDao");
|
||||
if let Err(e) = dao.backfill_content_hash(
|
||||
context,
|
||||
library.id,
|
||||
rel_path,
|
||||
&id.content_hash,
|
||||
id.size_bytes,
|
||||
) {
|
||||
warn!(
|
||||
"face_watch: backfill_content_hash failed for {}: {:?}",
|
||||
rel_path, e
|
||||
);
|
||||
errors += 1;
|
||||
} else {
|
||||
backfilled += 1;
|
||||
}
|
||||
}
|
||||
Err(e) => {
|
||||
debug!(
|
||||
"face_watch: hash compute failed for {} ({:?})",
|
||||
abs.display(),
|
||||
e
|
||||
);
|
||||
errors += 1;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if backfilled > 0 || errors > 0 || more_than_cap {
|
||||
info!(
|
||||
"face_watch: backfill pass for library '{}': hashed {} ({} error(s), {} skipped to other libraries; {} cap, more_remain={})",
|
||||
library.name, backfilled, errors, skipped_other_lib, cap, more_than_cap
|
||||
);
|
||||
}
|
||||
backfilled
|
||||
}
|
||||
|
||||
/// Drain image_exif rows whose `date_taken` was never resolved or was
|
||||
/// resolved by the weakest fallback (`fs_time`). Runs the canonical-date
|
||||
/// waterfall — exiftool batch (one subprocess for the whole tick's
|
||||
/// rows) → filename regex → earliest_fs_time — and persists each
|
||||
/// resolution with its source tag. Capped per tick by
|
||||
/// `DATE_BACKFILL_MAX_PER_TICK` (default 500) so a 14k-row library
|
||||
/// drains over a few quick-scan ticks without blocking the watcher.
|
||||
///
|
||||
/// kamadak-exif is intentionally skipped here: the row already has a
|
||||
/// NULL date_taken because the ingest path's kamadak-exif call returned
|
||||
/// nothing, and re-running it would just produce the same answer.
|
||||
/// exiftool is the meaningful new attempt — it handles videos and
|
||||
/// MakerNote-hosted dates kamadak can't reach.
|
||||
pub fn backfill_missing_date_taken(
|
||||
context: &opentelemetry::Context,
|
||||
library: &libraries::Library,
|
||||
exif_dao: &Arc<Mutex<Box<dyn ExifDao>>>,
|
||||
) -> usize {
|
||||
let cap: i64 = dotenv::var("DATE_BACKFILL_MAX_PER_TICK")
|
||||
.ok()
|
||||
.and_then(|s| s.parse().ok())
|
||||
.filter(|n: &i64| *n > 0)
|
||||
.unwrap_or(500);
|
||||
|
||||
let rows: Vec<(i32, String)> = {
|
||||
let mut dao = exif_dao.lock().expect("Unable to lock ExifDao");
|
||||
dao.get_rows_needing_date_backfill(context, library.id, cap + 1)
|
||||
.unwrap_or_default()
|
||||
};
|
||||
if rows.is_empty() {
|
||||
return 0;
|
||||
}
|
||||
|
||||
let more_than_cap = rows.len() as i64 > cap;
|
||||
let base_path = std::path::Path::new(&library.root_path);
|
||||
|
||||
// Build absolute paths and drop rows whose files no longer exist —
|
||||
// the missing-file scan in library_maintenance retires deleted rows
|
||||
// separately. Without this filter, NULL-date rows for missing files
|
||||
// would loop through the drain forever (no source can resolve them).
|
||||
let mut existing: Vec<(String, PathBuf)> = Vec::with_capacity(rows.len());
|
||||
for (_, rel_path) in rows.iter().take(cap as usize) {
|
||||
let abs = base_path.join(rel_path);
|
||||
if abs.exists() {
|
||||
existing.push((rel_path.clone(), abs));
|
||||
}
|
||||
}
|
||||
if existing.is_empty() {
|
||||
return 0;
|
||||
}
|
||||
|
||||
// One exiftool subprocess for the whole batch; the resolver falls
|
||||
// through to filename / fs_time per file when exiftool can't supply
|
||||
// a date (or isn't installed at all).
|
||||
let paths: Vec<PathBuf> = existing.iter().map(|(_, p)| p.clone()).collect();
|
||||
let resolved = date_resolver::resolve_dates_batch(&paths, &HashMap::new());
|
||||
|
||||
let mut backfilled = 0usize;
|
||||
let mut unresolved = 0usize;
|
||||
let mut by_source: HashMap<&'static str, usize> = HashMap::new();
|
||||
{
|
||||
let mut dao = exif_dao.lock().expect("Unable to lock ExifDao");
|
||||
for (rel_path, abs) in &existing {
|
||||
let Some(rd) = resolved.get(abs).copied() else {
|
||||
unresolved += 1;
|
||||
continue;
|
||||
};
|
||||
match dao.backfill_date_taken(
|
||||
context,
|
||||
library.id,
|
||||
rel_path,
|
||||
rd.timestamp,
|
||||
rd.source.as_str(),
|
||||
) {
|
||||
Ok(()) => {
|
||||
backfilled += 1;
|
||||
*by_source.entry(rd.source.as_str()).or_insert(0) += 1;
|
||||
}
|
||||
Err(e) => {
|
||||
warn!(
|
||||
"date_backfill: update failed for lib {} {}: {:?}",
|
||||
library.id, rel_path, e
|
||||
);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if backfilled > 0 || unresolved > 0 || more_than_cap {
|
||||
info!(
|
||||
"date_backfill: library '{}': resolved {} ({:?}), {} unresolved, cap={}, more_remain={}",
|
||||
library.name, backfilled, by_source, unresolved, cap, more_than_cap
|
||||
);
|
||||
}
|
||||
backfilled
|
||||
}
|
||||
|
||||
/// Per-tick face-detection drain. Pulls a capped batch of hashed-but-
|
||||
/// unscanned image_exif rows directly via the FaceDao anti-join and
|
||||
/// hands them to the existing detection pass. Runs on every tick (not
|
||||
/// just full scans) so the backlog moves at quick-scan cadence.
|
||||
pub fn process_face_backlog(
|
||||
context: &opentelemetry::Context,
|
||||
library: &libraries::Library,
|
||||
face_client: &crate::ai::face_client::FaceClient,
|
||||
face_dao: &Arc<Mutex<Box<dyn faces::FaceDao>>>,
|
||||
tag_dao: &Arc<Mutex<Box<dyn tags::TagDao>>>,
|
||||
excluded_dirs: &[String],
|
||||
) {
|
||||
let cap: i64 = dotenv::var("FACE_BACKLOG_MAX_PER_TICK")
|
||||
.ok()
|
||||
.and_then(|s| s.parse().ok())
|
||||
.filter(|n: &i64| *n > 0)
|
||||
.unwrap_or(64);
|
||||
|
||||
let rows: Vec<(String, String)> = {
|
||||
let mut dao = face_dao.lock().expect("face dao");
|
||||
match dao.list_unscanned_candidates(context, library.id, cap) {
|
||||
Ok(r) => r,
|
||||
Err(e) => {
|
||||
warn!(
|
||||
"face_watch: list_unscanned_candidates failed for library '{}': {:?}",
|
||||
library.name, e
|
||||
);
|
||||
return;
|
||||
}
|
||||
}
|
||||
};
|
||||
if rows.is_empty() {
|
||||
return;
|
||||
}
|
||||
|
||||
info!(
|
||||
"face_watch: backlog drain — running detection on {} candidate(s) for library '{}' (cap={})",
|
||||
rows.len(),
|
||||
library.name,
|
||||
cap
|
||||
);
|
||||
|
||||
let candidates: Vec<face_watch::FaceCandidate> = rows
|
||||
.into_iter()
|
||||
.map(|(rel_path, content_hash)| face_watch::FaceCandidate {
|
||||
rel_path,
|
||||
content_hash,
|
||||
})
|
||||
.collect();
|
||||
|
||||
face_watch::run_face_detection_pass(
|
||||
library,
|
||||
excluded_dirs,
|
||||
face_client,
|
||||
Arc::clone(face_dao),
|
||||
Arc::clone(tag_dao),
|
||||
candidates,
|
||||
);
|
||||
}
|
||||
|
||||
/// Compute content_hash for any image rows the walker just touched
|
||||
/// whose stored EXIF row is still hash-less. Called from
|
||||
/// `process_new_files` so freshly-ingested files don't have to wait for
|
||||
/// the next standalone `backfill_unhashed_backlog` tick before face
|
||||
/// detection can key on their bytes.
|
||||
///
|
||||
/// Cap is on **successes only**. An earlier version counted errors too,
|
||||
/// so a pocket of chronically-unhashable files at the front of the
|
||||
/// table (vanished mid-scan, permission denied, etc.) burned the budget
|
||||
/// every tick and the rest of the backlog never advanced.
|
||||
pub fn backfill_missing_content_hashes(
|
||||
context: &opentelemetry::Context,
|
||||
files: &[(PathBuf, String)],
|
||||
library: &libraries::Library,
|
||||
exif_dao: &Arc<Mutex<Box<dyn ExifDao>>>,
|
||||
) {
|
||||
let image_paths: Vec<String> = files
|
||||
.iter()
|
||||
.filter(|(p, _)| !file_types::is_video_file(p))
|
||||
.map(|(_, rel)| rel.clone())
|
||||
.collect();
|
||||
if image_paths.is_empty() {
|
||||
return;
|
||||
}
|
||||
|
||||
let exif_records = {
|
||||
let mut dao = exif_dao.lock().expect("Unable to lock ExifDao");
|
||||
dao.get_exif_batch(context, Some(library.id), &image_paths)
|
||||
.unwrap_or_default()
|
||||
};
|
||||
// Cheap lookup back from rel_path → absolute file_path so
|
||||
// content_hash::compute can read the bytes.
|
||||
let path_by_rel: HashMap<String, &PathBuf> =
|
||||
files.iter().map(|(p, rel)| (rel.clone(), p)).collect();
|
||||
|
||||
let cap: usize = dotenv::var("FACE_HASH_BACKFILL_MAX_PER_TICK")
|
||||
.ok()
|
||||
.and_then(|s| s.parse().ok())
|
||||
.filter(|n: &usize| *n > 0)
|
||||
.unwrap_or(2000);
|
||||
|
||||
// Count the unhashed backlog up front so we can surface "still needs
|
||||
// backfill: N" in the log — without it, a face-scan that's stuck at
|
||||
// 44% looks stalled when really it's chipping through hashes.
|
||||
let unhashed_total = exif_records
|
||||
.iter()
|
||||
.filter(|r| r.content_hash.is_none())
|
||||
.count();
|
||||
|
||||
let mut backfilled = 0usize;
|
||||
let mut errors = 0usize;
|
||||
for record in &exif_records {
|
||||
if backfilled >= cap {
|
||||
break;
|
||||
}
|
||||
if record.content_hash.is_some() {
|
||||
continue;
|
||||
}
|
||||
let Some(file_path) = path_by_rel.get(&record.file_path) else {
|
||||
// Walked file went missing between the directory scan and now;
|
||||
// next tick will retry naturally.
|
||||
continue;
|
||||
};
|
||||
match content_hash::compute(file_path) {
|
||||
Ok(id) => {
|
||||
let mut dao = exif_dao.lock().expect("Unable to lock ExifDao");
|
||||
if let Err(e) = dao.backfill_content_hash(
|
||||
context,
|
||||
library.id,
|
||||
&record.file_path,
|
||||
&id.content_hash,
|
||||
id.size_bytes,
|
||||
) {
|
||||
warn!(
|
||||
"face_watch: backfill_content_hash failed for {}: {:?}",
|
||||
record.file_path, e
|
||||
);
|
||||
errors += 1;
|
||||
} else {
|
||||
backfilled += 1;
|
||||
}
|
||||
}
|
||||
Err(e) => {
|
||||
debug!(
|
||||
"face_watch: hash compute failed for {} ({:?})",
|
||||
file_path.display(),
|
||||
e
|
||||
);
|
||||
errors += 1;
|
||||
}
|
||||
}
|
||||
}
|
||||
// Always log when there's an unhashed backlog so an operator
|
||||
// looking at "scan stuck at 44%" can see backfill is running and
|
||||
// how much remains. Quiet only when there's nothing to do.
|
||||
if unhashed_total > 0 || backfilled > 0 || errors > 0 {
|
||||
let remaining = unhashed_total.saturating_sub(backfilled);
|
||||
info!(
|
||||
"face_watch: backfilled {}/{} content_hash for library '{}' ({} error(s); {} still need backfill; cap={})",
|
||||
backfilled, unhashed_total, library.name, errors, remaining, cap
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
/// Build the face-detection candidate list for a scan tick.
|
||||
///
|
||||
/// Returns `(rel_path, content_hash)` for every image file that has a
|
||||
/// content_hash recorded in image_exif but no row in face_detections
|
||||
/// yet. Re-querying image_exif here picks up rows the EXIF write loop
|
||||
/// just inserted alongside any pre-existing rows the watcher walked
|
||||
/// over — covers both new uploads and the initial backlog scan.
|
||||
pub fn build_face_candidates(
|
||||
context: &opentelemetry::Context,
|
||||
library: &libraries::Library,
|
||||
files: &[(PathBuf, String)],
|
||||
exif_dao: &Arc<Mutex<Box<dyn ExifDao>>>,
|
||||
face_dao: &Arc<Mutex<Box<dyn faces::FaceDao>>>,
|
||||
) -> Vec<face_watch::FaceCandidate> {
|
||||
// Restrict to image files; videos aren't face-scanned in v1 (kamadak
|
||||
// doesn't even register them in image_exif).
|
||||
let image_paths: Vec<String> = files
|
||||
.iter()
|
||||
.filter(|(p, _)| !file_types::is_video_file(p))
|
||||
.map(|(_, rel)| rel.clone())
|
||||
.collect();
|
||||
if image_paths.is_empty() {
|
||||
return Vec::new();
|
||||
}
|
||||
|
||||
let exif_records = {
|
||||
let mut dao = exif_dao.lock().expect("Unable to lock ExifDao");
|
||||
dao.get_exif_batch(context, Some(library.id), &image_paths)
|
||||
.unwrap_or_default()
|
||||
};
|
||||
// rel_path → content_hash (only rows with a hash; without one we have
|
||||
// nothing to key face data against).
|
||||
let mut hash_by_path: HashMap<String, String> = HashMap::with_capacity(exif_records.len());
|
||||
for record in exif_records {
|
||||
if let Some(h) = record.content_hash {
|
||||
hash_by_path.insert(record.file_path, h);
|
||||
}
|
||||
}
|
||||
|
||||
let mut candidates = Vec::new();
|
||||
let mut dao = face_dao.lock().expect("face dao");
|
||||
for rel_path in image_paths {
|
||||
let Some(hash) = hash_by_path.get(&rel_path) else {
|
||||
continue;
|
||||
};
|
||||
match dao.already_scanned(context, hash) {
|
||||
Ok(true) => continue,
|
||||
Ok(false) => candidates.push(face_watch::FaceCandidate {
|
||||
rel_path,
|
||||
content_hash: hash.clone(),
|
||||
}),
|
||||
Err(e) => {
|
||||
warn!("face_watch: already_scanned errored for {}: {:?}", hash, e);
|
||||
}
|
||||
}
|
||||
}
|
||||
candidates
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
||||
use std::fs;
|
||||
use std::sync::{Arc, Mutex};
|
||||
|
||||
use diesel::prelude::*;
|
||||
use tempfile::TempDir;
|
||||
|
||||
use crate::database::models::{InsertImageExif, InsertLibrary};
|
||||
use crate::database::test::in_memory_db_connection;
|
||||
use crate::database::{ExifDao, SqliteExifDao, schema};
|
||||
use crate::faces::{FaceDao, SqliteFaceDao};
|
||||
use crate::libraries::Library;
|
||||
|
||||
fn ctx() -> opentelemetry::Context {
|
||||
opentelemetry::Context::new()
|
||||
}
|
||||
|
||||
/// Build a tempdir-backed library + DAOs sharing a single in-memory
|
||||
/// SQLite connection (so cross-table joins like
|
||||
/// `list_unscanned_candidates` see consistent state).
|
||||
fn setup() -> (
|
||||
TempDir,
|
||||
Library,
|
||||
Arc<Mutex<diesel::SqliteConnection>>,
|
||||
Arc<Mutex<Box<dyn ExifDao>>>,
|
||||
Arc<Mutex<Box<dyn FaceDao>>>,
|
||||
) {
|
||||
let tmp = TempDir::new().expect("tempdir");
|
||||
let mut conn = in_memory_db_connection();
|
||||
// Migration seeds library id=1 with a placeholder root; rewrite it
|
||||
// to point at the tempdir so `<root>/<rel_path>` resolves to real
|
||||
// files this test creates.
|
||||
diesel::update(schema::libraries::table.filter(schema::libraries::id.eq(1)))
|
||||
.set(schema::libraries::root_path.eq(tmp.path().to_string_lossy().to_string()))
|
||||
.execute(&mut conn)
|
||||
.expect("rewrite library 1 root");
|
||||
// Add a second library so cross-library skip cases have somewhere
|
||||
// to put their rows.
|
||||
diesel::insert_into(schema::libraries::table)
|
||||
.values(InsertLibrary {
|
||||
name: "other",
|
||||
root_path: "/tmp/other-test-lib",
|
||||
created_at: 0,
|
||||
enabled: true,
|
||||
excluded_dirs: None,
|
||||
})
|
||||
.execute(&mut conn)
|
||||
.expect("seed second library");
|
||||
|
||||
let library = Library {
|
||||
id: 1,
|
||||
name: "main".to_string(),
|
||||
root_path: tmp.path().to_string_lossy().to_string(),
|
||||
enabled: true,
|
||||
excluded_dirs: Vec::new(),
|
||||
};
|
||||
let shared = Arc::new(Mutex::new(conn));
|
||||
let exif_dao: Arc<Mutex<Box<dyn ExifDao>>> = Arc::new(Mutex::new(Box::new(
|
||||
SqliteExifDao::from_shared(Arc::clone(&shared)),
|
||||
)));
|
||||
let face_dao: Arc<Mutex<Box<dyn FaceDao>>> = Arc::new(Mutex::new(Box::new(
|
||||
SqliteFaceDao::from_connection(Arc::clone(&shared)),
|
||||
)));
|
||||
(tmp, library, shared, exif_dao, face_dao)
|
||||
}
|
||||
|
||||
fn insert_exif(
|
||||
exif_dao: &Arc<Mutex<Box<dyn ExifDao>>>,
|
||||
lib_id: i32,
|
||||
rel: &str,
|
||||
content_hash: Option<&str>,
|
||||
) {
|
||||
let mut dao = exif_dao.lock().unwrap();
|
||||
dao.store_exif(
|
||||
&ctx(),
|
||||
InsertImageExif {
|
||||
library_id: lib_id,
|
||||
file_path: rel.to_string(),
|
||||
camera_make: None,
|
||||
camera_model: None,
|
||||
lens_model: None,
|
||||
width: None,
|
||||
height: None,
|
||||
orientation: None,
|
||||
gps_latitude: None,
|
||||
gps_longitude: None,
|
||||
gps_altitude: None,
|
||||
focal_length: None,
|
||||
aperture: None,
|
||||
shutter_speed: None,
|
||||
iso: None,
|
||||
date_taken: None,
|
||||
created_time: 0,
|
||||
last_modified: 0,
|
||||
content_hash: content_hash.map(|s| s.to_string()),
|
||||
size_bytes: None,
|
||||
phash_64: None,
|
||||
dhash_64: None,
|
||||
date_taken_source: None,
|
||||
},
|
||||
)
|
||||
.expect("insert");
|
||||
}
|
||||
|
||||
fn write_image(root: &std::path::Path, rel: &str, bytes: &[u8]) {
|
||||
let abs = root.join(rel);
|
||||
if let Some(parent) = abs.parent() {
|
||||
fs::create_dir_all(parent).expect("mkdir");
|
||||
}
|
||||
fs::write(abs, bytes).expect("write file");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn backfill_unhashed_backlog_hashes_missing_rows_in_this_library() {
|
||||
let (tmp, library, _conn, exif_dao, _face_dao) = setup();
|
||||
write_image(tmp.path(), "a.jpg", b"alpha-bytes");
|
||||
write_image(tmp.path(), "b.jpg", b"bravo-bytes");
|
||||
insert_exif(&exif_dao, 1, "a.jpg", None);
|
||||
insert_exif(&exif_dao, 1, "b.jpg", None);
|
||||
|
||||
let backfilled = backfill_unhashed_backlog(&ctx(), &library, &exif_dao);
|
||||
assert_eq!(backfilled, 2);
|
||||
|
||||
let mut dao = exif_dao.lock().unwrap();
|
||||
let rows = dao
|
||||
.get_exif_batch(&ctx(), Some(1), &["a.jpg".to_string(), "b.jpg".to_string()])
|
||||
.unwrap();
|
||||
assert_eq!(rows.len(), 2);
|
||||
for r in rows {
|
||||
assert!(
|
||||
r.content_hash.is_some(),
|
||||
"row {} should have a hash",
|
||||
r.file_path
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn backfill_unhashed_backlog_skips_other_libraries_and_missing_files() {
|
||||
let (tmp, library, _conn, exif_dao, _face_dao) = setup();
|
||||
write_image(tmp.path(), "exists.jpg", b"hello");
|
||||
// Row for this library whose file is missing on disk:
|
||||
insert_exif(&exif_dao, 1, "ghost.jpg", None);
|
||||
insert_exif(&exif_dao, 1, "exists.jpg", None);
|
||||
// Row in the other library — must be skipped (different lib_id).
|
||||
insert_exif(&exif_dao, 2, "other.jpg", None);
|
||||
|
||||
let backfilled = backfill_unhashed_backlog(&ctx(), &library, &exif_dao);
|
||||
assert_eq!(backfilled, 1, "only the existing in-library file hashes");
|
||||
|
||||
let mut dao = exif_dao.lock().unwrap();
|
||||
let other = dao
|
||||
.get_exif_batch(&ctx(), Some(2), &["other.jpg".to_string()])
|
||||
.unwrap();
|
||||
assert_eq!(other.len(), 1);
|
||||
assert!(
|
||||
other[0].content_hash.is_none(),
|
||||
"other-library row must remain unhashed"
|
||||
);
|
||||
let ghost = dao
|
||||
.get_exif_batch(&ctx(), Some(1), &["ghost.jpg".to_string()])
|
||||
.unwrap();
|
||||
assert_eq!(ghost.len(), 1);
|
||||
assert!(
|
||||
ghost[0].content_hash.is_none(),
|
||||
"missing-on-disk row stays unhashed (reconciliation removes it later)"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn backfill_unhashed_backlog_respects_per_tick_cap() {
|
||||
// Env-var-driven cap; the function reads it on every call, so we
|
||||
// can set it just for this test and unset before returning.
|
||||
// Serial guard: tests in the same binary may share env, but each
|
||||
// backfill call re-reads — and we only care that the cap shape
|
||||
// (success count <= cap, more_remain logged) holds.
|
||||
unsafe {
|
||||
std::env::set_var("FACE_HASH_BACKFILL_MAX_PER_TICK", "2");
|
||||
}
|
||||
let (tmp, library, _conn, exif_dao, _face_dao) = setup();
|
||||
for i in 0..5 {
|
||||
let rel = format!("img_{}.jpg", i);
|
||||
write_image(tmp.path(), &rel, format!("bytes-{}", i).as_bytes());
|
||||
insert_exif(&exif_dao, 1, &rel, None);
|
||||
}
|
||||
|
||||
let backfilled = backfill_unhashed_backlog(&ctx(), &library, &exif_dao);
|
||||
assert_eq!(backfilled, 2, "cap=2 must bound the per-tick successes");
|
||||
unsafe {
|
||||
std::env::remove_var("FACE_HASH_BACKFILL_MAX_PER_TICK");
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn backfill_missing_content_hashes_skips_videos_and_hashed_rows() {
|
||||
let (tmp, library, _conn, exif_dao, _face_dao) = setup();
|
||||
// Two image rows (one already hashed, one not), one video.
|
||||
write_image(tmp.path(), "fresh.jpg", b"fresh-pixels");
|
||||
write_image(tmp.path(), "already.jpg", b"already-pixels");
|
||||
write_image(tmp.path(), "clip.mp4", b"video-bytes");
|
||||
insert_exif(&exif_dao, 1, "fresh.jpg", None);
|
||||
insert_exif(&exif_dao, 1, "already.jpg", Some("pre-existing-hash"));
|
||||
insert_exif(&exif_dao, 1, "clip.mp4", None);
|
||||
|
||||
let files: Vec<(PathBuf, String)> = vec![
|
||||
(tmp.path().join("fresh.jpg"), "fresh.jpg".to_string()),
|
||||
(tmp.path().join("already.jpg"), "already.jpg".to_string()),
|
||||
(tmp.path().join("clip.mp4"), "clip.mp4".to_string()),
|
||||
];
|
||||
backfill_missing_content_hashes(&ctx(), &files, &library, &exif_dao);
|
||||
|
||||
let mut dao = exif_dao.lock().unwrap();
|
||||
let rows = dao
|
||||
.get_exif_batch(
|
||||
&ctx(),
|
||||
Some(1),
|
||||
&[
|
||||
"fresh.jpg".to_string(),
|
||||
"already.jpg".to_string(),
|
||||
"clip.mp4".to_string(),
|
||||
],
|
||||
)
|
||||
.unwrap();
|
||||
let by_path: HashMap<String, Option<String>> = rows
|
||||
.into_iter()
|
||||
.map(|r| (r.file_path, r.content_hash))
|
||||
.collect();
|
||||
assert!(
|
||||
by_path["fresh.jpg"].is_some(),
|
||||
"fresh image must get a hash"
|
||||
);
|
||||
assert_eq!(
|
||||
by_path["already.jpg"].as_deref(),
|
||||
Some("pre-existing-hash"),
|
||||
"already-hashed image left untouched"
|
||||
);
|
||||
assert!(
|
||||
by_path["clip.mp4"].is_none(),
|
||||
"video skipped (not face-scanned, no hash needed via this path)"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn build_face_candidates_filters_videos_unhashed_and_already_scanned() {
|
||||
let (tmp, library, _conn, exif_dao, face_dao) = setup();
|
||||
|
||||
// Seed image_exif with: hashed unscanned, hashed scanned, unhashed,
|
||||
// and a video. Files don't need to exist on disk — the function
|
||||
// doesn't read them, only the DB rows.
|
||||
insert_exif(&exif_dao, 1, "fresh.jpg", Some("hash-fresh"));
|
||||
insert_exif(&exif_dao, 1, "scanned.jpg", Some("hash-scanned"));
|
||||
insert_exif(&exif_dao, 1, "unhashed.jpg", None);
|
||||
insert_exif(&exif_dao, 1, "clip.mp4", Some("hash-video"));
|
||||
// Mark `scanned.jpg`'s hash as already detected.
|
||||
{
|
||||
let mut dao = face_dao.lock().unwrap();
|
||||
dao.mark_status(&ctx(), 1, "hash-scanned", "scanned.jpg", "no_faces", "test")
|
||||
.expect("mark scanned");
|
||||
}
|
||||
|
||||
let files: Vec<(PathBuf, String)> = vec![
|
||||
(tmp.path().join("fresh.jpg"), "fresh.jpg".to_string()),
|
||||
(tmp.path().join("scanned.jpg"), "scanned.jpg".to_string()),
|
||||
(tmp.path().join("unhashed.jpg"), "unhashed.jpg".to_string()),
|
||||
(tmp.path().join("clip.mp4"), "clip.mp4".to_string()),
|
||||
];
|
||||
let candidates = build_face_candidates(&ctx(), &library, &files, &exif_dao, &face_dao);
|
||||
|
||||
assert_eq!(
|
||||
candidates.len(),
|
||||
1,
|
||||
"exactly fresh.jpg should be a candidate"
|
||||
);
|
||||
assert_eq!(candidates[0].rel_path, "fresh.jpg");
|
||||
assert_eq!(candidates[0].content_hash, "hash-fresh");
|
||||
}
|
||||
}
|
||||
@@ -1,243 +0,0 @@
|
||||
//! Backfill `image_exif.phash_64` + `dhash_64` for image rows that
|
||||
//! were ingested before perceptual hashing was wired into the watcher.
|
||||
//!
|
||||
//! The watcher computes perceptual hashes for new images as they're
|
||||
//! ingested, so this binary is a one-shot for the historical backlog.
|
||||
//! Idempotent — only rows with a non-null content_hash and a null
|
||||
//! phash are processed, so re-runs are safe and pick up where they
|
||||
//! left off (e.g. after a crash or interrupt).
|
||||
//!
|
||||
//! Image-only by design: `get_rows_missing_perceptual_hash` filters by
|
||||
//! file extension at the DB layer so videos and other non-decodable
|
||||
//! media are skipped without round-tripping `image_hasher`. Files that
|
||||
//! can't be opened (missing on disk, permission errors) are quietly
|
||||
//! left as null and counted as "missing"; on next run, if the file is
|
||||
//! restored, the row will surface again.
|
||||
|
||||
use std::path::Path;
|
||||
use std::sync::{Arc, Mutex};
|
||||
use std::time::Instant;
|
||||
|
||||
use clap::Parser;
|
||||
use log::{error, warn};
|
||||
use rayon::prelude::*;
|
||||
|
||||
use image_api::bin_progress;
|
||||
use image_api::database::{ExifDao, SqliteExifDao, connect};
|
||||
use image_api::libraries::{self, Library};
|
||||
use image_api::perceptual_hash;
|
||||
|
||||
#[derive(Parser, Debug)]
|
||||
#[command(name = "backfill_perceptual_hash")]
|
||||
#[command(about = "Compute pHash + dHash for image_exif rows missing one")]
|
||||
struct Args {
|
||||
/// Max rows to hash per batch. The process loops until no rows remain.
|
||||
#[arg(long, default_value_t = 256)]
|
||||
batch_size: i64,
|
||||
|
||||
/// Rayon parallelism override. 0 uses the default thread pool size.
|
||||
#[arg(long, default_value_t = 0)]
|
||||
parallelism: usize,
|
||||
|
||||
/// Dry-run: log what would be hashed without writing to the DB.
|
||||
#[arg(long)]
|
||||
dry_run: bool,
|
||||
}
|
||||
|
||||
fn main() -> anyhow::Result<()> {
|
||||
env_logger::init();
|
||||
dotenv::dotenv().ok();
|
||||
|
||||
let args = Args::parse();
|
||||
if args.parallelism > 0 {
|
||||
rayon::ThreadPoolBuilder::new()
|
||||
.num_threads(args.parallelism)
|
||||
.build_global()
|
||||
.expect("Unable to configure rayon thread pool");
|
||||
}
|
||||
|
||||
let base_path = dotenv::var("BASE_PATH").ok();
|
||||
let mut seed_conn = connect();
|
||||
if let Some(base) = base_path.as_deref() {
|
||||
libraries::seed_or_patch_from_env(&mut seed_conn, base);
|
||||
}
|
||||
let libs = libraries::load_all(&mut seed_conn);
|
||||
drop(seed_conn);
|
||||
if libs.is_empty() {
|
||||
anyhow::bail!("No libraries configured; cannot backfill perceptual hashes");
|
||||
}
|
||||
let libs_by_id: std::collections::HashMap<i32, Library> =
|
||||
libs.into_iter().map(|lib| (lib.id, lib)).collect();
|
||||
println!(
|
||||
"Configured libraries: {}",
|
||||
libs_by_id
|
||||
.values()
|
||||
.map(|l| format!("{} -> {}", l.name, l.root_path))
|
||||
.collect::<Vec<_>>()
|
||||
.join(", ")
|
||||
);
|
||||
|
||||
let dao: Arc<Mutex<Box<dyn ExifDao>>> = Arc::new(Mutex::new(Box::new(SqliteExifDao::new())));
|
||||
let ctx = opentelemetry::Context::new();
|
||||
|
||||
let mut total_hashed = 0u64;
|
||||
let mut total_missing = 0u64;
|
||||
let mut total_decode_failures = 0u64;
|
||||
let mut total_errors = 0u64;
|
||||
let start = Instant::now();
|
||||
|
||||
let pb = bin_progress::spinner("perceptual-hashing");
|
||||
|
||||
loop {
|
||||
let rows = {
|
||||
let mut guard = dao.lock().expect("Unable to lock ExifDao");
|
||||
guard
|
||||
.get_rows_missing_perceptual_hash(&ctx, args.batch_size)
|
||||
.map_err(|e| anyhow::anyhow!("DB error: {:?}", e))?
|
||||
};
|
||||
if rows.is_empty() {
|
||||
break;
|
||||
}
|
||||
let batch_size = rows.len();
|
||||
pb.set_message(format!(
|
||||
"batch of {} (hashed={} decode_fail={} missing={} errors={})",
|
||||
batch_size, total_hashed, total_decode_failures, total_missing, total_errors
|
||||
));
|
||||
|
||||
// Compute perceptual hashes in parallel — CPU-bound, decoder
|
||||
// releases the GIL-equivalent. rayon's default thread pool
|
||||
// matches the host's logical-core count which is the right
|
||||
// ceiling for image_hasher's DCT pass.
|
||||
let results: Vec<(i32, String, FilePerceptualResult)> = rows
|
||||
.into_par_iter()
|
||||
.map(|(library_id, rel_path)| {
|
||||
let abs = libs_by_id
|
||||
.get(&library_id)
|
||||
.map(|lib| Path::new(&lib.root_path).join(&rel_path));
|
||||
match abs {
|
||||
Some(abs_path) if abs_path.exists() => {
|
||||
match perceptual_hash::compute(&abs_path) {
|
||||
Some(id) => (library_id, rel_path, FilePerceptualResult::Ok(id)),
|
||||
None => (library_id, rel_path, FilePerceptualResult::DecodeFailed),
|
||||
}
|
||||
}
|
||||
Some(_) => (library_id, rel_path, FilePerceptualResult::MissingOnDisk),
|
||||
None => {
|
||||
warn!("Row refers to unknown library_id {}", library_id);
|
||||
(library_id, rel_path, FilePerceptualResult::MissingOnDisk)
|
||||
}
|
||||
}
|
||||
})
|
||||
.collect();
|
||||
|
||||
// Persist sequentially — SQLite writes serialize anyway.
|
||||
if !args.dry_run {
|
||||
let mut guard = dao.lock().expect("Unable to lock ExifDao");
|
||||
for (library_id, rel_path, result) in &results {
|
||||
match result {
|
||||
FilePerceptualResult::Ok(id) => {
|
||||
match guard.backfill_perceptual_hash(
|
||||
&ctx,
|
||||
*library_id,
|
||||
rel_path,
|
||||
Some(id.phash_64),
|
||||
Some(id.dhash_64),
|
||||
) {
|
||||
Ok(_) => {
|
||||
total_hashed += 1;
|
||||
pb.inc(1);
|
||||
}
|
||||
Err(e) => {
|
||||
pb.println(format!("persist error for {}: {:?}", rel_path, e));
|
||||
total_errors += 1;
|
||||
}
|
||||
}
|
||||
}
|
||||
FilePerceptualResult::DecodeFailed => {
|
||||
// Persist phash_64=0/dhash_64=0 as a "tried,
|
||||
// unhashable" sentinel so this row leaves the
|
||||
// `phash_64 IS NULL` candidate set and the
|
||||
// backfill doesn't infinite-loop on a queue of
|
||||
// unbreakable formats (HEIC, RAW, CMYK JPEGs,
|
||||
// truncated bytes). The all-zero hash is
|
||||
// explicitly excluded from clustering by
|
||||
// is_informative_hash in duplicates.rs, so it
|
||||
// won't pollute group output — it just becomes
|
||||
// invisible to the duplicate finder.
|
||||
log::debug!(
|
||||
"perceptual decode failed for {} (lib {}); marking unhashable",
|
||||
rel_path,
|
||||
library_id
|
||||
);
|
||||
match guard.backfill_perceptual_hash(
|
||||
&ctx,
|
||||
*library_id,
|
||||
rel_path,
|
||||
Some(0),
|
||||
Some(0),
|
||||
) {
|
||||
Ok(_) => {
|
||||
total_decode_failures += 1;
|
||||
}
|
||||
Err(e) => {
|
||||
pb.println(format!(
|
||||
"persist error (decode-fail sentinel) for {}: {:?}",
|
||||
rel_path, e
|
||||
));
|
||||
total_errors += 1;
|
||||
}
|
||||
}
|
||||
}
|
||||
FilePerceptualResult::MissingOnDisk => {
|
||||
total_missing += 1;
|
||||
}
|
||||
}
|
||||
}
|
||||
} else {
|
||||
for (_, rel_path, result) in &results {
|
||||
match result {
|
||||
FilePerceptualResult::Ok(id) => {
|
||||
pb.println(format!(
|
||||
"[dry-run] {} -> phash={:016x} dhash={:016x}",
|
||||
rel_path, id.phash_64, id.dhash_64
|
||||
));
|
||||
total_hashed += 1;
|
||||
pb.inc(1);
|
||||
}
|
||||
FilePerceptualResult::DecodeFailed => {
|
||||
total_decode_failures += 1;
|
||||
}
|
||||
FilePerceptualResult::MissingOnDisk => {
|
||||
total_missing += 1;
|
||||
}
|
||||
}
|
||||
}
|
||||
pb.println(format!(
|
||||
"[dry-run] processed one batch of {}. Stopping — a real run would continue \
|
||||
until no NULL phash_64 image rows remain.",
|
||||
results.len()
|
||||
));
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
pb.finish_and_clear();
|
||||
println!(
|
||||
"Done. hashed={}, decode_failed={}, skipped (missing on disk)={}, errors={}, elapsed={:.1}s",
|
||||
total_hashed,
|
||||
total_decode_failures,
|
||||
total_missing,
|
||||
total_errors,
|
||||
start.elapsed().as_secs_f64()
|
||||
);
|
||||
if total_errors > 0 {
|
||||
error!("Backfill completed with {} persist errors", total_errors);
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
|
||||
enum FilePerceptualResult {
|
||||
Ok(perceptual_hash::PerceptualIdentity),
|
||||
DecodeFailed,
|
||||
MissingOnDisk,
|
||||
}
|
||||
@@ -14,7 +14,6 @@ use image_api::database::{
|
||||
SqliteInsightDao, SqliteKnowledgeDao, SqliteLocationHistoryDao, SqliteSearchHistoryDao,
|
||||
connect,
|
||||
};
|
||||
use image_api::faces::{FaceDao, SqliteFaceDao};
|
||||
use image_api::file_types::{IMAGE_EXTENSIONS, VIDEO_EXTENSIONS};
|
||||
use image_api::libraries::{self, Library};
|
||||
use image_api::tags::{SqliteTagDao, TagDao};
|
||||
@@ -183,11 +182,6 @@ async fn main() -> anyhow::Result<()> {
|
||||
Arc::new(Mutex::new(Box::new(SqliteTagDao::default())));
|
||||
let knowledge_dao: Arc<Mutex<Box<dyn KnowledgeDao>>> =
|
||||
Arc::new(Mutex::new(Box::new(SqliteKnowledgeDao::new())));
|
||||
let face_dao: Arc<Mutex<Box<dyn FaceDao>>> =
|
||||
Arc::new(Mutex::new(Box::new(SqliteFaceDao::new())));
|
||||
let persona_dao: Arc<Mutex<Box<dyn image_api::database::PersonaDao>>> = Arc::new(Mutex::new(
|
||||
Box::new(image_api::database::SqlitePersonaDao::new()),
|
||||
));
|
||||
|
||||
// Pass the full library set so `resolve_full_path` probes every root,
|
||||
// even when --library restricts the walk. A rel_path shared across
|
||||
@@ -204,9 +198,7 @@ async fn main() -> anyhow::Result<()> {
|
||||
location_dao,
|
||||
search_dao,
|
||||
tag_dao,
|
||||
face_dao,
|
||||
knowledge_dao,
|
||||
persona_dao,
|
||||
all_libs.clone(),
|
||||
);
|
||||
|
||||
@@ -339,8 +331,6 @@ async fn main() -> anyhow::Result<()> {
|
||||
None,
|
||||
Vec::new(),
|
||||
Vec::new(),
|
||||
1, // operator user_id — populate_knowledge is single-user offline tool
|
||||
"default".to_string(),
|
||||
)
|
||||
.await
|
||||
{
|
||||
|
||||
@@ -1,250 +0,0 @@
|
||||
//! Probe binary for RAM++ auto-tagging.
|
||||
//!
|
||||
//! No DB writes. Walks a library's `image_exif` rows, sends a sample
|
||||
//! through Apollo's `/api/internal/tags/auto`, and prints `(path, tags)`
|
||||
//! to stdout so the operator can eyeball whether the model's vocabulary
|
||||
//! and threshold defaults are appropriate for this library before
|
||||
//! committing to the persistence phase (new table, per-tick drain, UI).
|
||||
//!
|
||||
//! Usage:
|
||||
//! cargo run --release --bin probe_auto_tags -- \
|
||||
//! --library 1 --limit 50 --threshold 0.7
|
||||
//!
|
||||
//! Env: standard ImageApi `.env`. Requires either
|
||||
//! `APOLLO_TAG_API_BASE_URL` or `APOLLO_API_BASE_URL` to be set
|
||||
//! (otherwise the client is disabled and the probe bails).
|
||||
|
||||
use std::path::{Path, PathBuf};
|
||||
use std::sync::{Arc, Mutex};
|
||||
use std::time::Instant;
|
||||
|
||||
use clap::Parser;
|
||||
use log::{info, warn};
|
||||
|
||||
use image_api::ai::tag_client::{TagClient, TagDetectError, TagMeta};
|
||||
use image_api::database::{ExifDao, SqliteExifDao, connect};
|
||||
use image_api::exif;
|
||||
use image_api::file_types;
|
||||
use image_api::libraries::{self, Library};
|
||||
|
||||
#[derive(Parser, Debug)]
|
||||
#[command(name = "probe_auto_tags")]
|
||||
#[command(about = "Print RAM++ auto-tags for a sample of image_exif rows")]
|
||||
struct Args {
|
||||
/// Library id to sample from.
|
||||
#[arg(long)]
|
||||
library: i32,
|
||||
|
||||
/// Max files to probe. The binary scans more rows internally because
|
||||
/// non-image rows (videos, junk) are skipped client-side.
|
||||
#[arg(long, default_value_t = 25)]
|
||||
limit: usize,
|
||||
|
||||
/// Per-call threshold sent to Apollo. Overrides the engine default.
|
||||
/// Lower = more tags per photo, more noise. 0.5–0.75 is the useful
|
||||
/// sweep range for ram_plus_swin_large_14m.
|
||||
#[arg(long, default_value_t = 0.65)]
|
||||
threshold: f32,
|
||||
|
||||
/// Offset into the library's rel_path listing (sorted by id ASC).
|
||||
/// Bump on re-runs to sample a different slice.
|
||||
#[arg(long, default_value_t = 0)]
|
||||
offset: i64,
|
||||
|
||||
/// How many DB rows to scan before giving up on hitting the limit.
|
||||
/// Useful when a library is mostly videos.
|
||||
#[arg(long, default_value_t = 2000)]
|
||||
max_scan: i64,
|
||||
}
|
||||
|
||||
/// Mirror of `face_watch::read_image_bytes_for_detect` — it's pub(crate)
|
||||
/// so we can't import it across the bin boundary. The probe is throwaway
|
||||
/// scope; inlining is cleaner than changing the visibility.
|
||||
fn read_image_bytes(path: &Path) -> std::io::Result<Vec<u8>> {
|
||||
if file_types::needs_ffmpeg_thumbnail(path)
|
||||
&& let Some(preview) = exif::extract_embedded_jpeg_preview(path)
|
||||
{
|
||||
return Ok(preview);
|
||||
}
|
||||
std::fs::read(path)
|
||||
}
|
||||
|
||||
#[tokio::main]
|
||||
async fn main() -> anyhow::Result<()> {
|
||||
env_logger::init();
|
||||
dotenv::dotenv().ok();
|
||||
|
||||
let args = Args::parse();
|
||||
|
||||
let client = TagClient::from_env();
|
||||
if !client.is_enabled() {
|
||||
anyhow::bail!(
|
||||
"TagClient disabled: set APOLLO_TAG_API_BASE_URL or APOLLO_API_BASE_URL in .env"
|
||||
);
|
||||
}
|
||||
|
||||
// Quick health probe so we fail fast on a misconfig before grinding
|
||||
// through a thousand rows.
|
||||
match client.health().await {
|
||||
Ok(h) => info!(
|
||||
"tag engine: loaded={} device={} model={} threshold_default={}",
|
||||
h.loaded, h.device, h.model_version, h.threshold
|
||||
),
|
||||
Err(e) => warn!("health probe failed (continuing): {e}"),
|
||||
}
|
||||
|
||||
let mut seed_conn = connect();
|
||||
if let Some(base) = dotenv::var("BASE_PATH").ok().as_deref() {
|
||||
libraries::seed_or_patch_from_env(&mut seed_conn, base);
|
||||
}
|
||||
let libs = libraries::load_all(&mut seed_conn);
|
||||
drop(seed_conn);
|
||||
let lib: Library = libs
|
||||
.into_iter()
|
||||
.find(|l| l.id == args.library)
|
||||
.ok_or_else(|| anyhow::anyhow!("library id {} not found", args.library))?;
|
||||
info!("probing library #{} ({}) at {}", lib.id, lib.name, lib.root_path);
|
||||
|
||||
let dao: Arc<Mutex<Box<dyn ExifDao>>> = Arc::new(Mutex::new(Box::new(SqliteExifDao::new())));
|
||||
let ctx = opentelemetry::Context::new();
|
||||
|
||||
// Paginate through (id, rel_path) for this library, filter to images
|
||||
// on disk, take `limit`. Page size is tuned so we don't slam the DB
|
||||
// when a library is video-heavy.
|
||||
const PAGE: i64 = 500;
|
||||
let mut offset = args.offset;
|
||||
let mut scanned: i64 = 0;
|
||||
let mut probed = 0usize;
|
||||
let mut ok_count = 0usize;
|
||||
let mut empty_count = 0usize;
|
||||
let mut perm_fail = 0usize;
|
||||
let mut transient_fail = 0usize;
|
||||
let started = Instant::now();
|
||||
let root = PathBuf::from(&lib.root_path);
|
||||
|
||||
'outer: loop {
|
||||
if scanned >= args.max_scan {
|
||||
warn!(
|
||||
"scan cap ({}) reached before hitting limit ({}); bump --max-scan to scan deeper",
|
||||
args.max_scan, args.limit
|
||||
);
|
||||
break;
|
||||
}
|
||||
let rows = {
|
||||
let mut guard = dao.lock().expect("dao lock");
|
||||
guard
|
||||
.list_rel_paths_for_library_page(&ctx, lib.id, PAGE, offset)
|
||||
.map_err(|e| anyhow::anyhow!("list rel_paths: {:?}", e))?
|
||||
};
|
||||
if rows.is_empty() {
|
||||
info!("no more rows after offset {}", offset);
|
||||
break;
|
||||
}
|
||||
offset += rows.len() as i64;
|
||||
scanned += rows.len() as i64;
|
||||
|
||||
for (_id, rel_path) in rows {
|
||||
if probed >= args.limit {
|
||||
break 'outer;
|
||||
}
|
||||
let abs = root.join(&rel_path);
|
||||
// Skip non-images and videos at the path level — same logic
|
||||
// the face backlog drain uses, just inlined.
|
||||
if !file_types::is_image_file(&abs) {
|
||||
continue;
|
||||
}
|
||||
if !abs.exists() {
|
||||
continue;
|
||||
}
|
||||
let bytes = match read_image_bytes(&abs) {
|
||||
Ok(b) => b,
|
||||
Err(e) => {
|
||||
warn!("read {rel_path}: {e}");
|
||||
continue;
|
||||
}
|
||||
};
|
||||
// The probe doesn't need a real content_hash — Apollo only
|
||||
// logs it. Pass an empty marker so we don't trip on no-hash
|
||||
// image_exif rows.
|
||||
let meta = TagMeta {
|
||||
content_hash: String::new(),
|
||||
library_id: lib.id,
|
||||
rel_path: rel_path.clone(),
|
||||
threshold: Some(args.threshold),
|
||||
};
|
||||
|
||||
let call_start = Instant::now();
|
||||
match client.auto_tag(bytes, meta).await {
|
||||
Ok(resp) => {
|
||||
probed += 1;
|
||||
if resp.tags.is_empty() {
|
||||
empty_count += 1;
|
||||
println!(
|
||||
"[{:>3}] (no tags) {}ms {}",
|
||||
probed, resp.duration_ms, rel_path
|
||||
);
|
||||
} else {
|
||||
ok_count += 1;
|
||||
let preview = resp
|
||||
.tags
|
||||
.iter()
|
||||
.map(|t| format!("{}({:.2})", t.name, t.confidence))
|
||||
.collect::<Vec<_>>()
|
||||
.join(", ");
|
||||
println!(
|
||||
"[{:>3}] {} tags {}ms {}\n {}",
|
||||
probed,
|
||||
resp.tags.len(),
|
||||
resp.duration_ms,
|
||||
rel_path,
|
||||
preview
|
||||
);
|
||||
}
|
||||
}
|
||||
Err(TagDetectError::Permanent(e)) => {
|
||||
probed += 1;
|
||||
perm_fail += 1;
|
||||
println!(
|
||||
"[{:>3}] PERMANENT FAIL ({:>4}ms) {}\n {}",
|
||||
probed,
|
||||
call_start.elapsed().as_millis(),
|
||||
rel_path,
|
||||
e
|
||||
);
|
||||
}
|
||||
Err(TagDetectError::Transient(e)) => {
|
||||
probed += 1;
|
||||
transient_fail += 1;
|
||||
println!(
|
||||
"[{:>3}] TRANSIENT FAIL ({:>4}ms) {}\n {}",
|
||||
probed,
|
||||
call_start.elapsed().as_millis(),
|
||||
rel_path,
|
||||
e
|
||||
);
|
||||
}
|
||||
Err(TagDetectError::Disabled) => {
|
||||
anyhow::bail!("tag client became disabled mid-run; impossible");
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
let elapsed = started.elapsed();
|
||||
println!();
|
||||
println!("── summary ───────────────────────────────────────");
|
||||
println!("scanned rows : {scanned}");
|
||||
println!("probed files : {probed}");
|
||||
println!(" with tags : {ok_count}");
|
||||
println!(" empty (no tags) : {empty_count}");
|
||||
println!(" permanent failures : {perm_fail}");
|
||||
println!(" transient failures : {transient_fail}");
|
||||
println!("elapsed : {:.1}s", elapsed.as_secs_f32());
|
||||
if probed > 0 {
|
||||
println!(
|
||||
"throughput : {:.2} photos/s",
|
||||
probed as f32 / elapsed.as_secs_f32().max(0.001)
|
||||
);
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
@@ -53,34 +53,12 @@ pub fn thumbnail_path(thumbs_dir: &Path, hash: &str) -> PathBuf {
|
||||
/// Hash-keyed HLS output directory: `<video_dir>/<hash[..2]>/<hash>/`.
|
||||
/// The playlist lives at `playlist.m3u8` inside this directory and its
|
||||
/// segments are co-located so HLS relative references Just Work.
|
||||
///
|
||||
/// Allow-dead until Branch B/C rewires the HLS pipeline to use it; the
|
||||
/// helper lives here today so Branch A's path layout decisions stay
|
||||
/// adjacent to thumbnail/legacy ones.
|
||||
#[allow(dead_code)]
|
||||
pub fn hls_dir(video_dir: &Path, hash: &str) -> PathBuf {
|
||||
let shard = shard_prefix(hash);
|
||||
video_dir.join(shard).join(hash)
|
||||
}
|
||||
|
||||
/// Library-scoped legacy mirrored path:
|
||||
/// `<derivative_dir>/<library_id>/<rel_path>`. Used as the fallback when
|
||||
/// `content_hash` isn't available — the library prefix prevents the
|
||||
/// "lib1 wrote `vacation/IMG.jpg` first, lib2 sees thumb_path.exists()
|
||||
/// and serves the wrong image" failure mode.
|
||||
///
|
||||
/// Existing single-library deployments may already have thumbnails at the
|
||||
/// bare-legacy `<derivative_dir>/<rel_path>` shape; serving code is
|
||||
/// expected to check both this scoped path and the bare-legacy path so
|
||||
/// nothing 404s during the transition.
|
||||
pub fn library_scoped_legacy_path(
|
||||
derivative_dir: &Path,
|
||||
library_id: i32,
|
||||
rel_path: impl AsRef<Path>,
|
||||
) -> PathBuf {
|
||||
derivative_dir.join(library_id.to_string()).join(rel_path)
|
||||
}
|
||||
|
||||
fn shard_prefix(hash: &str) -> &str {
|
||||
let end = hash
|
||||
.char_indices()
|
||||
@@ -127,17 +105,4 @@ mod tests {
|
||||
let d = hls_dir(video, "1234deadbeef");
|
||||
assert_eq!(d, PathBuf::from("/tmp/video/12/1234deadbeef"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn library_scoped_legacy_path_prefixes_with_library_id() {
|
||||
let thumbs = Path::new("/tmp/thumbs");
|
||||
let p = library_scoped_legacy_path(thumbs, 7, "vacation/IMG.jpg");
|
||||
assert_eq!(p, PathBuf::from("/tmp/thumbs/7/vacation/IMG.jpg"));
|
||||
|
||||
// Same rel_path, different library — different output. This is
|
||||
// the whole point: lib 1 and lib 2 don't clobber each other.
|
||||
let p1 = library_scoped_legacy_path(thumbs, 1, "vacation/IMG.jpg");
|
||||
let p2 = library_scoped_legacy_path(thumbs, 2, "vacation/IMG.jpg");
|
||||
assert_ne!(p1, p2);
|
||||
}
|
||||
}
|
||||
|
||||
@@ -165,15 +165,6 @@ pub struct FilesRequest {
|
||||
/// Optional library filter. Accepts a library id (e.g. "1") or name
|
||||
/// (e.g. "main"). When omitted, results span all libraries.
|
||||
pub library: Option<String>,
|
||||
|
||||
/// When true, include rows soft-marked as duplicates of another file
|
||||
/// (i.e. `image_exif.duplicate_of_hash IS NOT NULL`). Default false —
|
||||
/// the standard /photos listing hides demoted siblings so the grid
|
||||
/// silently shrinks after a resolve. The Apollo duplicates modal
|
||||
/// passes `true` so it can show both survivors and demoted members
|
||||
/// inside a group.
|
||||
#[serde(default)]
|
||||
pub include_duplicates: Option<bool>,
|
||||
}
|
||||
|
||||
#[derive(Copy, Clone, Deserialize, PartialEq, Debug)]
|
||||
@@ -286,16 +277,6 @@ pub struct ExifMetadata {
|
||||
pub gps: Option<GpsCoordinates>,
|
||||
pub capture_settings: Option<CaptureSettings>,
|
||||
pub date_taken: Option<i64>,
|
||||
/// Which step of the canonical-date waterfall populated `date_taken`:
|
||||
/// `"exif" | "exiftool" | "filename" | "fs_time" | "manual"`. NULL when
|
||||
/// `date_taken` itself is NULL.
|
||||
pub date_taken_source: Option<String>,
|
||||
/// When `date_taken_source = "manual"`, the prior `date_taken` snapshot.
|
||||
/// Used by the UI's revert affordance and to label "manually overridden;
|
||||
/// originally X" in the details modal.
|
||||
pub original_date_taken: Option<i64>,
|
||||
/// When `date_taken_source = "manual"`, the prior source.
|
||||
pub original_date_taken_source: Option<String>,
|
||||
}
|
||||
|
||||
#[derive(Debug, Serialize)]
|
||||
@@ -380,9 +361,6 @@ impl From<ImageExif> for ExifMetadata {
|
||||
None
|
||||
},
|
||||
date_taken: exif.date_taken,
|
||||
date_taken_source: exif.date_taken_source,
|
||||
original_date_taken: exif.original_date_taken,
|
||||
original_date_taken_source: exif.original_date_taken_source,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
@@ -75,11 +75,6 @@ pub trait DailySummaryDao: Sync + Send {
|
||||
context: &opentelemetry::Context,
|
||||
contact: &str,
|
||||
) -> Result<i64, DbError>;
|
||||
|
||||
/// Cheap presence check — returns true iff at least one daily summary row
|
||||
/// exists. Used by gating logic that only needs "is the table empty?",
|
||||
/// avoiding a `COUNT(*)` full scan on large corpora.
|
||||
fn has_any_summaries(&mut self, context: &opentelemetry::Context) -> Result<bool, DbError>;
|
||||
}
|
||||
|
||||
pub struct SqliteDailySummaryDao {
|
||||
@@ -459,30 +454,6 @@ impl DailySummaryDao for SqliteDailySummaryDao {
|
||||
})
|
||||
.map_err(|_| DbError::new(DbErrorKind::QueryError))
|
||||
}
|
||||
|
||||
fn has_any_summaries(&mut self, context: &opentelemetry::Context) -> Result<bool, DbError> {
|
||||
trace_db_call(context, "query", "has_any_summaries", |_span| {
|
||||
let mut conn = self
|
||||
.connection
|
||||
.lock()
|
||||
.expect("Unable to get DailySummaryDao");
|
||||
|
||||
#[derive(QueryableByName)]
|
||||
struct ProbeResult {
|
||||
#[diesel(sql_type = diesel::sql_types::Integer)]
|
||||
#[allow(dead_code)]
|
||||
one: i32,
|
||||
}
|
||||
|
||||
let rows: Vec<ProbeResult> =
|
||||
diesel::sql_query("SELECT 1 as one FROM daily_conversation_summaries LIMIT 1")
|
||||
.load(conn.deref_mut())
|
||||
.map_err(|e| anyhow::anyhow!("Failed to probe daily summaries: {}", e))?;
|
||||
|
||||
Ok(!rows.is_empty())
|
||||
})
|
||||
.map_err(|_| DbError::new(DbErrorKind::QueryError))
|
||||
}
|
||||
}
|
||||
|
||||
// Helper structs for raw SQL queries
|
||||
|
||||
@@ -21,22 +21,6 @@ pub trait InsightDao: Sync + Send {
|
||||
file_path: &str,
|
||||
) -> Result<Option<PhotoInsight>, DbError>;
|
||||
|
||||
/// Library-scoped variant of `get_insight`. The default `get_insight`
|
||||
/// finds any `is_current=true` row matching `file_path` across
|
||||
/// libraries — fine for the photo-grid metadata fetch (cross-library
|
||||
/// merge), wrong for the chat path: a regenerate on lib1 flips lib1's
|
||||
/// row to `is_current=false` and inserts a new lib1 row, but
|
||||
/// lib2's untouched `is_current=true` row for the same rel_path
|
||||
/// would still satisfy the path-only query and shadow the regen on
|
||||
/// the next history fetch. Always pass a library_id when you have
|
||||
/// one (chat / insight write paths always do).
|
||||
fn get_current_insight_for_library(
|
||||
&mut self,
|
||||
context: &opentelemetry::Context,
|
||||
library_id: i32,
|
||||
file_path: &str,
|
||||
) -> Result<Option<PhotoInsight>, DbError>;
|
||||
|
||||
/// Return the most recent current insight whose rel_path is one of
|
||||
/// `paths`. Used for content-hash sharing: the caller expands a
|
||||
/// single file into all rel_paths with the same content_hash, then
|
||||
@@ -127,30 +111,13 @@ impl InsightDao for SqliteInsightDao {
|
||||
fn store_insight(
|
||||
&mut self,
|
||||
context: &opentelemetry::Context,
|
||||
mut insight: InsertPhotoInsight,
|
||||
insight: InsertPhotoInsight,
|
||||
) -> Result<PhotoInsight, DbError> {
|
||||
trace_db_call(context, "insert", "store_insight", |_span| {
|
||||
use schema::photo_insights::dsl::*;
|
||||
|
||||
let mut connection = self.connection.lock().expect("Unable to get InsightDao");
|
||||
|
||||
// Eagerly populate content_hash so this insight follows the
|
||||
// bytes (CLAUDE.md "Multi-library data model"). Caller-
|
||||
// supplied hash wins; otherwise look it up from image_exif
|
||||
// for the (library_id, rel_path) tuple. None is acceptable —
|
||||
// reconciliation backfills it once the hash lands.
|
||||
if insight.content_hash.is_none() {
|
||||
use schema::image_exif as ie;
|
||||
insight.content_hash = ie::table
|
||||
.filter(ie::library_id.eq(insight.library_id))
|
||||
.filter(ie::rel_path.eq(&insight.file_path))
|
||||
.filter(ie::content_hash.is_not_null())
|
||||
.select(ie::content_hash)
|
||||
.first::<Option<String>>(connection.deref_mut())
|
||||
.ok()
|
||||
.flatten();
|
||||
}
|
||||
|
||||
// Mark all existing insights for this file as no longer current
|
||||
diesel::update(
|
||||
photo_insights
|
||||
@@ -198,33 +165,6 @@ impl InsightDao for SqliteInsightDao {
|
||||
.map_err(|_| DbError::new(DbErrorKind::QueryError))
|
||||
}
|
||||
|
||||
fn get_current_insight_for_library(
|
||||
&mut self,
|
||||
context: &opentelemetry::Context,
|
||||
lib_id: i32,
|
||||
path: &str,
|
||||
) -> Result<Option<PhotoInsight>, DbError> {
|
||||
trace_db_call(
|
||||
context,
|
||||
"query",
|
||||
"get_current_insight_for_library",
|
||||
|_span| {
|
||||
use schema::photo_insights::dsl::*;
|
||||
|
||||
let mut connection = self.connection.lock().expect("Unable to get InsightDao");
|
||||
|
||||
photo_insights
|
||||
.filter(library_id.eq(lib_id))
|
||||
.filter(rel_path.eq(path))
|
||||
.filter(is_current.eq(true))
|
||||
.first::<PhotoInsight>(connection.deref_mut())
|
||||
.optional()
|
||||
.map_err(|_| anyhow::anyhow!("Query error"))
|
||||
},
|
||||
)
|
||||
.map_err(|_| DbError::new(DbErrorKind::QueryError))
|
||||
}
|
||||
|
||||
fn get_insight_for_paths(
|
||||
&mut self,
|
||||
context: &opentelemetry::Context,
|
||||
|
||||
+15
-1971
File diff suppressed because it is too large
Load Diff
+7
-1409
File diff suppressed because it is too large
Load Diff
+2
-141
@@ -1,6 +1,6 @@
|
||||
use crate::database::schema::{
|
||||
entities, entity_facts, entity_photo_links, favorites, image_exif, libraries, personas,
|
||||
photo_insights, users, video_preview_clips,
|
||||
entities, entity_facts, entity_photo_links, favorites, image_exif, libraries, photo_insights,
|
||||
users, video_preview_clips,
|
||||
};
|
||||
use serde::Serialize;
|
||||
|
||||
@@ -59,16 +59,6 @@ pub struct InsertImageExif {
|
||||
pub last_modified: i64,
|
||||
pub content_hash: Option<String>,
|
||||
pub size_bytes: Option<i64>,
|
||||
/// 64-bit pHash (DCT) packed as i64. NULL for videos and decode failures.
|
||||
pub phash_64: Option<i64>,
|
||||
/// 64-bit dHash (gradient). NULL for videos and decode failures.
|
||||
pub dhash_64: Option<i64>,
|
||||
/// Which step of the canonical-date waterfall populated `date_taken`:
|
||||
/// `"exif"` | `"exiftool"` | `"filename"` | `"fs_time"`. NULL when
|
||||
/// `date_taken` is NULL (no source resolved it). The per-tick backfill
|
||||
/// drain re-resolves rows whose source is `"fs_time"` once exiftool
|
||||
/// has had a chance to run.
|
||||
pub date_taken_source: Option<String>,
|
||||
}
|
||||
|
||||
// Field order matches the post-migration column order in `image_exif`.
|
||||
@@ -96,24 +86,6 @@ pub struct ImageExif {
|
||||
pub last_modified: i64,
|
||||
pub content_hash: Option<String>,
|
||||
pub size_bytes: Option<i64>,
|
||||
pub phash_64: Option<i64>,
|
||||
pub dhash_64: Option<i64>,
|
||||
/// When non-null, this row is a soft-marked duplicate of the file
|
||||
/// whose `content_hash` matches this value. The default `/photos`
|
||||
/// listing filters such rows out.
|
||||
pub duplicate_of_hash: Option<String>,
|
||||
/// Unix seconds at which the resolve was committed.
|
||||
pub duplicate_decided_at: Option<i64>,
|
||||
/// Which step of the canonical-date waterfall populated `date_taken`.
|
||||
/// Plus `"manual"` when the operator has set it via POST /image/exif/date.
|
||||
pub date_taken_source: Option<String>,
|
||||
/// Snapshot of the prior `date_taken` taken on first manual override.
|
||||
/// NULL when no override is active. POST /image/exif/date/clear restores
|
||||
/// `date_taken` from this column and nulls it back out.
|
||||
pub original_date_taken: Option<i64>,
|
||||
/// Snapshot of the prior `date_taken_source` taken on first manual
|
||||
/// override. NULL when no override is active.
|
||||
pub original_date_taken_source: Option<String>,
|
||||
}
|
||||
|
||||
#[derive(Insertable)]
|
||||
@@ -136,13 +108,6 @@ pub struct InsertPhotoInsight {
|
||||
/// generation). Used downstream to filter out contaminated rows when
|
||||
/// assembling an unbiased training / evaluation set.
|
||||
pub fewshot_source_ids: Option<String>,
|
||||
/// Bytes-keyed identity. When present, this insight is considered
|
||||
/// to belong to the content rather than the path — see CLAUDE.md
|
||||
/// "Multi-library data model". The DAO populates this from
|
||||
/// `image_exif.content_hash` at insert time when known; rows
|
||||
/// inserted before the hash is available stay null and the
|
||||
/// reconciliation pass backfills them.
|
||||
pub content_hash: Option<String>,
|
||||
}
|
||||
|
||||
#[derive(Serialize, Queryable, Clone, Debug)]
|
||||
@@ -161,7 +126,6 @@ pub struct PhotoInsight {
|
||||
/// `"local"` (Ollama with images) | `"hybrid"` (local vision + OpenRouter chat).
|
||||
pub backend: String,
|
||||
pub fewshot_source_ids: Option<String>,
|
||||
pub content_hash: Option<String>,
|
||||
}
|
||||
|
||||
// --- Libraries ---
|
||||
@@ -172,20 +136,6 @@ pub struct LibraryRow {
|
||||
pub name: String,
|
||||
pub root_path: String,
|
||||
pub created_at: i64,
|
||||
/// Operator kill switch. `false` = the watcher skips this library
|
||||
/// entirely (no probe, no ingest, no maintenance) and orphan-GC
|
||||
/// treats it as out-of-scope for the all-online consensus rule.
|
||||
/// Toggle via SQL today — there is intentionally no HTTP endpoint
|
||||
/// for library mutation (see CLAUDE.md "Multi-library data model").
|
||||
pub enabled: bool,
|
||||
/// Per-library excluded paths/patterns, stored comma-separated
|
||||
/// (same shape as the global `EXCLUDED_DIRS` env var). NULL = no
|
||||
/// extra excludes for this library; the global env var still
|
||||
/// applies. The runtime `Library` struct parses this into a
|
||||
/// `Vec<String>` and the walker applies the union of (global,
|
||||
/// library) excludes when scanning. Use case: mount a parent
|
||||
/// directory while another library covers a child subtree.
|
||||
pub excluded_dirs: Option<String>,
|
||||
}
|
||||
|
||||
#[derive(Insertable)]
|
||||
@@ -194,8 +144,6 @@ pub struct InsertLibrary<'a> {
|
||||
pub name: &'a str,
|
||||
pub root_path: &'a str,
|
||||
pub created_at: i64,
|
||||
pub enabled: bool,
|
||||
pub excluded_dirs: Option<&'a str>,
|
||||
}
|
||||
|
||||
// --- Knowledge memory models ---
|
||||
@@ -238,44 +186,6 @@ pub struct InsertEntityFact {
|
||||
pub confidence: f32,
|
||||
pub status: String,
|
||||
pub created_at: i64,
|
||||
/// Which persona authored this fact. Shared entities, persona-tagged
|
||||
/// facts: each persona accumulates its own voice over the same
|
||||
/// real-world referents. Defaults to `'default'` for legacy rows
|
||||
/// (see migration 2026-05-09-000000).
|
||||
pub persona_id: String,
|
||||
/// Author's user_id. Required for the composite FK to
|
||||
/// `personas(user_id, persona_id)` (migration 2026-05-10-000000) and
|
||||
/// for cross-user fact isolation: two users with the same 'default'
|
||||
/// persona must not see each other's facts. Always paired with
|
||||
/// `persona_id` — they're a unit.
|
||||
pub user_id: i32,
|
||||
/// Real-world period the fact is/was true (unix seconds). NULL on
|
||||
/// either side = unbounded — `valid_from IS NULL` reads as
|
||||
/// "always-true-back-to-the-beginning", `valid_until IS NULL` as
|
||||
/// "still-true-now-or-unknown". Distinguishes valid time from
|
||||
/// transaction time (`created_at` is when we recorded the fact,
|
||||
/// not when it was true in the world). See migration
|
||||
/// 2026-05-10-000100.
|
||||
pub valid_from: Option<i64>,
|
||||
pub valid_until: Option<i64>,
|
||||
/// Points at the entity_facts.id that replaced this one. Set by
|
||||
/// the supersede endpoint; status flips to 'superseded' in the
|
||||
/// same transaction. See migration 2026-05-10-000200.
|
||||
pub superseded_by: Option<i32>,
|
||||
/// Provenance for model audit — see migration 2026-05-10-000300.
|
||||
/// `created_by_model` is the LLM identifier (e.g. "qwen2.5:7b",
|
||||
/// "anthropic/claude-sonnet-4") or NULL for legacy / manual rows.
|
||||
/// `created_by_backend` is "local" / "hybrid" / "manual" / NULL.
|
||||
pub created_by_model: Option<String>,
|
||||
pub created_by_backend: Option<String>,
|
||||
/// Audit trail for mutations after creation — see migration
|
||||
/// 2026-05-10-000500. `last_modified_*` stamp on any update
|
||||
/// (status flip, valid-time edit, supersede, manual PATCH);
|
||||
/// `last_modified_at` is unix seconds. NULL on rows that have
|
||||
/// never been touched since creation.
|
||||
pub last_modified_by_model: Option<String>,
|
||||
pub last_modified_by_backend: Option<String>,
|
||||
pub last_modified_at: Option<i64>,
|
||||
}
|
||||
|
||||
#[derive(Serialize, Queryable, Clone, Debug)]
|
||||
@@ -290,16 +200,6 @@ pub struct EntityFact {
|
||||
pub confidence: f32,
|
||||
pub status: String,
|
||||
pub created_at: i64,
|
||||
pub persona_id: String,
|
||||
pub user_id: i32,
|
||||
pub valid_from: Option<i64>,
|
||||
pub valid_until: Option<i64>,
|
||||
pub superseded_by: Option<i32>,
|
||||
pub created_by_model: Option<String>,
|
||||
pub created_by_backend: Option<String>,
|
||||
pub last_modified_by_model: Option<String>,
|
||||
pub last_modified_by_backend: Option<String>,
|
||||
pub last_modified_at: Option<i64>,
|
||||
}
|
||||
|
||||
#[derive(Insertable)]
|
||||
@@ -322,45 +222,6 @@ pub struct EntityPhotoLink {
|
||||
pub role: String,
|
||||
}
|
||||
|
||||
// --- Personas ---
|
||||
|
||||
#[derive(Insertable)]
|
||||
#[diesel(table_name = personas)]
|
||||
pub struct InsertPersona<'a> {
|
||||
pub user_id: i32,
|
||||
pub persona_id: &'a str,
|
||||
pub name: &'a str,
|
||||
pub system_prompt: &'a str,
|
||||
pub is_built_in: bool,
|
||||
pub include_all_memories: bool,
|
||||
pub created_at: i64,
|
||||
pub updated_at: i64,
|
||||
/// "Strict mode" — agent reads only see facts with status =
|
||||
/// 'reviewed' (human-verified). Default false. See migration
|
||||
/// 2026-05-10-000400.
|
||||
pub reviewed_only_facts: bool,
|
||||
/// Gate for the agent's update_fact / supersede_fact tools.
|
||||
/// Default false — fresh personas let the agent create but not
|
||||
/// alter or replace. Operator opts in once a model has earned
|
||||
/// trust. See migration 2026-05-10-000500.
|
||||
pub allow_agent_corrections: bool,
|
||||
}
|
||||
|
||||
#[derive(Serialize, Queryable, Clone, Debug)]
|
||||
pub struct Persona {
|
||||
pub id: i32,
|
||||
pub user_id: i32,
|
||||
pub persona_id: String,
|
||||
pub name: String,
|
||||
pub system_prompt: String,
|
||||
pub is_built_in: bool,
|
||||
pub include_all_memories: bool,
|
||||
pub created_at: i64,
|
||||
pub updated_at: i64,
|
||||
pub reviewed_only_facts: bool,
|
||||
pub allow_agent_corrections: bool,
|
||||
}
|
||||
|
||||
#[derive(Insertable)]
|
||||
#[diesel(table_name = video_preview_clips)]
|
||||
pub struct InsertVideoPreviewClip {
|
||||
|
||||
@@ -1,447 +0,0 @@
|
||||
#![allow(dead_code)]
|
||||
|
||||
use diesel::prelude::*;
|
||||
use diesel::sqlite::SqliteConnection;
|
||||
use std::ops::DerefMut;
|
||||
use std::sync::{Arc, Mutex};
|
||||
|
||||
use crate::database::models::{InsertPersona, Persona};
|
||||
use crate::database::schema;
|
||||
use crate::database::{DbError, DbErrorKind, connect};
|
||||
use crate::otel::trace_db_call;
|
||||
|
||||
/// Patch shape for update_persona. None = leave field alone. Built-ins are
|
||||
/// allowed to flip `include_all_memories` but should reject name/prompt
|
||||
/// edits at the handler layer (built-in copy lives in the migration).
|
||||
pub struct PersonaPatch {
|
||||
pub name: Option<String>,
|
||||
pub system_prompt: Option<String>,
|
||||
pub include_all_memories: Option<bool>,
|
||||
pub reviewed_only_facts: Option<bool>,
|
||||
pub allow_agent_corrections: Option<bool>,
|
||||
}
|
||||
|
||||
/// One row of a bulk migration upload. Fields named to match the JSON
|
||||
/// shape the mobile client uploads (`POST /personas/migrate`).
|
||||
pub struct ImportPersona {
|
||||
pub persona_id: String,
|
||||
pub name: String,
|
||||
pub system_prompt: String,
|
||||
pub is_built_in: bool,
|
||||
pub created_at: i64,
|
||||
}
|
||||
|
||||
pub trait PersonaDao: Sync + Send {
|
||||
fn list_personas(
|
||||
&mut self,
|
||||
cx: &opentelemetry::Context,
|
||||
user_id: i32,
|
||||
) -> Result<Vec<Persona>, DbError>;
|
||||
|
||||
fn get_persona(
|
||||
&mut self,
|
||||
cx: &opentelemetry::Context,
|
||||
user_id: i32,
|
||||
persona_id: &str,
|
||||
) -> Result<Option<Persona>, DbError>;
|
||||
|
||||
fn create_persona(
|
||||
&mut self,
|
||||
cx: &opentelemetry::Context,
|
||||
user_id: i32,
|
||||
persona_id: &str,
|
||||
name: &str,
|
||||
system_prompt: &str,
|
||||
is_built_in: bool,
|
||||
include_all_memories: bool,
|
||||
) -> Result<Persona, DbError>;
|
||||
|
||||
fn update_persona(
|
||||
&mut self,
|
||||
cx: &opentelemetry::Context,
|
||||
user_id: i32,
|
||||
persona_id: &str,
|
||||
patch: PersonaPatch,
|
||||
) -> Result<Option<Persona>, DbError>;
|
||||
|
||||
fn delete_persona(
|
||||
&mut self,
|
||||
cx: &opentelemetry::Context,
|
||||
user_id: i32,
|
||||
persona_id: &str,
|
||||
) -> Result<bool, DbError>;
|
||||
|
||||
/// Idempotent bulk import. INSERT OR IGNORE on (user_id, persona_id)
|
||||
/// — re-uploading the same set is a no-op. Returns the number of rows
|
||||
/// actually inserted (skipped duplicates don't count).
|
||||
fn bulk_import(
|
||||
&mut self,
|
||||
cx: &opentelemetry::Context,
|
||||
user_id: i32,
|
||||
personas: &[ImportPersona],
|
||||
) -> Result<usize, DbError>;
|
||||
}
|
||||
|
||||
pub struct SqlitePersonaDao {
|
||||
connection: Arc<Mutex<SqliteConnection>>,
|
||||
}
|
||||
|
||||
impl Default for SqlitePersonaDao {
|
||||
fn default() -> Self {
|
||||
Self::new()
|
||||
}
|
||||
}
|
||||
|
||||
impl SqlitePersonaDao {
|
||||
pub fn new() -> Self {
|
||||
Self {
|
||||
connection: Arc::new(Mutex::new(connect())),
|
||||
}
|
||||
}
|
||||
|
||||
pub fn from_connection(conn: Arc<Mutex<SqliteConnection>>) -> Self {
|
||||
Self { connection: conn }
|
||||
}
|
||||
}
|
||||
|
||||
impl PersonaDao for SqlitePersonaDao {
|
||||
fn list_personas(
|
||||
&mut self,
|
||||
cx: &opentelemetry::Context,
|
||||
uid: i32,
|
||||
) -> Result<Vec<Persona>, DbError> {
|
||||
trace_db_call(cx, "query", "list_personas", |_span| {
|
||||
use schema::personas::dsl::*;
|
||||
let mut conn = self.connection.lock().expect("PersonaDao lock");
|
||||
personas
|
||||
.filter(user_id.eq(uid))
|
||||
.order(created_at.asc())
|
||||
.load::<Persona>(conn.deref_mut())
|
||||
.map_err(|e| anyhow::anyhow!("Query error: {}", e))
|
||||
})
|
||||
.map_err(|_| DbError::new(DbErrorKind::QueryError))
|
||||
}
|
||||
|
||||
fn get_persona(
|
||||
&mut self,
|
||||
cx: &opentelemetry::Context,
|
||||
uid: i32,
|
||||
pid: &str,
|
||||
) -> Result<Option<Persona>, DbError> {
|
||||
trace_db_call(cx, "query", "get_persona", |_span| {
|
||||
use schema::personas::dsl::*;
|
||||
let mut conn = self.connection.lock().expect("PersonaDao lock");
|
||||
personas
|
||||
.filter(user_id.eq(uid))
|
||||
.filter(persona_id.eq(pid))
|
||||
.first::<Persona>(conn.deref_mut())
|
||||
.optional()
|
||||
.map_err(|e| anyhow::anyhow!("Query error: {}", e))
|
||||
})
|
||||
.map_err(|_| DbError::new(DbErrorKind::QueryError))
|
||||
}
|
||||
|
||||
fn create_persona(
|
||||
&mut self,
|
||||
cx: &opentelemetry::Context,
|
||||
uid: i32,
|
||||
pid: &str,
|
||||
nm: &str,
|
||||
prompt: &str,
|
||||
builtin: bool,
|
||||
include_all: bool,
|
||||
) -> Result<Persona, DbError> {
|
||||
trace_db_call(cx, "insert", "create_persona", |_span| {
|
||||
use schema::personas::dsl::*;
|
||||
let mut conn = self.connection.lock().expect("PersonaDao lock");
|
||||
let now = chrono::Utc::now().timestamp_millis();
|
||||
|
||||
diesel::insert_into(personas)
|
||||
.values(InsertPersona {
|
||||
user_id: uid,
|
||||
persona_id: pid,
|
||||
name: nm,
|
||||
system_prompt: prompt,
|
||||
is_built_in: builtin,
|
||||
include_all_memories: include_all,
|
||||
created_at: now,
|
||||
updated_at: now,
|
||||
reviewed_only_facts: false,
|
||||
allow_agent_corrections: false,
|
||||
})
|
||||
.execute(conn.deref_mut())
|
||||
.map_err(|e| anyhow::anyhow!("Insert error: {}", e))?;
|
||||
|
||||
personas
|
||||
.filter(user_id.eq(uid))
|
||||
.filter(persona_id.eq(pid))
|
||||
.first::<Persona>(conn.deref_mut())
|
||||
.map_err(|e| anyhow::anyhow!("Query error: {}", e))
|
||||
})
|
||||
.map_err(|_| DbError::new(DbErrorKind::InsertError))
|
||||
}
|
||||
|
||||
fn update_persona(
|
||||
&mut self,
|
||||
cx: &opentelemetry::Context,
|
||||
uid: i32,
|
||||
pid: &str,
|
||||
patch: PersonaPatch,
|
||||
) -> Result<Option<Persona>, DbError> {
|
||||
trace_db_call(cx, "update", "update_persona", |_span| {
|
||||
use schema::personas::dsl::*;
|
||||
let mut conn = self.connection.lock().expect("PersonaDao lock");
|
||||
let now = chrono::Utc::now().timestamp_millis();
|
||||
|
||||
// Apply each field as its own UPDATE — keeps types simple
|
||||
// (Diesel's tuple updates don't compose cleanly across optional
|
||||
// columns) and matches the pattern already in use for entities
|
||||
// (knowledge_dao.rs::update_entity).
|
||||
if let Some(ref new_name) = patch.name {
|
||||
diesel::update(personas.filter(user_id.eq(uid)).filter(persona_id.eq(pid)))
|
||||
.set((name.eq(new_name), updated_at.eq(now)))
|
||||
.execute(conn.deref_mut())
|
||||
.map_err(|e| anyhow::anyhow!("Update name error: {}", e))?;
|
||||
}
|
||||
if let Some(ref new_prompt) = patch.system_prompt {
|
||||
diesel::update(personas.filter(user_id.eq(uid)).filter(persona_id.eq(pid)))
|
||||
.set((system_prompt.eq(new_prompt), updated_at.eq(now)))
|
||||
.execute(conn.deref_mut())
|
||||
.map_err(|e| anyhow::anyhow!("Update prompt error: {}", e))?;
|
||||
}
|
||||
if let Some(new_include_all) = patch.include_all_memories {
|
||||
diesel::update(personas.filter(user_id.eq(uid)).filter(persona_id.eq(pid)))
|
||||
.set((include_all_memories.eq(new_include_all), updated_at.eq(now)))
|
||||
.execute(conn.deref_mut())
|
||||
.map_err(|e| anyhow::anyhow!("Update include_all error: {}", e))?;
|
||||
}
|
||||
if let Some(new_reviewed_only) = patch.reviewed_only_facts {
|
||||
diesel::update(personas.filter(user_id.eq(uid)).filter(persona_id.eq(pid)))
|
||||
.set((
|
||||
reviewed_only_facts.eq(new_reviewed_only),
|
||||
updated_at.eq(now),
|
||||
))
|
||||
.execute(conn.deref_mut())
|
||||
.map_err(|e| anyhow::anyhow!("Update reviewed_only_facts error: {}", e))?;
|
||||
}
|
||||
if let Some(new_allow_corrections) = patch.allow_agent_corrections {
|
||||
diesel::update(personas.filter(user_id.eq(uid)).filter(persona_id.eq(pid)))
|
||||
.set((
|
||||
allow_agent_corrections.eq(new_allow_corrections),
|
||||
updated_at.eq(now),
|
||||
))
|
||||
.execute(conn.deref_mut())
|
||||
.map_err(|e| anyhow::anyhow!("Update allow_agent_corrections error: {}", e))?;
|
||||
}
|
||||
|
||||
personas
|
||||
.filter(user_id.eq(uid))
|
||||
.filter(persona_id.eq(pid))
|
||||
.first::<Persona>(conn.deref_mut())
|
||||
.optional()
|
||||
.map_err(|e| anyhow::anyhow!("Query error: {}", e))
|
||||
})
|
||||
.map_err(|_| DbError::new(DbErrorKind::UpdateError))
|
||||
}
|
||||
|
||||
fn delete_persona(
|
||||
&mut self,
|
||||
cx: &opentelemetry::Context,
|
||||
uid: i32,
|
||||
pid: &str,
|
||||
) -> Result<bool, DbError> {
|
||||
trace_db_call(cx, "delete", "delete_persona", |_span| {
|
||||
use schema::personas::dsl::*;
|
||||
let mut conn = self.connection.lock().expect("PersonaDao lock");
|
||||
let n = diesel::delete(personas.filter(user_id.eq(uid)).filter(persona_id.eq(pid)))
|
||||
.execute(conn.deref_mut())
|
||||
.map_err(|e| anyhow::anyhow!("Delete error: {}", e))?;
|
||||
Ok(n > 0)
|
||||
})
|
||||
.map_err(|_| DbError::new(DbErrorKind::QueryError))
|
||||
}
|
||||
|
||||
fn bulk_import(
|
||||
&mut self,
|
||||
cx: &opentelemetry::Context,
|
||||
uid: i32,
|
||||
rows: &[ImportPersona],
|
||||
) -> Result<usize, DbError> {
|
||||
trace_db_call(cx, "insert", "bulk_import_personas", |_span| {
|
||||
let mut conn = self.connection.lock().expect("PersonaDao lock");
|
||||
let now = chrono::Utc::now().timestamp_millis();
|
||||
let mut inserted = 0usize;
|
||||
|
||||
// INSERT OR IGNORE on the (user_id, persona_id) UNIQUE so
|
||||
// re-running migrate is a no-op for personas already on the
|
||||
// server.
|
||||
for p in rows {
|
||||
let n = diesel::sql_query(
|
||||
"INSERT OR IGNORE INTO personas (user_id, persona_id, name, system_prompt, \
|
||||
is_built_in, include_all_memories, created_at, updated_at) \
|
||||
VALUES (?, ?, ?, ?, ?, 0, ?, ?)",
|
||||
)
|
||||
.bind::<diesel::sql_types::Integer, _>(uid)
|
||||
.bind::<diesel::sql_types::Text, _>(&p.persona_id)
|
||||
.bind::<diesel::sql_types::Text, _>(&p.name)
|
||||
.bind::<diesel::sql_types::Text, _>(&p.system_prompt)
|
||||
.bind::<diesel::sql_types::Bool, _>(p.is_built_in)
|
||||
.bind::<diesel::sql_types::BigInt, _>(p.created_at)
|
||||
.bind::<diesel::sql_types::BigInt, _>(now)
|
||||
.execute(conn.deref_mut())
|
||||
.map_err(|e| anyhow::anyhow!("Insert error: {}", e))?;
|
||||
inserted += n;
|
||||
}
|
||||
Ok(inserted)
|
||||
})
|
||||
.map_err(|_| DbError::new(DbErrorKind::InsertError))
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
use crate::database::test::in_memory_db_connection;
|
||||
|
||||
fn dao_with_user(username: &str) -> (SqlitePersonaDao, i32) {
|
||||
use crate::database::schema::users::dsl as u;
|
||||
let conn = Arc::new(Mutex::new(in_memory_db_connection()));
|
||||
diesel::insert_into(u::users)
|
||||
.values((u::username.eq(username), u::password.eq("x")))
|
||||
.execute(conn.lock().unwrap().deref_mut())
|
||||
.unwrap();
|
||||
let user_id: i32 = u::users
|
||||
.filter(u::username.eq(username))
|
||||
.select(u::id)
|
||||
.first(conn.lock().unwrap().deref_mut())
|
||||
.unwrap();
|
||||
(SqlitePersonaDao::from_connection(conn), user_id)
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn create_and_list_round_trip() {
|
||||
let cx = opentelemetry::Context::new();
|
||||
let (mut dao, uid) = dao_with_user("alice");
|
||||
|
||||
// The migration seeds 3 built-ins for any existing user; alice
|
||||
// was created post-migration so she starts empty.
|
||||
let p = dao
|
||||
.create_persona(&cx, uid, "custom-1", "Custom A", "prompt A", false, false)
|
||||
.unwrap();
|
||||
assert_eq!(p.persona_id, "custom-1");
|
||||
assert_eq!(p.user_id, uid);
|
||||
assert!(!p.is_built_in);
|
||||
|
||||
let list = dao.list_personas(&cx, uid).unwrap();
|
||||
assert_eq!(list.len(), 1);
|
||||
assert_eq!(list[0].persona_id, "custom-1");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn unique_constraint_blocks_duplicate_persona_id() {
|
||||
let cx = opentelemetry::Context::new();
|
||||
let (mut dao, uid) = dao_with_user("bob");
|
||||
|
||||
dao.create_persona(&cx, uid, "x", "X", "p", false, false)
|
||||
.unwrap();
|
||||
let err = dao.create_persona(&cx, uid, "x", "X2", "p2", false, false);
|
||||
assert!(
|
||||
err.is_err(),
|
||||
"second insert with same persona_id should fail"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn bulk_import_is_idempotent() {
|
||||
let cx = opentelemetry::Context::new();
|
||||
let (mut dao, uid) = dao_with_user("carol");
|
||||
|
||||
let rows = vec![
|
||||
ImportPersona {
|
||||
persona_id: "custom-a".into(),
|
||||
name: "A".into(),
|
||||
system_prompt: "p1".into(),
|
||||
is_built_in: false,
|
||||
created_at: 1,
|
||||
},
|
||||
ImportPersona {
|
||||
persona_id: "custom-b".into(),
|
||||
name: "B".into(),
|
||||
system_prompt: "p2".into(),
|
||||
is_built_in: false,
|
||||
created_at: 2,
|
||||
},
|
||||
];
|
||||
|
||||
let first = dao.bulk_import(&cx, uid, &rows).unwrap();
|
||||
assert_eq!(first, 2);
|
||||
let second = dao.bulk_import(&cx, uid, &rows).unwrap();
|
||||
assert_eq!(second, 0, "re-import should insert nothing");
|
||||
|
||||
assert_eq!(dao.list_personas(&cx, uid).unwrap().len(), 2);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn dao_update_does_not_block_built_ins() {
|
||||
// Documenting contract: the DAO is intentionally permissive —
|
||||
// `update_persona` will apply name/system_prompt edits to ANY
|
||||
// row, including built-ins. The guard against editing built-in
|
||||
// identity (name + systemPrompt) lives in the HTTP handler
|
||||
// (src/personas.rs::update_persona). If you find yourself
|
||||
// wanting to add the guard here too, prefer that — defence in
|
||||
// depth — but keep this test passing so anyone who removes
|
||||
// the handler guard gets a failing call site, not silent data
|
||||
// corruption.
|
||||
let cx = opentelemetry::Context::new();
|
||||
let (mut dao, uid) = dao_with_user("eve");
|
||||
|
||||
dao.create_persona(&cx, uid, "default", "Default", "old", true, false)
|
||||
.unwrap();
|
||||
let updated = dao
|
||||
.update_persona(
|
||||
&cx,
|
||||
uid,
|
||||
"default",
|
||||
PersonaPatch {
|
||||
name: Some("Renamed".into()),
|
||||
system_prompt: Some("new prompt".into()),
|
||||
include_all_memories: None,
|
||||
reviewed_only_facts: None,
|
||||
allow_agent_corrections: None,
|
||||
},
|
||||
)
|
||||
.unwrap()
|
||||
.unwrap();
|
||||
assert_eq!(updated.name, "Renamed");
|
||||
assert_eq!(updated.system_prompt, "new prompt");
|
||||
assert!(
|
||||
updated.is_built_in,
|
||||
"is_built_in flag should be unchanged by patch"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn update_toggles_include_all_memories() {
|
||||
let cx = opentelemetry::Context::new();
|
||||
let (mut dao, uid) = dao_with_user("dan");
|
||||
|
||||
dao.create_persona(&cx, uid, "j", "Journal", "p", true, false)
|
||||
.unwrap();
|
||||
let updated = dao
|
||||
.update_persona(
|
||||
&cx,
|
||||
uid,
|
||||
"j",
|
||||
PersonaPatch {
|
||||
name: None,
|
||||
system_prompt: None,
|
||||
include_all_memories: Some(true),
|
||||
reviewed_only_facts: None,
|
||||
allow_agent_corrections: None,
|
||||
},
|
||||
)
|
||||
.unwrap()
|
||||
.unwrap();
|
||||
assert!(updated.include_all_memories);
|
||||
}
|
||||
}
|
||||
@@ -1,382 +0,0 @@
|
||||
//! Reconciliation pass for hash-keyed derived data.
|
||||
//!
|
||||
//! As `backfill_unhashed_backlog` populates `image_exif.content_hash`
|
||||
//! for legacy rows, we want the matching `tagged_photo` and
|
||||
//! `photo_insights` rows — which were inserted before the hash was
|
||||
//! known — to inherit the hash too. Otherwise reads keep falling back
|
||||
//! to the rel_path path even when a hash is now available.
|
||||
//!
|
||||
//! Two passes:
|
||||
//! 1. **Hash backfill** — for every `tagged_photo` / `photo_insights`
|
||||
//! row with NULL `content_hash`, look up the matching
|
||||
//! `image_exif.content_hash` and write it. SQL-only; idempotent;
|
||||
//! a no-op once everything is hashed.
|
||||
//! 2. **Insight scalar merge** — when multiple `photo_insights` rows
|
||||
//! share a `content_hash` with `is_current = true`, only the
|
||||
//! earliest `generated_at` keeps `is_current = true` (per the
|
||||
//! "earliest wins" rule in CLAUDE.md → "Multi-library data
|
||||
//! model"). Others are demoted, not deleted, so they remain
|
||||
//! visible in history endpoints.
|
||||
//!
|
||||
//! Tags are set-valued under the policy (union on read), so there's no
|
||||
//! analogous "collapse" pass — duplicate `(tag_id, content_hash)` rows
|
||||
//! across libraries are harmless and correctly de-duped at read time
|
||||
//! by the existing `DISTINCT` queries.
|
||||
//!
|
||||
//! The pass operates on the database alone — no filesystem access —
|
||||
//! so it doesn't need the library availability gate.
|
||||
|
||||
// The lib doesn't call into this module directly — the watcher (in the
|
||||
// bin) does. Dead-code analysis at the lib level can't see that, so
|
||||
// suppress at the module level. Tests still exercise every function.
|
||||
#![allow(dead_code)]
|
||||
|
||||
use diesel::prelude::*;
|
||||
use diesel::sql_query;
|
||||
use diesel::sqlite::SqliteConnection;
|
||||
use log::{debug, info, warn};
|
||||
|
||||
/// Outcome of a reconciliation tick. Tracked so the watcher can log
|
||||
/// progress when something changed and stay quiet when nothing did.
|
||||
#[derive(Debug, Default, Clone, Copy, PartialEq, Eq)]
|
||||
pub struct ReconcileStats {
|
||||
pub tagged_photo_hashes_filled: usize,
|
||||
pub photo_insights_hashes_filled: usize,
|
||||
pub photo_insights_demoted: usize,
|
||||
}
|
||||
|
||||
impl ReconcileStats {
|
||||
pub fn changed(&self) -> bool {
|
||||
self.tagged_photo_hashes_filled > 0
|
||||
|| self.photo_insights_hashes_filled > 0
|
||||
|| self.photo_insights_demoted > 0
|
||||
}
|
||||
}
|
||||
|
||||
/// Run the reconciliation pass. Idempotent — safe to call on every
|
||||
/// watcher tick. Errors are logged but never propagated; reconciliation
|
||||
/// is best-effort and a transient DB hiccup must not stall the watcher.
|
||||
pub fn run(conn: &mut SqliteConnection) -> ReconcileStats {
|
||||
let mut stats = ReconcileStats::default();
|
||||
|
||||
stats.tagged_photo_hashes_filled = match backfill_tagged_photo_hashes(conn) {
|
||||
Ok(n) => n,
|
||||
Err(e) => {
|
||||
warn!("reconcile: tagged_photo hash backfill failed: {:?}", e);
|
||||
0
|
||||
}
|
||||
};
|
||||
|
||||
stats.photo_insights_hashes_filled = match backfill_photo_insights_hashes(conn) {
|
||||
Ok(n) => n,
|
||||
Err(e) => {
|
||||
warn!("reconcile: photo_insights hash backfill failed: {:?}", e);
|
||||
0
|
||||
}
|
||||
};
|
||||
|
||||
stats.photo_insights_demoted = match collapse_insight_currents(conn) {
|
||||
Ok(n) => n,
|
||||
Err(e) => {
|
||||
warn!("reconcile: photo_insights scalar merge failed: {:?}", e);
|
||||
0
|
||||
}
|
||||
};
|
||||
|
||||
if stats.changed() {
|
||||
info!(
|
||||
"reconcile: filled {} tagged_photo hash(es), {} photo_insights hash(es); demoted {} non-current insight row(s)",
|
||||
stats.tagged_photo_hashes_filled,
|
||||
stats.photo_insights_hashes_filled,
|
||||
stats.photo_insights_demoted,
|
||||
);
|
||||
} else {
|
||||
debug!("reconcile: no changes this tick");
|
||||
}
|
||||
|
||||
stats
|
||||
}
|
||||
|
||||
/// Populate `tagged_photo.content_hash` for any row that still has
|
||||
/// NULL by joining on `rel_path` against `image_exif`. tagged_photo
|
||||
/// doesn't carry `library_id`, so a path that exists under multiple
|
||||
/// libraries with different content is genuinely ambiguous; we pick
|
||||
/// any non-null hash for that path. Same trade-off as the migration
|
||||
/// backfill — see `migrations/2026-05-01-000000_hash_keyed_derived_data`.
|
||||
fn backfill_tagged_photo_hashes(conn: &mut SqliteConnection) -> QueryResult<usize> {
|
||||
sql_query(
|
||||
"UPDATE tagged_photo \
|
||||
SET content_hash = ( \
|
||||
SELECT content_hash FROM image_exif \
|
||||
WHERE image_exif.rel_path = tagged_photo.rel_path \
|
||||
AND image_exif.content_hash IS NOT NULL \
|
||||
LIMIT 1 \
|
||||
) \
|
||||
WHERE content_hash IS NULL \
|
||||
AND EXISTS ( \
|
||||
SELECT 1 FROM image_exif \
|
||||
WHERE image_exif.rel_path = tagged_photo.rel_path \
|
||||
AND image_exif.content_hash IS NOT NULL \
|
||||
)",
|
||||
)
|
||||
.execute(conn)
|
||||
}
|
||||
|
||||
/// Populate `photo_insights.content_hash` from `image_exif`, keyed on
|
||||
/// `(library_id, rel_path)`. Unambiguous because photo_insights carries
|
||||
/// library_id.
|
||||
fn backfill_photo_insights_hashes(conn: &mut SqliteConnection) -> QueryResult<usize> {
|
||||
sql_query(
|
||||
"UPDATE photo_insights \
|
||||
SET content_hash = ( \
|
||||
SELECT content_hash FROM image_exif \
|
||||
WHERE image_exif.library_id = photo_insights.library_id \
|
||||
AND image_exif.rel_path = photo_insights.rel_path \
|
||||
AND image_exif.content_hash IS NOT NULL \
|
||||
LIMIT 1 \
|
||||
) \
|
||||
WHERE content_hash IS NULL \
|
||||
AND EXISTS ( \
|
||||
SELECT 1 FROM image_exif \
|
||||
WHERE image_exif.library_id = photo_insights.library_id \
|
||||
AND image_exif.rel_path = photo_insights.rel_path \
|
||||
AND image_exif.content_hash IS NOT NULL \
|
||||
)",
|
||||
)
|
||||
.execute(conn)
|
||||
}
|
||||
|
||||
/// Scalar-merge step: when multiple rows share a `content_hash` and
|
||||
/// claim `is_current = true`, demote all but the earliest by
|
||||
/// `generated_at` (ties broken by lowest id, deterministic).
|
||||
///
|
||||
/// Demoted rows keep their data — only `is_current` flips. Clients that
|
||||
/// hit `/insights/history` still see the full sequence; only the
|
||||
/// "current" pointer is unique per hash.
|
||||
fn collapse_insight_currents(conn: &mut SqliteConnection) -> QueryResult<usize> {
|
||||
sql_query(
|
||||
"UPDATE photo_insights \
|
||||
SET is_current = 0 \
|
||||
WHERE is_current = 1 \
|
||||
AND content_hash IS NOT NULL \
|
||||
AND id NOT IN ( \
|
||||
SELECT MIN(p2.id) FROM photo_insights p2 \
|
||||
WHERE p2.is_current = 1 \
|
||||
AND p2.content_hash = photo_insights.content_hash \
|
||||
AND p2.generated_at = ( \
|
||||
SELECT MIN(p3.generated_at) FROM photo_insights p3 \
|
||||
WHERE p3.is_current = 1 \
|
||||
AND p3.content_hash = p2.content_hash \
|
||||
) \
|
||||
)",
|
||||
)
|
||||
.execute(conn)
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
use crate::database::test::in_memory_db_connection;
|
||||
|
||||
fn ensure_library(conn: &mut SqliteConnection, library_id: i32) {
|
||||
// Migration seeds library id=1; tests that reference id>1 must
|
||||
// create those rows themselves, otherwise FK enforcement (added
|
||||
// in the tags-edit migration) rejects image_exif inserts.
|
||||
diesel::sql_query(
|
||||
"INSERT OR IGNORE INTO libraries (id, name, root_path, created_at) \
|
||||
VALUES (?, 'test-' || ?, '/tmp/test-' || ?, 0)",
|
||||
)
|
||||
.bind::<diesel::sql_types::Integer, _>(library_id)
|
||||
.bind::<diesel::sql_types::Integer, _>(library_id)
|
||||
.bind::<diesel::sql_types::Integer, _>(library_id)
|
||||
.execute(conn)
|
||||
.unwrap();
|
||||
}
|
||||
|
||||
fn insert_image_exif(
|
||||
conn: &mut SqliteConnection,
|
||||
library_id: i32,
|
||||
rel_path: &str,
|
||||
content_hash: Option<&str>,
|
||||
) {
|
||||
use crate::database::schema::image_exif;
|
||||
ensure_library(conn, library_id);
|
||||
diesel::sql_query(
|
||||
"INSERT INTO image_exif (library_id, rel_path, created_time, last_modified, content_hash) \
|
||||
VALUES (?, ?, 0, 0, ?)",
|
||||
)
|
||||
.bind::<diesel::sql_types::Integer, _>(library_id)
|
||||
.bind::<diesel::sql_types::Text, _>(rel_path)
|
||||
.bind::<diesel::sql_types::Nullable<diesel::sql_types::Text>, _>(content_hash)
|
||||
.execute(conn)
|
||||
.unwrap();
|
||||
// Keep clippy happy that the import is used.
|
||||
let _ = image_exif::table;
|
||||
}
|
||||
|
||||
fn insert_tagged_photo(conn: &mut SqliteConnection, rel_path: &str, tag_id: i32) {
|
||||
diesel::sql_query(
|
||||
"INSERT INTO tagged_photo (rel_path, tag_id, created_time) VALUES (?, ?, 0)",
|
||||
)
|
||||
.bind::<diesel::sql_types::Text, _>(rel_path)
|
||||
.bind::<diesel::sql_types::Integer, _>(tag_id)
|
||||
.execute(conn)
|
||||
.unwrap();
|
||||
}
|
||||
|
||||
fn insert_tag(conn: &mut SqliteConnection, id: i32, name: &str) {
|
||||
diesel::sql_query("INSERT INTO tags (id, name, created_time) VALUES (?, ?, 0)")
|
||||
.bind::<diesel::sql_types::Integer, _>(id)
|
||||
.bind::<diesel::sql_types::Text, _>(name)
|
||||
.execute(conn)
|
||||
.unwrap();
|
||||
}
|
||||
|
||||
fn insert_insight(
|
||||
conn: &mut SqliteConnection,
|
||||
library_id: i32,
|
||||
rel_path: &str,
|
||||
generated_at: i64,
|
||||
is_current: bool,
|
||||
) -> i32 {
|
||||
ensure_library(conn, library_id);
|
||||
diesel::sql_query(
|
||||
"INSERT INTO photo_insights (library_id, rel_path, title, summary, generated_at, model_version, is_current, backend) \
|
||||
VALUES (?, ?, 't', 's', ?, 'v', ?, 'local')",
|
||||
)
|
||||
.bind::<diesel::sql_types::Integer, _>(library_id)
|
||||
.bind::<diesel::sql_types::Text, _>(rel_path)
|
||||
.bind::<diesel::sql_types::BigInt, _>(generated_at)
|
||||
.bind::<diesel::sql_types::Bool, _>(is_current)
|
||||
.execute(conn)
|
||||
.unwrap();
|
||||
diesel::sql_query("SELECT last_insert_rowid() AS id")
|
||||
.get_result::<TestId>(conn)
|
||||
.map(|r| r.id)
|
||||
.unwrap()
|
||||
}
|
||||
|
||||
#[derive(QueryableByName)]
|
||||
struct TestId {
|
||||
#[diesel(sql_type = diesel::sql_types::Integer)]
|
||||
id: i32,
|
||||
}
|
||||
|
||||
#[derive(QueryableByName, Debug)]
|
||||
struct HashOnly {
|
||||
#[diesel(sql_type = diesel::sql_types::Nullable<diesel::sql_types::Text>)]
|
||||
content_hash: Option<String>,
|
||||
}
|
||||
|
||||
#[derive(QueryableByName, Debug)]
|
||||
struct CurrentRow {
|
||||
#[diesel(sql_type = diesel::sql_types::Integer)]
|
||||
id: i32,
|
||||
#[diesel(sql_type = diesel::sql_types::Bool)]
|
||||
is_current: bool,
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn backfill_fills_tagged_photo_hash_when_image_exif_has_one() {
|
||||
let mut conn = in_memory_db_connection();
|
||||
insert_tag(&mut conn, 1, "vacation");
|
||||
insert_tagged_photo(&mut conn, "trip/IMG.jpg", 1);
|
||||
// No image_exif row yet — backfill no-op.
|
||||
let stats = run(&mut conn);
|
||||
assert_eq!(stats.tagged_photo_hashes_filled, 0);
|
||||
|
||||
// image_exif row appears with a hash; next reconcile fills it.
|
||||
insert_image_exif(&mut conn, 1, "trip/IMG.jpg", Some("hashabc"));
|
||||
let stats = run(&mut conn);
|
||||
assert_eq!(stats.tagged_photo_hashes_filled, 1);
|
||||
|
||||
let row = diesel::sql_query(
|
||||
"SELECT content_hash FROM tagged_photo WHERE rel_path = 'trip/IMG.jpg'",
|
||||
)
|
||||
.get_result::<HashOnly>(&mut conn)
|
||||
.unwrap();
|
||||
assert_eq!(row.content_hash.as_deref(), Some("hashabc"));
|
||||
|
||||
// Idempotent: a second run is a no-op.
|
||||
let stats = run(&mut conn);
|
||||
assert_eq!(stats.tagged_photo_hashes_filled, 0);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn backfill_skips_tagged_photo_when_image_exif_has_no_hash() {
|
||||
let mut conn = in_memory_db_connection();
|
||||
insert_tag(&mut conn, 1, "vacation");
|
||||
insert_tagged_photo(&mut conn, "trip/IMG.jpg", 1);
|
||||
// image_exif exists but its hash is null.
|
||||
insert_image_exif(&mut conn, 1, "trip/IMG.jpg", None);
|
||||
|
||||
let stats = run(&mut conn);
|
||||
assert_eq!(stats.tagged_photo_hashes_filled, 0);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn backfill_fills_photo_insights_hash_scoped_by_library() {
|
||||
let mut conn = in_memory_db_connection();
|
||||
// Row in library 1 only — must not be filled by a hash from
|
||||
// library 2's same-rel_path entry.
|
||||
insert_image_exif(&mut conn, 1, "shared.jpg", Some("hash-lib1"));
|
||||
let id1 = insert_insight(&mut conn, 1, "shared.jpg", 100, true);
|
||||
|
||||
let stats = run(&mut conn);
|
||||
assert_eq!(stats.photo_insights_hashes_filled, 1);
|
||||
|
||||
let row = diesel::sql_query("SELECT content_hash FROM photo_insights WHERE id = ?")
|
||||
.bind::<diesel::sql_types::Integer, _>(id1)
|
||||
.get_result::<HashOnly>(&mut conn)
|
||||
.unwrap();
|
||||
assert_eq!(row.content_hash.as_deref(), Some("hash-lib1"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn collapse_keeps_earliest_is_current_per_hash() {
|
||||
let mut conn = in_memory_db_connection();
|
||||
// Two libraries, same content_hash via image_exif. Insights
|
||||
// were generated independently in each library, both currently
|
||||
// is_current = true. The earlier one wins.
|
||||
insert_image_exif(&mut conn, 1, "a.jpg", Some("h1"));
|
||||
insert_image_exif(&mut conn, 2, "a.jpg", Some("h1"));
|
||||
let earlier = insert_insight(&mut conn, 1, "a.jpg", 100, true);
|
||||
let later = insert_insight(&mut conn, 2, "a.jpg", 200, true);
|
||||
|
||||
// First pass fills the content_hash; second collapses.
|
||||
let stats = run(&mut conn);
|
||||
assert_eq!(stats.photo_insights_hashes_filled, 2);
|
||||
assert_eq!(stats.photo_insights_demoted, 1);
|
||||
|
||||
let rows = diesel::sql_query("SELECT id, is_current FROM photo_insights ORDER BY id")
|
||||
.get_results::<CurrentRow>(&mut conn)
|
||||
.unwrap();
|
||||
let earlier_row = rows.iter().find(|r| r.id == earlier).unwrap();
|
||||
let later_row = rows.iter().find(|r| r.id == later).unwrap();
|
||||
assert!(
|
||||
earlier_row.is_current,
|
||||
"earlier insight should remain current"
|
||||
);
|
||||
assert!(!later_row.is_current, "later insight should be demoted");
|
||||
|
||||
// Idempotent.
|
||||
let stats = run(&mut conn);
|
||||
assert_eq!(stats.photo_insights_demoted, 0);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn collapse_does_not_demote_a_solo_current_row() {
|
||||
let mut conn = in_memory_db_connection();
|
||||
insert_image_exif(&mut conn, 1, "a.jpg", Some("h1"));
|
||||
let solo = insert_insight(&mut conn, 1, "a.jpg", 100, true);
|
||||
|
||||
let stats = run(&mut conn);
|
||||
assert_eq!(stats.photo_insights_demoted, 0);
|
||||
|
||||
let row = diesel::sql_query("SELECT id, is_current FROM photo_insights WHERE id = ?")
|
||||
.bind::<diesel::sql_types::Integer, _>(solo)
|
||||
.get_result::<CurrentRow>(&mut conn)
|
||||
.unwrap();
|
||||
assert!(row.is_current);
|
||||
}
|
||||
}
|
||||
@@ -57,16 +57,6 @@ diesel::table! {
|
||||
confidence -> Float,
|
||||
status -> Text,
|
||||
created_at -> BigInt,
|
||||
persona_id -> Text,
|
||||
user_id -> Integer,
|
||||
valid_from -> Nullable<BigInt>,
|
||||
valid_until -> Nullable<BigInt>,
|
||||
superseded_by -> Nullable<Integer>,
|
||||
created_by_model -> Nullable<Text>,
|
||||
created_by_backend -> Nullable<Text>,
|
||||
last_modified_by_model -> Nullable<Text>,
|
||||
last_modified_by_backend -> Nullable<Text>,
|
||||
last_modified_at -> Nullable<BigInt>,
|
||||
}
|
||||
}
|
||||
|
||||
@@ -131,13 +121,6 @@ diesel::table! {
|
||||
last_modified -> BigInt,
|
||||
content_hash -> Nullable<Text>,
|
||||
size_bytes -> Nullable<BigInt>,
|
||||
phash_64 -> Nullable<BigInt>,
|
||||
dhash_64 -> Nullable<BigInt>,
|
||||
duplicate_of_hash -> Nullable<Text>,
|
||||
duplicate_decided_at -> Nullable<BigInt>,
|
||||
date_taken_source -> Nullable<Text>,
|
||||
original_date_taken -> Nullable<BigInt>,
|
||||
original_date_taken_source -> Nullable<Text>,
|
||||
}
|
||||
}
|
||||
|
||||
@@ -147,8 +130,6 @@ diesel::table! {
|
||||
name -> Text,
|
||||
root_path -> Text,
|
||||
created_at -> BigInt,
|
||||
enabled -> Bool,
|
||||
excluded_dirs -> Nullable<Text>,
|
||||
}
|
||||
}
|
||||
|
||||
@@ -169,22 +150,6 @@ diesel::table! {
|
||||
}
|
||||
}
|
||||
|
||||
diesel::table! {
|
||||
personas (id) {
|
||||
id -> Integer,
|
||||
user_id -> Integer,
|
||||
persona_id -> Text,
|
||||
name -> Text,
|
||||
system_prompt -> Text,
|
||||
is_built_in -> Bool,
|
||||
include_all_memories -> Bool,
|
||||
created_at -> BigInt,
|
||||
updated_at -> BigInt,
|
||||
reviewed_only_facts -> Bool,
|
||||
allow_agent_corrections -> Bool,
|
||||
}
|
||||
}
|
||||
|
||||
diesel::table! {
|
||||
persons (id) {
|
||||
id -> Integer,
|
||||
@@ -213,7 +178,6 @@ diesel::table! {
|
||||
approved -> Nullable<Bool>,
|
||||
backend -> Text,
|
||||
fewshot_source_ids -> Nullable<Text>,
|
||||
content_hash -> Nullable<Text>,
|
||||
}
|
||||
}
|
||||
|
||||
@@ -235,7 +199,6 @@ diesel::table! {
|
||||
rel_path -> Text,
|
||||
tag_id -> Integer,
|
||||
created_time -> BigInt,
|
||||
content_hash -> Nullable<Text>,
|
||||
}
|
||||
}
|
||||
|
||||
@@ -275,7 +238,6 @@ diesel::joinable!(entity_photo_links -> libraries (library_id));
|
||||
diesel::joinable!(face_detections -> libraries (library_id));
|
||||
diesel::joinable!(face_detections -> persons (person_id));
|
||||
diesel::joinable!(image_exif -> libraries (library_id));
|
||||
diesel::joinable!(personas -> users (user_id));
|
||||
diesel::joinable!(persons -> entities (entity_id));
|
||||
diesel::joinable!(photo_insights -> libraries (library_id));
|
||||
diesel::joinable!(tagged_photo -> tags (tag_id));
|
||||
@@ -292,7 +254,6 @@ diesel::allow_tables_to_appear_in_same_query!(
|
||||
image_exif,
|
||||
libraries,
|
||||
location_history,
|
||||
personas,
|
||||
persons,
|
||||
photo_insights,
|
||||
search_history,
|
||||
|
||||
@@ -1,507 +0,0 @@
|
||||
//! Canonical `date_taken` resolution for ingest and the per-tick backfill
|
||||
//! drain.
|
||||
//!
|
||||
//! The waterfall (in order; first hit wins):
|
||||
//!
|
||||
//! 1. **kamadak-exif** — fast in-process EXIF read. Already done by
|
||||
//! `exif::extract_exif_from_path` for image-bearing formats; callers
|
||||
//! pass that result in via `prior_exif_date` so we don't re-parse.
|
||||
//! 2. **exiftool** — shell-out fallback that reaches places kamadak-exif
|
||||
//! can't: QuickTime/MP4 (`MediaCreateDate`, `TrackCreateDate`,
|
||||
//! `CreateDate`), Apple's `ContentCreateDate`, MakerNote sub-IFDs.
|
||||
//! Required for videos to land a real date; degrades silently when
|
||||
//! `exiftool` isn't on PATH.
|
||||
//! 3. **filename regex** — `memories::extract_date_from_filename` covers
|
||||
//! common screenshot / chat-export / timestamp-named patterns.
|
||||
//! 4. **earliest filesystem time** — `utils::earliest_fs_time` picks the
|
||||
//! earlier of created / modified, which on copied-from-backup files is
|
||||
//! a better proxy for content age than either alone.
|
||||
//!
|
||||
//! `DateSource` records which step won so the per-tick drain can re-resolve
|
||||
//! weak sources (`fs_time`) once exiftool becomes available, and so the
|
||||
//! UI/debug surface can answer "why does this photo show up under this
|
||||
//! date." Note that the previous `/memories` request-time logic preferred
|
||||
//! filename even when EXIF was present; this resolver inverts that — EXIF
|
||||
//! is authoritative when it exists, on the theory that an EXIF
|
||||
//! `DateTimeOriginal` is more reliable than a filename pattern that may
|
||||
//! reflect import time rather than capture time.
|
||||
|
||||
use std::collections::HashMap;
|
||||
use std::io::Write;
|
||||
use std::path::{Path, PathBuf};
|
||||
use std::process::{Command, Stdio};
|
||||
use std::sync::OnceLock;
|
||||
|
||||
use chrono::{DateTime, Utc};
|
||||
use log::{debug, trace, warn};
|
||||
use serde::Deserialize;
|
||||
|
||||
use crate::utils::earliest_fs_time;
|
||||
|
||||
#[derive(Copy, Clone, Debug, PartialEq, Eq)]
|
||||
pub enum DateSource {
|
||||
/// kamadak-exif read DateTime/DateTimeOriginal directly.
|
||||
Exif,
|
||||
/// exiftool fallback caught a video / MakerNote / QuickTime tag.
|
||||
Exiftool,
|
||||
/// `extract_date_from_filename` matched a known pattern.
|
||||
Filename,
|
||||
/// Fell through to `earliest_fs_time(metadata)`.
|
||||
FsTime,
|
||||
}
|
||||
|
||||
impl DateSource {
|
||||
pub fn as_str(self) -> &'static str {
|
||||
match self {
|
||||
DateSource::Exif => "exif",
|
||||
DateSource::Exiftool => "exiftool",
|
||||
DateSource::Filename => "filename",
|
||||
DateSource::FsTime => "fs_time",
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
#[derive(Copy, Clone, Debug)]
|
||||
pub struct ResolvedDate {
|
||||
pub timestamp: i64,
|
||||
pub source: DateSource,
|
||||
}
|
||||
|
||||
/// Resolve the canonical date for a single file, given an already-extracted
|
||||
/// kamadak-exif date if available. Returns `None` only if every step in the
|
||||
/// waterfall fails — for files that exist on disk this should be vanishingly
|
||||
/// rare (the fs-time fallback alone almost always succeeds).
|
||||
pub fn resolve_date_taken(path: &Path, prior_exif_date: Option<i64>) -> Option<ResolvedDate> {
|
||||
if let Some(ts) = prior_exif_date {
|
||||
return Some(ResolvedDate {
|
||||
timestamp: ts,
|
||||
source: DateSource::Exif,
|
||||
});
|
||||
}
|
||||
if let Some(ts) = exiftool_date_single(path) {
|
||||
return Some(ResolvedDate {
|
||||
timestamp: ts,
|
||||
source: DateSource::Exiftool,
|
||||
});
|
||||
}
|
||||
if let Some(dt) = path
|
||||
.file_name()
|
||||
.and_then(|f| f.to_str())
|
||||
.and_then(crate::memories::extract_date_from_filename)
|
||||
{
|
||||
return Some(ResolvedDate {
|
||||
timestamp: dt.timestamp(),
|
||||
source: DateSource::Filename,
|
||||
});
|
||||
}
|
||||
if let Ok(meta) = std::fs::metadata(path)
|
||||
&& let Some(t) = earliest_fs_time(&meta)
|
||||
{
|
||||
let dt: DateTime<Utc> = t.into();
|
||||
return Some(ResolvedDate {
|
||||
timestamp: dt.timestamp(),
|
||||
source: DateSource::FsTime,
|
||||
});
|
||||
}
|
||||
None
|
||||
}
|
||||
|
||||
/// Batch waterfall. exiftool runs once over the whole batch (single
|
||||
/// subprocess); everything else is per-file and runs only on misses.
|
||||
/// `prior_exif_dates` lets the caller pass in already-known kamadak dates
|
||||
/// keyed by path; entries without a prior date fall through to exiftool
|
||||
/// and the rest of the waterfall.
|
||||
///
|
||||
/// The per-tick backfill drain is the primary caller — it loads ~500 rows
|
||||
/// at a time and uses one exiftool subprocess to drain the lot.
|
||||
pub fn resolve_dates_batch(
|
||||
paths: &[PathBuf],
|
||||
prior_exif_dates: &HashMap<PathBuf, i64>,
|
||||
) -> HashMap<PathBuf, ResolvedDate> {
|
||||
let mut out: HashMap<PathBuf, ResolvedDate> = HashMap::new();
|
||||
let mut needs_exiftool: Vec<&Path> = Vec::with_capacity(paths.len());
|
||||
|
||||
for path in paths {
|
||||
if let Some(&ts) = prior_exif_dates.get(path) {
|
||||
out.insert(
|
||||
path.clone(),
|
||||
ResolvedDate {
|
||||
timestamp: ts,
|
||||
source: DateSource::Exif,
|
||||
},
|
||||
);
|
||||
} else {
|
||||
needs_exiftool.push(path.as_path());
|
||||
}
|
||||
}
|
||||
|
||||
if !needs_exiftool.is_empty() {
|
||||
let exiftool_results = exiftool_dates_batch(&needs_exiftool);
|
||||
for path in &needs_exiftool {
|
||||
if let Some(&ts) = exiftool_results.get(*path) {
|
||||
out.insert(
|
||||
path.to_path_buf(),
|
||||
ResolvedDate {
|
||||
timestamp: ts,
|
||||
source: DateSource::Exiftool,
|
||||
},
|
||||
);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
for path in paths {
|
||||
if out.contains_key(path) {
|
||||
continue;
|
||||
}
|
||||
if let Some(dt) = path
|
||||
.file_name()
|
||||
.and_then(|f| f.to_str())
|
||||
.and_then(crate::memories::extract_date_from_filename)
|
||||
{
|
||||
out.insert(
|
||||
path.clone(),
|
||||
ResolvedDate {
|
||||
timestamp: dt.timestamp(),
|
||||
source: DateSource::Filename,
|
||||
},
|
||||
);
|
||||
continue;
|
||||
}
|
||||
if let Ok(meta) = std::fs::metadata(path)
|
||||
&& let Some(t) = earliest_fs_time(&meta)
|
||||
{
|
||||
let dt: DateTime<Utc> = t.into();
|
||||
out.insert(
|
||||
path.clone(),
|
||||
ResolvedDate {
|
||||
timestamp: dt.timestamp(),
|
||||
source: DateSource::FsTime,
|
||||
},
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
out
|
||||
}
|
||||
|
||||
/// Tag priority for exiftool extraction. First non-zero value wins.
|
||||
///
|
||||
/// Photos: `DateTimeOriginal` (original capture) and `SubSecDateTimeOriginal`
|
||||
/// are most authoritative. `CreateDate` is a common alias and a sane fallback.
|
||||
///
|
||||
/// Videos: `MediaCreateDate` / `TrackCreateDate` are the QuickTime/MP4
|
||||
/// timestamps. `ContentCreateDate` is Apple's iOS-set tag; it often
|
||||
/// reflects local capture time on iPhone exports better than the others.
|
||||
///
|
||||
/// Notably absent: `FileModifyDate` / `FileAccessDate` — those are
|
||||
/// filesystem-derived and the resolver covers them via the `fs_time`
|
||||
/// fallback. Letting exiftool pull them here would mask "no real EXIF
|
||||
/// date" with a `source = exiftool` row that's no better than fs_time.
|
||||
const EXIFTOOL_DATE_TAGS: &[&str] = &[
|
||||
"DateTimeOriginal",
|
||||
"SubSecDateTimeOriginal",
|
||||
"CreateDate",
|
||||
"MediaCreateDate",
|
||||
"TrackCreateDate",
|
||||
"ContentCreateDate",
|
||||
];
|
||||
|
||||
/// Cache the "exiftool exists on PATH" check across the process lifetime so
|
||||
/// the per-tick backfill doesn't fork a doomed subprocess every iteration on
|
||||
/// deploys without exiftool installed.
|
||||
fn exiftool_available() -> bool {
|
||||
static AVAIL: OnceLock<bool> = OnceLock::new();
|
||||
*AVAIL.get_or_init(|| {
|
||||
let ok = Command::new("exiftool")
|
||||
.arg("-ver")
|
||||
.stdout(Stdio::null())
|
||||
.stderr(Stdio::null())
|
||||
.status()
|
||||
.map(|s| s.success())
|
||||
.unwrap_or(false);
|
||||
if !ok {
|
||||
warn!("exiftool not on PATH; date_taken waterfall skips that step");
|
||||
}
|
||||
ok
|
||||
})
|
||||
}
|
||||
|
||||
/// One-file exiftool invocation. Used by the upload + GPS-write paths,
|
||||
/// which deal with one file at a time. The batch path uses
|
||||
/// `exiftool_dates_batch` so we don't pay subprocess startup per row.
|
||||
///
|
||||
/// Notably absent: `-fast` / `-fast2`. For QuickTime/MP4 files whose
|
||||
/// `moov` atom sits at the end (non-faststart, common for Snapchat
|
||||
/// exports and any MP4 muxed without `-movflags +faststart`), `-fast2`
|
||||
/// causes exiftool to skip the trailer and return no `CreateDate` /
|
||||
/// `MediaCreateDate`, dropping us to the `fs_time` fallback for files
|
||||
/// that actually have a real capture date. We pre-filter to files that
|
||||
/// kamadak-exif couldn't read, so the JPEG fast-path is already covered
|
||||
/// — paying full-scan cost on the residual is the right trade.
|
||||
fn exiftool_date_single(path: &Path) -> Option<i64> {
|
||||
if !exiftool_available() {
|
||||
return None;
|
||||
}
|
||||
let mut cmd = Command::new("exiftool");
|
||||
cmd.arg("-j").arg("-q").arg("-d").arg("%s");
|
||||
for tag in EXIFTOOL_DATE_TAGS {
|
||||
cmd.arg(format!("-{}", tag));
|
||||
}
|
||||
cmd.arg(path);
|
||||
let output = cmd.output().ok()?;
|
||||
if !output.status.success() {
|
||||
trace!("exiftool exited non-zero for {:?}", path);
|
||||
return None;
|
||||
}
|
||||
parse_exiftool_json(&output.stdout)
|
||||
.into_iter()
|
||||
.next()
|
||||
.map(|(_, ts)| ts)
|
||||
}
|
||||
|
||||
/// Drain a batch via a single exiftool subprocess. Paths are fed on stdin
|
||||
/// via `-@ -`, so the argv stays short regardless of batch size — safe for
|
||||
/// libraries with very long path components.
|
||||
fn exiftool_dates_batch(paths: &[&Path]) -> HashMap<PathBuf, i64> {
|
||||
let mut out = HashMap::new();
|
||||
if paths.is_empty() || !exiftool_available() {
|
||||
return out;
|
||||
}
|
||||
|
||||
let mut cmd = Command::new("exiftool");
|
||||
// No `-fast2` — see exiftool_date_single for the rationale (QuickTime
|
||||
// moov-at-end files miss CreateDate / MediaCreateDate when the trailer
|
||||
// is skipped).
|
||||
cmd.arg("-j").arg("-q").arg("-d").arg("%s");
|
||||
for tag in EXIFTOOL_DATE_TAGS {
|
||||
cmd.arg(format!("-{}", tag));
|
||||
}
|
||||
cmd.arg("-@").arg("-");
|
||||
cmd.stdin(Stdio::piped())
|
||||
.stdout(Stdio::piped())
|
||||
.stderr(Stdio::null());
|
||||
|
||||
let mut child = match cmd.spawn() {
|
||||
Ok(c) => c,
|
||||
Err(e) => {
|
||||
warn!("exiftool batch spawn failed: {}", e);
|
||||
return out;
|
||||
}
|
||||
};
|
||||
|
||||
if let Some(mut stdin) = child.stdin.take() {
|
||||
for p in paths {
|
||||
// exiftool's argfile reader treats each line as one path; OS
|
||||
// path bytes don't always survive a String round-trip, but
|
||||
// every path we get here originated from rel_path / root_path
|
||||
// strings already, so to-string-lossy is a non-event.
|
||||
if let Err(e) = writeln!(stdin, "{}", p.display()) {
|
||||
warn!("exiftool batch stdin write failed: {}", e);
|
||||
break;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
let output = match child.wait_with_output() {
|
||||
Ok(o) => o,
|
||||
Err(e) => {
|
||||
warn!("exiftool batch wait failed: {}", e);
|
||||
return out;
|
||||
}
|
||||
};
|
||||
if !output.status.success() {
|
||||
debug!(
|
||||
"exiftool batch exit status {:?}; partial output may still parse",
|
||||
output.status.code()
|
||||
);
|
||||
}
|
||||
for (source, ts) in parse_exiftool_json(&output.stdout) {
|
||||
out.insert(PathBuf::from(source), ts);
|
||||
}
|
||||
out
|
||||
}
|
||||
|
||||
/// One row per input file. exiftool emits any tag we asked for that was
|
||||
/// present, plus the `SourceFile` it was reading. Tags are JSON values
|
||||
/// because `-d %s` returns the timestamp as a *string* of digits, not a
|
||||
/// number, when the date parses; absent tags are simply missing keys.
|
||||
#[derive(Debug, Deserialize)]
|
||||
struct ExiftoolEntry {
|
||||
#[serde(rename = "SourceFile")]
|
||||
source_file: String,
|
||||
#[serde(rename = "DateTimeOriginal")]
|
||||
date_time_original: Option<serde_json::Value>,
|
||||
#[serde(rename = "SubSecDateTimeOriginal")]
|
||||
sub_sec_date_time_original: Option<serde_json::Value>,
|
||||
#[serde(rename = "CreateDate")]
|
||||
create_date: Option<serde_json::Value>,
|
||||
#[serde(rename = "MediaCreateDate")]
|
||||
media_create_date: Option<serde_json::Value>,
|
||||
#[serde(rename = "TrackCreateDate")]
|
||||
track_create_date: Option<serde_json::Value>,
|
||||
#[serde(rename = "ContentCreateDate")]
|
||||
content_create_date: Option<serde_json::Value>,
|
||||
}
|
||||
|
||||
fn parse_exiftool_json(stdout: &[u8]) -> Vec<(String, i64)> {
|
||||
let entries: Vec<ExiftoolEntry> = match serde_json::from_slice(stdout) {
|
||||
Ok(v) => v,
|
||||
Err(e) => {
|
||||
// Empty stdout on total failure isn't a parse error worth
|
||||
// logging at warn — the caller already noted the non-zero
|
||||
// exit status.
|
||||
if !stdout.is_empty() {
|
||||
warn!("exiftool JSON parse failed: {}", e);
|
||||
}
|
||||
return Vec::new();
|
||||
}
|
||||
};
|
||||
|
||||
let mut out = Vec::with_capacity(entries.len());
|
||||
for entry in entries {
|
||||
// Walk the priority list. exiftool sometimes returns the literal
|
||||
// string "0000:00:00 00:00:00" for missing-but-allocated date
|
||||
// slots; with `-d %s` that becomes the unix epoch (0). Reject
|
||||
// anything <= 0 so we fall through to the next tag.
|
||||
let tags = [
|
||||
entry.date_time_original.as_ref(),
|
||||
entry.sub_sec_date_time_original.as_ref(),
|
||||
entry.create_date.as_ref(),
|
||||
entry.media_create_date.as_ref(),
|
||||
entry.track_create_date.as_ref(),
|
||||
entry.content_create_date.as_ref(),
|
||||
];
|
||||
let mut chosen: Option<i64> = None;
|
||||
for tag in tags.iter().flatten() {
|
||||
if let Some(ts) = coerce_to_unix_seconds(tag)
|
||||
&& ts > 0
|
||||
{
|
||||
chosen = Some(ts);
|
||||
break;
|
||||
}
|
||||
}
|
||||
if let Some(ts) = chosen {
|
||||
out.push((entry.source_file, ts));
|
||||
}
|
||||
}
|
||||
out
|
||||
}
|
||||
|
||||
/// `-d %s` should hand us a numeric string, but exiftool's JSON encoder
|
||||
/// will emit a number when the tag was defined as numeric in its lib —
|
||||
/// accept both shapes.
|
||||
fn coerce_to_unix_seconds(v: &serde_json::Value) -> Option<i64> {
|
||||
match v {
|
||||
serde_json::Value::String(s) => s.trim().parse::<i64>().ok(),
|
||||
serde_json::Value::Number(n) => n.as_i64(),
|
||||
_ => None,
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
||||
#[test]
|
||||
fn parse_exiftool_json_picks_first_priority_tag() {
|
||||
let json = br#"[{
|
||||
"SourceFile": "/lib/IMG.jpg",
|
||||
"DateTimeOriginal": "1500000000",
|
||||
"CreateDate": "1400000000"
|
||||
}]"#;
|
||||
let parsed = parse_exiftool_json(json);
|
||||
assert_eq!(parsed, vec![("/lib/IMG.jpg".to_string(), 1500000000)]);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn parse_exiftool_json_falls_through_zeros() {
|
||||
// exiftool emits "0000:00:00 00:00:00" → unix epoch 0 with -d %s.
|
||||
// The resolver should skip those and pick the next tag.
|
||||
let json = br#"[{
|
||||
"SourceFile": "/lib/clip.mov",
|
||||
"DateTimeOriginal": "0",
|
||||
"MediaCreateDate": "1500000000"
|
||||
}]"#;
|
||||
let parsed = parse_exiftool_json(json);
|
||||
assert_eq!(parsed, vec![("/lib/clip.mov".to_string(), 1500000000)]);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn parse_exiftool_json_accepts_numeric_values() {
|
||||
let json = br#"[{
|
||||
"SourceFile": "/lib/a.jpg",
|
||||
"CreateDate": 1234567890
|
||||
}]"#;
|
||||
let parsed = parse_exiftool_json(json);
|
||||
assert_eq!(parsed, vec![("/lib/a.jpg".to_string(), 1234567890)]);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn parse_exiftool_json_emits_nothing_when_no_tag_present() {
|
||||
let json = br#"[{"SourceFile": "/lib/no_dates.bin"}]"#;
|
||||
let parsed = parse_exiftool_json(json);
|
||||
assert!(parsed.is_empty());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn parse_exiftool_json_handles_multiple_entries() {
|
||||
let json = br#"[
|
||||
{"SourceFile": "/lib/a.jpg", "DateTimeOriginal": "100"},
|
||||
{"SourceFile": "/lib/b.jpg", "CreateDate": "200"}
|
||||
]"#;
|
||||
let parsed = parse_exiftool_json(json);
|
||||
assert_eq!(
|
||||
parsed,
|
||||
vec![
|
||||
("/lib/a.jpg".to_string(), 100),
|
||||
("/lib/b.jpg".to_string(), 200)
|
||||
]
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn date_source_as_str_round_trip() {
|
||||
for src in [
|
||||
DateSource::Exif,
|
||||
DateSource::Exiftool,
|
||||
DateSource::Filename,
|
||||
DateSource::FsTime,
|
||||
] {
|
||||
assert!(!src.as_str().is_empty());
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn resolve_uses_prior_exif_when_present() {
|
||||
// Path doesn't need to exist when prior_exif_date short-circuits.
|
||||
let resolved =
|
||||
resolve_date_taken(Path::new("/nonexistent/file.jpg"), Some(1700000000)).unwrap();
|
||||
assert_eq!(resolved.timestamp, 1700000000);
|
||||
assert_eq!(resolved.source, DateSource::Exif);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn resolve_filename_when_no_exif_and_file_missing() {
|
||||
// No prior EXIF, no exiftool match (file missing), but the filename
|
||||
// pattern still matches so the resolver lands on Filename.
|
||||
let resolved = resolve_date_taken(
|
||||
Path::new("/nonexistent/Screenshot_2014-06-01-20-44-50.png"),
|
||||
None,
|
||||
)
|
||||
.unwrap();
|
||||
assert_eq!(resolved.source, DateSource::Filename);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn resolve_fs_time_when_only_metadata_available() {
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
let path = dir.path().join("plain.jpg");
|
||||
std::fs::File::create(&path).unwrap();
|
||||
let resolved = resolve_date_taken(&path, None).unwrap();
|
||||
// exiftool may or may not be installed in the test env; either
|
||||
// way the file has no EXIF and no filename date, so we should
|
||||
// fall to fs_time.
|
||||
assert_eq!(resolved.source, DateSource::FsTime);
|
||||
}
|
||||
}
|
||||
-1368
File diff suppressed because it is too large
Load Diff
-48
@@ -71,53 +71,6 @@ fn read_jpeg_at_ifd(exif: &exif::Exif, path: &Path, ifd: In) -> Option<Vec<u8>>
|
||||
Some(buf)
|
||||
}
|
||||
|
||||
/// Shell out to `exiftool -j -G -n <path>` and return the per-file tag map.
|
||||
///
|
||||
/// `-j` requests JSON; the response is always an array of one element per
|
||||
/// input path. `-G` prefixes each key with the group name (`EXIF:Make`,
|
||||
/// `MakerNotes:LensInfo`, `File:FileSize`, …) so a UI can group the dump.
|
||||
/// `-n` returns numeric / raw values rather than exiftool's pretty-printed
|
||||
/// human strings, which keeps the output stable for clients that want to
|
||||
/// reformat (e.g. divide a focal-length numerator/denominator).
|
||||
///
|
||||
/// Returns:
|
||||
/// - `Ok(Some(value))` — the parsed object for this file.
|
||||
/// - `Ok(None)` — exiftool ran but the array was empty / not an object.
|
||||
/// - `Err(_)` — exiftool isn't on PATH, the spawn failed, or its stderr
|
||||
/// indicates an unsupported file. Caller surfaces a 503 / 422.
|
||||
///
|
||||
/// Used by `GET /image/exif/full` to power Apollo's DETAILS modal "FULL
|
||||
/// EXIF" pane. Per-file shell-out is fine for this on-demand surface;
|
||||
/// the indexer does NOT call this on the hot path (kamadak-exif covers
|
||||
/// the indexed columns; exiftool is the slow-path preview helper).
|
||||
pub fn read_full_exif_via_exiftool(path: &Path) -> Result<Option<serde_json::Value>> {
|
||||
let output = Command::new("exiftool")
|
||||
.arg("-j")
|
||||
.arg("-G")
|
||||
.arg("-n")
|
||||
.arg(path)
|
||||
.output()
|
||||
.map_err(|e| anyhow!("exiftool spawn failed (is it on PATH?): {}", e))?;
|
||||
|
||||
if !output.status.success() {
|
||||
let stderr = String::from_utf8_lossy(&output.stderr);
|
||||
return Err(anyhow!(
|
||||
"exiftool exited with {}: {}",
|
||||
output.status,
|
||||
stderr.trim()
|
||||
));
|
||||
}
|
||||
|
||||
let parsed: serde_json::Value = serde_json::from_slice(&output.stdout)
|
||||
.map_err(|e| anyhow!("exiftool returned non-JSON output: {}", e))?;
|
||||
|
||||
// `-j` always wraps the result in an array — pull out the first object.
|
||||
let arr = parsed
|
||||
.as_array()
|
||||
.ok_or_else(|| anyhow!("expected JSON array from exiftool -j"))?;
|
||||
Ok(arr.first().cloned())
|
||||
}
|
||||
|
||||
/// Tags exiftool exposes for embedded JPEG previews, in priority order. The
|
||||
/// largest valid JPEG returned by any of them wins. Different camera makers
|
||||
/// stash their largest preview under different names: Nikon's full-res
|
||||
@@ -235,7 +188,6 @@ pub fn write_gps(path: &Path, lat: f64, lon: f64) -> Result<()> {
|
||||
let lon_abs = lon.abs();
|
||||
let output = Command::new("exiftool")
|
||||
.arg("-overwrite_original")
|
||||
.arg("-P")
|
||||
.arg(format!("-GPSLatitude={}", lat_abs))
|
||||
.arg(format!("-GPSLatitudeRef={}", lat_ref))
|
||||
.arg(format!("-GPSLongitude={}", lon_abs))
|
||||
|
||||
+458
-300
@@ -20,10 +20,9 @@
|
||||
|
||||
use crate::Claims;
|
||||
use crate::ai::face_client::{DetectMeta, FaceClient, FaceDetectError};
|
||||
use crate::exif;
|
||||
use crate::database::schema::{face_detections, image_exif, persons};
|
||||
use crate::error::IntoHttpError;
|
||||
use crate::exif;
|
||||
use crate::file_types;
|
||||
use crate::libraries::{self, Library};
|
||||
use crate::otel::{extract_context_from_request, global_tracer, trace_db_call};
|
||||
use crate::state::AppState;
|
||||
@@ -47,7 +46,7 @@ use std::sync::{Arc, Mutex};
|
||||
/// Visual identity. The optional `entity_id` bridges this person to an
|
||||
/// LLM-extracted knowledge-graph entity (textual side). Persons are NOT
|
||||
/// auto-bridged at creation — only when the user explicitly links them in
|
||||
/// the management UI.
|
||||
/// the management UI, or when bootstrap finds an exact-name match.
|
||||
#[derive(Serialize, Queryable, Clone, Debug)]
|
||||
pub struct Person {
|
||||
pub id: i32,
|
||||
@@ -100,30 +99,9 @@ pub struct FaceDetectionRow {
|
||||
pub created_at: i64,
|
||||
}
|
||||
|
||||
/// SQL fragment restricting an `image_exif.rel_path` (or `face_detections.rel_path`)
|
||||
/// column to image extensions. Videos register in `image_exif` with a
|
||||
/// populated `content_hash` but can never produce a `face_detections` row
|
||||
/// — applying this filter at query time keeps videos out of the per-tick
|
||||
/// backlog drain (which would otherwise loop forever — `filter_excluded`
|
||||
/// drops them client-side without writing a marker) and out of the SCANNED
|
||||
/// stat denominator (so 100% is reachable).
|
||||
fn image_path_predicate(col: &str) -> String {
|
||||
let clauses: Vec<String> = file_types::IMAGE_EXTENSIONS
|
||||
.iter()
|
||||
.map(|ext| format!("lower({col}) LIKE '%.{ext}'"))
|
||||
.collect();
|
||||
format!("({})", clauses.join(" OR "))
|
||||
}
|
||||
|
||||
/// Row shape for `list_unscanned_candidates`'s raw SQL. Diesel's
|
||||
/// `sql_query` requires a `QueryableByName` row type with explicit
|
||||
/// column SQL types; using a tuple isn't supported.
|
||||
#[derive(diesel::QueryableByName, Debug)]
|
||||
struct CountRow {
|
||||
#[diesel(sql_type = diesel::sql_types::BigInt)]
|
||||
count: i64,
|
||||
}
|
||||
|
||||
#[derive(diesel::QueryableByName, Debug)]
|
||||
struct UnscannedRow {
|
||||
#[diesel(sql_type = diesel::sql_types::Text)]
|
||||
@@ -366,10 +344,6 @@ pub struct EmbeddingsQuery {
|
||||
pub limit: i64,
|
||||
#[serde(default)]
|
||||
pub offset: i64,
|
||||
/// Restrict to one person's faces. Used by the similar-unassigned
|
||||
/// suggester to fetch a centroid pool. When set, takes precedence
|
||||
/// over `unassigned` (the more specific filter wins).
|
||||
pub person_id: Option<i32>,
|
||||
}
|
||||
|
||||
fn default_unassigned() -> bool {
|
||||
@@ -433,7 +407,6 @@ pub trait FaceDao: Send + Sync {
|
||||
ctx: &opentelemetry::Context,
|
||||
library_id: Option<i32>,
|
||||
unassigned: bool,
|
||||
person_id: Option<i32>,
|
||||
limit: i64,
|
||||
offset: i64,
|
||||
) -> anyhow::Result<Vec<(FaceDetectionRow, String)>>;
|
||||
@@ -508,10 +481,6 @@ pub trait FaceDao: Send + Sync {
|
||||
into: i32,
|
||||
) -> anyhow::Result<Person>;
|
||||
|
||||
/// Cheap presence probe — returns true iff at least one face has been
|
||||
/// detected (excluding marker rows). Used by chat-tool gating.
|
||||
fn has_any_faces(&mut self, ctx: &opentelemetry::Context) -> anyhow::Result<bool>;
|
||||
|
||||
/// Resolve `(library_id, rel_path)` → `content_hash` via image_exif.
|
||||
/// Returns None when the photo hasn't been EXIF-indexed yet (no row
|
||||
/// in image_exif) or when the row exists but content_hash is NULL.
|
||||
@@ -632,32 +601,26 @@ impl FaceDao for SqliteFaceDao {
|
||||
// fire multiple detect calls for the same hash if it lives
|
||||
// under several rel_paths in the same library. The
|
||||
// anti-join (NOT EXISTS) drains hashes that have no row in
|
||||
// face_detections at all. The image-extension predicate
|
||||
// keeps videos out of the candidate set; without it they'd
|
||||
// be filtered client-side and re-pulled every tick forever
|
||||
// because no marker row is written for excluded paths.
|
||||
let ext_predicate = image_path_predicate("rel_path");
|
||||
let sql = format!(
|
||||
// face_detections at all.
|
||||
let rows: Vec<(String, String)> = diesel::sql_query(
|
||||
"SELECT rel_path, content_hash \
|
||||
FROM image_exif e \
|
||||
WHERE library_id = ? \
|
||||
AND content_hash IS NOT NULL \
|
||||
AND {ext_predicate} \
|
||||
AND NOT EXISTS ( \
|
||||
SELECT 1 FROM face_detections f \
|
||||
WHERE f.content_hash = e.content_hash \
|
||||
) \
|
||||
GROUP BY content_hash \
|
||||
LIMIT ?"
|
||||
);
|
||||
let rows: Vec<(String, String)> = diesel::sql_query(sql)
|
||||
.bind::<diesel::sql_types::Integer, _>(library_id)
|
||||
.bind::<diesel::sql_types::BigInt, _>(limit)
|
||||
.load::<UnscannedRow>(conn.deref_mut())
|
||||
.with_context(|| "list_unscanned_candidates")?
|
||||
.into_iter()
|
||||
.map(|r| (r.rel_path, r.content_hash))
|
||||
.collect();
|
||||
LIMIT ?",
|
||||
)
|
||||
.bind::<diesel::sql_types::Integer, _>(library_id)
|
||||
.bind::<diesel::sql_types::BigInt, _>(limit)
|
||||
.load::<UnscannedRow>(conn.deref_mut())
|
||||
.with_context(|| "list_unscanned_candidates")?
|
||||
.into_iter()
|
||||
.map(|r| (r.rel_path, r.content_hash))
|
||||
.collect();
|
||||
Ok(rows)
|
||||
})
|
||||
}
|
||||
@@ -868,7 +831,6 @@ impl FaceDao for SqliteFaceDao {
|
||||
ctx: &opentelemetry::Context,
|
||||
library_id: Option<i32>,
|
||||
unassigned: bool,
|
||||
person_id: Option<i32>,
|
||||
limit: i64,
|
||||
offset: i64,
|
||||
) -> anyhow::Result<Vec<(FaceDetectionRow, String)>> {
|
||||
@@ -882,13 +844,7 @@ impl FaceDao for SqliteFaceDao {
|
||||
if let Some(lib) = library_id {
|
||||
query = query.filter(face_detections::library_id.eq(lib));
|
||||
}
|
||||
// person_id is the more specific filter — when both it and
|
||||
// `unassigned` are supplied, prefer the explicit person id and
|
||||
// ignore the IS NULL constraint (which would always return
|
||||
// empty for an assigned person).
|
||||
if let Some(pid) = person_id {
|
||||
query = query.filter(face_detections::person_id.eq(pid));
|
||||
} else if unassigned {
|
||||
if unassigned {
|
||||
query = query.filter(face_detections::person_id.is_null());
|
||||
}
|
||||
let rows = query
|
||||
@@ -900,18 +856,14 @@ impl FaceDao for SqliteFaceDao {
|
||||
// Pair with the base64-encoded embedding string so the handler
|
||||
// doesn't need to know the wire format. Skip rows with NULL
|
||||
// embedding (shouldn't happen on detected rows, but defensive).
|
||||
// `embedding.take()` moves the bytes out of the row so we can
|
||||
// hand the (now-empty-embedding) row plus the encoded string
|
||||
// back to the caller without cloning the whole row — at 20k
|
||||
// rows × 2 KB that clone was 40 MB of pointless heap traffic
|
||||
// per cluster-suggest run.
|
||||
use base64::Engine;
|
||||
Ok(rows
|
||||
.into_iter()
|
||||
.filter_map(|mut r| {
|
||||
let bytes = r.embedding.take()?;
|
||||
let b64 = base64::engine::general_purpose::STANDARD.encode(&bytes);
|
||||
Some((r, b64))
|
||||
.filter_map(|r| {
|
||||
r.embedding.as_ref().map(|bytes| {
|
||||
let b64 = base64::engine::general_purpose::STANDARD.encode(bytes);
|
||||
(r.clone(), b64)
|
||||
})
|
||||
})
|
||||
.collect())
|
||||
})
|
||||
@@ -1061,42 +1013,14 @@ impl FaceDao for SqliteFaceDao {
|
||||
.first(conn.deref_mut())
|
||||
.with_context(|| "stats: failed")?
|
||||
};
|
||||
// Image-extension filter mirrors `list_unscanned_candidates` so
|
||||
// SCANNED can actually reach 100%: videos sit in `image_exif` but
|
||||
// never get a `face_detections` row, so counting them here
|
||||
// permanently caps the percentage below 100%.
|
||||
//
|
||||
// Count DISTINCT content_hash (not rows) so the numerator
|
||||
// (`scanned`, also distinct-content_hash) and denominator live
|
||||
// in the same domain. Without this, a file present at multiple
|
||||
// rel_paths or across libraries inflates total_photos by one
|
||||
// per duplicate row while face_detections — keyed on
|
||||
// content_hash — counts the bytes once, leaving a permanent
|
||||
// gap (e.g. 1101/1103 with nothing actually pending). Rows
|
||||
// with NULL content_hash are excluded; they're held in the
|
||||
// hash-backfill backlog and counting them would pin the bar
|
||||
// below 100% for the duration of that backfill.
|
||||
let total_photos: i64 = {
|
||||
let ext_predicate = image_path_predicate("rel_path");
|
||||
let row: CountRow = if let Some(lib) = library_id {
|
||||
let sql = format!(
|
||||
"SELECT COUNT(DISTINCT content_hash) AS count FROM image_exif \
|
||||
WHERE library_id = ? AND content_hash IS NOT NULL AND {ext_predicate}"
|
||||
);
|
||||
diesel::sql_query(sql)
|
||||
.bind::<diesel::sql_types::Integer, _>(lib)
|
||||
.get_result(conn.deref_mut())
|
||||
.with_context(|| "stats: total_photos")?
|
||||
} else {
|
||||
let sql = format!(
|
||||
"SELECT COUNT(DISTINCT content_hash) AS count FROM image_exif \
|
||||
WHERE content_hash IS NOT NULL AND {ext_predicate}"
|
||||
);
|
||||
diesel::sql_query(sql)
|
||||
.get_result(conn.deref_mut())
|
||||
.with_context(|| "stats: total_photos")?
|
||||
};
|
||||
row.count
|
||||
let mut q = image_exif::table.into_boxed();
|
||||
if let Some(lib) = library_id {
|
||||
q = q.filter(image_exif::library_id.eq(lib));
|
||||
}
|
||||
q.select(diesel::dsl::count_star())
|
||||
.first(conn.deref_mut())
|
||||
.with_context(|| "stats: total_photos")?
|
||||
};
|
||||
let persons_count: i64 = persons::table
|
||||
.select(diesel::dsl::count_star())
|
||||
@@ -1448,19 +1372,6 @@ impl FaceDao for SqliteFaceDao {
|
||||
})
|
||||
}
|
||||
|
||||
fn has_any_faces(&mut self, ctx: &opentelemetry::Context) -> anyhow::Result<bool> {
|
||||
let mut conn = self.connection.lock().expect("face dao lock");
|
||||
trace_db_call(ctx, "query", "has_any_faces", |_span| {
|
||||
face_detections::table
|
||||
.filter(face_detections::status.eq("detected"))
|
||||
.select(face_detections::id)
|
||||
.first::<i32>(conn.deref_mut())
|
||||
.optional()
|
||||
.map(|x| x.is_some())
|
||||
.with_context(|| "has_any_faces query")
|
||||
})
|
||||
}
|
||||
|
||||
fn resolve_content_hash(
|
||||
&mut self,
|
||||
ctx: &opentelemetry::Context,
|
||||
@@ -1688,10 +1599,18 @@ where
|
||||
.route(web::get().to(list_persons_handler::<D>))
|
||||
.route(web::post().to(create_person_handler::<D>)),
|
||||
)
|
||||
.service(
|
||||
web::resource("/persons/bootstrap")
|
||||
.route(web::post().to(bootstrap_persons_handler::<D>)),
|
||||
)
|
||||
.service(
|
||||
web::resource("/persons/ignore-bucket")
|
||||
.route(web::post().to(ignore_bucket_handler::<D>)),
|
||||
)
|
||||
.service(
|
||||
web::resource("/tags/people-bootstrap-candidates")
|
||||
.route(web::get().to(bootstrap_candidates_handler::<D>)),
|
||||
)
|
||||
.service(
|
||||
web::resource("/persons/{id}")
|
||||
.route(web::get().to(get_person_handler::<D>))
|
||||
@@ -1706,6 +1625,340 @@ where
|
||||
)
|
||||
}
|
||||
|
||||
// ── Bootstrap (Phase 4) ─────────────────────────────────────────────────────
|
||||
|
||||
#[derive(Serialize, Debug, Clone)]
|
||||
pub struct BootstrapCandidate {
|
||||
/// Display name — most-frequent capitalization across the case-insensitive
|
||||
/// group, or simply the first one seen if it's a tie.
|
||||
pub name: String,
|
||||
/// Lowercased name; the stable key for grouping and the auto-bind path.
|
||||
pub normalized_name: String,
|
||||
/// Sum of `tagged_photo` counts across all capitalizations of this name.
|
||||
pub usage_count: i64,
|
||||
/// Heuristic suggestion; the UI defaults this to checked but the user
|
||||
/// confirms before [`bootstrap_persons_handler`] actually creates rows.
|
||||
pub looks_like_person: bool,
|
||||
/// True when a `persons` row already exists for this name (any case).
|
||||
/// The UI hides these — re-running bootstrap is idempotent so it's fine
|
||||
/// either way, but the noise isn't worth showing.
|
||||
pub already_exists: bool,
|
||||
}
|
||||
|
||||
#[derive(Serialize, Debug)]
|
||||
pub struct BootstrapCandidatesResponse {
|
||||
pub candidates: Vec<BootstrapCandidate>,
|
||||
}
|
||||
|
||||
#[derive(Deserialize, Debug)]
|
||||
pub struct BootstrapPersonsReq {
|
||||
pub names: Vec<String>,
|
||||
}
|
||||
|
||||
#[derive(Serialize, Debug)]
|
||||
pub struct BootstrapPersonsResponse {
|
||||
pub created: Vec<Person>,
|
||||
pub skipped: Vec<BootstrapSkipped>,
|
||||
}
|
||||
|
||||
#[derive(Serialize, Debug)]
|
||||
pub struct BootstrapSkipped {
|
||||
pub name: String,
|
||||
pub reason: String,
|
||||
}
|
||||
|
||||
/// Hard filter for the bootstrap candidate list. Returns true if the tag
|
||||
/// could plausibly be a person name; returns false to drop it from the
|
||||
/// candidates entirely (not just leave looks_like_person=false).
|
||||
///
|
||||
/// Rules — all required:
|
||||
/// - At least 3 characters after trimming. Two-letter tags ("AB", "OK")
|
||||
/// are almost always abbreviations or markers, not names.
|
||||
/// - No emoji or symbol-class characters. SQL-side string sort already
|
||||
/// surfaces those at the top of the tag list; filtering them keeps
|
||||
/// the candidate UI focused on names rather than chart-junk.
|
||||
/// - No control characters or null bytes.
|
||||
pub(crate) fn is_plausible_name_token(raw: &str) -> bool {
|
||||
let trimmed = raw.trim();
|
||||
if trimmed.chars().count() < 3 {
|
||||
return false;
|
||||
}
|
||||
for c in trimmed.chars() {
|
||||
// Letter / mark / decimal-digit / connector-punctuation /
|
||||
// dash / apostrophe / period / whitespace are all plausible in a
|
||||
// name. Anything else (emoji, symbols, math operators, arrows,
|
||||
// box drawing, control codes) disqualifies the whole tag.
|
||||
if c.is_alphabetic()
|
||||
|| c.is_whitespace()
|
||||
|| matches!(c, '\'' | '-' | '.' | '_' | '\u{2019}')
|
||||
{
|
||||
continue;
|
||||
}
|
||||
if c.is_ascii_digit() {
|
||||
// Digits don't disqualify here — `looks_like_person` rejects
|
||||
// them later, but `is_plausible_name_token` is just about
|
||||
// "could this be in the candidate list at all?". A tag like
|
||||
// "Sarah2" stays as a candidate (display-flagged not-a-person
|
||||
// by looks_like_person) so the operator can still spot and
|
||||
// confirm it manually if it's an alias.
|
||||
continue;
|
||||
}
|
||||
return false;
|
||||
}
|
||||
true
|
||||
}
|
||||
|
||||
/// Conservative "this tag *might* be a person name" heuristic. False
|
||||
/// negatives are fine — the operator confirms in the UI before any row
|
||||
/// is created. False positives are also fine for the same reason; the
|
||||
/// goal is just to default sensible candidates to checked.
|
||||
///
|
||||
/// Rules:
|
||||
/// - 1–2 whitespace-separated words
|
||||
/// - Each word starts with an uppercase character
|
||||
/// - No digits anywhere (rejects "Trip 2018", "2024", etc.)
|
||||
/// - Single-word names not on a small denylist of common non-person
|
||||
/// tags (cat, christmas, beach, ...). Two-word names skip the
|
||||
/// denylist because a real two-word person name is the dominant
|
||||
/// case ("Sarah Smith") and false-blocking it is worse than false-
|
||||
/// accepting "Sunset Walk".
|
||||
pub(crate) fn looks_like_person(raw: &str) -> bool {
|
||||
let trimmed = raw.trim();
|
||||
if trimmed.is_empty() {
|
||||
return false;
|
||||
}
|
||||
let words: Vec<&str> = trimmed.split_whitespace().collect();
|
||||
if !(1..=2).contains(&words.len()) {
|
||||
return false;
|
||||
}
|
||||
for w in &words {
|
||||
let Some(first) = w.chars().next() else {
|
||||
return false;
|
||||
};
|
||||
if !first.is_uppercase() {
|
||||
return false;
|
||||
}
|
||||
if w.chars().any(|c| c.is_ascii_digit()) {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
if words.len() == 1 {
|
||||
const DENY: &[&str] = &[
|
||||
// Pets / animals
|
||||
"cat",
|
||||
"dog",
|
||||
"kitten",
|
||||
"puppy",
|
||||
"bird",
|
||||
"fish",
|
||||
"pet",
|
||||
"pets",
|
||||
// Events / occasions
|
||||
"birthday",
|
||||
"christmas",
|
||||
"halloween",
|
||||
"easter",
|
||||
"thanksgiving",
|
||||
"wedding",
|
||||
"anniversary",
|
||||
"vacation",
|
||||
"holiday",
|
||||
"party",
|
||||
"trip",
|
||||
"graduation",
|
||||
"concert",
|
||||
// Places (generic)
|
||||
"home",
|
||||
"work",
|
||||
"beach",
|
||||
"park",
|
||||
"hotel",
|
||||
"restaurant",
|
||||
"office",
|
||||
"house",
|
||||
"garden",
|
||||
// Subjects / styles
|
||||
"food",
|
||||
"sunset",
|
||||
"sunrise",
|
||||
"landscape",
|
||||
"portrait",
|
||||
"selfie",
|
||||
"nature",
|
||||
"flowers",
|
||||
"flower",
|
||||
"snow",
|
||||
"rain",
|
||||
"sky",
|
||||
// Buckets
|
||||
"untagged",
|
||||
"favorites",
|
||||
"favourites",
|
||||
"misc",
|
||||
"other",
|
||||
"random",
|
||||
];
|
||||
let lower = trimmed.to_lowercase();
|
||||
if DENY.iter().any(|w| *w == lower) {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
true
|
||||
}
|
||||
|
||||
async fn bootstrap_candidates_handler<D: FaceDao>(
|
||||
_: Claims,
|
||||
request: HttpRequest,
|
||||
face_dao: web::Data<Mutex<D>>,
|
||||
tag_dao: web::Data<Mutex<crate::tags::SqliteTagDao>>,
|
||||
) -> impl Responder {
|
||||
use std::collections::HashMap;
|
||||
let context = extract_context_from_request(&request);
|
||||
let span = global_tracer().start_with_context("faces.bootstrap_candidates", &context);
|
||||
let span_context = opentelemetry::Context::current_with_span(span);
|
||||
|
||||
// All tags + their counts. Path filter unused — bootstrap is library-wide.
|
||||
let tags_with_counts = {
|
||||
let mut td = tag_dao.lock().expect("tag dao lock");
|
||||
match crate::tags::TagDao::get_all_tags(&mut *td, &span_context, None) {
|
||||
Ok(t) => t,
|
||||
Err(e) => return HttpResponse::InternalServerError().body(format!("{:#}", e)),
|
||||
}
|
||||
};
|
||||
|
||||
// Group by lowercase name. Pick the most-frequent capitalization
|
||||
// for the display name (ties broken by first-seen). Filter out
|
||||
// short tags and tags carrying non-name characters (emojis, symbols)
|
||||
// before grouping — they're noise no operator would tick, so showing
|
||||
// them just makes the candidate list harder to scan.
|
||||
struct Group {
|
||||
display: String,
|
||||
display_freq: i64,
|
||||
total_count: i64,
|
||||
}
|
||||
let mut groups: HashMap<String, Group> = HashMap::new();
|
||||
for (count, tag) in tags_with_counts {
|
||||
if !is_plausible_name_token(&tag.name) {
|
||||
continue;
|
||||
}
|
||||
let lower = tag.name.to_lowercase();
|
||||
let g = groups.entry(lower).or_insert_with(|| Group {
|
||||
display: tag.name.clone(),
|
||||
display_freq: 0,
|
||||
total_count: 0,
|
||||
});
|
||||
g.total_count += count;
|
||||
if count > g.display_freq {
|
||||
g.display = tag.name.clone();
|
||||
g.display_freq = count;
|
||||
}
|
||||
}
|
||||
|
||||
// Cross-reference against existing persons (bulk one-query lookup).
|
||||
let lower_names: Vec<String> = groups.keys().cloned().collect();
|
||||
let existing = {
|
||||
let mut fd = face_dao.lock().expect("face dao lock");
|
||||
match fd.find_persons_by_names_ci(&span_context, &lower_names) {
|
||||
Ok(m) => m,
|
||||
Err(e) => return HttpResponse::InternalServerError().body(format!("{:#}", e)),
|
||||
}
|
||||
};
|
||||
|
||||
let mut candidates: Vec<BootstrapCandidate> = groups
|
||||
.into_iter()
|
||||
.map(|(lower, g)| BootstrapCandidate {
|
||||
looks_like_person: looks_like_person(&g.display),
|
||||
already_exists: existing.contains_key(&lower),
|
||||
name: g.display,
|
||||
normalized_name: lower,
|
||||
usage_count: g.total_count,
|
||||
})
|
||||
.collect();
|
||||
// Sort: persons-first heuristic by descending count, then alphabetical.
|
||||
// Persons-likely candidates surface near the top so the user doesn't
|
||||
// scroll past dozens of "vacation"-style tags to find them.
|
||||
candidates.sort_by(|a, b| {
|
||||
b.looks_like_person
|
||||
.cmp(&a.looks_like_person)
|
||||
.then(b.usage_count.cmp(&a.usage_count))
|
||||
.then(a.normalized_name.cmp(&b.normalized_name))
|
||||
});
|
||||
|
||||
HttpResponse::Ok().json(BootstrapCandidatesResponse { candidates })
|
||||
}
|
||||
|
||||
async fn bootstrap_persons_handler<D: FaceDao>(
|
||||
_: Claims,
|
||||
request: HttpRequest,
|
||||
body: web::Json<BootstrapPersonsReq>,
|
||||
face_dao: web::Data<Mutex<D>>,
|
||||
) -> impl Responder {
|
||||
let context = extract_context_from_request(&request);
|
||||
let span = global_tracer().start_with_context("faces.bootstrap_persons", &context);
|
||||
let span_context = opentelemetry::Context::current_with_span(span);
|
||||
|
||||
let mut created: Vec<Person> = Vec::new();
|
||||
let mut skipped: Vec<BootstrapSkipped> = Vec::new();
|
||||
|
||||
let mut dao = face_dao.lock().expect("face dao lock");
|
||||
|
||||
// Pre-fetch the existing-name set so a duplicate request reports
|
||||
// "already exists" (skipped) rather than firing N inserts that all
|
||||
// 409 against the UNIQUE COLLATE NOCASE constraint.
|
||||
let lower_names: Vec<String> = body.names.iter().map(|n| n.to_lowercase()).collect();
|
||||
let existing = match dao.find_persons_by_names_ci(&span_context, &lower_names) {
|
||||
Ok(m) => m,
|
||||
Err(e) => return HttpResponse::InternalServerError().body(format!("{:#}", e)),
|
||||
};
|
||||
|
||||
for name in &body.names {
|
||||
let trimmed = name.trim();
|
||||
if trimmed.is_empty() {
|
||||
skipped.push(BootstrapSkipped {
|
||||
name: name.clone(),
|
||||
reason: "empty name".into(),
|
||||
});
|
||||
continue;
|
||||
}
|
||||
let lower = trimmed.to_lowercase();
|
||||
if existing.contains_key(&lower) {
|
||||
skipped.push(BootstrapSkipped {
|
||||
name: trimmed.to_string(),
|
||||
reason: "person already exists".into(),
|
||||
});
|
||||
continue;
|
||||
}
|
||||
match dao.create_person(
|
||||
&span_context,
|
||||
&CreatePersonReq {
|
||||
name: trimmed.to_string(),
|
||||
notes: None,
|
||||
entity_id: None,
|
||||
is_ignored: false,
|
||||
},
|
||||
/*from_tag*/ true,
|
||||
) {
|
||||
Ok(p) => created.push(p),
|
||||
Err(e) => {
|
||||
if is_unique_violation(&e) {
|
||||
// Race with a concurrent create; treat as skipped.
|
||||
skipped.push(BootstrapSkipped {
|
||||
name: trimmed.to_string(),
|
||||
reason: "person already exists".into(),
|
||||
});
|
||||
} else {
|
||||
skipped.push(BootstrapSkipped {
|
||||
name: trimmed.to_string(),
|
||||
reason: format!("{:#}", e),
|
||||
});
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
HttpResponse::Ok().json(BootstrapPersonsResponse { created, skipped })
|
||||
}
|
||||
|
||||
// ── Stats / list ────────────────────────────────────────────────────────────
|
||||
|
||||
#[derive(Deserialize)]
|
||||
@@ -1802,7 +2055,6 @@ async fn embeddings_handler<D: FaceDao>(
|
||||
&span_context,
|
||||
query.library,
|
||||
query.unassigned,
|
||||
query.person_id,
|
||||
limit,
|
||||
offset,
|
||||
)
|
||||
@@ -2003,12 +2255,6 @@ async fn update_face_handler<D: FaceDao>(
|
||||
let mut new_embedding: Option<Vec<u8>> = None;
|
||||
if let Some((bx, by, bw, bh)) = bbox_patch {
|
||||
if !face_client.is_enabled() {
|
||||
warn!(
|
||||
"PATCH /image/faces/{}: 503 — face client not enabled \
|
||||
(APOLLO_FACE_API_BASE_URL / APOLLO_API_BASE_URL both unset). \
|
||||
Bbox edit requires Apollo to re-embed.",
|
||||
id
|
||||
);
|
||||
return HttpResponse::ServiceUnavailable()
|
||||
.body("face client disabled — bbox edit requires Apollo");
|
||||
}
|
||||
@@ -2018,19 +2264,12 @@ async fn update_face_handler<D: FaceDao>(
|
||||
match dao.get_face(&span_context, id) {
|
||||
Ok(Some(r)) => r,
|
||||
Ok(None) => return HttpResponse::NotFound().finish(),
|
||||
Err(e) => {
|
||||
warn!("PATCH /image/faces/{}: 500 — get_face failed: {:#}", id, e);
|
||||
return HttpResponse::InternalServerError().body(e.to_string());
|
||||
}
|
||||
Err(e) => return HttpResponse::InternalServerError().body(e.to_string()),
|
||||
}
|
||||
};
|
||||
let library = match app_state.library_by_id(current.library_id) {
|
||||
Some(l) => l.clone(),
|
||||
None => {
|
||||
warn!(
|
||||
"PATCH /image/faces/{}: 500 — face row references unknown library_id {}",
|
||||
id, current.library_id
|
||||
);
|
||||
return HttpResponse::InternalServerError().body(format!(
|
||||
"face row references unknown library_id {}",
|
||||
current.library_id
|
||||
@@ -2045,7 +2284,8 @@ async fn update_face_handler<D: FaceDao>(
|
||||
"PATCH /image/faces/{}: crop failed for {:?}: {:?}",
|
||||
id, abs_path, e
|
||||
);
|
||||
return HttpResponse::BadRequest().body(format!("cannot crop new bbox: {}", e));
|
||||
return HttpResponse::BadRequest()
|
||||
.body(format!("cannot crop new bbox: {}", e));
|
||||
}
|
||||
};
|
||||
let meta = DetectMeta {
|
||||
@@ -2092,20 +2332,11 @@ async fn update_face_handler<D: FaceDao>(
|
||||
);
|
||||
}
|
||||
Err(FaceDetectError::Transient(e)) => {
|
||||
warn!(
|
||||
"PATCH /image/faces/{}: 503 — Apollo face client transient \
|
||||
error during re-embed: {}",
|
||||
id, e
|
||||
);
|
||||
return HttpResponse::ServiceUnavailable().body(format!("{}", e));
|
||||
}
|
||||
Err(FaceDetectError::Disabled) => {
|
||||
warn!(
|
||||
"PATCH /image/faces/{}: 503 — face client became disabled \
|
||||
mid-flight",
|
||||
id
|
||||
);
|
||||
return HttpResponse::ServiceUnavailable().body("face client disabled mid-flight");
|
||||
return HttpResponse::ServiceUnavailable()
|
||||
.body("face client disabled mid-flight");
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -2113,14 +2344,7 @@ async fn update_face_handler<D: FaceDao>(
|
||||
let mut dao = face_dao.lock().expect("face dao lock");
|
||||
let row = match dao.update_face(&span_context, id, person_patch, bbox_patch, new_embedding) {
|
||||
Ok(r) => r,
|
||||
Err(e) => {
|
||||
// The full anyhow chain (`{:#}`) shows the diesel cause behind
|
||||
// the short context string we surface in the response body —
|
||||
// SQLITE_BUSY here usually means another DAO's writer held the
|
||||
// lock past `busy_timeout` (5s), which is invisible in `{}`.
|
||||
warn!("PATCH /image/faces/{}: 500 — update_face failed: {:#}", id, e);
|
||||
return HttpResponse::InternalServerError().body(e.to_string());
|
||||
}
|
||||
Err(e) => return HttpResponse::InternalServerError().body(e.to_string()),
|
||||
};
|
||||
// Hydrate person_name so the response shape matches GET /image/faces
|
||||
// — the carousel overlay does an optimistic replace on this row, and
|
||||
@@ -2128,13 +2352,7 @@ async fn update_face_handler<D: FaceDao>(
|
||||
// VFD label off the bbox even though the assignment didn't change.
|
||||
match hydrate_face_with_person(&mut *dao, &span_context, row) {
|
||||
Ok(joined) => HttpResponse::Ok().json(joined),
|
||||
Err(e) => {
|
||||
warn!(
|
||||
"PATCH /image/faces/{}: 500 — hydrate_face_with_person failed: {:#}",
|
||||
id, e
|
||||
);
|
||||
HttpResponse::InternalServerError().body(e.to_string())
|
||||
}
|
||||
Err(e) => HttpResponse::InternalServerError().body(e.to_string()),
|
||||
}
|
||||
}
|
||||
|
||||
@@ -2487,7 +2705,77 @@ mod tests {
|
||||
);
|
||||
}
|
||||
|
||||
// ── Phase 4: cosine + DAO support ───────────────────────────────────
|
||||
// ── Phase 4: bootstrap heuristic + cosine + DAO support ─────────────
|
||||
|
||||
#[test]
|
||||
fn is_plausible_name_token_filters_short_and_emoji() {
|
||||
// Hard filter applied before grouping — emojis and tags shorter
|
||||
// than 3 chars never make it into the candidate list, regardless
|
||||
// of looks_like_person's later assessment.
|
||||
assert!(is_plausible_name_token("Cameron"));
|
||||
assert!(is_plausible_name_token("Sarah Smith"));
|
||||
assert!(is_plausible_name_token("O'Brien"));
|
||||
assert!(is_plausible_name_token("Jean-Luc"));
|
||||
assert!(is_plausible_name_token("St. James"));
|
||||
assert!(is_plausible_name_token("Renée"));
|
||||
assert!(is_plausible_name_token("José"));
|
||||
// Asian script names — the alphabetic/letter check covers any
|
||||
// script, not just Latin.
|
||||
assert!(is_plausible_name_token("田中太郎"));
|
||||
|
||||
// Below the 3-character floor.
|
||||
assert!(!is_plausible_name_token(""));
|
||||
assert!(!is_plausible_name_token(" "));
|
||||
assert!(!is_plausible_name_token("Bo"));
|
||||
assert!(!is_plausible_name_token("AB"));
|
||||
// Trim before counting — surrounding whitespace doesn't count.
|
||||
assert!(!is_plausible_name_token(" AB "));
|
||||
|
||||
// Emoji / symbol classes get the whole tag dropped.
|
||||
assert!(!is_plausible_name_token("🐱cat"));
|
||||
assert!(!is_plausible_name_token("Heart ❤"));
|
||||
assert!(!is_plausible_name_token("📸Photo"));
|
||||
assert!(!is_plausible_name_token("→ Trip"));
|
||||
assert!(!is_plausible_name_token("★Vacation"));
|
||||
|
||||
// Digits are kept (handled by looks_like_person, not here).
|
||||
assert!(is_plausible_name_token("Trip 2018"));
|
||||
assert!(is_plausible_name_token("2024"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn looks_like_person_accepts_typical_names() {
|
||||
assert!(looks_like_person("Cameron"));
|
||||
assert!(looks_like_person("Sarah Smith"));
|
||||
assert!(looks_like_person("Mary Jane"));
|
||||
// Non-ASCII title-cased single word still counts.
|
||||
assert!(looks_like_person("Renée"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn looks_like_person_rejects_obvious_non_people() {
|
||||
// Digits, lowercase, three-or-more words, denylist hits.
|
||||
assert!(!looks_like_person("2018"));
|
||||
assert!(!looks_like_person("Trip 2018"));
|
||||
assert!(!looks_like_person("trip"));
|
||||
assert!(!looks_like_person("Birthday Party Cake"));
|
||||
assert!(!looks_like_person("cat"));
|
||||
assert!(!looks_like_person("Cat")); // denied even when title-cased
|
||||
assert!(!looks_like_person("Christmas"));
|
||||
assert!(!looks_like_person("home"));
|
||||
assert!(!looks_like_person(""));
|
||||
assert!(!looks_like_person(" "));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn looks_like_person_two_words_skips_denylist() {
|
||||
// Two-word names get a pass on the single-word denylist —
|
||||
// "Sunset Walk" is much more likely a real album than a person,
|
||||
// but false-accepting is fine because the operator confirms.
|
||||
// What matters is we don't false-reject "Sarah Smith".
|
||||
assert!(looks_like_person("Sunset Walk"));
|
||||
assert!(looks_like_person("Sarah Smith"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn cosine_similarity_known_vectors() {
|
||||
@@ -2857,39 +3145,6 @@ mod tests {
|
||||
assert_eq!(stats.with_faces, 0);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn stats_total_photos_excludes_videos() {
|
||||
// SCANNED counts content_hashes in face_detections; total_photos
|
||||
// must apply the same image-extension filter as the watcher
|
||||
// backlog query so the percentage can reach 100%. Without this,
|
||||
// videos sit in image_exif but never produce a face_detections
|
||||
// row (Apollo decodes images only) and the bar caps below 100%.
|
||||
let mut dao = fresh_dao();
|
||||
diesel::sql_query(
|
||||
"INSERT OR IGNORE INTO libraries (id, name, root_path, created_at) \
|
||||
VALUES (1, 'main', '/tmp', 0)",
|
||||
)
|
||||
.execute(dao.connection.lock().unwrap().deref_mut())
|
||||
.expect("seed libraries");
|
||||
|
||||
diesel::sql_query(
|
||||
"INSERT INTO image_exif \
|
||||
(library_id, rel_path, content_hash, created_time, last_modified) VALUES \
|
||||
(1, 'a.jpg', 'h-a', 0, 0), \
|
||||
(1, 'b.JPEG', 'h-b', 0, 0), \
|
||||
(1, 'movie.mp4', 'h-mp4', 0, 0), \
|
||||
(1, 'clip.MOV', 'h-mov', 0, 0)",
|
||||
)
|
||||
.execute(dao.connection.lock().unwrap().deref_mut())
|
||||
.expect("seed image_exif");
|
||||
|
||||
let stats = dao.stats(&ctx(), Some(1)).expect("stats");
|
||||
assert_eq!(
|
||||
stats.total_photos, 2,
|
||||
"videos should not count toward total"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn merge_persons_repoints_faces() {
|
||||
let mut dao = fresh_dao();
|
||||
@@ -2960,87 +3215,6 @@ mod tests {
|
||||
assert_eq!(faces[0].person_id, Some(alice.id));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn list_embeddings_filters_by_person_id() {
|
||||
// Apollo's similar-unassigned suggester relies on this filter to
|
||||
// pull a single person's embeddings without paging the whole
|
||||
// detected set client-side. When person_id is set it must win
|
||||
// over `unassigned=true` (otherwise the IS NULL constraint would
|
||||
// always return an empty set for an assigned person).
|
||||
let mut dao = fresh_dao();
|
||||
diesel::sql_query(
|
||||
"INSERT OR IGNORE INTO libraries (id, name, root_path, created_at) \
|
||||
VALUES (1, 'main', '/tmp', 0)",
|
||||
)
|
||||
.execute(dao.connection.lock().unwrap().deref_mut())
|
||||
.expect("seed libraries");
|
||||
|
||||
let alice = dao
|
||||
.create_person(
|
||||
&ctx(),
|
||||
&CreatePersonReq {
|
||||
name: "Alice".into(),
|
||||
notes: None,
|
||||
entity_id: None,
|
||||
is_ignored: false,
|
||||
},
|
||||
false,
|
||||
)
|
||||
.unwrap();
|
||||
let bob = dao
|
||||
.create_person(
|
||||
&ctx(),
|
||||
&CreatePersonReq {
|
||||
name: "Bob".into(),
|
||||
notes: None,
|
||||
entity_id: None,
|
||||
is_ignored: false,
|
||||
},
|
||||
false,
|
||||
)
|
||||
.unwrap();
|
||||
|
||||
let mk_row = |hash: &str, person: Option<i32>| InsertFaceDetectionInput {
|
||||
library_id: 1,
|
||||
content_hash: hash.into(),
|
||||
rel_path: format!("{hash}.jpg"),
|
||||
bbox: Some((0.1, 0.1, 0.2, 0.2)),
|
||||
embedding: Some(vec![0u8; 2048]),
|
||||
confidence: Some(0.9),
|
||||
source: "auto".into(),
|
||||
person_id: person,
|
||||
status: "detected".into(),
|
||||
model_version: "buffalo_l".into(),
|
||||
};
|
||||
dao.store_detection(&ctx(), mk_row("a1", Some(alice.id)))
|
||||
.unwrap();
|
||||
dao.store_detection(&ctx(), mk_row("a2", Some(alice.id)))
|
||||
.unwrap();
|
||||
dao.store_detection(&ctx(), mk_row("b1", Some(bob.id)))
|
||||
.unwrap();
|
||||
dao.store_detection(&ctx(), mk_row("u1", None)).unwrap();
|
||||
|
||||
// person_id=alice returns only alice's two faces — ignoring the
|
||||
// (default-true) `unassigned` filter, which would have selected
|
||||
// u1 only.
|
||||
let alice_rows = dao
|
||||
.list_embeddings(&ctx(), None, true, Some(alice.id), 100, 0)
|
||||
.unwrap();
|
||||
assert_eq!(alice_rows.len(), 2);
|
||||
assert!(
|
||||
alice_rows
|
||||
.iter()
|
||||
.all(|(r, _)| r.person_id == Some(alice.id))
|
||||
);
|
||||
|
||||
// unassigned=true with no person_id behaves as before.
|
||||
let unassigned_rows = dao
|
||||
.list_embeddings(&ctx(), None, true, None, 100, 0)
|
||||
.unwrap();
|
||||
assert_eq!(unassigned_rows.len(), 1);
|
||||
assert_eq!(unassigned_rows[0].0.content_hash, "u1");
|
||||
}
|
||||
|
||||
// ── crop_image_to_bbox ──────────────────────────────────────────────
|
||||
// Pure helper used by the manual face-create handler. Generate a tiny
|
||||
// image in memory, write it to a temp file, then exercise the bbox
|
||||
@@ -3151,7 +3325,8 @@ mod tests {
|
||||
)
|
||||
.unwrap();
|
||||
let row = seed_library_and_face(&mut dao, Some(p.id));
|
||||
let joined = hydrate_face_with_person(&mut dao, &ctx(), row).expect("hydrate assigned");
|
||||
let joined =
|
||||
hydrate_face_with_person(&mut dao, &ctx(), row).expect("hydrate assigned");
|
||||
assert_eq!(joined.person_id, Some(p.id));
|
||||
assert_eq!(joined.person_name.as_deref(), Some("Alice"));
|
||||
// Bbox + confidence + source must round-trip — these are what
|
||||
@@ -3170,7 +3345,8 @@ mod tests {
|
||||
// previously-assigned row's serialization.
|
||||
let mut dao = fresh_dao();
|
||||
let row = seed_library_and_face(&mut dao, None);
|
||||
let joined = hydrate_face_with_person(&mut dao, &ctx(), row).expect("hydrate unassigned");
|
||||
let joined =
|
||||
hydrate_face_with_person(&mut dao, &ctx(), row).expect("hydrate unassigned");
|
||||
assert!(joined.person_id.is_none());
|
||||
assert!(joined.person_name.is_none());
|
||||
}
|
||||
@@ -3191,12 +3367,7 @@ mod tests {
|
||||
.execute(dao.connection.lock().unwrap().deref_mut())
|
||||
.expect("seed libraries");
|
||||
|
||||
// Seed image_exif: mix of hashed/unhashed/scanned/cross-library,
|
||||
// plus a video and a mixed-case image extension. Videos register
|
||||
// in image_exif but can never produce a face_detections row, so
|
||||
// the SQL must filter them out — otherwise the per-tick backlog
|
||||
// drain re-pulls them every tick (no marker is ever written, so
|
||||
// they loop forever) and the SCANNED stat is permanently capped.
|
||||
// Seed image_exif: mix of hashed/unhashed/scanned/cross-library.
|
||||
diesel::sql_query(
|
||||
"INSERT INTO image_exif \
|
||||
(library_id, rel_path, content_hash, created_time, last_modified) VALUES \
|
||||
@@ -3204,9 +3375,6 @@ mod tests {
|
||||
(1, 'b.jpg', 'h-b', 0, 0), \
|
||||
(1, 'c.jpg', NULL, 0, 0), \
|
||||
(1, 'd.jpg', 'h-d', 0, 0), \
|
||||
(1, 'movie.mp4', 'h-mp4', 0, 0), \
|
||||
(1, 'clip.MOV', 'h-mov', 0, 0), \
|
||||
(1, 'photo.JPG', 'h-jpg-upper', 0, 0), \
|
||||
(2, 'e.jpg', 'h-e', 0, 0)",
|
||||
)
|
||||
.execute(dao.connection.lock().unwrap().deref_mut())
|
||||
@@ -3220,26 +3388,16 @@ mod tests {
|
||||
.list_unscanned_candidates(&ctx(), 1, 10)
|
||||
.expect("list unscanned");
|
||||
|
||||
let hashes: std::collections::HashSet<_> = cands.iter().map(|(_, h)| h.clone()).collect();
|
||||
let hashes: std::collections::HashSet<_> =
|
||||
cands.iter().map(|(_, h)| h.clone()).collect();
|
||||
|
||||
// Should contain a, d, and the upper-case .JPG (image-extension
|
||||
// match is case-insensitive).
|
||||
// Should contain a and d (hashed, unscanned, library 1).
|
||||
assert!(hashes.contains("h-a"), "missing h-a: {:?}", hashes);
|
||||
assert!(hashes.contains("h-d"), "missing h-d: {:?}", hashes);
|
||||
assert!(
|
||||
hashes.contains("h-jpg-upper"),
|
||||
"missing h-jpg-upper: {:?}",
|
||||
hashes
|
||||
);
|
||||
// Should NOT contain b (scanned), c (no hash), e (other library),
|
||||
// or videos (mp4/mov are not image extensions).
|
||||
// Should NOT contain b (scanned), c (no hash), e (other library).
|
||||
assert!(!hashes.contains("h-b"), "expected h-b filtered (scanned)");
|
||||
assert!(
|
||||
!hashes.contains("h-e"),
|
||||
"expected h-e filtered (other library)"
|
||||
);
|
||||
assert!(!hashes.contains("h-mp4"), "expected h-mp4 filtered (video)");
|
||||
assert!(!hashes.contains("h-mov"), "expected h-mov filtered (video)");
|
||||
assert_eq!(cands.len(), 3, "unexpected candidates: {:?}", cands);
|
||||
assert!(!hashes.contains("h-e"), "expected h-e filtered (other library)");
|
||||
assert_eq!(cands.len(), 2, "unexpected candidates: {:?}", cands);
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
+22
-10
@@ -123,7 +123,11 @@ mod tests {
|
||||
"vacation/@eaDir/IMG_0001.jpg/SYNOFILE_THUMB_XL.jpg",
|
||||
"@eaDir/top_level_thumb.jpg",
|
||||
]);
|
||||
let found = enumerate_indexable_files(dir.path(), &["@eaDir".to_string()], None);
|
||||
let found = enumerate_indexable_files(
|
||||
dir.path(),
|
||||
&["@eaDir".to_string()],
|
||||
None,
|
||||
);
|
||||
assert_eq!(rel_paths(&found), vec!["vacation/IMG_0001.jpg".to_string()]);
|
||||
}
|
||||
|
||||
@@ -135,7 +139,11 @@ mod tests {
|
||||
"a/.thumbnails/cached.jpg",
|
||||
"a/b/.thumbnails/nested.jpg",
|
||||
]);
|
||||
let found = enumerate_indexable_files(dir.path(), &[".thumbnails".to_string()], None);
|
||||
let found = enumerate_indexable_files(
|
||||
dir.path(),
|
||||
&[".thumbnails".to_string()],
|
||||
None,
|
||||
);
|
||||
assert_eq!(rel_paths(&found), vec!["a/b/photo.jpg".to_string()]);
|
||||
}
|
||||
|
||||
@@ -143,8 +151,15 @@ mod tests {
|
||||
fn excludes_absolute_under_base() {
|
||||
// Leading-'/' entries are interpreted as paths under the library
|
||||
// root (see PathExcluder::new).
|
||||
let dir = make_tree(&["private/secret.jpg", "public/keep.jpg"]);
|
||||
let found = enumerate_indexable_files(dir.path(), &["/private".to_string()], None);
|
||||
let dir = make_tree(&[
|
||||
"private/secret.jpg",
|
||||
"public/keep.jpg",
|
||||
]);
|
||||
let found = enumerate_indexable_files(
|
||||
dir.path(),
|
||||
&["/private".to_string()],
|
||||
None,
|
||||
);
|
||||
assert_eq!(rel_paths(&found), vec!["public/keep.jpg".to_string()]);
|
||||
}
|
||||
|
||||
@@ -154,14 +169,11 @@ mod tests {
|
||||
"a.jpg",
|
||||
"b.mp4",
|
||||
"c.txt",
|
||||
"d", // no extension
|
||||
"e.jpg.bak", // wrong ext
|
||||
"d", // no extension
|
||||
"e.jpg.bak", // wrong ext
|
||||
]);
|
||||
let found = enumerate_indexable_files(dir.path(), &[], None);
|
||||
assert_eq!(
|
||||
rel_paths(&found),
|
||||
vec!["a.jpg".to_string(), "b.mp4".to_string()]
|
||||
);
|
||||
assert_eq!(rel_paths(&found), vec!["a.jpg".to_string(), "b.mp4".to_string()]);
|
||||
}
|
||||
|
||||
#[test]
|
||||
|
||||
+13
-246
@@ -10,7 +10,6 @@ use std::path::{Path, PathBuf};
|
||||
use std::sync::Mutex;
|
||||
use std::time::SystemTime;
|
||||
|
||||
use crate::AppState;
|
||||
use crate::data::{
|
||||
Claims, ExifBatchRequest, ExifBatchResponse, ExifSummary, FilesRequest, FilterMode, MediaType,
|
||||
PhotosResponse, SortType,
|
||||
@@ -19,8 +18,8 @@ use crate::database::ExifDao;
|
||||
use crate::file_types;
|
||||
use crate::geo::{gps_bounding_box, haversine_distance};
|
||||
use crate::memories::extract_date_from_filename;
|
||||
use crate::thumbnails::create_thumbnails;
|
||||
use crate::utils::earliest_fs_time;
|
||||
use crate::{AppState, create_thumbnails};
|
||||
use actix_web::web::Data;
|
||||
use actix_web::{
|
||||
HttpRequest, HttpResponse,
|
||||
@@ -111,18 +110,11 @@ fn in_memory_date_sort(
|
||||
let total_count = files.len() as i64;
|
||||
let file_paths: Vec<String> = files.iter().map(|f| f.file_name.clone()).collect();
|
||||
|
||||
// Batch fetch EXIF data. When every file in this batch belongs to the
|
||||
// same library, scope the SQL filter to that library so cross-library
|
||||
// duplicates with the same rel_path don't get fetched and discarded.
|
||||
// In genuine union mode (mixed libraries) keep the rel-path-only
|
||||
// lookup; the caller's `(file_path, library_id)` map below picks the
|
||||
// right row.
|
||||
let scope_library = match file_libraries.first() {
|
||||
Some(&first) if file_libraries.iter().all(|&id| id == first) => Some(first),
|
||||
_ => None,
|
||||
};
|
||||
// Batch fetch EXIF data (keyed by rel_path; in union mode a rel_path may
|
||||
// correspond to rows in multiple libraries — pick the date from the one
|
||||
// matching the requesting row's library_id when possible).
|
||||
let exif_rows = exif_dao
|
||||
.get_exif_batch(span_context, scope_library, &file_paths)
|
||||
.get_exif_batch(span_context, &file_paths)
|
||||
.unwrap_or_default();
|
||||
let exif_map: std::collections::HashMap<(String, i32), i64> = exif_rows
|
||||
.into_iter()
|
||||
@@ -317,15 +309,11 @@ pub async fn list_photos<TagD: TagDao, FS: FileSystemAccess>(
|
||||
None
|
||||
};
|
||||
|
||||
// Query EXIF database. When the request named a library, the EXIF
|
||||
// filter must be scoped to it — otherwise camera/date/GPS hits
|
||||
// from other libraries would pollute the result set even though
|
||||
// downstream filesystem walks would never visit those files.
|
||||
// Query EXIF database
|
||||
let mut exif_dao_guard = exif_dao.lock().expect("Unable to get ExifDao");
|
||||
let exif_results = exif_dao_guard
|
||||
.query_by_exif(
|
||||
&span_context,
|
||||
library.map(|l| l.id),
|
||||
req.camera_make.as_deref(),
|
||||
req.camera_model.as_deref(),
|
||||
req.lens_model.as_deref(),
|
||||
@@ -584,10 +572,9 @@ pub async fn list_photos<TagD: TagDao, FS: FileSystemAccess>(
|
||||
} else {
|
||||
Some(trimmed)
|
||||
};
|
||||
let include_duplicates = req.include_duplicates.unwrap_or(false);
|
||||
let rows = {
|
||||
let mut dao = exif_dao.lock().expect("Unable to get ExifDao");
|
||||
dao.list_rel_paths_for_libraries(&span_context, &lib_ids, prefix, include_duplicates)
|
||||
dao.list_rel_paths_for_libraries(&span_context, &lib_ids, prefix)
|
||||
.unwrap_or_else(|e| {
|
||||
warn!("list_rel_paths_for_libraries failed: {:?}", e);
|
||||
Vec::new()
|
||||
@@ -1255,19 +1242,15 @@ pub async fn list_exif_summary(
|
||||
.collect();
|
||||
|
||||
let mut exif_dao_guard = exif_dao.lock().expect("Unable to get ExifDao");
|
||||
match exif_dao_guard.query_by_exif(
|
||||
&cx,
|
||||
library_filter,
|
||||
None,
|
||||
None,
|
||||
None,
|
||||
None,
|
||||
req.date_from,
|
||||
req.date_to,
|
||||
) {
|
||||
match exif_dao_guard.query_by_exif(&cx, None, None, None, None, req.date_from, req.date_to) {
|
||||
Ok(rows) => {
|
||||
let photos: Vec<ExifSummary> = rows
|
||||
.into_iter()
|
||||
// Library filter post-query: keeps the DAO trait (and its
|
||||
// mocks) unchanged. For typical 2–3 library setups the in-
|
||||
// memory pass over a date-bounded result set is negligible;
|
||||
// can be pushed into SQL later if it ever isn't.
|
||||
.filter(|r| library_filter.is_none_or(|id| r.library_id == id))
|
||||
.map(|r| ExifSummary {
|
||||
library_name: library_names.get(&r.library_id).cloned(),
|
||||
file_path: r.file_path,
|
||||
@@ -1476,44 +1459,6 @@ mod tests {
|
||||
|
||||
struct MockExifDao;
|
||||
|
||||
fn mock_exif_row(
|
||||
library_id: i32,
|
||||
rel_path: &str,
|
||||
date_taken: Option<i64>,
|
||||
date_taken_source: Option<String>,
|
||||
) -> crate::database::models::ImageExif {
|
||||
crate::database::models::ImageExif {
|
||||
id: 1,
|
||||
library_id,
|
||||
file_path: rel_path.to_string(),
|
||||
camera_make: None,
|
||||
camera_model: None,
|
||||
lens_model: None,
|
||||
width: None,
|
||||
height: None,
|
||||
orientation: None,
|
||||
gps_latitude: None,
|
||||
gps_longitude: None,
|
||||
gps_altitude: None,
|
||||
focal_length: None,
|
||||
aperture: None,
|
||||
shutter_speed: None,
|
||||
iso: None,
|
||||
date_taken,
|
||||
created_time: 0,
|
||||
last_modified: 0,
|
||||
content_hash: None,
|
||||
size_bytes: None,
|
||||
phash_64: None,
|
||||
dhash_64: None,
|
||||
duplicate_of_hash: None,
|
||||
duplicate_decided_at: None,
|
||||
date_taken_source,
|
||||
original_date_taken: None,
|
||||
original_date_taken_source: None,
|
||||
}
|
||||
}
|
||||
|
||||
impl ExifDao for MockExifDao {
|
||||
fn store_exif(
|
||||
&mut self,
|
||||
@@ -1543,13 +1488,6 @@ mod tests {
|
||||
last_modified: data.last_modified,
|
||||
content_hash: data.content_hash.clone(),
|
||||
size_bytes: data.size_bytes,
|
||||
phash_64: data.phash_64,
|
||||
dhash_64: data.dhash_64,
|
||||
duplicate_of_hash: None,
|
||||
duplicate_decided_at: None,
|
||||
date_taken_source: data.date_taken_source.clone(),
|
||||
original_date_taken: None,
|
||||
original_date_taken_source: None,
|
||||
})
|
||||
}
|
||||
|
||||
@@ -1589,13 +1527,6 @@ mod tests {
|
||||
last_modified: data.last_modified,
|
||||
content_hash: data.content_hash.clone(),
|
||||
size_bytes: data.size_bytes,
|
||||
phash_64: data.phash_64,
|
||||
dhash_64: data.dhash_64,
|
||||
duplicate_of_hash: None,
|
||||
duplicate_decided_at: None,
|
||||
date_taken_source: data.date_taken_source.clone(),
|
||||
original_date_taken: None,
|
||||
original_date_taken_source: None,
|
||||
})
|
||||
}
|
||||
|
||||
@@ -1618,7 +1549,6 @@ mod tests {
|
||||
fn get_exif_batch(
|
||||
&mut self,
|
||||
_context: &opentelemetry::Context,
|
||||
_library_id: Option<i32>,
|
||||
_: &[String],
|
||||
) -> Result<Vec<crate::database::models::ImageExif>, DbError> {
|
||||
Ok(Vec::new())
|
||||
@@ -1627,7 +1557,6 @@ mod tests {
|
||||
fn query_by_exif(
|
||||
&mut self,
|
||||
_context: &opentelemetry::Context,
|
||||
_library_id: Option<i32>,
|
||||
_: Option<&str>,
|
||||
_: Option<&str>,
|
||||
_: Option<&str>,
|
||||
@@ -1689,64 +1618,6 @@ mod tests {
|
||||
Ok(())
|
||||
}
|
||||
|
||||
fn get_rows_needing_date_backfill(
|
||||
&mut self,
|
||||
_context: &opentelemetry::Context,
|
||||
_library_id: i32,
|
||||
_limit: i64,
|
||||
) -> Result<Vec<(i32, String)>, DbError> {
|
||||
Ok(Vec::new())
|
||||
}
|
||||
|
||||
fn backfill_date_taken(
|
||||
&mut self,
|
||||
_context: &opentelemetry::Context,
|
||||
_library_id: i32,
|
||||
_rel_path: &str,
|
||||
_date_taken: i64,
|
||||
_source: &str,
|
||||
) -> Result<(), DbError> {
|
||||
Ok(())
|
||||
}
|
||||
|
||||
fn set_manual_date_taken(
|
||||
&mut self,
|
||||
_context: &opentelemetry::Context,
|
||||
library_id: i32,
|
||||
rel_path: &str,
|
||||
date_taken: i64,
|
||||
) -> Result<crate::database::models::ImageExif, DbError> {
|
||||
// Mock — files.rs tests don't exercise the date-override endpoints.
|
||||
// Returning a synthetic row keeps the trait satisfied without
|
||||
// depending on private DbError constructors.
|
||||
Ok(mock_exif_row(
|
||||
library_id,
|
||||
rel_path,
|
||||
Some(date_taken),
|
||||
Some("manual".to_string()),
|
||||
))
|
||||
}
|
||||
|
||||
fn clear_manual_date_taken(
|
||||
&mut self,
|
||||
_context: &opentelemetry::Context,
|
||||
library_id: i32,
|
||||
rel_path: &str,
|
||||
) -> Result<crate::database::models::ImageExif, DbError> {
|
||||
Ok(mock_exif_row(library_id, rel_path, None, None))
|
||||
}
|
||||
|
||||
fn get_memories_in_window(
|
||||
&mut self,
|
||||
_context: &opentelemetry::Context,
|
||||
_library_id: i32,
|
||||
_span_token: &str,
|
||||
_years_back: i32,
|
||||
_tz_offset_minutes: i32,
|
||||
) -> Result<Vec<(String, i64, i64)>, DbError> {
|
||||
Ok(Vec::new())
|
||||
}
|
||||
|
||||
fn find_by_content_hash(
|
||||
&mut self,
|
||||
_context: &opentelemetry::Context,
|
||||
@@ -1801,7 +1672,6 @@ mod tests {
|
||||
_context: &opentelemetry::Context,
|
||||
_library_ids: &[i32],
|
||||
_path_prefix: Option<&str>,
|
||||
_include_duplicates: bool,
|
||||
) -> Result<Vec<(i32, String)>, DbError> {
|
||||
Ok(vec![])
|
||||
}
|
||||
@@ -1814,109 +1684,6 @@ mod tests {
|
||||
) -> Result<(), DbError> {
|
||||
Ok(())
|
||||
}
|
||||
|
||||
fn count_for_library(
|
||||
&mut self,
|
||||
_context: &opentelemetry::Context,
|
||||
_library_id: i32,
|
||||
) -> Result<i64, DbError> {
|
||||
Ok(0)
|
||||
}
|
||||
|
||||
fn list_rel_paths_for_library_page(
|
||||
&mut self,
|
||||
_context: &opentelemetry::Context,
|
||||
_library_id: i32,
|
||||
_limit: i64,
|
||||
_offset: i64,
|
||||
) -> Result<Vec<(i32, String)>, DbError> {
|
||||
Ok(Vec::new())
|
||||
}
|
||||
|
||||
fn get_rows_missing_perceptual_hash(
|
||||
&mut self,
|
||||
_context: &opentelemetry::Context,
|
||||
_limit: i64,
|
||||
) -> Result<Vec<(i32, String)>, DbError> {
|
||||
Ok(Vec::new())
|
||||
}
|
||||
|
||||
fn backfill_perceptual_hash(
|
||||
&mut self,
|
||||
_context: &opentelemetry::Context,
|
||||
_library_id: i32,
|
||||
_rel_path: &str,
|
||||
_phash_64: Option<i64>,
|
||||
_dhash_64: Option<i64>,
|
||||
) -> Result<(), DbError> {
|
||||
Ok(())
|
||||
}
|
||||
|
||||
fn list_duplicates_exact(
|
||||
&mut self,
|
||||
_context: &opentelemetry::Context,
|
||||
_library_id: Option<i32>,
|
||||
_include_resolved: bool,
|
||||
) -> Result<Vec<crate::database::DuplicateRow>, DbError> {
|
||||
Ok(Vec::new())
|
||||
}
|
||||
|
||||
fn list_perceptual_candidates(
|
||||
&mut self,
|
||||
_context: &opentelemetry::Context,
|
||||
_library_id: Option<i32>,
|
||||
_include_resolved: bool,
|
||||
) -> Result<Vec<crate::database::DuplicateRow>, DbError> {
|
||||
Ok(Vec::new())
|
||||
}
|
||||
|
||||
fn list_image_paths(
|
||||
&mut self,
|
||||
_context: &opentelemetry::Context,
|
||||
_library_id: Option<i32>,
|
||||
_include_resolved: bool,
|
||||
) -> Result<Vec<(i32, String)>, DbError> {
|
||||
Ok(Vec::new())
|
||||
}
|
||||
|
||||
fn lookup_duplicate_row(
|
||||
&mut self,
|
||||
_context: &opentelemetry::Context,
|
||||
_library_id: i32,
|
||||
_rel_path: &str,
|
||||
) -> Result<Option<crate::database::DuplicateRow>, DbError> {
|
||||
Ok(None)
|
||||
}
|
||||
|
||||
fn set_duplicate_of(
|
||||
&mut self,
|
||||
_context: &opentelemetry::Context,
|
||||
_library_id: i32,
|
||||
_rel_path: &str,
|
||||
_survivor_hash: &str,
|
||||
_decided_at: i64,
|
||||
) -> Result<(), DbError> {
|
||||
Ok(())
|
||||
}
|
||||
|
||||
fn clear_duplicate_of(
|
||||
&mut self,
|
||||
_context: &opentelemetry::Context,
|
||||
_library_id: i32,
|
||||
_rel_path: &str,
|
||||
) -> Result<(), DbError> {
|
||||
Ok(())
|
||||
}
|
||||
|
||||
fn union_perceptual_tags(
|
||||
&mut self,
|
||||
_context: &opentelemetry::Context,
|
||||
_survivor_hash: &str,
|
||||
_demoted_hash: &str,
|
||||
_survivor_rel_path: &str,
|
||||
) -> Result<(), DbError> {
|
||||
Ok(())
|
||||
}
|
||||
}
|
||||
|
||||
mod api {
|
||||
|
||||
@@ -1,128 +0,0 @@
|
||||
//! User-favorites endpoints. Favorites are keyed on `(user_id, rel_path)`
|
||||
//! and shared across libraries — a favorite created in lib1 is visible
|
||||
//! under lib2 if the same rel_path resolves there too.
|
||||
|
||||
use std::sync::Mutex;
|
||||
|
||||
use actix_web::{
|
||||
HttpRequest, HttpResponse, Responder, delete, get, put,
|
||||
web::{self, Data},
|
||||
};
|
||||
use log::{error, info, warn};
|
||||
use opentelemetry::trace::{Span, Status, Tracer};
|
||||
|
||||
use crate::data::{AddFavoriteRequest, Claims, PhotosResponse};
|
||||
use crate::database::{DbError, DbErrorKind, FavoriteDao};
|
||||
use crate::otel::{extract_context_from_request, global_tracer};
|
||||
|
||||
#[get("image/favorites")]
|
||||
pub async fn favorites(
|
||||
claims: Claims,
|
||||
request: HttpRequest,
|
||||
favorites_dao: Data<Mutex<Box<dyn FavoriteDao>>>,
|
||||
) -> impl Responder {
|
||||
let tracer = global_tracer();
|
||||
let context = extract_context_from_request(&request);
|
||||
let mut span = tracer.start_with_context("get favorites", &context);
|
||||
|
||||
match web::block(move || {
|
||||
favorites_dao
|
||||
.lock()
|
||||
.expect("Unable to get FavoritesDao")
|
||||
.get_favorites(claims.sub.parse::<i32>().unwrap())
|
||||
})
|
||||
.await
|
||||
{
|
||||
Ok(Ok(favorites)) => {
|
||||
let favorites = favorites
|
||||
.into_iter()
|
||||
.map(|favorite| favorite.path)
|
||||
.collect::<Vec<String>>();
|
||||
|
||||
span.set_status(Status::Ok);
|
||||
// Favorites are library-agnostic (shared by rel_path), so we
|
||||
// intentionally leave photo_libraries empty to signal "no badge".
|
||||
HttpResponse::Ok().json(PhotosResponse {
|
||||
photos: favorites,
|
||||
dirs: Vec::new(),
|
||||
photo_libraries: Vec::new(),
|
||||
total_count: None,
|
||||
has_more: None,
|
||||
next_offset: None,
|
||||
})
|
||||
}
|
||||
Ok(Err(e)) => {
|
||||
span.set_status(Status::error(format!("Error getting favorites: {:?}", e)));
|
||||
error!("Error getting favorites: {:?}", e);
|
||||
HttpResponse::InternalServerError().finish()
|
||||
}
|
||||
Err(_) => HttpResponse::InternalServerError().finish(),
|
||||
}
|
||||
}
|
||||
|
||||
#[put("image/favorites")]
|
||||
pub async fn put_add_favorite(
|
||||
claims: Claims,
|
||||
body: web::Json<AddFavoriteRequest>,
|
||||
favorites_dao: Data<Mutex<Box<dyn FavoriteDao>>>,
|
||||
) -> impl Responder {
|
||||
if let Ok(user_id) = claims.sub.parse::<i32>() {
|
||||
let path = body.path.clone();
|
||||
match web::block::<_, Result<usize, DbError>>(move || {
|
||||
favorites_dao
|
||||
.lock()
|
||||
.expect("Unable to get FavoritesDao")
|
||||
.add_favorite(user_id, &path)
|
||||
})
|
||||
.await
|
||||
{
|
||||
Ok(Err(e)) if e.kind == DbErrorKind::AlreadyExists => {
|
||||
warn!("Favorite: {} exists for user: {}", &body.path, user_id);
|
||||
HttpResponse::Ok()
|
||||
}
|
||||
Ok(Err(e)) => {
|
||||
error!("{:?} {}. for user: {}", e, body.path, user_id);
|
||||
HttpResponse::BadRequest()
|
||||
}
|
||||
Ok(Ok(_)) => {
|
||||
info!("Adding favorite \"{}\" for userid: {}", body.path, user_id);
|
||||
HttpResponse::Created()
|
||||
}
|
||||
Err(e) => {
|
||||
error!("Blocking error while inserting favorite: {:?}", e);
|
||||
HttpResponse::InternalServerError()
|
||||
}
|
||||
}
|
||||
} else {
|
||||
error!("Unable to parse sub as i32: {}", claims.sub);
|
||||
HttpResponse::BadRequest()
|
||||
}
|
||||
}
|
||||
|
||||
#[delete("image/favorites")]
|
||||
pub async fn delete_favorite(
|
||||
claims: Claims,
|
||||
body: web::Query<AddFavoriteRequest>,
|
||||
favorites_dao: Data<Mutex<Box<dyn FavoriteDao>>>,
|
||||
) -> impl Responder {
|
||||
if let Ok(user_id) = claims.sub.parse::<i32>() {
|
||||
let path = body.path.clone();
|
||||
web::block(move || {
|
||||
favorites_dao
|
||||
.lock()
|
||||
.expect("Unable to get favorites dao")
|
||||
.remove_favorite(user_id, path);
|
||||
})
|
||||
.await
|
||||
.unwrap();
|
||||
|
||||
info!(
|
||||
"Removing favorite \"{}\" for userid: {}",
|
||||
body.path, user_id
|
||||
);
|
||||
HttpResponse::Ok()
|
||||
} else {
|
||||
error!("Unable to parse sub as i32: {}", claims.sub);
|
||||
HttpResponse::BadRequest()
|
||||
}
|
||||
}
|
||||
@@ -1,999 +0,0 @@
|
||||
//! `/image*` endpoints: image serving (with hash/library-scoped/bare
|
||||
//! legacy thumbnail lookup), upload, EXIF metadata read + GPS / date
|
||||
//! mutation, and the full exiftool dump used by Apollo's details modal.
|
||||
|
||||
use std::error::Error;
|
||||
use std::fs::File;
|
||||
use std::io::ErrorKind;
|
||||
use std::io::prelude::*;
|
||||
use std::path::{Path, PathBuf};
|
||||
use std::sync::Mutex;
|
||||
|
||||
use actix_files::NamedFile;
|
||||
use actix_multipart as mp;
|
||||
use actix_web::{
|
||||
HttpRequest, HttpResponse, Responder, get, post,
|
||||
web::{self, BufMut, BytesMut, Data},
|
||||
};
|
||||
use chrono::Utc;
|
||||
use futures::stream::StreamExt;
|
||||
use log::{debug, error, info, trace, warn};
|
||||
use opentelemetry::KeyValue;
|
||||
use opentelemetry::trace::{Span, Status, TraceContextExt, Tracer};
|
||||
use urlencoding::decode;
|
||||
|
||||
use crate::content_hash;
|
||||
use crate::data::{
|
||||
Claims, MetadataResponse, PhotoSize, ThumbnailFormat, ThumbnailRequest, ThumbnailShape,
|
||||
};
|
||||
use crate::database::models::{ImageExif, InsertImageExif};
|
||||
use crate::database::{DbErrorKind, ExifDao};
|
||||
use crate::date_resolver;
|
||||
use crate::exif;
|
||||
use crate::file_types;
|
||||
use crate::files::{RefreshThumbnailsMessage, is_image_or_video, is_valid_full_path};
|
||||
use crate::libraries;
|
||||
use crate::memories;
|
||||
use crate::otel::{extract_context_from_request, global_tracer};
|
||||
use crate::perceptual_hash;
|
||||
use crate::state::AppState;
|
||||
|
||||
#[get("/image")]
|
||||
pub async fn get_image(
|
||||
_claims: Claims,
|
||||
request: HttpRequest,
|
||||
req: web::Query<ThumbnailRequest>,
|
||||
app_state: Data<AppState>,
|
||||
exif_dao: Data<Mutex<Box<dyn ExifDao>>>,
|
||||
) -> impl Responder {
|
||||
let tracer = global_tracer();
|
||||
let context = extract_context_from_request(&request);
|
||||
|
||||
let mut span = tracer.start_with_context("get_image", &context);
|
||||
|
||||
// Resolve library from query param; default to primary so clients that
|
||||
// don't yet send `library=` continue to work.
|
||||
let library = match libraries::resolve_library_param(&app_state, req.library.as_deref()) {
|
||||
Ok(Some(lib)) => lib,
|
||||
Ok(None) => app_state.primary_library(),
|
||||
Err(msg) => {
|
||||
span.set_status(Status::error(msg.clone()));
|
||||
return HttpResponse::BadRequest().body(msg);
|
||||
}
|
||||
};
|
||||
|
||||
// Union-mode search returns flat rel_paths with no library attribution,
|
||||
// so clients may request a file under the wrong library. Try the
|
||||
// resolved library first; if the file isn't there, fall back to any
|
||||
// other library holding that rel_path on disk.
|
||||
let resolved = is_valid_full_path(&library.root_path, &req.path, false)
|
||||
.filter(|p| p.exists())
|
||||
.map(|p| (library, p))
|
||||
.or_else(|| {
|
||||
app_state.libraries.iter().find_map(|lib| {
|
||||
if lib.id == library.id {
|
||||
return None;
|
||||
}
|
||||
is_valid_full_path(&lib.root_path, &req.path, false)
|
||||
.filter(|p| p.exists())
|
||||
.map(|p| (lib, p))
|
||||
})
|
||||
});
|
||||
|
||||
if let Some((library, path)) = resolved {
|
||||
let image_size = req.size.unwrap_or(PhotoSize::Full);
|
||||
if image_size == PhotoSize::Thumb {
|
||||
let relative_path = path
|
||||
.strip_prefix(&library.root_path)
|
||||
.expect("Error stripping library root prefix from thumbnail");
|
||||
let relative_path_str = relative_path.to_string_lossy().replace('\\', "/");
|
||||
|
||||
let thumbs = &app_state.thumbnail_path;
|
||||
let bare_legacy_thumb_path = Path::new(&thumbs).join(relative_path);
|
||||
let scoped_legacy_thumb_path = content_hash::library_scoped_legacy_path(
|
||||
Path::new(&thumbs),
|
||||
library.id,
|
||||
relative_path,
|
||||
);
|
||||
|
||||
// Gif thumbnails are a separate lookup (video GIF previews).
|
||||
// Dual-lookup for gif is out of scope; preserve existing flow.
|
||||
if req.format == Some(ThumbnailFormat::Gif) && file_types::is_video_file(&path) {
|
||||
let mut gif_path = Path::new(&app_state.gif_path).join(relative_path);
|
||||
gif_path.set_extension("gif");
|
||||
trace!("Gif thumbnail path: {:?}", gif_path);
|
||||
if let Ok(file) = NamedFile::open(&gif_path) {
|
||||
span.set_status(Status::Ok);
|
||||
return file
|
||||
.use_etag(true)
|
||||
.use_last_modified(true)
|
||||
.prefer_utf8(true)
|
||||
.into_response(&request);
|
||||
}
|
||||
}
|
||||
|
||||
// Lookup chain (most-specific first, falling back as we miss):
|
||||
// 1. hash-keyed (`<thumbs>/<hash[..2]>/<hash>.jpg`) — content
|
||||
// identity, shared across libraries;
|
||||
// 2. library-scoped legacy (`<thumbs>/<lib_id>/<rel_path>`) —
|
||||
// written by current generation when hash isn't known;
|
||||
// 3. bare legacy (`<thumbs>/<rel_path>`) — pre-multi-library
|
||||
// thumbs from the days before library prefixing existed.
|
||||
// Stage (3) goes away once a one-time migration lifts every
|
||||
// bare-legacy file under a library prefix; until then it
|
||||
// prevents needless 404s for already-warmed deployments.
|
||||
let hash_thumb_path: Option<PathBuf> = {
|
||||
let mut dao = exif_dao.lock().expect("Unable to lock ExifDao");
|
||||
match dao.get_exif(&context, &relative_path_str) {
|
||||
Ok(Some(row)) => row
|
||||
.content_hash
|
||||
.as_deref()
|
||||
.map(|h| content_hash::thumbnail_path(Path::new(thumbs), h)),
|
||||
_ => None,
|
||||
}
|
||||
};
|
||||
let thumb_path = hash_thumb_path
|
||||
.as_ref()
|
||||
.filter(|p| p.exists())
|
||||
.cloned()
|
||||
.or_else(|| {
|
||||
if scoped_legacy_thumb_path.exists() {
|
||||
Some(scoped_legacy_thumb_path.clone())
|
||||
} else {
|
||||
None
|
||||
}
|
||||
})
|
||||
.unwrap_or_else(|| bare_legacy_thumb_path.clone());
|
||||
|
||||
// Handle circular thumbnail request
|
||||
if req.shape == Some(ThumbnailShape::Circle) {
|
||||
match create_circular_thumbnail(&thumb_path, thumbs).await {
|
||||
Ok(circular_path) => {
|
||||
if let Ok(file) = NamedFile::open(&circular_path) {
|
||||
span.set_status(Status::Ok);
|
||||
return file
|
||||
.use_etag(true)
|
||||
.use_last_modified(true)
|
||||
.prefer_utf8(true)
|
||||
.into_response(&request);
|
||||
}
|
||||
}
|
||||
Err(e) => {
|
||||
warn!("Failed to create circular thumbnail: {:?}", e);
|
||||
// Fall through to serve square thumbnail
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
trace!("Thumbnail path: {:?}", thumb_path);
|
||||
if let Ok(file) = NamedFile::open(&thumb_path) {
|
||||
span.set_status(Status::Ok);
|
||||
return file
|
||||
.use_etag(true)
|
||||
.use_last_modified(true)
|
||||
.prefer_utf8(true)
|
||||
.into_response(&request);
|
||||
}
|
||||
}
|
||||
|
||||
// Full-size requests for RAW formats (NEF/CR2/ARW/etc.) can't just
|
||||
// NamedFile-stream the original bytes — browsers won't decode the
|
||||
// RAW container, so a `<img src=...>` lands as a broken image. Serve
|
||||
// the embedded JPEG preview instead (typically the camera's in-body
|
||||
// review JPEG, ~1–2 MP). Falls through to NamedFile if no preview is
|
||||
// available, which preserves the historical behavior for callers
|
||||
// that genuinely want the original bytes.
|
||||
if image_size == PhotoSize::Full && exif::is_tiff_raw(&path) {
|
||||
if let Some(preview) = exif::extract_embedded_jpeg_preview(&path) {
|
||||
span.set_status(Status::Ok);
|
||||
return HttpResponse::Ok()
|
||||
.content_type("image/jpeg")
|
||||
.insert_header(("Cache-Control", "public, max-age=3600"))
|
||||
.body(preview);
|
||||
}
|
||||
}
|
||||
|
||||
if let Ok(file) = NamedFile::open(&path) {
|
||||
span.set_status(Status::Ok);
|
||||
// Enable ETag and set cache headers for full images (1 hour cache)
|
||||
return file
|
||||
.use_etag(true)
|
||||
.use_last_modified(true)
|
||||
.prefer_utf8(true)
|
||||
.into_response(&request);
|
||||
}
|
||||
|
||||
span.set_status(Status::error("Not found"));
|
||||
HttpResponse::NotFound().finish()
|
||||
} else {
|
||||
span.set_status(Status::error("Not found"));
|
||||
error!("Path does not exist in any library: {}", req.path);
|
||||
HttpResponse::NotFound().finish()
|
||||
}
|
||||
}
|
||||
|
||||
async fn create_circular_thumbnail(
|
||||
thumb_path: &Path,
|
||||
thumbs_dir: &str,
|
||||
) -> Result<PathBuf, Box<dyn Error>> {
|
||||
use image::{GenericImageView, ImageBuffer, Rgba};
|
||||
|
||||
// Create circular thumbnails directory
|
||||
let circular_dir = Path::new(thumbs_dir).join("_circular");
|
||||
|
||||
// Get relative path from thumbs_dir to create same structure
|
||||
let relative_to_thumbs = thumb_path.strip_prefix(thumbs_dir)?;
|
||||
let circular_path = circular_dir.join(relative_to_thumbs).with_extension("png");
|
||||
|
||||
// Check if circular thumbnail already exists
|
||||
if circular_path.exists() {
|
||||
return Ok(circular_path);
|
||||
}
|
||||
|
||||
// Create parent directory if needed
|
||||
if let Some(parent) = circular_path.parent() {
|
||||
std::fs::create_dir_all(parent)?;
|
||||
}
|
||||
|
||||
// Load the square thumbnail
|
||||
let img = image::open(thumb_path)?;
|
||||
let (width, height) = img.dimensions();
|
||||
|
||||
// Fixed output size for consistency
|
||||
let output_size = 80u32;
|
||||
let radius = output_size as f32 / 2.0;
|
||||
|
||||
// Calculate crop area to get square center of original image
|
||||
let crop_size = width.min(height);
|
||||
let crop_x = (width - crop_size) / 2;
|
||||
let crop_y = (height - crop_size) / 2;
|
||||
|
||||
// Create a new RGBA image with transparency
|
||||
let output = ImageBuffer::from_fn(output_size, output_size, |x, y| {
|
||||
let dx = x as f32 - radius;
|
||||
let dy = y as f32 - radius;
|
||||
let distance = (dx * dx + dy * dy).sqrt();
|
||||
|
||||
if distance <= radius {
|
||||
// Inside circle - map to cropped source area
|
||||
// Scale from output coordinates to crop coordinates
|
||||
let scale = crop_size as f32 / output_size as f32;
|
||||
let src_x = crop_x + (x as f32 * scale) as u32;
|
||||
let src_y = crop_y + (y as f32 * scale) as u32;
|
||||
let pixel = img.get_pixel(src_x, src_y);
|
||||
Rgba([pixel[0], pixel[1], pixel[2], 255])
|
||||
} else {
|
||||
// Outside circle - transparent
|
||||
Rgba([0, 0, 0, 0])
|
||||
}
|
||||
});
|
||||
|
||||
// Save as PNG (supports transparency)
|
||||
output.save(&circular_path)?;
|
||||
|
||||
Ok(circular_path)
|
||||
}
|
||||
|
||||
#[get("/image/metadata")]
|
||||
pub async fn get_file_metadata(
|
||||
_: Claims,
|
||||
request: HttpRequest,
|
||||
path: web::Query<ThumbnailRequest>,
|
||||
app_state: Data<AppState>,
|
||||
exif_dao: Data<Mutex<Box<dyn ExifDao>>>,
|
||||
) -> impl Responder {
|
||||
let tracer = global_tracer();
|
||||
let context = extract_context_from_request(&request);
|
||||
let mut span = tracer.start_with_context("get_file_metadata", &context);
|
||||
let span_context =
|
||||
opentelemetry::Context::new().with_remote_span_context(span.span_context().clone());
|
||||
|
||||
let library = libraries::resolve_library_param(&app_state, path.library.as_deref())
|
||||
.ok()
|
||||
.flatten()
|
||||
.unwrap_or_else(|| app_state.primary_library());
|
||||
|
||||
// Fall back to other libraries if the file isn't under the resolved one,
|
||||
// matching the `/image` handler so union-mode search results resolve.
|
||||
let resolved = is_valid_full_path(&library.root_path, &path.path, false)
|
||||
.filter(|p| p.exists())
|
||||
.map(|p| (library, p))
|
||||
.or_else(|| {
|
||||
app_state.libraries.iter().find_map(|lib| {
|
||||
if lib.id == library.id {
|
||||
return None;
|
||||
}
|
||||
is_valid_full_path(&lib.root_path, &path.path, false)
|
||||
.filter(|p| p.exists())
|
||||
.map(|p| (lib, p))
|
||||
})
|
||||
});
|
||||
|
||||
match resolved
|
||||
.ok_or_else(|| ErrorKind::InvalidData.into())
|
||||
.and_then(|(lib, full_path)| {
|
||||
File::open(&full_path)
|
||||
.and_then(|file| file.metadata())
|
||||
.map(|metadata| (lib, metadata))
|
||||
}) {
|
||||
Ok((resolved_library, metadata)) => {
|
||||
let mut response: MetadataResponse = metadata.into();
|
||||
response.library_id = Some(resolved_library.id);
|
||||
response.library_name = Some(resolved_library.name.clone());
|
||||
|
||||
// Extract date from filename if possible
|
||||
response.filename_date =
|
||||
memories::extract_date_from_filename(&path.path).map(|dt| dt.timestamp());
|
||||
|
||||
// Query EXIF data if available
|
||||
if let Ok(mut dao) = exif_dao.lock()
|
||||
&& let Ok(Some(exif)) = dao.get_exif(&span_context, &path.path)
|
||||
{
|
||||
response.exif = Some(exif.into());
|
||||
}
|
||||
|
||||
span.add_event(
|
||||
"Metadata fetched",
|
||||
vec![KeyValue::new("file", path.path.clone())],
|
||||
);
|
||||
span.set_status(Status::Ok);
|
||||
|
||||
HttpResponse::Ok().json(response)
|
||||
}
|
||||
Err(e) => {
|
||||
let message = format!("Error getting metadata for file '{}': {:?}", path.path, e);
|
||||
error!("{}", message);
|
||||
span.set_status(Status::error(message));
|
||||
|
||||
HttpResponse::InternalServerError().finish()
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Body for `POST /image/exif/gps` — write GPS coordinates into a file's
|
||||
/// EXIF in place. Only `path` + `latitude` + `longitude` are required.
|
||||
/// `library` is optional (falls back to the primary library) and matches
|
||||
/// the convention of the other path-keyed routes.
|
||||
#[derive(serde::Deserialize)]
|
||||
struct SetGpsRequest {
|
||||
path: String,
|
||||
library: Option<String>,
|
||||
latitude: f64,
|
||||
longitude: f64,
|
||||
}
|
||||
|
||||
#[post("/image/exif/gps")]
|
||||
pub async fn set_image_gps(
|
||||
_: Claims,
|
||||
request: HttpRequest,
|
||||
body: web::Json<SetGpsRequest>,
|
||||
app_state: Data<AppState>,
|
||||
exif_dao: Data<Mutex<Box<dyn ExifDao>>>,
|
||||
) -> impl Responder {
|
||||
let tracer = global_tracer();
|
||||
let context = extract_context_from_request(&request);
|
||||
let mut span = tracer.start_with_context("set_image_gps", &context);
|
||||
let span_context =
|
||||
opentelemetry::Context::new().with_remote_span_context(span.span_context().clone());
|
||||
|
||||
let library = libraries::resolve_library_param(&app_state, body.library.as_deref())
|
||||
.ok()
|
||||
.flatten()
|
||||
.unwrap_or_else(|| app_state.primary_library());
|
||||
|
||||
// Same fallback as get_file_metadata: union-mode means a file may
|
||||
// resolve under a sibling library.
|
||||
let resolved = is_valid_full_path(&library.root_path, &body.path, false)
|
||||
.filter(|p| p.exists())
|
||||
.map(|p| (library, p))
|
||||
.or_else(|| {
|
||||
app_state.libraries.iter().find_map(|lib| {
|
||||
if lib.id == library.id {
|
||||
return None;
|
||||
}
|
||||
is_valid_full_path(&lib.root_path, &body.path, false)
|
||||
.filter(|p| p.exists())
|
||||
.map(|p| (lib, p))
|
||||
})
|
||||
});
|
||||
|
||||
let (resolved_library, full_path) = match resolved {
|
||||
Some(v) => v,
|
||||
None => {
|
||||
span.set_status(Status::error("file not found"));
|
||||
return HttpResponse::NotFound().body("File not found");
|
||||
}
|
||||
};
|
||||
|
||||
if !exif::supports_exif(&full_path) {
|
||||
return HttpResponse::BadRequest().body("File format does not support EXIF GPS write");
|
||||
}
|
||||
|
||||
if let Err(e) = exif::write_gps(&full_path, body.latitude, body.longitude) {
|
||||
let msg = format!("exiftool write failed: {}", e);
|
||||
error!("{}", msg);
|
||||
span.set_status(Status::error(msg.clone()));
|
||||
return HttpResponse::InternalServerError().body(msg);
|
||||
}
|
||||
|
||||
// Re-read EXIF from disk (the write path doesn't tell us the rest of
|
||||
// the parsed fields back, and we want the DB row to match what
|
||||
// extract_exif_from_path would now produce). Update the existing row
|
||||
// rather than insert — this endpoint is invoked on already-indexed
|
||||
// files only.
|
||||
let extracted = match exif::extract_exif_from_path(&full_path) {
|
||||
Ok(d) => d,
|
||||
Err(e) => {
|
||||
// GPS was written successfully but re-extraction failed; surface
|
||||
// a 500 because the DB will now disagree with disk until the
|
||||
// next file scan rewrites it.
|
||||
let msg = format!("EXIF re-read failed after write: {}", e);
|
||||
error!("{}", msg);
|
||||
return HttpResponse::InternalServerError().body(msg);
|
||||
}
|
||||
};
|
||||
let now = Utc::now().timestamp();
|
||||
let normalized_path = body.path.replace('\\', "/");
|
||||
// Re-run the canonical-date waterfall on every GPS write — exiftool
|
||||
// writing GPS doesn't change the capture date, but if the row was
|
||||
// previously sourced from `fs_time` the re-read may have given us a
|
||||
// real EXIF date this time, and we want to upgrade the source.
|
||||
let resolved_date = date_resolver::resolve_date_taken(&full_path, extracted.date_taken);
|
||||
let insert_exif = InsertImageExif {
|
||||
library_id: resolved_library.id,
|
||||
file_path: normalized_path.clone(),
|
||||
camera_make: extracted.camera_make,
|
||||
camera_model: extracted.camera_model,
|
||||
lens_model: extracted.lens_model,
|
||||
width: extracted.width,
|
||||
height: extracted.height,
|
||||
orientation: extracted.orientation,
|
||||
gps_latitude: extracted.gps_latitude.map(|v| v as f32),
|
||||
gps_longitude: extracted.gps_longitude.map(|v| v as f32),
|
||||
gps_altitude: extracted.gps_altitude.map(|v| v as f32),
|
||||
focal_length: extracted.focal_length.map(|v| v as f32),
|
||||
aperture: extracted.aperture.map(|v| v as f32),
|
||||
shutter_speed: extracted.shutter_speed,
|
||||
iso: extracted.iso,
|
||||
date_taken: resolved_date.map(|r| r.timestamp),
|
||||
// Created_time is preserved by update_exif (it doesn't touch the
|
||||
// column); pass any int — it's ignored in the UPDATE statement.
|
||||
created_time: now,
|
||||
last_modified: now,
|
||||
// Hash + size aren't touched in update_exif either, but the file
|
||||
// bytes did change — best-effort recompute so the new hash lands
|
||||
// on the next call to get_exif. Failure here just leaves the old
|
||||
// values in place.
|
||||
content_hash: content_hash::compute(&full_path)
|
||||
.ok()
|
||||
.map(|c| c.content_hash),
|
||||
size_bytes: content_hash::compute(&full_path).ok().map(|c| c.size_bytes),
|
||||
// GPS-update path doesn't touch perceptual hashes either; columns
|
||||
// ignored by update_exif. Compute best-effort so a new file lands
|
||||
// with a usable signal; failure just leaves prior values in place.
|
||||
phash_64: perceptual_hash::compute(&full_path).map(|h| h.phash_64),
|
||||
dhash_64: perceptual_hash::compute(&full_path).map(|h| h.dhash_64),
|
||||
date_taken_source: resolved_date.map(|r| r.source.as_str().to_string()),
|
||||
};
|
||||
|
||||
let updated = {
|
||||
let mut dao = exif_dao.lock().expect("Unable to lock ExifDao");
|
||||
// If the row doesn't exist yet (file isn't indexed for some reason),
|
||||
// insert instead so the GPS write is at least visible the moment
|
||||
// the watcher catches up.
|
||||
match dao.get_exif(&span_context, &normalized_path) {
|
||||
Ok(Some(_)) => dao.update_exif(&span_context, insert_exif),
|
||||
Ok(None) => dao.store_exif(&span_context, insert_exif),
|
||||
Err(_) => dao.update_exif(&span_context, insert_exif),
|
||||
}
|
||||
};
|
||||
|
||||
match updated {
|
||||
Ok(row) => {
|
||||
// Mirror the file metadata so the client gets the new size /
|
||||
// mtime in the same response and can refresh its cached
|
||||
// metadata block in one round-trip.
|
||||
let fs_meta = std::fs::metadata(&full_path).ok();
|
||||
let mut response: MetadataResponse = match fs_meta {
|
||||
Some(m) => m.into(),
|
||||
None => MetadataResponse {
|
||||
created: None,
|
||||
modified: None,
|
||||
size: 0,
|
||||
exif: None,
|
||||
filename_date: None,
|
||||
library_id: None,
|
||||
library_name: None,
|
||||
},
|
||||
};
|
||||
response.exif = Some(row.into());
|
||||
response.library_id = Some(resolved_library.id);
|
||||
response.library_name = Some(resolved_library.name.clone());
|
||||
response.filename_date =
|
||||
memories::extract_date_from_filename(&body.path).map(|dt| dt.timestamp());
|
||||
span.set_status(Status::Ok);
|
||||
HttpResponse::Ok().json(response)
|
||||
}
|
||||
Err(e) => {
|
||||
let msg = format!("EXIF DB update failed: {:?}", e);
|
||||
error!("{}", msg);
|
||||
span.set_status(Status::error(msg.clone()));
|
||||
HttpResponse::InternalServerError().body(msg)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// `GET /image/exif/full?path=&library=` — full per-file EXIF dump via
|
||||
/// exiftool, for the DETAILS modal's "FULL EXIF" pane. Strictly richer
|
||||
/// than `/image/metadata`'s curated subset (every group exiftool can
|
||||
/// see: EXIF, File, MakerNotes, Composite, ICC_Profile, IPTC, …).
|
||||
///
|
||||
/// On-demand only — the watcher / indexer never calls this. Falls back
|
||||
/// to 503 when exiftool isn't installed (deployer guidance is the same
|
||||
/// as for the RAW preview pipeline: install exiftool for full coverage).
|
||||
#[get("/image/exif/full")]
|
||||
pub async fn get_full_exif(
|
||||
_: Claims,
|
||||
request: HttpRequest,
|
||||
path: web::Query<ThumbnailRequest>,
|
||||
app_state: Data<AppState>,
|
||||
) -> impl Responder {
|
||||
let tracer = global_tracer();
|
||||
let context = extract_context_from_request(&request);
|
||||
let mut span = tracer.start_with_context("get_full_exif", &context);
|
||||
|
||||
let library = libraries::resolve_library_param(&app_state, path.library.as_deref())
|
||||
.ok()
|
||||
.flatten()
|
||||
.unwrap_or_else(|| app_state.primary_library());
|
||||
|
||||
// Same union-mode fallback as get_file_metadata — the file may live
|
||||
// under a sibling library when the requested one's path resolves but
|
||||
// doesn't actually contain the bytes.
|
||||
let resolved = is_valid_full_path(&library.root_path, &path.path, false)
|
||||
.filter(|p| p.exists())
|
||||
.map(|p| (library, p))
|
||||
.or_else(|| {
|
||||
app_state.libraries.iter().find_map(|lib| {
|
||||
if lib.id == library.id {
|
||||
return None;
|
||||
}
|
||||
is_valid_full_path(&lib.root_path, &path.path, false)
|
||||
.filter(|p| p.exists())
|
||||
.map(|p| (lib, p))
|
||||
})
|
||||
});
|
||||
|
||||
let (resolved_library, full_path) = match resolved {
|
||||
Some(v) => v,
|
||||
None => {
|
||||
span.set_status(Status::error("file not found"));
|
||||
return HttpResponse::NotFound().body("File not found");
|
||||
}
|
||||
};
|
||||
|
||||
// exiftool spawn is blocking — keep it off the actix worker by
|
||||
// running on the blocking pool. ~50–200 ms typical for a JPEG;
|
||||
// longer for RAW with rich MakerNotes.
|
||||
let exif_result =
|
||||
web::block(move || crate::exif::read_full_exif_via_exiftool(&full_path)).await;
|
||||
|
||||
match exif_result {
|
||||
Ok(Ok(Some(tags))) => {
|
||||
span.set_status(Status::Ok);
|
||||
HttpResponse::Ok().json(serde_json::json!({
|
||||
"library_id": resolved_library.id,
|
||||
"library_name": resolved_library.name,
|
||||
"tags": tags,
|
||||
}))
|
||||
}
|
||||
Ok(Ok(None)) => {
|
||||
// exiftool ran but produced no output for this file — treat as
|
||||
// empty rather than an error so the modal renders "no tags"
|
||||
// gracefully.
|
||||
HttpResponse::Ok().json(serde_json::json!({
|
||||
"library_id": resolved_library.id,
|
||||
"library_name": resolved_library.name,
|
||||
"tags": serde_json::Value::Object(Default::default()),
|
||||
}))
|
||||
}
|
||||
Ok(Err(e)) => {
|
||||
let msg = format!("exiftool failed: {}", e);
|
||||
error!("{}", msg);
|
||||
span.set_status(Status::error(msg.clone()));
|
||||
// 503 — typically "exiftool isn't on PATH" or a transient spawn
|
||||
// failure. Apollo surfaces a hint in the modal.
|
||||
HttpResponse::ServiceUnavailable().body(msg)
|
||||
}
|
||||
Err(e) => {
|
||||
let msg = format!("blocking-pool error: {}", e);
|
||||
error!("{}", msg);
|
||||
span.set_status(Status::error(msg.clone()));
|
||||
HttpResponse::InternalServerError().body(msg)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Body for `POST /image/exif/date` — operator-driven date_taken override.
|
||||
/// `date_taken` is unix seconds (matches `image_exif.date_taken`'s convention
|
||||
/// — naive local reinterpreted as UTC, not real UTC; the Apollo client passes
|
||||
/// through the same value the photo carousel rendered before edit).
|
||||
#[derive(serde::Deserialize)]
|
||||
struct SetDateRequest {
|
||||
path: String,
|
||||
library: Option<String>,
|
||||
date_taken: i64,
|
||||
}
|
||||
|
||||
/// Body for `POST /image/exif/date/clear` — revert a manual override and
|
||||
/// restore the resolver-derived `(date_taken, date_taken_source)` pair from
|
||||
/// the snapshot.
|
||||
#[derive(serde::Deserialize)]
|
||||
struct ClearDateRequest {
|
||||
path: String,
|
||||
library: Option<String>,
|
||||
}
|
||||
|
||||
/// Build a `MetadataResponse` for the date endpoints. Mirrors
|
||||
/// `get_file_metadata`'s shape so the client gets a single source of truth
|
||||
/// after every mutation. Filesystem metadata is best-effort: if the file is
|
||||
/// on a stale mount or moved, the DB-side override still succeeds and the
|
||||
/// response carries `created=None, modified=None, size=0`. The DB row's
|
||||
/// updated EXIF is what matters here.
|
||||
fn build_metadata_response_for_date_mutation(
|
||||
library: &libraries::Library,
|
||||
rel_path: &str,
|
||||
exif: ImageExif,
|
||||
) -> MetadataResponse {
|
||||
let full_path = is_valid_full_path(&library.root_path, &rel_path.to_string(), false);
|
||||
let fs_meta = full_path
|
||||
.as_ref()
|
||||
.filter(|p| p.exists())
|
||||
.and_then(|p| std::fs::metadata(p).ok());
|
||||
let mut response: MetadataResponse = match fs_meta {
|
||||
Some(m) => m.into(),
|
||||
None => MetadataResponse {
|
||||
created: None,
|
||||
modified: None,
|
||||
size: 0,
|
||||
exif: None,
|
||||
filename_date: None,
|
||||
library_id: None,
|
||||
library_name: None,
|
||||
},
|
||||
};
|
||||
response.exif = Some(exif.into());
|
||||
response.library_id = Some(library.id);
|
||||
response.library_name = Some(library.name.clone());
|
||||
response.filename_date =
|
||||
memories::extract_date_from_filename(rel_path).map(|dt| dt.timestamp());
|
||||
response
|
||||
}
|
||||
|
||||
#[post("/image/exif/date")]
|
||||
pub async fn set_image_date(
|
||||
_: Claims,
|
||||
request: HttpRequest,
|
||||
body: web::Json<SetDateRequest>,
|
||||
app_state: Data<AppState>,
|
||||
exif_dao: Data<Mutex<Box<dyn ExifDao>>>,
|
||||
) -> impl Responder {
|
||||
let tracer = global_tracer();
|
||||
let context = extract_context_from_request(&request);
|
||||
let mut span = tracer.start_with_context("set_image_date", &context);
|
||||
let span_context =
|
||||
opentelemetry::Context::new().with_remote_span_context(span.span_context().clone());
|
||||
|
||||
let library = match libraries::resolve_library_param(&app_state, body.library.as_deref()) {
|
||||
Ok(Some(lib)) => lib,
|
||||
Ok(None) => app_state.primary_library(),
|
||||
Err(msg) => {
|
||||
span.set_status(Status::error(msg.clone()));
|
||||
return HttpResponse::BadRequest().body(msg);
|
||||
}
|
||||
};
|
||||
|
||||
// Path normalization matches set_image_gps so a Windows-import client
|
||||
// doesn't end up with a backslash variant that misses the row.
|
||||
let normalized_path = body.path.replace('\\', "/");
|
||||
|
||||
let updated = {
|
||||
let mut dao = exif_dao.lock().expect("Unable to lock ExifDao");
|
||||
dao.set_manual_date_taken(&span_context, library.id, &normalized_path, body.date_taken)
|
||||
};
|
||||
|
||||
match updated {
|
||||
Ok(row) => {
|
||||
span.set_status(Status::Ok);
|
||||
HttpResponse::Ok().json(build_metadata_response_for_date_mutation(
|
||||
&library,
|
||||
&normalized_path,
|
||||
row,
|
||||
))
|
||||
}
|
||||
Err(e) => {
|
||||
let msg = format!("set_manual_date_taken failed: {:?}", e);
|
||||
error!("{}", msg);
|
||||
span.set_status(Status::error(msg.clone()));
|
||||
match e.kind {
|
||||
DbErrorKind::NotFound => HttpResponse::NotFound().body(msg),
|
||||
_ => HttpResponse::InternalServerError().body(msg),
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
#[post("/image/exif/date/clear")]
|
||||
pub async fn clear_image_date(
|
||||
_: Claims,
|
||||
request: HttpRequest,
|
||||
body: web::Json<ClearDateRequest>,
|
||||
app_state: Data<AppState>,
|
||||
exif_dao: Data<Mutex<Box<dyn ExifDao>>>,
|
||||
) -> impl Responder {
|
||||
let tracer = global_tracer();
|
||||
let context = extract_context_from_request(&request);
|
||||
let mut span = tracer.start_with_context("clear_image_date", &context);
|
||||
let span_context =
|
||||
opentelemetry::Context::new().with_remote_span_context(span.span_context().clone());
|
||||
|
||||
let library = match libraries::resolve_library_param(&app_state, body.library.as_deref()) {
|
||||
Ok(Some(lib)) => lib,
|
||||
Ok(None) => app_state.primary_library(),
|
||||
Err(msg) => {
|
||||
span.set_status(Status::error(msg.clone()));
|
||||
return HttpResponse::BadRequest().body(msg);
|
||||
}
|
||||
};
|
||||
|
||||
let normalized_path = body.path.replace('\\', "/");
|
||||
|
||||
let updated = {
|
||||
let mut dao = exif_dao.lock().expect("Unable to lock ExifDao");
|
||||
dao.clear_manual_date_taken(&span_context, library.id, &normalized_path)
|
||||
};
|
||||
|
||||
match updated {
|
||||
Ok(row) => {
|
||||
span.set_status(Status::Ok);
|
||||
HttpResponse::Ok().json(build_metadata_response_for_date_mutation(
|
||||
&library,
|
||||
&normalized_path,
|
||||
row,
|
||||
))
|
||||
}
|
||||
Err(e) => {
|
||||
let msg = format!("clear_manual_date_taken failed: {:?}", e);
|
||||
error!("{}", msg);
|
||||
span.set_status(Status::error(msg.clone()));
|
||||
match e.kind {
|
||||
DbErrorKind::NotFound => HttpResponse::NotFound().body(msg),
|
||||
_ => HttpResponse::InternalServerError().body(msg),
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
#[derive(serde::Deserialize)]
|
||||
struct UploadQuery {
|
||||
library: Option<String>,
|
||||
}
|
||||
|
||||
#[post("/image")]
|
||||
pub async fn upload_image(
|
||||
_: Claims,
|
||||
request: HttpRequest,
|
||||
query: web::Query<UploadQuery>,
|
||||
mut payload: mp::Multipart,
|
||||
app_state: Data<AppState>,
|
||||
exif_dao: Data<Mutex<Box<dyn ExifDao>>>,
|
||||
) -> impl Responder {
|
||||
let tracer = global_tracer();
|
||||
let context = extract_context_from_request(&request);
|
||||
let mut span = tracer.start_with_context("upload_image", &context);
|
||||
let span_context =
|
||||
opentelemetry::Context::new().with_remote_span_context(span.span_context().clone());
|
||||
|
||||
// Resolve the optional library selector. Absent → primary library
|
||||
// (backwards-compatible with clients that don't yet send `library=`).
|
||||
let target_library =
|
||||
match libraries::resolve_library_param(&app_state, query.library.as_deref()) {
|
||||
Ok(Some(lib)) => lib,
|
||||
Ok(None) => app_state.primary_library(),
|
||||
Err(msg) => {
|
||||
span.set_status(Status::error(msg.clone()));
|
||||
return HttpResponse::BadRequest().body(msg);
|
||||
}
|
||||
};
|
||||
|
||||
let mut file_content: BytesMut = BytesMut::new();
|
||||
let mut file_name: Option<String> = None;
|
||||
let mut file_path: Option<String> = None;
|
||||
|
||||
while let Some(Ok(mut part)) = payload.next().await {
|
||||
if let Some(content_type) = part.content_disposition() {
|
||||
debug!("{:?}", content_type);
|
||||
if let Some(filename) = content_type.get_filename() {
|
||||
debug!("Name (raw): {:?}", filename);
|
||||
// Decode URL-encoded filename (e.g., "file%20name.jpg" -> "file name.jpg")
|
||||
let decoded_filename = decode(filename)
|
||||
.map(|s| s.to_string())
|
||||
.unwrap_or_else(|_| filename.to_string());
|
||||
debug!("Name (decoded): {:?}", decoded_filename);
|
||||
file_name = Some(decoded_filename);
|
||||
|
||||
while let Some(Ok(data)) = part.next().await {
|
||||
file_content.put(data);
|
||||
}
|
||||
} else if content_type.get_name() == Some("path") {
|
||||
while let Some(Ok(data)) = part.next().await {
|
||||
if let Ok(path) = std::str::from_utf8(&data) {
|
||||
file_path = Some(path.to_string())
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
let path = file_path.unwrap_or_else(|| target_library.root_path.clone());
|
||||
if !file_content.is_empty() {
|
||||
if file_name.is_none() {
|
||||
span.set_status(Status::error("No filename provided"));
|
||||
return HttpResponse::BadRequest().body("No filename provided");
|
||||
}
|
||||
let full_path = PathBuf::from(&path).join(file_name.unwrap());
|
||||
if let Some(full_path) = is_valid_full_path(
|
||||
&target_library.root_path,
|
||||
&full_path.to_str().unwrap().to_string(),
|
||||
true,
|
||||
) {
|
||||
// Pre-write content-hash check: if these exact bytes already
|
||||
// exist anywhere in any library (and aren't themselves
|
||||
// soft-marked as duplicates), don't write the file. Return
|
||||
// 409 with the canonical sibling so the mobile app can show
|
||||
// a friendly "already in your library" toast.
|
||||
let upload_hash = blake3::Hasher::new()
|
||||
.update(&file_content)
|
||||
.finalize()
|
||||
.to_hex()
|
||||
.to_string();
|
||||
{
|
||||
let mut dao = exif_dao.lock().expect("Unable to lock ExifDao");
|
||||
if let Ok(Some(existing)) = dao.find_by_content_hash(&span_context, &upload_hash)
|
||||
&& existing.duplicate_of_hash.is_none()
|
||||
{
|
||||
let library_name = libraries::load_all(&mut crate::database::connect())
|
||||
.into_iter()
|
||||
.find(|l| l.id == existing.library_id)
|
||||
.map(|l| l.name);
|
||||
span.set_status(Status::Ok);
|
||||
return HttpResponse::Conflict().json(serde_json::json!({
|
||||
"duplicate_of": {
|
||||
"library_id": existing.library_id,
|
||||
"rel_path": existing.file_path,
|
||||
},
|
||||
"content_hash": upload_hash,
|
||||
"library_name": library_name,
|
||||
}));
|
||||
}
|
||||
}
|
||||
|
||||
let context =
|
||||
opentelemetry::Context::new().with_remote_span_context(span.span_context().clone());
|
||||
tracer
|
||||
.span_builder("file write")
|
||||
.start_with_context(&tracer, &context);
|
||||
|
||||
let uploaded_path = if !full_path.is_file() && is_image_or_video(&full_path) {
|
||||
let mut file = File::create(&full_path).unwrap();
|
||||
file.write_all(&file_content).unwrap();
|
||||
|
||||
info!("Uploaded: {:?}", full_path);
|
||||
full_path
|
||||
} else {
|
||||
warn!("File already exists: {:?}", full_path);
|
||||
|
||||
let new_path = format!(
|
||||
"{}/{}_{}.{}",
|
||||
full_path.parent().unwrap().to_str().unwrap(),
|
||||
full_path.file_stem().unwrap().to_str().unwrap(),
|
||||
Utc::now().timestamp(),
|
||||
full_path
|
||||
.extension()
|
||||
.expect("Uploaded file should have an extension")
|
||||
.to_str()
|
||||
.unwrap()
|
||||
);
|
||||
info!("Uploaded: {}", new_path);
|
||||
|
||||
let new_path_buf = PathBuf::from(&new_path);
|
||||
let mut file = File::create(&new_path_buf).unwrap();
|
||||
file.write_all(&file_content).unwrap();
|
||||
new_path_buf
|
||||
};
|
||||
|
||||
// Extract and store EXIF data if file supports it
|
||||
if exif::supports_exif(&uploaded_path) {
|
||||
let relative_path = uploaded_path
|
||||
.strip_prefix(&target_library.root_path)
|
||||
.expect("Error stripping library root prefix")
|
||||
.to_str()
|
||||
.unwrap()
|
||||
.replace('\\', "/");
|
||||
|
||||
match exif::extract_exif_from_path(&uploaded_path) {
|
||||
Ok(exif_data) => {
|
||||
let timestamp = Utc::now().timestamp();
|
||||
let (content_hash, size_bytes) = match content_hash::compute(&uploaded_path)
|
||||
{
|
||||
Ok(id) => (Some(id.content_hash), Some(id.size_bytes)),
|
||||
Err(e) => {
|
||||
warn!(
|
||||
"Failed to hash uploaded {}: {:?}",
|
||||
uploaded_path.display(),
|
||||
e
|
||||
);
|
||||
(None, None)
|
||||
}
|
||||
};
|
||||
let perceptual = perceptual_hash::compute(&uploaded_path);
|
||||
let resolved_date =
|
||||
date_resolver::resolve_date_taken(&uploaded_path, exif_data.date_taken);
|
||||
let insert_exif = InsertImageExif {
|
||||
library_id: target_library.id,
|
||||
file_path: relative_path.clone(),
|
||||
camera_make: exif_data.camera_make,
|
||||
camera_model: exif_data.camera_model,
|
||||
lens_model: exif_data.lens_model,
|
||||
width: exif_data.width,
|
||||
height: exif_data.height,
|
||||
orientation: exif_data.orientation,
|
||||
gps_latitude: exif_data.gps_latitude.map(|v| v as f32),
|
||||
gps_longitude: exif_data.gps_longitude.map(|v| v as f32),
|
||||
gps_altitude: exif_data.gps_altitude.map(|v| v as f32),
|
||||
focal_length: exif_data.focal_length.map(|v| v as f32),
|
||||
aperture: exif_data.aperture.map(|v| v as f32),
|
||||
shutter_speed: exif_data.shutter_speed,
|
||||
iso: exif_data.iso,
|
||||
date_taken: resolved_date.map(|r| r.timestamp),
|
||||
created_time: timestamp,
|
||||
last_modified: timestamp,
|
||||
content_hash,
|
||||
size_bytes,
|
||||
phash_64: perceptual.map(|h| h.phash_64),
|
||||
dhash_64: perceptual.map(|h| h.dhash_64),
|
||||
date_taken_source: resolved_date.map(|r| r.source.as_str().to_string()),
|
||||
};
|
||||
|
||||
if let Ok(mut dao) = exif_dao.lock() {
|
||||
if let Err(e) = dao.store_exif(&span_context, insert_exif) {
|
||||
error!("Failed to store EXIF data for {}: {:?}", relative_path, e);
|
||||
} else {
|
||||
debug!("EXIF data stored for {}", relative_path);
|
||||
}
|
||||
}
|
||||
}
|
||||
Err(e) => {
|
||||
debug!(
|
||||
"No EXIF data or error extracting from {}: {:?}",
|
||||
uploaded_path.display(),
|
||||
e
|
||||
);
|
||||
}
|
||||
}
|
||||
}
|
||||
} else {
|
||||
error!("Invalid path for upload: {:?}", full_path);
|
||||
span.set_status(Status::error("Invalid path for upload"));
|
||||
return HttpResponse::BadRequest().body("Path was not valid");
|
||||
}
|
||||
} else {
|
||||
span.set_status(Status::error("No file body read"));
|
||||
return HttpResponse::BadRequest().body("No file body read");
|
||||
}
|
||||
|
||||
app_state.stream_manager.do_send(RefreshThumbnailsMessage);
|
||||
span.set_status(Status::Ok);
|
||||
|
||||
HttpResponse::Ok().finish()
|
||||
}
|
||||
@@ -1,9 +0,0 @@
|
||||
//! HTTP route handlers, grouped by domain.
|
||||
//!
|
||||
//! These were previously inlined in `main.rs`; moving them out keeps
|
||||
//! `main()` focused on startup wiring and makes each domain
|
||||
//! independently testable with `actix_web::test::init_service`.
|
||||
|
||||
pub mod favorites;
|
||||
pub mod image;
|
||||
pub mod video;
|
||||
@@ -1,665 +0,0 @@
|
||||
//! Video-related endpoints: HLS playlist generation, segment streaming,
|
||||
//! and the short-clip preview pipeline.
|
||||
|
||||
use std::collections::HashMap;
|
||||
use std::path::PathBuf;
|
||||
use std::sync::Mutex;
|
||||
|
||||
use actix_files::NamedFile;
|
||||
use actix_web::{
|
||||
HttpRequest, HttpResponse, Responder, get, post,
|
||||
web::{self, Data},
|
||||
};
|
||||
use log::{debug, error, info, warn};
|
||||
use opentelemetry::trace::{Span, Status, Tracer};
|
||||
use opentelemetry::{KeyValue, global};
|
||||
|
||||
use crate::data::{
|
||||
Claims, PreviewClipRequest, PreviewStatusItem, PreviewStatusRequest, PreviewStatusResponse,
|
||||
ThumbnailRequest,
|
||||
};
|
||||
use crate::database::PreviewDao;
|
||||
use crate::files::is_valid_full_path;
|
||||
use crate::libraries;
|
||||
use crate::otel::{extract_context_from_request, global_tracer};
|
||||
use crate::state::AppState;
|
||||
use crate::video::actors::{GeneratePreviewClipMessage, ProcessMessage, create_playlist};
|
||||
|
||||
#[post("/video/generate")]
|
||||
pub async fn generate_video(
|
||||
_claims: Claims,
|
||||
request: HttpRequest,
|
||||
app_state: Data<AppState>,
|
||||
body: web::Json<ThumbnailRequest>,
|
||||
) -> impl Responder {
|
||||
let tracer = global_tracer();
|
||||
|
||||
let context = extract_context_from_request(&request);
|
||||
let mut span = tracer.start_with_context("generate_video", &context);
|
||||
|
||||
let filename = PathBuf::from(&body.path);
|
||||
|
||||
if let Some(name) = filename.file_name() {
|
||||
let filename = name.to_str().expect("Filename should convert to string");
|
||||
// KNOWN ISSUE (multi-library): playlist filename is the basename
|
||||
// alone, so two source files with the same basename — whether in
|
||||
// different libraries or different subdirs of one library —
|
||||
// overwrite each other's playlists while ffmpeg runs. The
|
||||
// hash-keyed `content_hash::hls_dir` is the long-term answer
|
||||
// (see CLAUDE.md "Multi-library data model"); rewiring the
|
||||
// actor pipeline to use it is out of scope for this branch.
|
||||
// The orphan-cleanup job above already walks every library so
|
||||
// it doesn't false-delete archive playlists.
|
||||
let playlist = format!("{}/{}.m3u8", app_state.video_path, filename);
|
||||
|
||||
let library = libraries::resolve_library_param(&app_state, body.library.as_deref())
|
||||
.ok()
|
||||
.flatten()
|
||||
.unwrap_or_else(|| app_state.primary_library());
|
||||
|
||||
// Try the resolved library first, then fall back to any other library
|
||||
// that actually contains the file — handles union-mode requests where
|
||||
// the mobile client passes no library but the file lives in a
|
||||
// non-primary library.
|
||||
let resolved = is_valid_full_path(&library.root_path, &body.path, false)
|
||||
.filter(|p| p.exists())
|
||||
.or_else(|| {
|
||||
app_state.libraries.iter().find_map(|lib| {
|
||||
if lib.id == library.id {
|
||||
return None;
|
||||
}
|
||||
is_valid_full_path(&lib.root_path, &body.path, false).filter(|p| p.exists())
|
||||
})
|
||||
});
|
||||
|
||||
if let Some(path) = resolved {
|
||||
if let Ok(child) = create_playlist(path.to_str().unwrap(), &playlist).await {
|
||||
span.add_event(
|
||||
"playlist_created".to_string(),
|
||||
vec![KeyValue::new("playlist-name", filename.to_string())],
|
||||
);
|
||||
|
||||
span.set_status(Status::Ok);
|
||||
app_state.stream_manager.do_send(ProcessMessage(
|
||||
playlist.clone(),
|
||||
child,
|
||||
// opentelemetry::Context::new().with_span(span),
|
||||
));
|
||||
}
|
||||
} else {
|
||||
span.set_status(Status::error(format!("invalid path {:?}", &body.path)));
|
||||
return HttpResponse::BadRequest().finish();
|
||||
}
|
||||
|
||||
HttpResponse::Ok().json(playlist)
|
||||
} else {
|
||||
let message = format!("Unable to get file name: {:?}", filename);
|
||||
error!("{}", message);
|
||||
span.set_status(Status::error(message));
|
||||
|
||||
HttpResponse::BadRequest().finish()
|
||||
}
|
||||
}
|
||||
|
||||
#[get("/video/stream")]
|
||||
pub async fn stream_video(
|
||||
request: HttpRequest,
|
||||
_: Claims,
|
||||
path: web::Query<ThumbnailRequest>,
|
||||
app_state: Data<AppState>,
|
||||
) -> impl Responder {
|
||||
let tracer = global::tracer("image-server");
|
||||
let context = extract_context_from_request(&request);
|
||||
let mut span = tracer.start_with_context("stream_video", &context);
|
||||
|
||||
let playlist = &path.path;
|
||||
debug!("Playlist: {}", playlist);
|
||||
|
||||
// Only serve files under video_path (HLS playlists) or base_path (source videos)
|
||||
if playlist.starts_with(&app_state.video_path)
|
||||
|| is_valid_full_path(&app_state.base_path, playlist, false).is_some()
|
||||
{
|
||||
match NamedFile::open(playlist) {
|
||||
Ok(file) => {
|
||||
span.set_status(Status::Ok);
|
||||
file.into_response(&request)
|
||||
}
|
||||
_ => {
|
||||
span.set_status(Status::error(format!("playlist not found {}", playlist)));
|
||||
HttpResponse::NotFound().finish()
|
||||
}
|
||||
}
|
||||
} else {
|
||||
span.set_status(Status::error(format!("playlist not valid {}", playlist)));
|
||||
HttpResponse::BadRequest().finish()
|
||||
}
|
||||
}
|
||||
|
||||
#[get("/video/{path}")]
|
||||
pub async fn get_video_part(
|
||||
request: HttpRequest,
|
||||
_: Claims,
|
||||
path: web::Path<ThumbnailRequest>,
|
||||
app_state: Data<AppState>,
|
||||
) -> impl Responder {
|
||||
let tracer = global_tracer();
|
||||
let context = extract_context_from_request(&request);
|
||||
let mut span = tracer.start_with_context("get_video_part", &context);
|
||||
|
||||
let part = &path.path;
|
||||
debug!("Video part: {}", part);
|
||||
|
||||
let mut file_part = PathBuf::new();
|
||||
file_part.push(app_state.video_path.clone());
|
||||
file_part.push(part);
|
||||
|
||||
// Guard against directory traversal attacks
|
||||
let canonical_base = match std::fs::canonicalize(&app_state.video_path) {
|
||||
Ok(path) => path,
|
||||
Err(e) => {
|
||||
error!("Failed to canonicalize video path: {:?}", e);
|
||||
span.set_status(Status::error("Invalid video path configuration"));
|
||||
return HttpResponse::InternalServerError().finish();
|
||||
}
|
||||
};
|
||||
|
||||
let canonical_file = match std::fs::canonicalize(&file_part) {
|
||||
Ok(path) => path,
|
||||
Err(_) => {
|
||||
warn!("Video part not found or invalid: {:?}", file_part);
|
||||
span.set_status(Status::error(format!("Video part not found '{}'", part)));
|
||||
return HttpResponse::NotFound().finish();
|
||||
}
|
||||
};
|
||||
|
||||
// Ensure the resolved path is still within the video directory
|
||||
if !canonical_file.starts_with(&canonical_base) {
|
||||
warn!("Directory traversal attempt detected: {:?}", part);
|
||||
span.set_status(Status::error("Invalid video path"));
|
||||
return HttpResponse::Forbidden().finish();
|
||||
}
|
||||
|
||||
match NamedFile::open(&canonical_file) {
|
||||
Ok(file) => {
|
||||
span.set_status(Status::Ok);
|
||||
file.into_response(&request)
|
||||
}
|
||||
_ => {
|
||||
error!("Video part not found: {:?}", file_part);
|
||||
span.set_status(Status::error(format!(
|
||||
"Video part not found '{}'",
|
||||
file_part.to_str().unwrap()
|
||||
)));
|
||||
HttpResponse::NotFound().finish()
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
#[get("/video/preview")]
|
||||
pub async fn get_video_preview(
|
||||
_claims: Claims,
|
||||
request: HttpRequest,
|
||||
req: web::Query<PreviewClipRequest>,
|
||||
app_state: Data<AppState>,
|
||||
preview_dao: Data<Mutex<Box<dyn PreviewDao>>>,
|
||||
) -> impl Responder {
|
||||
let tracer = global_tracer();
|
||||
let context = extract_context_from_request(&request);
|
||||
let mut span = tracer.start_with_context("get_video_preview", &context);
|
||||
|
||||
// Validate path
|
||||
let full_path = match is_valid_full_path(&app_state.base_path, &req.path, true) {
|
||||
Some(path) => path,
|
||||
None => {
|
||||
span.set_status(Status::error("Invalid path"));
|
||||
return HttpResponse::BadRequest().json(serde_json::json!({"error": "Invalid path"}));
|
||||
}
|
||||
};
|
||||
|
||||
let full_path_str = full_path.to_string_lossy().to_string();
|
||||
|
||||
// Use relative path (from BASE_PATH) for DB storage, consistent with EXIF convention
|
||||
let relative_path = full_path_str
|
||||
.strip_prefix(&app_state.base_path)
|
||||
.unwrap_or(&full_path_str)
|
||||
.trim_start_matches(['/', '\\'])
|
||||
.to_string();
|
||||
|
||||
// Check preview status in DB
|
||||
let preview = {
|
||||
let mut dao = preview_dao.lock().expect("Unable to lock PreviewDao");
|
||||
dao.get_preview(&context, &relative_path)
|
||||
};
|
||||
|
||||
match preview {
|
||||
Ok(Some(clip)) => match clip.status.as_str() {
|
||||
"complete" => {
|
||||
let preview_path = PathBuf::from(&app_state.preview_clips_path)
|
||||
.join(&relative_path)
|
||||
.with_extension("mp4");
|
||||
|
||||
match NamedFile::open(&preview_path) {
|
||||
Ok(file) => {
|
||||
span.set_status(Status::Ok);
|
||||
file.into_response(&request)
|
||||
}
|
||||
Err(_) => {
|
||||
// File missing on disk but DB says complete - reset and regenerate
|
||||
let mut dao = preview_dao.lock().expect("Unable to lock PreviewDao");
|
||||
let _ = dao.update_status(
|
||||
&context,
|
||||
&relative_path,
|
||||
"pending",
|
||||
None,
|
||||
None,
|
||||
None,
|
||||
);
|
||||
app_state
|
||||
.preview_clip_generator
|
||||
.do_send(GeneratePreviewClipMessage {
|
||||
video_path: full_path_str,
|
||||
});
|
||||
span.set_status(Status::Ok);
|
||||
HttpResponse::Accepted().json(serde_json::json!({
|
||||
"status": "processing",
|
||||
"path": req.path
|
||||
}))
|
||||
}
|
||||
}
|
||||
}
|
||||
"processing" => {
|
||||
span.set_status(Status::Ok);
|
||||
HttpResponse::Accepted().json(serde_json::json!({
|
||||
"status": "processing",
|
||||
"path": req.path
|
||||
}))
|
||||
}
|
||||
"failed" => {
|
||||
let error_msg = clip
|
||||
.error_message
|
||||
.unwrap_or_else(|| "Unknown error".to_string());
|
||||
span.set_status(Status::error(format!("Generation failed: {}", error_msg)));
|
||||
HttpResponse::InternalServerError().json(serde_json::json!({
|
||||
"error": format!("Generation failed: {}", error_msg)
|
||||
}))
|
||||
}
|
||||
_ => {
|
||||
// pending or unknown status - trigger generation
|
||||
app_state
|
||||
.preview_clip_generator
|
||||
.do_send(GeneratePreviewClipMessage {
|
||||
video_path: full_path_str,
|
||||
});
|
||||
span.set_status(Status::Ok);
|
||||
HttpResponse::Accepted().json(serde_json::json!({
|
||||
"status": "processing",
|
||||
"path": req.path
|
||||
}))
|
||||
}
|
||||
},
|
||||
Ok(None) => {
|
||||
// No record exists - insert as pending and trigger generation
|
||||
{
|
||||
let mut dao = preview_dao.lock().expect("Unable to lock PreviewDao");
|
||||
let _ = dao.insert_preview(&context, &relative_path, "pending");
|
||||
}
|
||||
app_state
|
||||
.preview_clip_generator
|
||||
.do_send(GeneratePreviewClipMessage {
|
||||
video_path: full_path_str,
|
||||
});
|
||||
span.set_status(Status::Ok);
|
||||
HttpResponse::Accepted().json(serde_json::json!({
|
||||
"status": "processing",
|
||||
"path": req.path
|
||||
}))
|
||||
}
|
||||
Err(_) => {
|
||||
span.set_status(Status::error("Database error"));
|
||||
HttpResponse::InternalServerError().json(serde_json::json!({"error": "Database error"}))
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
#[post("/video/preview/status")]
|
||||
pub async fn get_preview_status(
|
||||
_claims: Claims,
|
||||
request: HttpRequest,
|
||||
body: web::Json<PreviewStatusRequest>,
|
||||
app_state: Data<AppState>,
|
||||
preview_dao: Data<Mutex<Box<dyn PreviewDao>>>,
|
||||
) -> impl Responder {
|
||||
let tracer = global_tracer();
|
||||
let context = extract_context_from_request(&request);
|
||||
let mut span = tracer.start_with_context("get_preview_status", &context);
|
||||
|
||||
// Limit to 200 paths per request
|
||||
if body.paths.len() > 200 {
|
||||
span.set_status(Status::error("Too many paths"));
|
||||
return HttpResponse::BadRequest()
|
||||
.json(serde_json::json!({"error": "Maximum 200 paths per request"}));
|
||||
}
|
||||
|
||||
let previews = {
|
||||
let mut dao = preview_dao.lock().expect("Unable to lock PreviewDao");
|
||||
dao.get_previews_batch(&context, &body.paths)
|
||||
};
|
||||
|
||||
match previews {
|
||||
Ok(clips) => {
|
||||
// Build a map of file_path -> VideoPreviewClip for quick lookup
|
||||
let clip_map: HashMap<String, _> = clips
|
||||
.into_iter()
|
||||
.map(|clip| (clip.file_path.clone(), clip))
|
||||
.collect();
|
||||
|
||||
let mut items: Vec<PreviewStatusItem> = Vec::with_capacity(body.paths.len());
|
||||
|
||||
for path in &body.paths {
|
||||
if let Some(clip) = clip_map.get(path) {
|
||||
// Re-queue generation for stale pending/failed records
|
||||
if clip.status == "pending" || clip.status == "failed" {
|
||||
let full_path = format!(
|
||||
"{}/{}",
|
||||
app_state.base_path.trim_end_matches(['/', '\\']),
|
||||
path.trim_start_matches(['/', '\\'])
|
||||
);
|
||||
app_state
|
||||
.preview_clip_generator
|
||||
.do_send(GeneratePreviewClipMessage {
|
||||
video_path: full_path,
|
||||
});
|
||||
}
|
||||
|
||||
items.push(PreviewStatusItem {
|
||||
path: path.clone(),
|
||||
status: clip.status.clone(),
|
||||
preview_url: if clip.status == "complete" {
|
||||
Some(format!("/video/preview?path={}", urlencoding::encode(path)))
|
||||
} else {
|
||||
None
|
||||
},
|
||||
});
|
||||
} else {
|
||||
// No record exists — insert as pending and trigger generation
|
||||
{
|
||||
let mut dao = preview_dao.lock().expect("Unable to lock PreviewDao");
|
||||
let _ = dao.insert_preview(&context, path, "pending");
|
||||
}
|
||||
|
||||
// Build full path for ffmpeg (actor needs the absolute path for input)
|
||||
let full_path = format!(
|
||||
"{}/{}",
|
||||
app_state.base_path.trim_end_matches(['/', '\\']),
|
||||
path.trim_start_matches(['/', '\\'])
|
||||
);
|
||||
|
||||
info!("Triggering preview generation for '{}'", path);
|
||||
app_state
|
||||
.preview_clip_generator
|
||||
.do_send(GeneratePreviewClipMessage {
|
||||
video_path: full_path,
|
||||
});
|
||||
|
||||
items.push(PreviewStatusItem {
|
||||
path: path.clone(),
|
||||
status: "pending".to_string(),
|
||||
preview_url: None,
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
span.set_status(Status::Ok);
|
||||
HttpResponse::Ok().json(PreviewStatusResponse { previews: items })
|
||||
}
|
||||
Err(_) => {
|
||||
span.set_status(Status::error("Database error"));
|
||||
HttpResponse::InternalServerError().json(serde_json::json!({"error": "Database error"}))
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
use crate::data::Claims;
|
||||
use crate::database::PreviewDao;
|
||||
use crate::testhelpers::TestPreviewDao;
|
||||
use actix_web::App;
|
||||
|
||||
fn make_token() -> String {
|
||||
let claims = Claims::valid_user("1".to_string());
|
||||
jsonwebtoken::encode(
|
||||
&jsonwebtoken::Header::default(),
|
||||
&claims,
|
||||
&jsonwebtoken::EncodingKey::from_secret(b"test_key"),
|
||||
)
|
||||
.unwrap()
|
||||
}
|
||||
|
||||
fn make_preview_dao(dao: TestPreviewDao) -> Data<Mutex<Box<dyn PreviewDao>>> {
|
||||
Data::new(Mutex::new(Box::new(dao) as Box<dyn PreviewDao>))
|
||||
}
|
||||
|
||||
#[actix_rt::test]
|
||||
async fn test_get_preview_status_returns_pending_for_unknown() {
|
||||
let dao = TestPreviewDao::new();
|
||||
let preview_dao = make_preview_dao(dao);
|
||||
let app_state = Data::new(AppState::test_state());
|
||||
let token = make_token();
|
||||
|
||||
let app = actix_web::test::init_service(
|
||||
App::new()
|
||||
.service(get_preview_status)
|
||||
.app_data(app_state)
|
||||
.app_data(preview_dao.clone()),
|
||||
)
|
||||
.await;
|
||||
|
||||
let req = actix_web::test::TestRequest::post()
|
||||
.uri("/video/preview/status")
|
||||
.insert_header(("Authorization", format!("Bearer {}", token)))
|
||||
.set_json(serde_json::json!({"paths": ["photos/new_video.mp4"]}))
|
||||
.to_request();
|
||||
|
||||
let resp = actix_web::test::call_service(&app, req).await;
|
||||
assert_eq!(resp.status(), 200);
|
||||
|
||||
let body: serde_json::Value = actix_web::test::read_body_json(resp).await;
|
||||
let previews = body["previews"].as_array().unwrap();
|
||||
assert_eq!(previews.len(), 1);
|
||||
assert_eq!(previews[0]["status"], "pending");
|
||||
|
||||
// Verify the DAO now has a pending record
|
||||
let mut dao_lock = preview_dao.lock().unwrap();
|
||||
let ctx = opentelemetry::Context::new();
|
||||
let clip = dao_lock.get_preview(&ctx, "photos/new_video.mp4").unwrap();
|
||||
assert!(clip.is_some());
|
||||
assert_eq!(clip.unwrap().status, "pending");
|
||||
}
|
||||
|
||||
#[actix_rt::test]
|
||||
async fn test_get_preview_status_returns_complete_with_url() {
|
||||
let mut dao = TestPreviewDao::new();
|
||||
let ctx = opentelemetry::Context::new();
|
||||
dao.insert_preview(&ctx, "photos/done.mp4", "pending")
|
||||
.unwrap();
|
||||
dao.update_status(
|
||||
&ctx,
|
||||
"photos/done.mp4",
|
||||
"complete",
|
||||
Some(9.5),
|
||||
Some(500000),
|
||||
None,
|
||||
)
|
||||
.unwrap();
|
||||
|
||||
let preview_dao = make_preview_dao(dao);
|
||||
let app_state = Data::new(AppState::test_state());
|
||||
let token = make_token();
|
||||
|
||||
let app = actix_web::test::init_service(
|
||||
App::new()
|
||||
.service(get_preview_status)
|
||||
.app_data(app_state)
|
||||
.app_data(preview_dao),
|
||||
)
|
||||
.await;
|
||||
|
||||
let req = actix_web::test::TestRequest::post()
|
||||
.uri("/video/preview/status")
|
||||
.insert_header(("Authorization", format!("Bearer {}", token)))
|
||||
.set_json(serde_json::json!({"paths": ["photos/done.mp4"]}))
|
||||
.to_request();
|
||||
|
||||
let resp = actix_web::test::call_service(&app, req).await;
|
||||
assert_eq!(resp.status(), 200);
|
||||
|
||||
let body: serde_json::Value = actix_web::test::read_body_json(resp).await;
|
||||
let previews = body["previews"].as_array().unwrap();
|
||||
assert_eq!(previews.len(), 1);
|
||||
assert_eq!(previews[0]["status"], "complete");
|
||||
assert!(
|
||||
previews[0]["preview_url"]
|
||||
.as_str()
|
||||
.unwrap()
|
||||
.contains("photos%2Fdone.mp4")
|
||||
);
|
||||
}
|
||||
|
||||
#[actix_rt::test]
|
||||
async fn test_get_preview_status_rejects_over_200_paths() {
|
||||
let dao = TestPreviewDao::new();
|
||||
let preview_dao = make_preview_dao(dao);
|
||||
let app_state = Data::new(AppState::test_state());
|
||||
let token = make_token();
|
||||
|
||||
let app = actix_web::test::init_service(
|
||||
App::new()
|
||||
.service(get_preview_status)
|
||||
.app_data(app_state)
|
||||
.app_data(preview_dao),
|
||||
)
|
||||
.await;
|
||||
|
||||
let paths: Vec<String> = (0..201).map(|i| format!("video_{}.mp4", i)).collect();
|
||||
let req = actix_web::test::TestRequest::post()
|
||||
.uri("/video/preview/status")
|
||||
.insert_header(("Authorization", format!("Bearer {}", token)))
|
||||
.set_json(serde_json::json!({"paths": paths}))
|
||||
.to_request();
|
||||
|
||||
let resp = actix_web::test::call_service(&app, req).await;
|
||||
assert_eq!(resp.status(), 400);
|
||||
}
|
||||
|
||||
#[actix_rt::test]
|
||||
async fn test_get_preview_status_mixed_statuses() {
|
||||
let mut dao = TestPreviewDao::new();
|
||||
let ctx = opentelemetry::Context::new();
|
||||
dao.insert_preview(&ctx, "a.mp4", "pending").unwrap();
|
||||
dao.insert_preview(&ctx, "b.mp4", "pending").unwrap();
|
||||
dao.update_status(&ctx, "b.mp4", "complete", Some(10.0), Some(100000), None)
|
||||
.unwrap();
|
||||
|
||||
let preview_dao = make_preview_dao(dao);
|
||||
let app_state = Data::new(AppState::test_state());
|
||||
let token = make_token();
|
||||
|
||||
let app = actix_web::test::init_service(
|
||||
App::new()
|
||||
.service(get_preview_status)
|
||||
.app_data(app_state)
|
||||
.app_data(preview_dao),
|
||||
)
|
||||
.await;
|
||||
|
||||
let req = actix_web::test::TestRequest::post()
|
||||
.uri("/video/preview/status")
|
||||
.insert_header(("Authorization", format!("Bearer {}", token)))
|
||||
.set_json(serde_json::json!({"paths": ["a.mp4", "b.mp4", "c.mp4"]}))
|
||||
.to_request();
|
||||
|
||||
let resp = actix_web::test::call_service(&app, req).await;
|
||||
assert_eq!(resp.status(), 200);
|
||||
|
||||
let body: serde_json::Value = actix_web::test::read_body_json(resp).await;
|
||||
let previews = body["previews"].as_array().unwrap();
|
||||
assert_eq!(previews.len(), 3);
|
||||
|
||||
// a.mp4 is pending
|
||||
assert_eq!(previews[0]["path"], "a.mp4");
|
||||
assert_eq!(previews[0]["status"], "pending");
|
||||
|
||||
// b.mp4 is complete with URL
|
||||
assert_eq!(previews[1]["path"], "b.mp4");
|
||||
assert_eq!(previews[1]["status"], "complete");
|
||||
assert!(previews[1]["preview_url"].is_string());
|
||||
|
||||
// c.mp4 was not found — handler inserts pending
|
||||
assert_eq!(previews[2]["path"], "c.mp4");
|
||||
assert_eq!(previews[2]["status"], "pending");
|
||||
}
|
||||
|
||||
/// Verifies that the status endpoint re-queues generation for stale
|
||||
/// "pending" and "failed" records (e.g., after a server restart or
|
||||
/// when clip files were deleted). The do_send to the actor exercises
|
||||
/// the re-queue code path; the actor runs against temp dirs so it
|
||||
/// won't panic.
|
||||
#[actix_rt::test]
|
||||
async fn test_get_preview_status_requeues_pending_and_failed() {
|
||||
let mut dao = TestPreviewDao::new();
|
||||
let ctx = opentelemetry::Context::new();
|
||||
|
||||
// Simulate stale records left from a previous server run
|
||||
dao.insert_preview(&ctx, "stale/pending.mp4", "pending")
|
||||
.unwrap();
|
||||
dao.insert_preview(&ctx, "stale/failed.mp4", "pending")
|
||||
.unwrap();
|
||||
dao.update_status(
|
||||
&ctx,
|
||||
"stale/failed.mp4",
|
||||
"failed",
|
||||
None,
|
||||
None,
|
||||
Some("ffmpeg error"),
|
||||
)
|
||||
.unwrap();
|
||||
|
||||
let preview_dao = make_preview_dao(dao);
|
||||
let app_state = Data::new(AppState::test_state());
|
||||
let token = make_token();
|
||||
|
||||
let app = actix_web::test::init_service(
|
||||
App::new()
|
||||
.service(get_preview_status)
|
||||
.app_data(app_state)
|
||||
.app_data(preview_dao),
|
||||
)
|
||||
.await;
|
||||
|
||||
let req = actix_web::test::TestRequest::post()
|
||||
.uri("/video/preview/status")
|
||||
.insert_header(("Authorization", format!("Bearer {}", token)))
|
||||
.set_json(serde_json::json!({
|
||||
"paths": ["stale/pending.mp4", "stale/failed.mp4"]
|
||||
}))
|
||||
.to_request();
|
||||
|
||||
let resp = actix_web::test::call_service(&app, req).await;
|
||||
assert_eq!(resp.status(), 200);
|
||||
|
||||
let body: serde_json::Value = actix_web::test::read_body_json(resp).await;
|
||||
let previews = body["previews"].as_array().unwrap();
|
||||
assert_eq!(previews.len(), 2);
|
||||
|
||||
// Both records are returned with their current status
|
||||
assert_eq!(previews[0]["path"], "stale/pending.mp4");
|
||||
assert_eq!(previews[0]["status"], "pending");
|
||||
assert!(previews[0].get("preview_url").is_none());
|
||||
|
||||
assert_eq!(previews[1]["path"], "stale/failed.mp4");
|
||||
assert_eq!(previews[1]["status"], "failed");
|
||||
assert!(previews[1].get("preview_url").is_none());
|
||||
}
|
||||
}
|
||||
+17
-826
File diff suppressed because it is too large
Load Diff
+17
-6
@@ -10,8 +10,6 @@ pub mod cleanup;
|
||||
pub mod content_hash;
|
||||
pub mod data;
|
||||
pub mod database;
|
||||
pub mod date_resolver;
|
||||
pub mod duplicates;
|
||||
pub mod error;
|
||||
pub mod exif;
|
||||
pub mod face_watch;
|
||||
@@ -21,18 +19,14 @@ pub mod file_types;
|
||||
pub mod files;
|
||||
pub mod geo;
|
||||
pub mod libraries;
|
||||
pub mod library_maintenance;
|
||||
pub mod memories;
|
||||
pub mod otel;
|
||||
pub mod parsers;
|
||||
pub mod perceptual_hash;
|
||||
pub mod personas;
|
||||
pub mod service;
|
||||
pub mod state;
|
||||
pub mod tags;
|
||||
#[cfg(test)]
|
||||
pub mod testhelpers;
|
||||
pub mod thumbnails;
|
||||
pub mod utils;
|
||||
pub mod video;
|
||||
|
||||
@@ -40,3 +34,20 @@ pub mod video;
|
||||
pub use data::{Claims, ThumbnailRequest};
|
||||
pub use database::{connect, schema};
|
||||
pub use state::AppState;
|
||||
|
||||
// Stub functions for modules that reference main.rs
|
||||
// These are not used by cleanup_files binary
|
||||
use std::path::Path;
|
||||
use walkdir::DirEntry;
|
||||
|
||||
pub fn create_thumbnails(_libs: &[libraries::Library], _excluded_dirs: &[String]) {
|
||||
// Stub - implemented in main.rs
|
||||
}
|
||||
|
||||
pub fn update_media_counts(_media_dir: &Path, _excluded_dirs: &[String]) {
|
||||
// Stub - implemented in main.rs
|
||||
}
|
||||
|
||||
pub fn is_video(entry: &DirEntry) -> bool {
|
||||
file_types::direntry_is_video(entry)
|
||||
}
|
||||
|
||||
+3
-797
@@ -1,12 +1,9 @@
|
||||
use actix_web::{HttpResponse, Responder, get, patch, web, web::Data};
|
||||
use actix_web::{HttpResponse, Responder, get, web::Data};
|
||||
use chrono::Utc;
|
||||
use diesel::prelude::*;
|
||||
use diesel::sqlite::SqliteConnection;
|
||||
use log::{info, warn};
|
||||
use serde::Deserialize;
|
||||
use std::collections::HashMap;
|
||||
use std::path::{Path, PathBuf};
|
||||
use std::sync::{Arc, RwLock};
|
||||
|
||||
use crate::data::Claims;
|
||||
use crate::database::models::{InsertLibrary, LibraryRow};
|
||||
@@ -29,19 +26,6 @@ pub struct Library {
|
||||
pub id: i32,
|
||||
pub name: String,
|
||||
pub root_path: String,
|
||||
/// Operator kill switch (mirrors `libraries.enabled`). When `false`
|
||||
/// the watcher skips this library entirely — before the probe,
|
||||
/// before ingest, before maintenance. Reads / serving still work
|
||||
/// (a request whose path resolves to a disabled library's root
|
||||
/// will succeed if the file is on disk; nothing prevents that
|
||||
/// today and there's no obvious reason to). Toggle via SQL.
|
||||
pub enabled: bool,
|
||||
/// Per-library excluded paths/patterns, parsed from the
|
||||
/// comma-separated DB column. The walker applies these
|
||||
/// **in union** with the global `EXCLUDED_DIRS` env var; either
|
||||
/// list matching a path is enough to exclude. Empty = no
|
||||
/// library-specific excludes (only the global env var applies).
|
||||
pub excluded_dirs: Vec<String>,
|
||||
}
|
||||
|
||||
impl Library {
|
||||
@@ -63,158 +47,6 @@ impl Library {
|
||||
.ok()
|
||||
.map(|p| p.to_string_lossy().replace('\\', "/"))
|
||||
}
|
||||
|
||||
/// Effective excluded directories for a walk of this library:
|
||||
/// the union of the global env-var excludes (passed in by the
|
||||
/// caller as `globals`) and this library's per-row excludes.
|
||||
/// Order doesn't matter; `PathExcluder` accepts repeats.
|
||||
pub fn effective_excluded_dirs(&self, globals: &[String]) -> Vec<String> {
|
||||
if self.excluded_dirs.is_empty() {
|
||||
return globals.to_vec();
|
||||
}
|
||||
let mut combined: Vec<String> =
|
||||
Vec::with_capacity(globals.len() + self.excluded_dirs.len());
|
||||
combined.extend_from_slice(globals);
|
||||
combined.extend(self.excluded_dirs.iter().cloned());
|
||||
combined
|
||||
}
|
||||
}
|
||||
|
||||
/// Parse an excluded_dirs string into a Vec, dropping empty entries.
|
||||
/// NULL → empty Vec. Duplicates are preserved — `PathExcluder` accepts
|
||||
/// repeats, and the storage-side normaliser is where dedup happens.
|
||||
///
|
||||
/// Accepts both `,` and newline (`\n` / `\r\n`) as separators so the
|
||||
/// UI's textarea can submit one-entry-per-line input without forcing
|
||||
/// the operator to remember commas. The DB stores the canonical
|
||||
/// comma-joined form (see `normalize_excluded_dirs_input`); the
|
||||
/// newline path matters mostly for the frontend submit, but mirroring
|
||||
/// it here keeps the parse direction round-trip safe.
|
||||
pub fn parse_excluded_dirs_column(raw: Option<&str>) -> Vec<String> {
|
||||
match raw {
|
||||
None => Vec::new(),
|
||||
Some(s) => s
|
||||
.split(|c: char| matches!(c, ',' | '\n' | '\r'))
|
||||
.map(str::trim)
|
||||
.filter(|s| !s.is_empty())
|
||||
.map(String::from)
|
||||
.collect(),
|
||||
}
|
||||
}
|
||||
|
||||
/// Validate a single excluded_dirs entry, normalising trivial cosmetic
|
||||
/// differences and rejecting forms that `PathExcluder` would silently
|
||||
/// drop. Returns the entry to store, or an error message describing
|
||||
/// what's wrong with it.
|
||||
///
|
||||
/// Rules:
|
||||
/// - Backslashes are rejected — PathExcluder strips only a leading `/`;
|
||||
/// a Windows-typed `\photos` or `photos\2024` lands in the
|
||||
/// component-pattern bucket and never matches anything. Suggest the
|
||||
/// forward-slash form.
|
||||
/// - A Windows drive letter prefix (`Z:` etc.) is rejected — excluded
|
||||
/// entries are *relative to the library root*, not absolute system
|
||||
/// paths.
|
||||
/// - A no-leading-slash entry containing `/` is rejected — the
|
||||
/// component-pattern path matches a single segment only; the user
|
||||
/// almost certainly meant the leading-slash form.
|
||||
/// - A `..` segment in a path entry is rejected — `base.join("../x")`
|
||||
/// doesn't canonicalise, so the resulting prefix never matches and
|
||||
/// the exclude silently fails.
|
||||
/// - Trailing slashes on path entries are stripped silently
|
||||
/// (`/photos/` → `/photos`) — purely cosmetic.
|
||||
pub fn validate_excluded_dirs_entry(entry: &str) -> Result<String, String> {
|
||||
let trimmed = entry.trim();
|
||||
if trimmed.is_empty() {
|
||||
return Err("empty entry".to_string());
|
||||
}
|
||||
if trimmed.contains('\\') {
|
||||
return Err(format!(
|
||||
"'{}': use forward slashes — backslash paths never match on the watcher's component-by-component compare",
|
||||
trimmed
|
||||
));
|
||||
}
|
||||
// Windows drive letter prefix like `Z:` or `Z:/something`. A
|
||||
// length-2 ASCII-alpha + colon is the canonical form; we don't
|
||||
// bother with longer multi-letter Windows drive-equivalents
|
||||
// (`\\?\Volume{…}`) since the backslash check already catches them.
|
||||
let bytes = trimmed.as_bytes();
|
||||
if bytes.len() >= 2 && bytes[0].is_ascii_alphabetic() && bytes[1] == b':' {
|
||||
return Err(format!(
|
||||
"'{}': excluded entries are relative to the library root, not absolute system paths — drop the drive letter",
|
||||
trimmed
|
||||
));
|
||||
}
|
||||
if let Some(rel) = trimmed.strip_prefix('/') {
|
||||
// Path form. Reject `..` traversal — `base.join(\"../x\")` doesn't
|
||||
// canonicalise, so `path.starts_with(...)` never matches.
|
||||
if rel
|
||||
.split('/')
|
||||
.any(|seg| seg == "..")
|
||||
{
|
||||
return Err(format!(
|
||||
"'{}': '..' segments don't normalise — the prefix-match never fires",
|
||||
trimmed
|
||||
));
|
||||
}
|
||||
// Strip a trailing slash if any (`/photos/` → `/photos`). Purely
|
||||
// cosmetic; PathBuf::starts_with treats both forms identically.
|
||||
let stripped = if rel.ends_with('/') {
|
||||
format!("/{}", rel.trim_end_matches('/'))
|
||||
} else {
|
||||
trimmed.to_string()
|
||||
};
|
||||
// After stripping, an empty rel ("/" alone) excludes the root —
|
||||
// certainly a typo.
|
||||
if stripped == "/" {
|
||||
return Err("'/': excluding the library root is almost certainly a typo".to_string());
|
||||
}
|
||||
Ok(stripped)
|
||||
} else {
|
||||
// Component-pattern form: must be a single segment. A `/`
|
||||
// anywhere here is the common "I forgot the leading slash" typo
|
||||
// — reject so the user fixes it instead of staring at an
|
||||
// exclude that does nothing.
|
||||
if trimmed.contains('/') {
|
||||
return Err(format!(
|
||||
"'{}': multi-segment names only match with a leading slash — try '/{}'",
|
||||
trimmed, trimmed
|
||||
));
|
||||
}
|
||||
Ok(trimmed.to_string())
|
||||
}
|
||||
}
|
||||
|
||||
/// Canonicalise an excluded_dirs string for storage: validate each
|
||||
/// entry, then parse → trim → dedupe (preserving insertion order) →
|
||||
/// comma-join with no inner whitespace. Empty / whitespace-only input
|
||||
/// → `Ok(None)` (writes NULL). Any entry that fails validation aborts
|
||||
/// the whole patch with a descriptive error so the operator can fix
|
||||
/// the typo before retrying.
|
||||
///
|
||||
/// Used by `PATCH /libraries/{id}` so two users typing the same entries
|
||||
/// in different orders / casings / whitespace land on the same stored
|
||||
/// form, and a typo'd duplicate (`@eaDir, @eaDir`) collapses on save.
|
||||
/// Round-trip stable: writing the output back through this function
|
||||
/// yields the same string.
|
||||
pub fn normalize_excluded_dirs_input(raw: &str) -> Result<Option<String>, String> {
|
||||
let parsed = parse_excluded_dirs_column(Some(raw));
|
||||
if parsed.is_empty() {
|
||||
return Ok(None);
|
||||
}
|
||||
let mut seen = std::collections::HashSet::new();
|
||||
let mut deduped: Vec<String> = Vec::with_capacity(parsed.len());
|
||||
for entry in parsed {
|
||||
let validated = validate_excluded_dirs_entry(&entry)?;
|
||||
if seen.insert(validated.clone()) {
|
||||
deduped.push(validated);
|
||||
}
|
||||
}
|
||||
if deduped.is_empty() {
|
||||
Ok(None)
|
||||
} else {
|
||||
Ok(Some(deduped.join(",")))
|
||||
}
|
||||
}
|
||||
|
||||
impl From<LibraryRow> for Library {
|
||||
@@ -223,8 +55,6 @@ impl From<LibraryRow> for Library {
|
||||
id: row.id,
|
||||
name: row.name,
|
||||
root_path: row.root_path,
|
||||
enabled: row.enabled,
|
||||
excluded_dirs: parse_excluded_dirs_column(row.excluded_dirs.as_deref()),
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -279,8 +109,6 @@ pub fn seed_or_patch_from_env(conn: &mut SqliteConnection, base_path: &str) {
|
||||
name: "main",
|
||||
root_path: base_path,
|
||||
created_at: now,
|
||||
enabled: true,
|
||||
excluded_dirs: None,
|
||||
})
|
||||
.execute(conn);
|
||||
match result {
|
||||
@@ -318,292 +146,18 @@ pub fn resolve_library_param<'a>(
|
||||
.ok_or_else(|| format!("unknown library name: {}", raw))
|
||||
}
|
||||
|
||||
/// Health of a library at a point in time. Probed at the top of each
|
||||
/// file-watcher tick. The `Stale` state is the "be conservative" signal:
|
||||
/// destructive paths (ingest writes, future move-handoff and orphan GC in
|
||||
/// branches B/C) skip a stale library, but reads/serving stay unaffected.
|
||||
///
|
||||
/// See `CLAUDE.md` → "Library availability and safety" for the policy.
|
||||
#[derive(Clone, Debug, serde::Serialize, PartialEq, Eq)]
|
||||
#[serde(tag = "state", rename_all = "snake_case")]
|
||||
pub enum LibraryHealth {
|
||||
Online,
|
||||
Stale {
|
||||
reason: String,
|
||||
/// Unix timestamp (seconds) of the most recent transition into
|
||||
/// Stale. Held for telemetry / `/libraries` surfacing only —
|
||||
/// gating logic doesn't read it.
|
||||
since: i64,
|
||||
},
|
||||
}
|
||||
|
||||
impl LibraryHealth {
|
||||
pub fn is_online(&self) -> bool {
|
||||
matches!(self, LibraryHealth::Online)
|
||||
}
|
||||
}
|
||||
|
||||
/// Shared snapshot of every configured library's health, keyed by
|
||||
/// `library_id`. The watcher writes; HTTP handlers read. RwLock because
|
||||
/// reads vastly outnumber writes (one tick vs. every status request).
|
||||
pub type LibraryHealthMap = Arc<RwLock<HashMap<i32, LibraryHealth>>>;
|
||||
|
||||
/// Construct an initial health map. Libraries start `Online`; the first
|
||||
/// probe will downgrade any that fail. Starting `Stale` would block ingest
|
||||
/// for the watcher's first tick on a healthy mount, which is the wrong
|
||||
/// default for a server that's just been restarted.
|
||||
pub fn new_health_map(libs: &[Library]) -> LibraryHealthMap {
|
||||
let mut m = HashMap::with_capacity(libs.len());
|
||||
for lib in libs {
|
||||
m.insert(lib.id, LibraryHealth::Online);
|
||||
}
|
||||
Arc::new(RwLock::new(m))
|
||||
}
|
||||
|
||||
/// Probe a library's mount point. Cheap: stat + open dir + peek one entry.
|
||||
///
|
||||
/// `had_data` is the caller's prior knowledge that this library has been
|
||||
/// non-empty before — typically `image_exif` row count > 0. When true, an
|
||||
/// empty directory is suspicious (it's how an unmounted NFS share looks);
|
||||
/// when false, it's accepted as a fresh mount that simply hasn't been
|
||||
/// indexed yet.
|
||||
///
|
||||
/// Note: stat / read_dir on a hard-mounted, unreachable NFS share can
|
||||
/// block. The watcher accepts that risk for now — the worst case is that
|
||||
/// the tick stalls until the mount returns, which is no more destructive
|
||||
/// than the pre-probe behavior. A future enhancement can wrap this in a
|
||||
/// thread + timeout if it becomes an operational issue.
|
||||
pub fn probe_online(lib: &Library, had_data: bool) -> LibraryHealth {
|
||||
let now = Utc::now().timestamp();
|
||||
let path = Path::new(&lib.root_path);
|
||||
|
||||
let metadata = match std::fs::metadata(path) {
|
||||
Ok(m) => m,
|
||||
Err(e) => {
|
||||
return LibraryHealth::Stale {
|
||||
reason: format!("root_path stat failed: {}", e),
|
||||
since: now,
|
||||
};
|
||||
}
|
||||
};
|
||||
if !metadata.is_dir() {
|
||||
return LibraryHealth::Stale {
|
||||
reason: format!("root_path is not a directory: {}", lib.root_path),
|
||||
since: now,
|
||||
};
|
||||
}
|
||||
|
||||
let mut entries = match std::fs::read_dir(path) {
|
||||
Ok(it) => it,
|
||||
Err(e) => {
|
||||
return LibraryHealth::Stale {
|
||||
reason: format!("read_dir failed: {}", e),
|
||||
since: now,
|
||||
};
|
||||
}
|
||||
};
|
||||
|
||||
// Empty directory only counts as Stale when we have prior evidence
|
||||
// this library used to have content. A genuinely fresh mount is
|
||||
// legitimately empty, and degrading it would block first-time ingest.
|
||||
if had_data && entries.next().is_none() {
|
||||
return LibraryHealth::Stale {
|
||||
reason: "library is empty but image_exif has rows for it".to_string(),
|
||||
since: now,
|
||||
};
|
||||
}
|
||||
|
||||
LibraryHealth::Online
|
||||
}
|
||||
|
||||
/// Probe `lib`, update `map`, and return the new state. Logs only on a
|
||||
/// state transition (Online↔Stale) so a long outage doesn't spam at every
|
||||
/// tick — operators get one warn on the way down and one info on the way
|
||||
/// up.
|
||||
pub fn refresh_health(map: &LibraryHealthMap, lib: &Library, had_data: bool) -> LibraryHealth {
|
||||
let new_state = probe_online(lib, had_data);
|
||||
let mut guard = map.write().unwrap_or_else(|e| e.into_inner());
|
||||
let prev = guard.get(&lib.id).cloned();
|
||||
let transitioned = matches!(
|
||||
(&prev, &new_state),
|
||||
(None, LibraryHealth::Stale { .. })
|
||||
| (Some(LibraryHealth::Online), LibraryHealth::Stale { .. })
|
||||
| (Some(LibraryHealth::Stale { .. }), LibraryHealth::Online)
|
||||
);
|
||||
if transitioned {
|
||||
match &new_state {
|
||||
LibraryHealth::Online => info!(
|
||||
"Library '{}' (id={}) recovered: {} is online",
|
||||
lib.name, lib.id, lib.root_path
|
||||
),
|
||||
LibraryHealth::Stale { reason, .. } => warn!(
|
||||
"Library '{}' (id={}) is STALE — pausing writes. Reason: {}. Path: {}",
|
||||
lib.name, lib.id, reason, lib.root_path
|
||||
),
|
||||
}
|
||||
}
|
||||
guard.insert(lib.id, new_state.clone());
|
||||
new_state
|
||||
}
|
||||
|
||||
/// Snapshot of one library + its current health, for `/libraries`.
|
||||
#[derive(serde::Serialize)]
|
||||
pub struct LibraryStatus {
|
||||
#[serde(flatten)]
|
||||
pub library: Library,
|
||||
pub health: LibraryHealth,
|
||||
}
|
||||
|
||||
#[derive(serde::Serialize)]
|
||||
pub struct LibrariesResponse {
|
||||
pub libraries: Vec<LibraryStatus>,
|
||||
/// Globally-excluded paths/patterns from the `EXCLUDED_DIRS` env var.
|
||||
/// Applied **in union** with each library's own `excluded_dirs`. Surfaced
|
||||
/// here so an admin UI can show the operator "you already skip these
|
||||
/// everywhere" before they add per-library entries that would duplicate
|
||||
/// the global list. Read-only — globals live in `.env` and aren't
|
||||
/// mutable via the API today.
|
||||
pub global_excluded_dirs: Vec<String>,
|
||||
pub libraries: Vec<Library>,
|
||||
}
|
||||
|
||||
#[get("/libraries")]
|
||||
pub async fn list_libraries(_claims: Claims, app_state: Data<AppState>) -> impl Responder {
|
||||
// Read from the live view so a recent PATCH /libraries/{id} that
|
||||
// flipped `enabled` or rewrote `excluded_dirs` surfaces immediately
|
||||
// — the immutable `app_state.libraries` snapshot is stale once the
|
||||
// first mutation lands.
|
||||
let live_guard = app_state
|
||||
.live_libraries
|
||||
.read()
|
||||
.unwrap_or_else(|e| e.into_inner());
|
||||
let health_guard = app_state
|
||||
.library_health
|
||||
.read()
|
||||
.unwrap_or_else(|e| e.into_inner());
|
||||
let libraries = live_guard
|
||||
.iter()
|
||||
.map(|lib| LibraryStatus {
|
||||
library: lib.clone(),
|
||||
health: health_guard
|
||||
.get(&lib.id)
|
||||
.cloned()
|
||||
.unwrap_or(LibraryHealth::Online),
|
||||
})
|
||||
.collect();
|
||||
HttpResponse::Ok().json(LibrariesResponse {
|
||||
libraries,
|
||||
global_excluded_dirs: app_state.excluded_dirs.clone(),
|
||||
libraries: app_state.libraries.clone(),
|
||||
})
|
||||
}
|
||||
|
||||
/// Body for PATCH /libraries/{id}. Both fields are optional — omitting
|
||||
/// one leaves it untouched. `excluded_dirs` is the same comma-separated
|
||||
/// shape as the DB column; an empty string clears (writes NULL).
|
||||
#[derive(Deserialize, Debug)]
|
||||
pub struct PatchLibraryBody {
|
||||
pub enabled: Option<bool>,
|
||||
pub excluded_dirs: Option<String>,
|
||||
}
|
||||
|
||||
/// Mutate one library row. The watcher reads `app_state.live_libraries`
|
||||
/// at the top of each tick, so a successful PATCH is picked up within
|
||||
/// one WATCH_QUICK_INTERVAL_SECONDS without restart — no separate
|
||||
/// `apply_now` signal. Returns the updated `Library` so the caller can
|
||||
/// render the new state without a follow-up GET.
|
||||
///
|
||||
/// Despite CLAUDE.md noting "Toggle via SQL; there is intentionally no
|
||||
/// HTTP endpoint for library mutation", we now expose this for Apollo's
|
||||
/// Settings panel. The single-user trust model hasn't changed; the
|
||||
/// endpoint just removes the SSH-and-sqlite3 step.
|
||||
#[patch("/libraries/{id}")]
|
||||
pub async fn patch_library(
|
||||
_claims: Claims,
|
||||
path: web::Path<i32>,
|
||||
body: web::Json<PatchLibraryBody>,
|
||||
app_state: Data<AppState>,
|
||||
) -> impl Responder {
|
||||
let lib_id = path.into_inner();
|
||||
let body = body.into_inner();
|
||||
|
||||
if body.enabled.is_none() && body.excluded_dirs.is_none() {
|
||||
return HttpResponse::UnprocessableEntity().body("empty patch body");
|
||||
}
|
||||
|
||||
let mut conn = crate::database::connect();
|
||||
|
||||
// Build the SET clause. Diesel's set() takes a tuple of assignments;
|
||||
// we apply each field independently so an absent field doesn't get
|
||||
// forced to NULL / its default.
|
||||
let mut affected = 0usize;
|
||||
if let Some(enabled) = body.enabled {
|
||||
match diesel::update(libraries::table.filter(libraries::id.eq(lib_id)))
|
||||
.set(libraries::enabled.eq(enabled))
|
||||
.execute(&mut conn)
|
||||
{
|
||||
Ok(n) => affected = affected.max(n),
|
||||
Err(e) => {
|
||||
warn!("PATCH /libraries/{}: enabled update failed: {:?}", lib_id, e);
|
||||
return HttpResponse::InternalServerError().body(format!("{}", e));
|
||||
}
|
||||
}
|
||||
}
|
||||
if let Some(raw) = body.excluded_dirs.as_deref() {
|
||||
// Canonicalise on write — trim, dedupe, validate, drop empties —
|
||||
// so the DB stores a round-trip-stable form regardless of how
|
||||
// messy the user typed it. Empty / whitespace-only → NULL
|
||||
// (matches a never-set library). Validation failures (Windows
|
||||
// backslash paths, drive letters, `..` traversal, etc.) bounce
|
||||
// back as 422 so the operator can fix the typo.
|
||||
let normalised = match normalize_excluded_dirs_input(raw) {
|
||||
Ok(v) => v,
|
||||
Err(msg) => return HttpResponse::UnprocessableEntity().body(msg),
|
||||
};
|
||||
let stored: Option<&str> = normalised.as_deref();
|
||||
match diesel::update(libraries::table.filter(libraries::id.eq(lib_id)))
|
||||
.set(libraries::excluded_dirs.eq(stored))
|
||||
.execute(&mut conn)
|
||||
{
|
||||
Ok(n) => affected = affected.max(n),
|
||||
Err(e) => {
|
||||
warn!(
|
||||
"PATCH /libraries/{}: excluded_dirs update failed: {:?}",
|
||||
lib_id, e
|
||||
);
|
||||
return HttpResponse::InternalServerError().body(format!("{}", e));
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if affected == 0 {
|
||||
return HttpResponse::NotFound().body(format!("library id {} not found", lib_id));
|
||||
}
|
||||
|
||||
// Refresh the live view from the canonical DB state. Reloading the
|
||||
// whole table (rather than mutating one entry in place) is cheap
|
||||
// (handful of rows) and keeps the in-memory and DB views trivially
|
||||
// consistent.
|
||||
let fresh = load_all(&mut conn);
|
||||
let updated = fresh.iter().find(|l| l.id == lib_id).cloned();
|
||||
{
|
||||
let mut live = app_state
|
||||
.live_libraries
|
||||
.write()
|
||||
.unwrap_or_else(|e| e.into_inner());
|
||||
*live = fresh;
|
||||
}
|
||||
|
||||
match updated {
|
||||
Some(lib) => {
|
||||
info!(
|
||||
"PATCH /libraries/{}: enabled={:?} excluded_dirs={:?} → applied",
|
||||
lib_id, body.enabled, body.excluded_dirs
|
||||
);
|
||||
HttpResponse::Ok().json(lib)
|
||||
}
|
||||
None => HttpResponse::NotFound().body(format!("library id {} not found after update", lib_id)),
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
@@ -638,8 +192,6 @@ mod tests {
|
||||
id: 1,
|
||||
name: "main".into(),
|
||||
root_path: "/tmp/media".into(),
|
||||
enabled: true,
|
||||
excluded_dirs: Vec::new(),
|
||||
};
|
||||
let rel = lib.strip_root(Path::new("/tmp/media/2024/photo.jpg"));
|
||||
assert_eq!(rel.as_deref(), Some("2024/photo.jpg"));
|
||||
@@ -653,8 +205,6 @@ mod tests {
|
||||
id: 1,
|
||||
name: "main".into(),
|
||||
root_path: "/tmp/media".into(),
|
||||
enabled: true,
|
||||
excluded_dirs: Vec::new(),
|
||||
};
|
||||
let abs = lib.resolve("2024/photo.jpg");
|
||||
assert_eq!(abs, PathBuf::from("/tmp/media/2024/photo.jpg"));
|
||||
@@ -672,15 +222,11 @@ mod tests {
|
||||
id: 1,
|
||||
name: "main".into(),
|
||||
root_path: "/tmp/main".into(),
|
||||
enabled: true,
|
||||
excluded_dirs: Vec::new(),
|
||||
},
|
||||
Library {
|
||||
id: 7,
|
||||
name: "archive".into(),
|
||||
root_path: "/tmp/archive".into(),
|
||||
enabled: true,
|
||||
excluded_dirs: Vec::new(),
|
||||
},
|
||||
]
|
||||
}
|
||||
@@ -733,344 +279,4 @@ mod tests {
|
||||
let err = resolve_library_param(&state, Some("missing")).unwrap_err();
|
||||
assert!(err.contains("unknown library name"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn parse_excluded_dirs_column_handles_null_and_whitespace() {
|
||||
assert_eq!(parse_excluded_dirs_column(None), Vec::<String>::new());
|
||||
assert_eq!(parse_excluded_dirs_column(Some("")), Vec::<String>::new());
|
||||
assert_eq!(
|
||||
parse_excluded_dirs_column(Some(" /a , /b/sub , @eaDir ,, ")),
|
||||
vec!["/a".to_string(), "/b/sub".to_string(), "@eaDir".to_string()]
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn parse_excluded_dirs_column_splits_on_newlines_too() {
|
||||
// Newline-separated input from a textarea submit. One-per-line
|
||||
// is the recommended UX because "I forgot the comma" was a
|
||||
// recurring footgun (.thumbnails .thumbnails2 silently
|
||||
// becomes a single never-matching pattern).
|
||||
assert_eq!(
|
||||
parse_excluded_dirs_column(Some("@eaDir\n.thumbnails\n/private")),
|
||||
vec![
|
||||
"@eaDir".to_string(),
|
||||
".thumbnails".to_string(),
|
||||
"/private".to_string()
|
||||
]
|
||||
);
|
||||
// Windows line endings (CRLF) — the carriage return is its own
|
||||
// separator so the trailing empty token between \r and \n gets
|
||||
// trimmed + dropped.
|
||||
assert_eq!(
|
||||
parse_excluded_dirs_column(Some("a\r\nb\r\nc")),
|
||||
vec!["a".to_string(), "b".to_string(), "c".to_string()]
|
||||
);
|
||||
// Mixed comma + newline — the user pastes from one source,
|
||||
// adds a few entries inline. Both work, in any combination.
|
||||
assert_eq!(
|
||||
parse_excluded_dirs_column(Some("a, b\nc,d")),
|
||||
vec![
|
||||
"a".to_string(),
|
||||
"b".to_string(),
|
||||
"c".to_string(),
|
||||
"d".to_string()
|
||||
]
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn effective_excluded_dirs_unions_global_and_per_library() {
|
||||
let lib_no_extras = Library {
|
||||
id: 1,
|
||||
name: "main".into(),
|
||||
root_path: "/x".into(),
|
||||
enabled: true,
|
||||
excluded_dirs: Vec::new(),
|
||||
};
|
||||
let globals = vec!["@eaDir".to_string(), ".thumbnails".to_string()];
|
||||
// Empty per-library excludes → exactly the globals.
|
||||
assert_eq!(lib_no_extras.effective_excluded_dirs(&globals), globals);
|
||||
|
||||
let lib_with_extras = Library {
|
||||
id: 2,
|
||||
name: "archive".into(),
|
||||
root_path: "/y".into(),
|
||||
enabled: true,
|
||||
excluded_dirs: vec!["/photos".to_string()],
|
||||
};
|
||||
let combined = lib_with_extras.effective_excluded_dirs(&globals);
|
||||
assert!(combined.contains(&"@eaDir".to_string()));
|
||||
assert!(combined.contains(&".thumbnails".to_string()));
|
||||
assert!(combined.contains(&"/photos".to_string()));
|
||||
assert_eq!(combined.len(), 3);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn effective_excluded_dirs_keeps_overlap_between_global_and_per_library() {
|
||||
// Two sources both excluding `@eaDir` is legal — `PathExcluder`
|
||||
// accepts repeats, and there's no behavioral reason to dedupe
|
||||
// here. Documents the design choice so a future refactor that
|
||||
// tightens this is forced to update both code and tests.
|
||||
let globals = vec!["@eaDir".to_string()];
|
||||
let lib = Library {
|
||||
id: 1,
|
||||
name: "main".into(),
|
||||
root_path: "/x".into(),
|
||||
enabled: true,
|
||||
excluded_dirs: vec!["@eaDir".to_string(), "/private".to_string()],
|
||||
};
|
||||
let combined = lib.effective_excluded_dirs(&globals);
|
||||
// 2 occurrences of @eaDir + /private = 3 entries total.
|
||||
assert_eq!(combined, vec!["@eaDir", "@eaDir", "/private"]);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn normalize_excluded_dirs_input_handles_empty_and_whitespace() {
|
||||
assert_eq!(normalize_excluded_dirs_input(""), Ok(None));
|
||||
assert_eq!(normalize_excluded_dirs_input(" "), Ok(None));
|
||||
assert_eq!(normalize_excluded_dirs_input(",,,"), Ok(None));
|
||||
assert_eq!(normalize_excluded_dirs_input(" , , "), Ok(None));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn normalize_excluded_dirs_input_trims_per_entry() {
|
||||
// Inner whitespace stripped on each item, comma-joined without
|
||||
// spaces. Mirrors how parse_excluded_dirs_column reads it back.
|
||||
assert_eq!(
|
||||
normalize_excluded_dirs_input(" @eaDir , /private , .thumbnails "),
|
||||
Ok(Some("@eaDir,/private,.thumbnails".to_string()))
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn normalize_excluded_dirs_input_dedupes_preserving_first_occurrence() {
|
||||
// Exact-string duplicates collapse; the first occurrence wins
|
||||
// (preserves the operator's typed order so they recognise their
|
||||
// intent on round-trip).
|
||||
assert_eq!(
|
||||
normalize_excluded_dirs_input("@eaDir, /private, @eaDir, /private"),
|
||||
Ok(Some("@eaDir,/private".to_string()))
|
||||
);
|
||||
// Whitespace-distinct entries collapse to the same canonical
|
||||
// form. Case is preserved — `Foo` and `foo` are different keys
|
||||
// (filesystem case-sensitivity is platform-dependent; we don't
|
||||
// make that call here).
|
||||
assert_eq!(
|
||||
normalize_excluded_dirs_input(" Foo,foo, Foo "),
|
||||
Ok(Some("Foo,foo".to_string()))
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn normalize_excluded_dirs_input_is_round_trip_stable() {
|
||||
// Writing the normaliser's output back through it yields the
|
||||
// same string. PATCH-clearing edits round-trip cleanly through
|
||||
// parse_excluded_dirs_column too.
|
||||
let raw = " /a/b ,, /a/b , c ";
|
||||
let once = normalize_excluded_dirs_input(raw)
|
||||
.expect("validation passes")
|
||||
.expect("not empty");
|
||||
let twice = normalize_excluded_dirs_input(&once)
|
||||
.expect("validation passes")
|
||||
.expect("not empty");
|
||||
assert_eq!(once, twice);
|
||||
// Parsing the stored form back gives the deduped Vec.
|
||||
assert_eq!(
|
||||
parse_excluded_dirs_column(Some(&once)),
|
||||
vec!["/a/b".to_string(), "c".to_string()]
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn validate_rejects_backslash_paths() {
|
||||
// Windows-typed entries land in the component-pattern bucket
|
||||
// and never match — reject so the user gets feedback instead
|
||||
// of a silent no-op.
|
||||
assert!(validate_excluded_dirs_entry(r"\photos").is_err());
|
||||
assert!(validate_excluded_dirs_entry(r"photos\2024").is_err());
|
||||
assert!(validate_excluded_dirs_entry(r"\\server\share").is_err());
|
||||
// The error message names the entry and points at the fix.
|
||||
let err = validate_excluded_dirs_entry(r"\photos").unwrap_err();
|
||||
assert!(err.contains("forward slashes"), "{}", err);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn validate_rejects_windows_drive_letters() {
|
||||
assert!(validate_excluded_dirs_entry("Z:/photos").is_err());
|
||||
assert!(validate_excluded_dirs_entry("z:photos").is_err());
|
||||
// Single-letter alpha + colon is the canonical drive prefix;
|
||||
// the message should steer toward the relative form.
|
||||
let err = validate_excluded_dirs_entry("Z:/foo").unwrap_err();
|
||||
assert!(err.contains("relative to the library root"), "{}", err);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn validate_rejects_multi_segment_name_without_leading_slash() {
|
||||
// The common "I forgot the slash" typo. Today this would store
|
||||
// a never-matching component pattern; we catch it.
|
||||
let err = validate_excluded_dirs_entry("photos/2024").unwrap_err();
|
||||
assert!(err.contains("multi-segment"), "{}", err);
|
||||
// And the suggestion shows the corrected form.
|
||||
assert!(err.contains("/photos/2024"), "{}", err);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn validate_rejects_parent_dir_traversal_in_path_entries() {
|
||||
// base.join("../sensitive") doesn't canonicalise, so the
|
||||
// resulting prefix never starts_with anything the walker sees.
|
||||
assert!(validate_excluded_dirs_entry("/../secret").is_err());
|
||||
assert!(validate_excluded_dirs_entry("/photos/../keys").is_err());
|
||||
// Same string as a non-leading-slash component is fine — it
|
||||
// just never matches (you'd literally need a directory named
|
||||
// `..` which is impossible on every filesystem we care about),
|
||||
// but the validator accepts it because the failure mode isn't
|
||||
// a silent footgun in that direction.
|
||||
assert!(validate_excluded_dirs_entry("..").is_ok());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn validate_strips_trailing_slash_on_path_entries() {
|
||||
assert_eq!(
|
||||
validate_excluded_dirs_entry("/photos/").unwrap(),
|
||||
"/photos"
|
||||
);
|
||||
assert_eq!(
|
||||
validate_excluded_dirs_entry("/photos//").unwrap(),
|
||||
"/photos"
|
||||
);
|
||||
// Bare "/" is rejected — almost certainly a typo for the
|
||||
// library root.
|
||||
assert!(validate_excluded_dirs_entry("/").is_err());
|
||||
assert!(validate_excluded_dirs_entry("///").is_err());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn validate_passes_valid_entries() {
|
||||
for entry in &[
|
||||
"/photos",
|
||||
"/photos/2024",
|
||||
"/media/raw/private",
|
||||
"@eaDir",
|
||||
".thumbnails",
|
||||
".DS_Store",
|
||||
"node_modules",
|
||||
] {
|
||||
assert!(
|
||||
validate_excluded_dirs_entry(entry).is_ok(),
|
||||
"expected {} to pass",
|
||||
entry
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn normalize_aborts_on_invalid_entry() {
|
||||
// One bad entry kills the whole patch — better to surface the
|
||||
// problem than to silently apply N-1 of N changes.
|
||||
let err = normalize_excluded_dirs_input("/photos, photos/2024").unwrap_err();
|
||||
assert!(err.contains("photos/2024"), "{}", err);
|
||||
// A valid mix succeeds — the bad-entry test isn't accidentally
|
||||
// matching the good prefix.
|
||||
assert_eq!(
|
||||
normalize_excluded_dirs_input("/photos, @eaDir, /private/"),
|
||||
Ok(Some("/photos,@eaDir,/private".to_string()))
|
||||
);
|
||||
}
|
||||
|
||||
fn probe_lib(id: i32, root: String) -> Library {
|
||||
Library {
|
||||
id,
|
||||
name: "main".into(),
|
||||
root_path: root,
|
||||
enabled: true,
|
||||
excluded_dirs: Vec::new(),
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn probe_online_for_existing_non_empty_dir() {
|
||||
let tmp = tempfile::tempdir().unwrap();
|
||||
std::fs::write(tmp.path().join("photo.jpg"), b"hello").unwrap();
|
||||
let lib = probe_lib(1, tmp.path().to_string_lossy().into());
|
||||
// had_data doesn't matter when the dir has entries.
|
||||
assert!(probe_online(&lib, true).is_online());
|
||||
assert!(probe_online(&lib, false).is_online());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn probe_stale_when_root_missing() {
|
||||
let lib = probe_lib(1, "/nonexistent/definitely/not/here".into());
|
||||
assert!(matches!(
|
||||
probe_online(&lib, false),
|
||||
LibraryHealth::Stale { .. }
|
||||
));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn probe_stale_when_root_is_a_file() {
|
||||
let tmp = tempfile::tempdir().unwrap();
|
||||
let file = tmp.path().join("not-a-dir");
|
||||
std::fs::write(&file, b"x").unwrap();
|
||||
let lib = probe_lib(1, file.to_string_lossy().into());
|
||||
assert!(matches!(
|
||||
probe_online(&lib, false),
|
||||
LibraryHealth::Stale { .. }
|
||||
));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn probe_empty_dir_is_online_when_no_prior_data() {
|
||||
// Fresh mount: empty directory, no rows in image_exif. Accept it.
|
||||
let tmp = tempfile::tempdir().unwrap();
|
||||
let lib = probe_lib(1, tmp.path().to_string_lossy().into());
|
||||
assert!(probe_online(&lib, false).is_online());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn probe_empty_dir_is_stale_when_prior_data_existed() {
|
||||
// The "share went offline" signal: directory exists but is empty,
|
||||
// and we know the library used to have content. Treat as Stale.
|
||||
let tmp = tempfile::tempdir().unwrap();
|
||||
let lib = probe_lib(1, tmp.path().to_string_lossy().into());
|
||||
match probe_online(&lib, true) {
|
||||
LibraryHealth::Stale { reason, .. } => {
|
||||
assert!(reason.contains("empty"), "unexpected reason: {}", reason)
|
||||
}
|
||||
other => panic!("expected Stale, got {:?}", other),
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn refresh_health_logs_only_on_transition() {
|
||||
// Smoke test: refresh_health updates the map and reports correctly.
|
||||
// (We can't easily assert on logs without a custom logger; the
|
||||
// important thing is that the state churns properly.)
|
||||
let tmp = tempfile::tempdir().unwrap();
|
||||
let lib = Library {
|
||||
id: 42,
|
||||
name: "test".into(),
|
||||
root_path: tmp.path().to_string_lossy().into(),
|
||||
enabled: true,
|
||||
excluded_dirs: Vec::new(),
|
||||
};
|
||||
let map = new_health_map(&[lib.clone()]);
|
||||
|
||||
// First probe: empty dir, no prior data — Online.
|
||||
let s1 = refresh_health(&map, &lib, false);
|
||||
assert!(s1.is_online());
|
||||
|
||||
// Probe again with had_data=true on the same empty dir — Stale.
|
||||
let s2 = refresh_health(&map, &lib, true);
|
||||
assert!(matches!(s2, LibraryHealth::Stale { .. }));
|
||||
assert_eq!(
|
||||
map.read().unwrap().get(&lib.id).cloned(),
|
||||
Some(s2.clone()),
|
||||
"map should reflect the latest probe"
|
||||
);
|
||||
|
||||
// Recovery: drop a file and probe again.
|
||||
std::fs::write(tmp.path().join("photo.jpg"), b"x").unwrap();
|
||||
let s3 = refresh_health(&map, &lib, true);
|
||||
assert!(s3.is_online());
|
||||
}
|
||||
}
|
||||
|
||||
@@ -1,828 +0,0 @@
|
||||
//! Filesystem-backed maintenance of `image_exif`, the back-ref columns
|
||||
//! on hash-keyed tables, and orphan derived data.
|
||||
//!
|
||||
//! These passes are the operational implementation of the library
|
||||
//! handoff and orphan rules from CLAUDE.md → "Multi-library data
|
||||
//! model" / "Library availability and safety":
|
||||
//!
|
||||
//! 1. **Missing-file detection** — when a file disappears from disk
|
||||
//! but its `image_exif` row remains, the row is removed. Naturally
|
||||
//! implements the move case: when a user moves a file from lib-A
|
||||
//! to lib-B, the watcher's normal ingest creates the lib-B row;
|
||||
//! this pass eventually retires the lib-A row.
|
||||
//!
|
||||
//! 2. **Back-ref refresh** — hash-keyed rows (`face_detections` and,
|
||||
//! after Branch B, `tagged_photo` / `photo_insights`) carry a
|
||||
//! denormalized `(library_id, rel_path)` back-ref. After a move,
|
||||
//! that back-ref may point at a deleted row. The refresh pass
|
||||
//! finds rows whose `(library_id, rel_path)` no longer matches
|
||||
//! any `image_exif` row but whose `content_hash` does, and updates
|
||||
//! the back-ref to one of the surviving paths. Idempotent.
|
||||
//!
|
||||
//! 3. **Orphan GC** — when a `content_hash` no longer has any
|
||||
//! `image_exif` row referencing it, hash-keyed derived rows for
|
||||
//! that hash become eligible for deletion. To survive transient
|
||||
//! unmounts, the pass uses a **two-tick consensus rule**: a hash
|
||||
//! must be observed orphaned for two consecutive ticks AND every
|
||||
//! library must be online for both observations. The "marked but
|
||||
//! not yet deleted" state is held in memory; restarting the
|
||||
//! watcher resets it (which is fine — the second tick simply
|
||||
//! happens after the next tick, not the very next one).
|
||||
//!
|
||||
//! Pass 1 is filesystem-dependent and gated on the per-library
|
||||
//! availability probe. Passes 2 and 3 are database-only but pass 3
|
||||
//! additionally requires every library to be online for the
|
||||
//! consensus window.
|
||||
|
||||
use std::collections::HashSet;
|
||||
use std::path::Path;
|
||||
use std::sync::{Arc, Mutex};
|
||||
|
||||
use diesel::prelude::*;
|
||||
use diesel::sql_query;
|
||||
use diesel::sqlite::SqliteConnection;
|
||||
use log::{debug, info, warn};
|
||||
|
||||
use crate::database::ExifDao;
|
||||
use crate::libraries::{Library, LibraryHealthMap};
|
||||
|
||||
/// Cap on missing-file deletions per library per tick. Prevents a
|
||||
/// pathological mount that returns "not found" for everything (e.g.
|
||||
/// case-sensitivity flip on a network share that the probe didn't
|
||||
/// catch) from wiping the entire image_exif table in one tick. Tunable
|
||||
/// via `IMAGE_EXIF_MISSING_DELETE_CAP_PER_TICK`.
|
||||
pub const DEFAULT_MISSING_DELETE_CAP: usize = 200;
|
||||
|
||||
/// Page size for the missing-file scan. We stat() every row in this
|
||||
/// batch but only delete those that are confirmed-not-found (subject
|
||||
/// to the delete cap above). Tunable via
|
||||
/// `IMAGE_EXIF_MISSING_SCAN_PAGE_SIZE`.
|
||||
pub const DEFAULT_SCAN_PAGE_SIZE: i64 = 500;
|
||||
|
||||
/// Scan a page of `image_exif` rows for `library`, stat() each one,
|
||||
/// and delete rows whose source file is gone. Returns
|
||||
/// `(deleted, next_offset)`. `next_offset` wraps to 0 when the page
|
||||
/// returned fewer rows than the page size, so the watcher cycles
|
||||
/// through the whole library across ticks.
|
||||
///
|
||||
/// Caller must already have confirmed the library is online — running
|
||||
/// against a Stale library would interpret every row as missing.
|
||||
pub fn detect_missing_files_for_library(
|
||||
context: &opentelemetry::Context,
|
||||
library: &Library,
|
||||
exif_dao: &Arc<Mutex<Box<dyn ExifDao>>>,
|
||||
offset: i64,
|
||||
page_size: i64,
|
||||
delete_cap: usize,
|
||||
) -> (usize, i64) {
|
||||
let rows = {
|
||||
let mut dao = exif_dao.lock().expect("exif_dao poisoned");
|
||||
match dao.list_rel_paths_for_library_page(context, library.id, page_size, offset) {
|
||||
Ok(r) => r,
|
||||
Err(e) => {
|
||||
warn!(
|
||||
"missing-file scan: list page failed for library '{}' (offset={}): {:?}",
|
||||
library.name, offset, e
|
||||
);
|
||||
return (0, offset);
|
||||
}
|
||||
}
|
||||
};
|
||||
let n_returned = rows.len();
|
||||
// Wrap offset when we hit the end of the table — next tick starts
|
||||
// a fresh sweep. Doing it here rather than on the next call keeps
|
||||
// the offset accounting visible in one place.
|
||||
let next_offset = if (n_returned as i64) < page_size {
|
||||
0
|
||||
} else {
|
||||
offset + page_size
|
||||
};
|
||||
|
||||
if rows.is_empty() {
|
||||
return (0, next_offset);
|
||||
}
|
||||
|
||||
let root = Path::new(&library.root_path);
|
||||
let mut to_delete: Vec<String> = Vec::new();
|
||||
for (_id, rel_path) in &rows {
|
||||
if to_delete.len() >= delete_cap {
|
||||
break;
|
||||
}
|
||||
let abs = root.join(rel_path);
|
||||
match std::fs::metadata(&abs) {
|
||||
Ok(_) => {
|
||||
// File still exists — nothing to do.
|
||||
}
|
||||
Err(e) if e.kind() == std::io::ErrorKind::NotFound => {
|
||||
to_delete.push(rel_path.clone());
|
||||
}
|
||||
Err(e) => {
|
||||
// Permission denied / IO error / etc. — skip this row,
|
||||
// leave it for the next sweep. We never want a transient
|
||||
// FS hiccup to mass-delete metadata.
|
||||
debug!(
|
||||
"missing-file scan: stat() error for {:?}, skipping: {:?}",
|
||||
abs, e
|
||||
);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if to_delete.is_empty() {
|
||||
return (0, next_offset);
|
||||
}
|
||||
|
||||
let mut deleted = 0;
|
||||
{
|
||||
let mut dao = exif_dao.lock().expect("exif_dao poisoned");
|
||||
for rel_path in &to_delete {
|
||||
match dao.delete_exif_by_library(context, library.id, rel_path) {
|
||||
Ok(()) => deleted += 1,
|
||||
Err(e) => warn!(
|
||||
"missing-file scan: delete failed for ({}, {}): {:?}",
|
||||
library.id, rel_path, e
|
||||
),
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if deleted > 0 {
|
||||
info!(
|
||||
"missing-file scan: removed {} stale image_exif row(s) from library '{}'",
|
||||
deleted, library.name
|
||||
);
|
||||
}
|
||||
|
||||
(deleted, next_offset)
|
||||
}
|
||||
|
||||
/// Refresh the `(library_id, rel_path)` back-refs on hash-keyed
|
||||
/// tables. A back-ref is stale when:
|
||||
/// - its `content_hash` is non-null,
|
||||
/// - that hash is referenced by at least one `image_exif` row, but
|
||||
/// - the row's own `(library_id, rel_path)` does not appear in
|
||||
/// `image_exif`.
|
||||
///
|
||||
/// In that case, point the back-ref at any surviving image_exif row
|
||||
/// for the same hash. `face_detections` is the canonical case (it
|
||||
/// carries `library_id` + `rel_path` columns); `tagged_photo` and
|
||||
/// `photo_insights` only carry rel_path historically — we still keep
|
||||
/// it in sync here for consistency, picking any surviving rel_path.
|
||||
///
|
||||
/// All-SQL, idempotent. Returns the number of rows updated.
|
||||
pub fn refresh_back_refs(conn: &mut SqliteConnection) -> usize {
|
||||
let mut total = 0usize;
|
||||
|
||||
// face_detections — back-ref is (library_id, rel_path). Repoint to
|
||||
// any surviving image_exif row carrying the same content_hash.
|
||||
let updated = sql_query(
|
||||
"UPDATE face_detections \
|
||||
SET library_id = ( \
|
||||
SELECT ie.library_id FROM image_exif ie \
|
||||
WHERE ie.content_hash = face_detections.content_hash \
|
||||
ORDER BY ie.id LIMIT 1 \
|
||||
), \
|
||||
rel_path = ( \
|
||||
SELECT ie.rel_path FROM image_exif ie \
|
||||
WHERE ie.content_hash = face_detections.content_hash \
|
||||
ORDER BY ie.id LIMIT 1 \
|
||||
) \
|
||||
WHERE EXISTS ( \
|
||||
SELECT 1 FROM image_exif ie \
|
||||
WHERE ie.content_hash = face_detections.content_hash \
|
||||
) \
|
||||
AND NOT EXISTS ( \
|
||||
SELECT 1 FROM image_exif ie \
|
||||
WHERE ie.library_id = face_detections.library_id \
|
||||
AND ie.rel_path = face_detections.rel_path \
|
||||
)",
|
||||
)
|
||||
.execute(conn)
|
||||
.unwrap_or_else(|e| {
|
||||
warn!("back-ref refresh: face_detections update failed: {:?}", e);
|
||||
0
|
||||
});
|
||||
total += updated;
|
||||
|
||||
// tagged_photo — only rel_path. Update to any surviving rel_path
|
||||
// for the same content_hash so the path-only DAO read still finds
|
||||
// tags after a move.
|
||||
let updated = sql_query(
|
||||
"UPDATE tagged_photo \
|
||||
SET rel_path = ( \
|
||||
SELECT ie.rel_path FROM image_exif ie \
|
||||
WHERE ie.content_hash = tagged_photo.content_hash \
|
||||
ORDER BY ie.id LIMIT 1 \
|
||||
) \
|
||||
WHERE content_hash IS NOT NULL \
|
||||
AND EXISTS ( \
|
||||
SELECT 1 FROM image_exif ie \
|
||||
WHERE ie.content_hash = tagged_photo.content_hash \
|
||||
) \
|
||||
AND NOT EXISTS ( \
|
||||
SELECT 1 FROM image_exif ie \
|
||||
WHERE ie.rel_path = tagged_photo.rel_path \
|
||||
)",
|
||||
)
|
||||
.execute(conn)
|
||||
.unwrap_or_else(|e| {
|
||||
warn!("back-ref refresh: tagged_photo update failed: {:?}", e);
|
||||
0
|
||||
});
|
||||
total += updated;
|
||||
|
||||
// photo_insights — has both library_id and rel_path. Update both
|
||||
// when the (library_id, rel_path) tuple no longer matches any
|
||||
// image_exif row but the hash does.
|
||||
let updated = sql_query(
|
||||
"UPDATE photo_insights \
|
||||
SET library_id = ( \
|
||||
SELECT ie.library_id FROM image_exif ie \
|
||||
WHERE ie.content_hash = photo_insights.content_hash \
|
||||
ORDER BY ie.id LIMIT 1 \
|
||||
), \
|
||||
rel_path = ( \
|
||||
SELECT ie.rel_path FROM image_exif ie \
|
||||
WHERE ie.content_hash = photo_insights.content_hash \
|
||||
ORDER BY ie.id LIMIT 1 \
|
||||
) \
|
||||
WHERE content_hash IS NOT NULL \
|
||||
AND EXISTS ( \
|
||||
SELECT 1 FROM image_exif ie \
|
||||
WHERE ie.content_hash = photo_insights.content_hash \
|
||||
) \
|
||||
AND NOT EXISTS ( \
|
||||
SELECT 1 FROM image_exif ie \
|
||||
WHERE ie.library_id = photo_insights.library_id \
|
||||
AND ie.rel_path = photo_insights.rel_path \
|
||||
)",
|
||||
)
|
||||
.execute(conn)
|
||||
.unwrap_or_else(|e| {
|
||||
warn!("back-ref refresh: photo_insights update failed: {:?}", e);
|
||||
0
|
||||
});
|
||||
total += updated;
|
||||
|
||||
if total > 0 {
|
||||
info!("back-ref refresh: updated {} hash-keyed row(s)", total);
|
||||
}
|
||||
total
|
||||
}
|
||||
|
||||
/// One tick's outcome of the orphan-GC pass.
|
||||
#[derive(Debug, Default, Clone, Copy, PartialEq, Eq)]
|
||||
pub struct GcStats {
|
||||
/// Hashes newly observed orphaned this tick (added to the
|
||||
/// pending set).
|
||||
pub newly_marked: usize,
|
||||
/// Hashes that were marked last tick AND are still orphaned this
|
||||
/// tick AND every library is online — these are deleted.
|
||||
pub deleted_face_detections: usize,
|
||||
pub deleted_tagged_photo: usize,
|
||||
pub deleted_photo_insights: usize,
|
||||
/// Hashes dropped from the pending set because they re-appeared
|
||||
/// in image_exif (e.g. user remounted a backup that was briefly
|
||||
/// missing).
|
||||
pub revived: usize,
|
||||
}
|
||||
|
||||
impl GcStats {
|
||||
pub fn changed(&self) -> bool {
|
||||
self.newly_marked > 0
|
||||
|| self.deleted_face_detections > 0
|
||||
|| self.deleted_tagged_photo > 0
|
||||
|| self.deleted_photo_insights > 0
|
||||
|| self.revived > 0
|
||||
}
|
||||
|
||||
pub fn total_deleted(&self) -> usize {
|
||||
self.deleted_face_detections + self.deleted_tagged_photo + self.deleted_photo_insights
|
||||
}
|
||||
}
|
||||
|
||||
/// Two-tick orphan-GC state. The watcher constructs one of these once
|
||||
/// at startup and passes it back into `run_orphan_gc` every tick.
|
||||
#[derive(Debug, Default)]
|
||||
pub struct OrphanGcState {
|
||||
/// Hashes observed orphaned on the previous tick. A hash gets
|
||||
/// promoted to "delete" when it survives a second consecutive
|
||||
/// observation with all libraries online.
|
||||
pending: HashSet<String>,
|
||||
/// Whether every library was online on the previous tick. Combined
|
||||
/// with the all-online check on the current tick, this gives the
|
||||
/// "two consecutive ticks of full availability" guard described in
|
||||
/// CLAUDE.md → "Library availability and safety".
|
||||
prev_tick_all_online: bool,
|
||||
}
|
||||
|
||||
/// Run one tick of the orphan GC. The function is responsible for the
|
||||
/// full lifecycle: probing for orphans, updating `state.pending`,
|
||||
/// performing deletes when consensus is reached, and returning stats
|
||||
/// for the watcher to log.
|
||||
///
|
||||
/// Safety guard: `all_online` MUST reflect every configured library
|
||||
/// being Online right now. Even if true, deletes only happen when the
|
||||
/// previous tick was also all-online. A single Stale tick within the
|
||||
/// window cancels any pending deletes (they stay marked but won't be
|
||||
/// promoted) — they're then re-evaluated next tick.
|
||||
pub fn run_orphan_gc(
|
||||
conn: &mut SqliteConnection,
|
||||
state: &mut OrphanGcState,
|
||||
all_online: bool,
|
||||
) -> GcStats {
|
||||
let mut stats = GcStats::default();
|
||||
|
||||
// Find every distinct content_hash referenced by hash-keyed
|
||||
// derived data that is NOT currently referenced by image_exif.
|
||||
// These are this tick's orphan candidates. Cheap query — three
|
||||
// index lookups + a HashSet at row count of derived tables, which
|
||||
// is small.
|
||||
let orphans: HashSet<String> = match collect_orphan_hashes(conn) {
|
||||
Ok(set) => set,
|
||||
Err(e) => {
|
||||
warn!("orphan-gc: candidate query failed: {:?}", e);
|
||||
return stats;
|
||||
}
|
||||
};
|
||||
|
||||
// Drop entries from pending that are no longer orphaned
|
||||
// ("revived"). Common case: a network share that briefly went
|
||||
// stale comes back, image_exif gets re-populated by ingest, and
|
||||
// the hash is no longer orphaned.
|
||||
let revived = state
|
||||
.pending
|
||||
.difference(&orphans)
|
||||
.cloned()
|
||||
.collect::<Vec<_>>();
|
||||
if !revived.is_empty() {
|
||||
for h in &revived {
|
||||
state.pending.remove(h);
|
||||
}
|
||||
stats.revived = revived.len();
|
||||
}
|
||||
|
||||
if !all_online {
|
||||
// Every Stale library cancels both the consensus window AND
|
||||
// any pending deletes. We *do* still note newly observed
|
||||
// orphans below — that's harmless bookkeeping. But we never
|
||||
// delete this tick.
|
||||
for h in &orphans {
|
||||
if state.pending.insert(h.clone()) {
|
||||
stats.newly_marked += 1;
|
||||
}
|
||||
}
|
||||
state.prev_tick_all_online = false;
|
||||
if stats.changed() {
|
||||
info!(
|
||||
"orphan-gc: {} new orphan hash(es) marked, {} revived (deferred — at least one library Stale; pending: {})",
|
||||
stats.newly_marked,
|
||||
stats.revived,
|
||||
state.pending.len()
|
||||
);
|
||||
} else {
|
||||
debug!(
|
||||
"orphan-gc: stale library, no changes (pending: {})",
|
||||
state.pending.len()
|
||||
);
|
||||
}
|
||||
return stats;
|
||||
}
|
||||
|
||||
// All-online + previous-tick-also-all-online: hashes that are
|
||||
// both pending AND still orphaned this tick are confirmed and
|
||||
// get deleted. Hashes orphaned this tick but not pending get
|
||||
// freshly marked.
|
||||
let consensus_window_open = state.prev_tick_all_online;
|
||||
|
||||
let to_delete: Vec<String> = if consensus_window_open {
|
||||
orphans
|
||||
.iter()
|
||||
.filter(|h| state.pending.contains(*h))
|
||||
.cloned()
|
||||
.collect()
|
||||
} else {
|
||||
Vec::new()
|
||||
};
|
||||
|
||||
for h in &orphans {
|
||||
if !state.pending.contains(h) {
|
||||
state.pending.insert(h.clone());
|
||||
stats.newly_marked += 1;
|
||||
}
|
||||
}
|
||||
|
||||
if !to_delete.is_empty() {
|
||||
match delete_hash_keyed_rows(conn, &to_delete) {
|
||||
Ok((faces, tags, insights)) => {
|
||||
stats.deleted_face_detections = faces;
|
||||
stats.deleted_tagged_photo = tags;
|
||||
stats.deleted_photo_insights = insights;
|
||||
// Drop deleted hashes from pending so we don't try to
|
||||
// re-delete them next tick (they'll have already been
|
||||
// removed from the orphan set).
|
||||
for h in &to_delete {
|
||||
state.pending.remove(h);
|
||||
}
|
||||
}
|
||||
Err(e) => warn!("orphan-gc: delete batch failed: {:?}", e),
|
||||
}
|
||||
}
|
||||
|
||||
state.prev_tick_all_online = true;
|
||||
|
||||
if stats.changed() {
|
||||
info!(
|
||||
"orphan-gc: {} new orphan hash(es) marked, {} revived; deleted {} face_detections / {} tagged_photo / {} photo_insights row(s) (pending: {})",
|
||||
stats.newly_marked,
|
||||
stats.revived,
|
||||
stats.deleted_face_detections,
|
||||
stats.deleted_tagged_photo,
|
||||
stats.deleted_photo_insights,
|
||||
state.pending.len(),
|
||||
);
|
||||
} else {
|
||||
debug!(
|
||||
"orphan-gc: no changes this tick (pending: {})",
|
||||
state.pending.len()
|
||||
);
|
||||
}
|
||||
|
||||
stats
|
||||
}
|
||||
|
||||
/// Helper for the watcher: are *all enabled* libraries currently Online?
|
||||
///
|
||||
/// Disabled libraries are out-of-scope for the orphan-GC consensus
|
||||
/// rule — they don't get probed, don't have a health entry, and a
|
||||
/// system with one disabled library should still be able to GC
|
||||
/// orphans for the remaining online libraries. Treating disabled as
|
||||
/// "blocking" would mean flipping a library to `enabled=false` would
|
||||
/// permanently halt GC, which is the opposite of the intended kill-
|
||||
/// switch semantics ("turn this library off and let the rest of the
|
||||
/// system run normally").
|
||||
pub fn all_libraries_online(libs: &[Library], health: &LibraryHealthMap) -> bool {
|
||||
let guard = health.read().unwrap_or_else(|e| e.into_inner());
|
||||
libs.iter()
|
||||
.filter(|lib| lib.enabled)
|
||||
.all(|lib| guard.get(&lib.id).map(|h| h.is_online()).unwrap_or(false))
|
||||
}
|
||||
|
||||
#[derive(QueryableByName, Debug)]
|
||||
struct HashRow {
|
||||
#[diesel(sql_type = diesel::sql_types::Text)]
|
||||
content_hash: String,
|
||||
}
|
||||
|
||||
fn collect_orphan_hashes(conn: &mut SqliteConnection) -> QueryResult<HashSet<String>> {
|
||||
// Union of every distinct content_hash carried by hash-keyed
|
||||
// derived tables, minus those still referenced by image_exif.
|
||||
let rows = sql_query(
|
||||
"SELECT DISTINCT content_hash FROM ( \
|
||||
SELECT content_hash FROM face_detections WHERE content_hash IS NOT NULL \
|
||||
UNION ALL \
|
||||
SELECT content_hash FROM tagged_photo WHERE content_hash IS NOT NULL \
|
||||
UNION ALL \
|
||||
SELECT content_hash FROM photo_insights WHERE content_hash IS NOT NULL \
|
||||
) AS derived \
|
||||
WHERE content_hash NOT IN ( \
|
||||
SELECT content_hash FROM image_exif WHERE content_hash IS NOT NULL \
|
||||
)",
|
||||
)
|
||||
.get_results::<HashRow>(conn)?;
|
||||
|
||||
Ok(rows.into_iter().map(|r| r.content_hash).collect())
|
||||
}
|
||||
|
||||
/// Delete every hash-keyed row whose `content_hash` is in `hashes`.
|
||||
/// Returns `(faces, tagged_photo, photo_insights)`.
|
||||
fn delete_hash_keyed_rows(
|
||||
conn: &mut SqliteConnection,
|
||||
hashes: &[String],
|
||||
) -> QueryResult<(usize, usize, usize)> {
|
||||
if hashes.is_empty() {
|
||||
return Ok((0, 0, 0));
|
||||
}
|
||||
|
||||
use crate::database::schema::{face_detections, photo_insights, tagged_photo};
|
||||
|
||||
let faces =
|
||||
diesel::delete(face_detections::table.filter(face_detections::content_hash.eq_any(hashes)))
|
||||
.execute(conn)?;
|
||||
let tags =
|
||||
diesel::delete(tagged_photo::table.filter(tagged_photo::content_hash.eq_any(hashes)))
|
||||
.execute(conn)?;
|
||||
let insights =
|
||||
diesel::delete(photo_insights::table.filter(photo_insights::content_hash.eq_any(hashes)))
|
||||
.execute(conn)?;
|
||||
|
||||
Ok((faces, tags, insights))
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
use crate::database::test::in_memory_db_connection;
|
||||
|
||||
fn ensure_library(conn: &mut SqliteConnection, library_id: i32) {
|
||||
diesel::sql_query(
|
||||
"INSERT OR IGNORE INTO libraries (id, name, root_path, created_at) \
|
||||
VALUES (?, 'test-' || ?, '/tmp/test-' || ?, 0)",
|
||||
)
|
||||
.bind::<diesel::sql_types::Integer, _>(library_id)
|
||||
.bind::<diesel::sql_types::Integer, _>(library_id)
|
||||
.bind::<diesel::sql_types::Integer, _>(library_id)
|
||||
.execute(conn)
|
||||
.unwrap();
|
||||
}
|
||||
|
||||
fn insert_image_exif(
|
||||
conn: &mut SqliteConnection,
|
||||
library_id: i32,
|
||||
rel_path: &str,
|
||||
content_hash: Option<&str>,
|
||||
) {
|
||||
ensure_library(conn, library_id);
|
||||
diesel::sql_query(
|
||||
"INSERT INTO image_exif (library_id, rel_path, created_time, last_modified, content_hash) \
|
||||
VALUES (?, ?, 0, 0, ?)",
|
||||
)
|
||||
.bind::<diesel::sql_types::Integer, _>(library_id)
|
||||
.bind::<diesel::sql_types::Text, _>(rel_path)
|
||||
.bind::<diesel::sql_types::Nullable<diesel::sql_types::Text>, _>(content_hash)
|
||||
.execute(conn)
|
||||
.unwrap();
|
||||
}
|
||||
|
||||
fn insert_face(conn: &mut SqliteConnection, library_id: i32, rel_path: &str, hash: &str) {
|
||||
ensure_library(conn, library_id);
|
||||
diesel::sql_query(
|
||||
"INSERT INTO face_detections (library_id, content_hash, rel_path, source, status, model_version, created_at) \
|
||||
VALUES (?, ?, ?, 'auto', 'no_faces', 'v', 0)",
|
||||
)
|
||||
.bind::<diesel::sql_types::Integer, _>(library_id)
|
||||
.bind::<diesel::sql_types::Text, _>(hash)
|
||||
.bind::<diesel::sql_types::Text, _>(rel_path)
|
||||
.execute(conn)
|
||||
.unwrap();
|
||||
}
|
||||
|
||||
fn insert_tag_with_hash(conn: &mut SqliteConnection, rel_path: &str, hash: &str) {
|
||||
diesel::sql_query("INSERT OR IGNORE INTO tags (id, name, created_time) VALUES (1, 't', 0)")
|
||||
.execute(conn)
|
||||
.unwrap();
|
||||
diesel::sql_query(
|
||||
"INSERT INTO tagged_photo (rel_path, tag_id, created_time, content_hash) VALUES (?, 1, 0, ?)",
|
||||
)
|
||||
.bind::<diesel::sql_types::Text, _>(rel_path)
|
||||
.bind::<diesel::sql_types::Text, _>(hash)
|
||||
.execute(conn)
|
||||
.unwrap();
|
||||
}
|
||||
|
||||
fn insert_insight_with_hash(
|
||||
conn: &mut SqliteConnection,
|
||||
library_id: i32,
|
||||
rel_path: &str,
|
||||
hash: &str,
|
||||
) {
|
||||
ensure_library(conn, library_id);
|
||||
diesel::sql_query(
|
||||
"INSERT INTO photo_insights (library_id, rel_path, title, summary, generated_at, model_version, is_current, backend, content_hash) \
|
||||
VALUES (?, ?, 't', 's', 0, 'v', 1, 'local', ?)",
|
||||
)
|
||||
.bind::<diesel::sql_types::Integer, _>(library_id)
|
||||
.bind::<diesel::sql_types::Text, _>(rel_path)
|
||||
.bind::<diesel::sql_types::Text, _>(hash)
|
||||
.execute(conn)
|
||||
.unwrap();
|
||||
}
|
||||
|
||||
#[derive(QueryableByName, Debug)]
|
||||
struct CountRow {
|
||||
#[diesel(sql_type = diesel::sql_types::BigInt)]
|
||||
n: i64,
|
||||
}
|
||||
fn count(conn: &mut SqliteConnection, sql: &str) -> i64 {
|
||||
diesel::sql_query(sql)
|
||||
.get_result::<CountRow>(conn)
|
||||
.unwrap()
|
||||
.n
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn refresh_back_refs_repoints_face_detection_after_move() {
|
||||
let mut conn = in_memory_db_connection();
|
||||
// Original location lib 1, rel "old.jpg". image_exif row gone
|
||||
// (file moved); only the new lib 2 row remains.
|
||||
insert_image_exif(&mut conn, 2, "new.jpg", Some("h1"));
|
||||
insert_face(&mut conn, 1, "old.jpg", "h1");
|
||||
|
||||
let updated = refresh_back_refs(&mut conn);
|
||||
assert_eq!(updated, 1);
|
||||
|
||||
let row = diesel::sql_query("SELECT library_id AS n FROM face_detections")
|
||||
.get_result::<CountRow>(&mut conn)
|
||||
.unwrap();
|
||||
assert_eq!(row.n, 2, "library_id should now point at lib 2");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn refresh_back_refs_no_change_when_back_ref_still_valid() {
|
||||
let mut conn = in_memory_db_connection();
|
||||
insert_image_exif(&mut conn, 1, "a.jpg", Some("h1"));
|
||||
insert_face(&mut conn, 1, "a.jpg", "h1");
|
||||
|
||||
let updated = refresh_back_refs(&mut conn);
|
||||
assert_eq!(updated, 0);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn refresh_back_refs_no_change_when_hash_fully_orphaned() {
|
||||
// Hash exists on face_detections but no surviving image_exif
|
||||
// row for it → the refresh is a no-op (orphan GC handles
|
||||
// these). Important: the SET subquery would return NULL and
|
||||
// we'd null out the back-ref otherwise; the EXISTS guard
|
||||
// protects against that.
|
||||
let mut conn = in_memory_db_connection();
|
||||
insert_face(&mut conn, 1, "gone.jpg", "h1");
|
||||
|
||||
let updated = refresh_back_refs(&mut conn);
|
||||
assert_eq!(updated, 0);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn orphan_gc_requires_two_consecutive_all_online_ticks() {
|
||||
let mut conn = in_memory_db_connection();
|
||||
// Hash present in face_detections but NOT image_exif → orphan.
|
||||
insert_face(&mut conn, 1, "x.jpg", "h-orphan");
|
||||
let mut state = OrphanGcState::default();
|
||||
|
||||
// Tick 1: prev_tick_all_online is false (default), so even
|
||||
// with current tick all-online we mark only.
|
||||
let stats = run_orphan_gc(&mut conn, &mut state, true);
|
||||
assert_eq!(stats.newly_marked, 1);
|
||||
assert_eq!(stats.total_deleted(), 0);
|
||||
assert_eq!(state.pending.len(), 1);
|
||||
|
||||
// Tick 2: prev_tick_all_online is now true, current tick still
|
||||
// all-online → consensus reached, hash gets deleted.
|
||||
let stats = run_orphan_gc(&mut conn, &mut state, true);
|
||||
assert_eq!(stats.deleted_face_detections, 1);
|
||||
assert!(state.pending.is_empty());
|
||||
|
||||
// Tick 3: nothing left.
|
||||
let stats = run_orphan_gc(&mut conn, &mut state, true);
|
||||
assert_eq!(stats.total_deleted(), 0);
|
||||
assert_eq!(stats.newly_marked, 0);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn orphan_gc_resets_consensus_on_stale_library() {
|
||||
let mut conn = in_memory_db_connection();
|
||||
insert_face(&mut conn, 1, "x.jpg", "h-orphan");
|
||||
let mut state = OrphanGcState::default();
|
||||
|
||||
// Tick 1: all-online, mark.
|
||||
run_orphan_gc(&mut conn, &mut state, true);
|
||||
// Tick 2: stale library — consensus window resets, no delete.
|
||||
let stats = run_orphan_gc(&mut conn, &mut state, false);
|
||||
assert_eq!(stats.total_deleted(), 0);
|
||||
assert!(!state.prev_tick_all_online);
|
||||
// Tick 3: all-online again — but we need ANOTHER tick to set
|
||||
// prev_tick_all_online before deletes can fire. So tick 3
|
||||
// marks (no-op on existing pending), tick 4 deletes.
|
||||
let stats = run_orphan_gc(&mut conn, &mut state, true);
|
||||
assert_eq!(stats.total_deleted(), 0);
|
||||
let stats = run_orphan_gc(&mut conn, &mut state, true);
|
||||
assert_eq!(stats.deleted_face_detections, 1);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn orphan_gc_revives_when_image_exif_reappears() {
|
||||
let mut conn = in_memory_db_connection();
|
||||
insert_face(&mut conn, 1, "x.jpg", "h-orphan");
|
||||
let mut state = OrphanGcState::default();
|
||||
|
||||
// Tick 1: mark.
|
||||
run_orphan_gc(&mut conn, &mut state, true);
|
||||
assert!(state.pending.contains("h-orphan"));
|
||||
|
||||
// Between ticks, the image_exif row reappears (e.g. backup
|
||||
// share was briefly stale). Hash is no longer orphaned.
|
||||
insert_image_exif(&mut conn, 2, "x.jpg", Some("h-orphan"));
|
||||
|
||||
let stats = run_orphan_gc(&mut conn, &mut state, true);
|
||||
assert_eq!(stats.revived, 1);
|
||||
assert_eq!(stats.total_deleted(), 0);
|
||||
assert!(state.pending.is_empty());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn orphan_gc_deletes_across_all_three_tables() {
|
||||
let mut conn = in_memory_db_connection();
|
||||
// Same orphan hash appears in all three derived tables.
|
||||
insert_face(&mut conn, 1, "a.jpg", "h-orphan");
|
||||
insert_tag_with_hash(&mut conn, "a.jpg", "h-orphan");
|
||||
insert_insight_with_hash(&mut conn, 1, "a.jpg", "h-orphan");
|
||||
|
||||
let mut state = OrphanGcState::default();
|
||||
run_orphan_gc(&mut conn, &mut state, true);
|
||||
let stats = run_orphan_gc(&mut conn, &mut state, true);
|
||||
assert_eq!(stats.deleted_face_detections, 1);
|
||||
assert_eq!(stats.deleted_tagged_photo, 1);
|
||||
assert_eq!(stats.deleted_photo_insights, 1);
|
||||
|
||||
assert_eq!(
|
||||
count(&mut conn, "SELECT COUNT(*) AS n FROM face_detections"),
|
||||
0
|
||||
);
|
||||
assert_eq!(
|
||||
count(&mut conn, "SELECT COUNT(*) AS n FROM tagged_photo"),
|
||||
0
|
||||
);
|
||||
assert_eq!(
|
||||
count(&mut conn, "SELECT COUNT(*) AS n FROM photo_insights"),
|
||||
0
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn all_libraries_online_helper() {
|
||||
use crate::libraries::{LibraryHealth, new_health_map};
|
||||
let libs = vec![
|
||||
Library {
|
||||
id: 1,
|
||||
name: "a".into(),
|
||||
root_path: "/x".into(),
|
||||
enabled: true,
|
||||
excluded_dirs: Vec::new(),
|
||||
},
|
||||
Library {
|
||||
id: 2,
|
||||
name: "b".into(),
|
||||
root_path: "/y".into(),
|
||||
enabled: true,
|
||||
excluded_dirs: Vec::new(),
|
||||
},
|
||||
];
|
||||
let health = new_health_map(&libs);
|
||||
assert!(all_libraries_online(&libs, &health));
|
||||
|
||||
// Flip lib 2 to stale.
|
||||
{
|
||||
let mut g = health.write().unwrap();
|
||||
g.insert(
|
||||
2,
|
||||
LibraryHealth::Stale {
|
||||
reason: "test".into(),
|
||||
since: 0,
|
||||
},
|
||||
);
|
||||
}
|
||||
assert!(!all_libraries_online(&libs, &health));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn all_libraries_online_treats_disabled_as_out_of_scope() {
|
||||
use crate::libraries::{LibraryHealth, new_health_map};
|
||||
// lib 1 enabled+online, lib 2 disabled (would be treated as
|
||||
// Online in the health map's optimistic seed but the map
|
||||
// entry is irrelevant — disabled libs are filtered out
|
||||
// before the health lookup).
|
||||
let libs = vec![
|
||||
Library {
|
||||
id: 1,
|
||||
name: "a".into(),
|
||||
root_path: "/x".into(),
|
||||
enabled: true,
|
||||
excluded_dirs: Vec::new(),
|
||||
},
|
||||
Library {
|
||||
id: 2,
|
||||
name: "b".into(),
|
||||
root_path: "/y".into(),
|
||||
enabled: false,
|
||||
excluded_dirs: Vec::new(),
|
||||
},
|
||||
];
|
||||
let health = new_health_map(&libs);
|
||||
// Sanity: forcibly mark lib 2 stale to prove disabled wins
|
||||
// over even an explicit Stale entry — the filter skips it
|
||||
// before the health check happens.
|
||||
{
|
||||
let mut g = health.write().unwrap();
|
||||
g.insert(
|
||||
2,
|
||||
LibraryHealth::Stale {
|
||||
reason: "intentionally stale".into(),
|
||||
since: 0,
|
||||
},
|
||||
);
|
||||
}
|
||||
assert!(
|
||||
all_libraries_online(&libs, &health),
|
||||
"disabled library should not block consensus"
|
||||
);
|
||||
}
|
||||
}
|
||||
+2534
-57
File diff suppressed because it is too large
Load Diff
+470
-204
@@ -1,18 +1,25 @@
|
||||
use actix_web::web::Data;
|
||||
use actix_web::{HttpRequest, HttpResponse, Responder, get, web};
|
||||
use chrono::{DateTime, FixedOffset, Local, NaiveDate, TimeZone, Utc};
|
||||
use chrono::LocalResult::{Ambiguous, Single};
|
||||
use chrono::{DateTime, Datelike, FixedOffset, Local, LocalResult, NaiveDate, TimeZone, Utc};
|
||||
use log::{debug, trace, warn};
|
||||
use opentelemetry::KeyValue;
|
||||
use opentelemetry::trace::{Span, Status, TraceContextExt, Tracer};
|
||||
use rayon::prelude::*;
|
||||
use serde::{Deserialize, Serialize};
|
||||
use std::collections::HashSet;
|
||||
use std::path::Path;
|
||||
use std::path::PathBuf;
|
||||
use std::sync::Mutex;
|
||||
use walkdir::WalkDir;
|
||||
|
||||
use crate::data::Claims;
|
||||
use crate::database::ExifDao;
|
||||
use crate::files::is_image_or_video;
|
||||
use crate::libraries::Library;
|
||||
use crate::otel::{extract_context_from_request, global_tracer};
|
||||
use crate::state::AppState;
|
||||
use crate::utils::earliest_fs_time;
|
||||
|
||||
// Helper that encapsulates path-exclusion semantics
|
||||
#[derive(Debug)]
|
||||
@@ -132,16 +139,23 @@ pub struct MemoriesResponse {
|
||||
pub items: Vec<MemoryItem>,
|
||||
}
|
||||
|
||||
/// Convert Unix timestamp to NaiveDate in client timezone
|
||||
fn timestamp_to_naive_date(
|
||||
timestamp: i64,
|
||||
client_timezone: &Option<FixedOffset>,
|
||||
) -> Option<NaiveDate> {
|
||||
let dt_utc = DateTime::<Utc>::from_timestamp(timestamp, 0)?;
|
||||
|
||||
let date = if let Some(tz) = client_timezone {
|
||||
dt_utc.with_timezone(tz).date_naive()
|
||||
} else {
|
||||
dt_utc.with_timezone(&Local).date_naive()
|
||||
};
|
||||
|
||||
Some(date)
|
||||
}
|
||||
|
||||
pub fn extract_date_from_filename(filename: &str) -> Option<DateTime<FixedOffset>> {
|
||||
// Filenames carry only digits — no timezone. We deliberately interpret
|
||||
// them as UTC so `.timestamp()` returns the wall-clock-as-UTC unix
|
||||
// seconds, matching the "naive local reinterpreted as UTC" convention
|
||||
// image_exif.date_taken uses for kamadak-exif DateTimeOriginal (which
|
||||
// is also naive). Anything else (Local::from_local_datetime, the
|
||||
// previous behavior) shifted filename-sourced dates by the SERVER's
|
||||
// TZ offset relative to UTC, making them disagree with EXIF-sourced
|
||||
// dates by hours and double-shifting through Apollo's photo matcher
|
||||
// (which re-anchors naive-as-UTC via the browser TZ).
|
||||
let build_date_from_ymd_capture =
|
||||
|captures: ®ex::Captures| -> Option<DateTime<FixedOffset>> {
|
||||
let year = captures.get(1)?.as_str().parse::<i32>().ok()?;
|
||||
@@ -151,8 +165,16 @@ pub fn extract_date_from_filename(filename: &str) -> Option<DateTime<FixedOffset
|
||||
let min = captures.get(5)?.as_str().parse::<u32>().ok()?;
|
||||
let sec = captures.get(6)?.as_str().parse::<u32>().ok()?;
|
||||
|
||||
let naive = NaiveDate::from_ymd_opt(year, month, day)?.and_hms_opt(hour, min, sec)?;
|
||||
Some(Utc.from_utc_datetime(&naive).fixed_offset())
|
||||
match Local.from_local_datetime(
|
||||
&NaiveDate::from_ymd_opt(year, month, day)?.and_hms_opt(hour, min, sec)?,
|
||||
) {
|
||||
Single(dt) => Some(dt.fixed_offset()),
|
||||
Ambiguous(early_dt, _) => Some(early_dt.fixed_offset()),
|
||||
LocalResult::None => {
|
||||
warn!("Weird local date: {:?}", filename);
|
||||
None
|
||||
}
|
||||
}
|
||||
};
|
||||
|
||||
// 1. Screenshot format: Screenshot_2014-06-01-20-44-50.png
|
||||
@@ -206,37 +228,12 @@ pub fn extract_date_from_filename(filename: &str) -> Option<DateTime<FixedOffset
|
||||
let timestamp_str = captures.get(1)?.as_str();
|
||||
let len = timestamp_str.len();
|
||||
|
||||
// Snapchat used real unix-second filenames in its early era
|
||||
// (e.g. `Snapchat-1383929602.jpg` = 2013-11-08), then switched to
|
||||
// monotonic sequential IDs whose digits overlap plausible epoch
|
||||
// ranges (`Snapchat-1021849065.mp4` truncates to 2002, actually
|
||||
// saved 2021; `Snapchat-1751031586660373917.jpg` is 19 digits,
|
||||
// truncates to 2002, actually 2016). Discriminate by:
|
||||
// - exactly 10 captured digits AND post-2011-09-23 (launch) → real epoch
|
||||
// - anything else under this prefix → sequential ID, fall through
|
||||
// The Snapchat-launch floor catches the 10-digit-2002 case; the
|
||||
// length=10 gate catches the multi-digit sequential IDs (which
|
||||
// get truncated to 16 by the regex above).
|
||||
let lower = filename.to_ascii_lowercase();
|
||||
let is_snapchat = lower.starts_with("snapchat-");
|
||||
if is_snapchat && len != 10 {
|
||||
return None;
|
||||
}
|
||||
|
||||
// Skip autogenerated filenames that start with "10000" (e.g., 1000004178.jpg)
|
||||
// These are not timestamps but auto-generated file IDs
|
||||
if timestamp_str.starts_with("10000") {
|
||||
return None;
|
||||
}
|
||||
|
||||
// A leading zero rules out a real unix timestamp at any sane
|
||||
// resolution (seconds since 2001-09-09, ms since 1970-01-01 are
|
||||
// both 10+ digits with no leading zero). Filenames like
|
||||
// `000227580005.jpg` are sequential scan IDs, not timestamps.
|
||||
if timestamp_str.starts_with('0') {
|
||||
return None;
|
||||
}
|
||||
|
||||
// Try milliseconds first (13 digits exactly)
|
||||
if len == 13
|
||||
&& let Some(date_time) = timestamp_str
|
||||
@@ -244,7 +241,6 @@ pub fn extract_date_from_filename(filename: &str) -> Option<DateTime<FixedOffset
|
||||
.ok()
|
||||
.and_then(DateTime::from_timestamp_millis)
|
||||
.map(|naive_dt| naive_dt.fixed_offset())
|
||||
.and_then(plausible_filename_date)
|
||||
{
|
||||
return Some(date_time);
|
||||
}
|
||||
@@ -257,7 +253,6 @@ pub fn extract_date_from_filename(filename: &str) -> Option<DateTime<FixedOffset
|
||||
.ok()
|
||||
.and_then(|timestamp_secs| DateTime::from_timestamp(timestamp_secs, 0))
|
||||
.map(|naive_dt| naive_dt.fixed_offset())
|
||||
.and_then(plausible_filename_date)
|
||||
{
|
||||
return Some(date_time);
|
||||
}
|
||||
@@ -269,15 +264,7 @@ pub fn extract_date_from_filename(filename: &str) -> Option<DateTime<FixedOffset
|
||||
.ok()
|
||||
.and_then(|timestamp_secs| DateTime::from_timestamp(timestamp_secs, 0))
|
||||
.map(|naive_dt| naive_dt.fixed_offset())
|
||||
.and_then(plausible_filename_date)
|
||||
{
|
||||
// Snapchat launched 2011-09-23. A 10-digit Snapchat filename
|
||||
// dated before that is a sequential ID (e.g.
|
||||
// `Snapchat-1021849065.mp4` parses to 2002), not a real epoch.
|
||||
const SNAPCHAT_LAUNCH_TS: i64 = 1_316_736_000;
|
||||
if is_snapchat && date_time.timestamp() < SNAPCHAT_LAUNCH_TS {
|
||||
return None;
|
||||
}
|
||||
return Some(date_time);
|
||||
}
|
||||
|
||||
@@ -288,7 +275,6 @@ pub fn extract_date_from_filename(filename: &str) -> Option<DateTime<FixedOffset
|
||||
.ok()
|
||||
.and_then(DateTime::from_timestamp_millis)
|
||||
.map(|naive_dt| naive_dt.fixed_offset())
|
||||
.and_then(plausible_filename_date)
|
||||
{
|
||||
return Some(date_time);
|
||||
}
|
||||
@@ -297,42 +283,232 @@ pub fn extract_date_from_filename(filename: &str) -> Option<DateTime<FixedOffset
|
||||
None
|
||||
}
|
||||
|
||||
/// Sanity gate for filename-derived timestamps. Real photo capture dates
|
||||
/// live in a narrow window; values outside it are almost always sequential
|
||||
/// scan IDs (`000227580005.jpg` → 1970) or arbitrary numeric suffixes
|
||||
/// (`IMG_21323906751390.jpeg` → 2037) that the regex caught by accident.
|
||||
/// Rejecting them lets the date_resolver waterfall fall through to
|
||||
/// `fs_time`, which is a much better proxy for content age than a fake
|
||||
/// epoch date.
|
||||
fn plausible_filename_date(dt: DateTime<FixedOffset>) -> Option<DateTime<FixedOffset>> {
|
||||
use chrono::Datelike;
|
||||
let year = dt.year();
|
||||
// 1995 predates digital photography for most users; allowing one year
|
||||
// past `now` covers clock-skew on freshly-taken shots without letting
|
||||
// 2037 timestamps through.
|
||||
let max_year = Utc::now().year() + 1;
|
||||
if (1995..=max_year).contains(&year) {
|
||||
Some(dt)
|
||||
} else {
|
||||
None
|
||||
/// Get the canonical date for a memory with priority: filename → EXIF → metadata
|
||||
/// Returns (NaiveDate for matching, timestamp for display, modified timestamp)
|
||||
fn get_memory_date_with_priority(
|
||||
path: &Path,
|
||||
exif_date_taken: Option<i64>,
|
||||
client_timezone: &Option<FixedOffset>,
|
||||
) -> Option<(NaiveDate, Option<i64>, Option<i64>)> {
|
||||
// Read file metadata once
|
||||
let meta = std::fs::metadata(path).ok()?;
|
||||
|
||||
// Priority 1: Try to extract date from filename
|
||||
if let Some(filename_date) = path
|
||||
.file_name()
|
||||
.and_then(|f| f.to_str())
|
||||
.and_then(extract_date_from_filename)
|
||||
{
|
||||
// Convert to client timezone if specified
|
||||
let date_in_timezone = if let Some(tz) = client_timezone {
|
||||
filename_date.with_timezone(tz)
|
||||
} else {
|
||||
filename_date.with_timezone(&Local).fixed_offset()
|
||||
};
|
||||
|
||||
let timestamp = if let Some(tz) = client_timezone {
|
||||
filename_date.with_timezone(tz).timestamp()
|
||||
} else {
|
||||
filename_date.timestamp()
|
||||
};
|
||||
|
||||
let modified = meta.modified().ok().map(|t| {
|
||||
let utc: DateTime<Utc> = t.into();
|
||||
if let Some(tz) = client_timezone {
|
||||
utc.with_timezone(tz).timestamp()
|
||||
} else {
|
||||
utc.timestamp()
|
||||
}
|
||||
});
|
||||
|
||||
debug!(
|
||||
"Memory date from filename {:?} > {:?} = {:?}",
|
||||
path.file_name(),
|
||||
filename_date,
|
||||
date_in_timezone
|
||||
);
|
||||
return Some((date_in_timezone.date_naive(), Some(timestamp), modified));
|
||||
}
|
||||
|
||||
// Priority 2: Use EXIF date_taken if available
|
||||
if let Some(exif_timestamp) = exif_date_taken {
|
||||
let date = timestamp_to_naive_date(exif_timestamp, client_timezone)?;
|
||||
|
||||
let modified = meta.modified().ok().map(|t| {
|
||||
let utc: DateTime<Utc> = t.into();
|
||||
if let Some(tz) = client_timezone {
|
||||
utc.with_timezone(tz).timestamp()
|
||||
} else {
|
||||
utc.timestamp()
|
||||
}
|
||||
});
|
||||
|
||||
debug!("Memory date from EXIF {:?} = {:?}", path.file_name(), date);
|
||||
return Some((date, Some(exif_timestamp), modified));
|
||||
}
|
||||
|
||||
// Priority 3: Fall back to metadata (earlier of created/modified — see utils::earliest_fs_time)
|
||||
let system_time = earliest_fs_time(&meta)?;
|
||||
let dt_utc: DateTime<Utc> = system_time.into();
|
||||
|
||||
let date_in_timezone = if let Some(tz) = client_timezone {
|
||||
dt_utc.with_timezone(tz).date_naive()
|
||||
} else {
|
||||
dt_utc.with_timezone(&Local).date_naive()
|
||||
};
|
||||
|
||||
let created_timestamp = if let Some(tz) = client_timezone {
|
||||
dt_utc.with_timezone(tz).timestamp()
|
||||
} else {
|
||||
dt_utc.timestamp()
|
||||
};
|
||||
|
||||
let modified = meta.modified().ok().map(|t| {
|
||||
let utc: DateTime<Utc> = t.into();
|
||||
if let Some(tz) = client_timezone {
|
||||
utc.with_timezone(tz).timestamp()
|
||||
} else {
|
||||
utc.timestamp()
|
||||
}
|
||||
});
|
||||
|
||||
trace!("Fallback metadata create date = {:?}", date_in_timezone);
|
||||
Some((date_in_timezone, Some(created_timestamp), modified))
|
||||
}
|
||||
|
||||
/// Convert a `date_taken` Unix-seconds value to a `NaiveDate` in the
|
||||
/// client's local time. Falls back to server-local when the client didn't
|
||||
/// send a tz hint.
|
||||
fn date_in_client_tz(timestamp: i64, client_timezone: Option<FixedOffset>) -> Option<NaiveDate> {
|
||||
let dt = DateTime::from_timestamp(timestamp, 0)?;
|
||||
Some(match client_timezone {
|
||||
Some(tz) => dt.with_timezone(&tz).date_naive(),
|
||||
None => dt.with_timezone(&Local).date_naive(),
|
||||
})
|
||||
/// Collect memories from EXIF database
|
||||
fn collect_exif_memories(
|
||||
exif_dao: &Data<Mutex<Box<dyn ExifDao>>>,
|
||||
context: &opentelemetry::Context,
|
||||
base_path: &str,
|
||||
library_id: i32,
|
||||
now: NaiveDate,
|
||||
span_mode: MemoriesSpan,
|
||||
years_back: u32,
|
||||
client_timezone: &Option<FixedOffset>,
|
||||
path_excluder: &PathExcluder,
|
||||
) -> Vec<(MemoryItem, NaiveDate)> {
|
||||
// Query database for all files with date_taken
|
||||
let exif_records = match exif_dao.lock() {
|
||||
Ok(mut dao) => match dao.get_all_with_date_taken(context, Some(library_id)) {
|
||||
Ok(records) => records,
|
||||
Err(e) => {
|
||||
warn!("Failed to query EXIF database: {:?}", e);
|
||||
return Vec::new(); // Graceful fallback
|
||||
}
|
||||
},
|
||||
Err(e) => {
|
||||
warn!("Failed to lock EXIF DAO: {:?}", e);
|
||||
return Vec::new();
|
||||
}
|
||||
};
|
||||
|
||||
// Parallel processing with Rayon
|
||||
exif_records
|
||||
.par_iter()
|
||||
.filter_map(|(file_path, date_taken_ts)| {
|
||||
// Build full path
|
||||
let full_path = Path::new(base_path).join(file_path);
|
||||
|
||||
// Check exclusions
|
||||
if path_excluder.is_excluded(&full_path) {
|
||||
return None;
|
||||
}
|
||||
|
||||
// Verify file exists
|
||||
if !full_path.exists() || !full_path.is_file() {
|
||||
warn!("EXIF record exists but file not found: {:?}", full_path);
|
||||
return None;
|
||||
}
|
||||
|
||||
// Get date with priority: filename → EXIF → metadata
|
||||
// This ensures sorting and display use the same date source
|
||||
let (file_date, created, modified) =
|
||||
get_memory_date_with_priority(&full_path, Some(*date_taken_ts), client_timezone)?;
|
||||
|
||||
// Check if matches memory criteria
|
||||
if !is_memories_match(file_path, file_date, now, span_mode, years_back) {
|
||||
return None;
|
||||
}
|
||||
|
||||
Some((
|
||||
MemoryItem {
|
||||
path: file_path.clone(),
|
||||
created,
|
||||
modified,
|
||||
library_id,
|
||||
},
|
||||
file_date,
|
||||
))
|
||||
})
|
||||
.collect()
|
||||
}
|
||||
|
||||
/// Default lookback for `/memories`. The original 15-year cap pre-dated
|
||||
/// most of the imported libraries; bumped to 20 so users with deeper
|
||||
/// archives see those photos surface on the matching anniversary too.
|
||||
pub const DEFAULT_YEARS_BACK: i32 = 20;
|
||||
/// Collect memories from file system scan (for files not in EXIF DB)
|
||||
fn collect_filesystem_memories(
|
||||
base_path: &str,
|
||||
library_id: i32,
|
||||
path_excluder: &PathExcluder,
|
||||
skip_paths: &HashSet<PathBuf>,
|
||||
now: NaiveDate,
|
||||
span_mode: MemoriesSpan,
|
||||
years_back: u32,
|
||||
client_timezone: &Option<FixedOffset>,
|
||||
) -> Vec<(MemoryItem, NaiveDate)> {
|
||||
let base = Path::new(base_path);
|
||||
|
||||
let entries: Vec<_> = WalkDir::new(base)
|
||||
.into_iter()
|
||||
.filter_map(|e| e.ok())
|
||||
.filter(|e| {
|
||||
let path = e.path();
|
||||
|
||||
// Skip if already processed by EXIF query
|
||||
if skip_paths.contains(path) {
|
||||
return false;
|
||||
}
|
||||
|
||||
// Check exclusions
|
||||
if path_excluder.is_excluded(path) {
|
||||
return false;
|
||||
}
|
||||
|
||||
// Only process image/video files
|
||||
e.file_type().is_file() && is_image_or_video(path)
|
||||
})
|
||||
.collect();
|
||||
|
||||
entries
|
||||
.par_iter()
|
||||
.filter_map(|entry| {
|
||||
// Use unified date priority function (no EXIF for filesystem scan)
|
||||
let (file_date, created, modified) =
|
||||
get_memory_date_with_priority(entry.path(), None, client_timezone)?;
|
||||
|
||||
if is_memories_match(
|
||||
entry.path().to_str().unwrap_or("Unknown"),
|
||||
file_date,
|
||||
now,
|
||||
span_mode,
|
||||
years_back,
|
||||
) {
|
||||
let path_relative = entry.path().strip_prefix(base).ok()?.to_str()?.to_string();
|
||||
|
||||
Some((
|
||||
MemoryItem {
|
||||
path: path_relative,
|
||||
created,
|
||||
modified,
|
||||
library_id,
|
||||
},
|
||||
file_date,
|
||||
))
|
||||
} else {
|
||||
None
|
||||
}
|
||||
})
|
||||
.collect()
|
||||
}
|
||||
|
||||
#[get("/memories")]
|
||||
pub async fn list_memories(
|
||||
@@ -349,28 +525,32 @@ pub async fn list_memories(
|
||||
opentelemetry::Context::new().with_remote_span_context(span.span_context().clone());
|
||||
|
||||
let span_mode = q.span.unwrap_or(MemoriesSpan::Day);
|
||||
let span_token = match span_mode {
|
||||
MemoriesSpan::Day => "day",
|
||||
MemoriesSpan::Week => "week",
|
||||
MemoriesSpan::Month => "month",
|
||||
let years_back: u32 = 15;
|
||||
|
||||
// Create timezone from client offset, default to local timezone if not provided
|
||||
let client_timezone = match q.timezone_offset_minutes {
|
||||
Some(offset_mins) => {
|
||||
let offset_secs = offset_mins * 60;
|
||||
Some(
|
||||
FixedOffset::east_opt(offset_secs)
|
||||
.unwrap_or_else(|| FixedOffset::east_opt(0).unwrap()),
|
||||
)
|
||||
}
|
||||
None => None,
|
||||
};
|
||||
let years_back: i32 = DEFAULT_YEARS_BACK;
|
||||
|
||||
// The SQL filter expects a signed offset in minutes from UTC; default
|
||||
// 0 (UTC) when the client didn't send a hint. We also keep a chrono
|
||||
// `FixedOffset` for sorting/secondary-key date math in Rust below —
|
||||
// anchoring both sides on the same value keeps "what SQL matched" and
|
||||
// "what we sort by" consistent.
|
||||
let tz_offset_minutes = q.timezone_offset_minutes.unwrap_or(0);
|
||||
let client_timezone = q
|
||||
.timezone_offset_minutes
|
||||
.and_then(|offset_mins| FixedOffset::east_opt(offset_mins * 60));
|
||||
let now = if let Some(tz) = client_timezone {
|
||||
debug!("Client timezone: {:?}", tz);
|
||||
Utc::now().with_timezone(&tz).date_naive()
|
||||
} else {
|
||||
Local::now().date_naive()
|
||||
};
|
||||
|
||||
debug!(
|
||||
"list_memories: span={:?} tz_offset_min={} years_back={}",
|
||||
span_mode, tz_offset_minutes, years_back
|
||||
);
|
||||
debug!("Now: {:?}", now);
|
||||
|
||||
// Resolve the optional library filter. Unknown values are a 400; None
|
||||
// means "all libraries" — currently equivalent to the primary library
|
||||
// while only one is configured.
|
||||
let library = match crate::libraries::resolve_library_param(&app_state, q.library.as_deref()) {
|
||||
Ok(lib) => lib,
|
||||
Err(msg) => {
|
||||
@@ -378,96 +558,91 @@ pub async fn list_memories(
|
||||
return HttpResponse::BadRequest().body(msg);
|
||||
}
|
||||
};
|
||||
let libraries_to_scan: Vec<&crate::libraries::Library> = match library {
|
||||
// When `library` is `Some`, scope to that one library; otherwise union
|
||||
// across every configured library and let the results interleave.
|
||||
let libraries_to_scan: Vec<&Library> = match library {
|
||||
Some(lib) => vec![lib],
|
||||
None => app_state.libraries.iter().collect(),
|
||||
};
|
||||
|
||||
// (item, date) tuples — `date` is the canonical NaiveDate of the
|
||||
// memory in the client's tz, used as the primary sort key.
|
||||
let mut memories_with_dates: Vec<(MemoryItem, NaiveDate)> = Vec::new();
|
||||
|
||||
for lib in &libraries_to_scan {
|
||||
let base = Path::new(&lib.root_path);
|
||||
let effective = lib.effective_excluded_dirs(&app_state.excluded_dirs);
|
||||
let path_excluder = PathExcluder::new(base, &effective);
|
||||
let path_excluder = PathExcluder::new(base, &app_state.excluded_dirs);
|
||||
|
||||
let rows = match exif_dao.lock() {
|
||||
Ok(mut dao) => match dao.get_memories_in_window(
|
||||
&span_context,
|
||||
lib.id,
|
||||
span_token,
|
||||
years_back,
|
||||
tz_offset_minutes,
|
||||
) {
|
||||
Ok(rows) => rows,
|
||||
Err(e) => {
|
||||
warn!(
|
||||
"Failed to query memories for library '{}': {:?}",
|
||||
lib.name, e
|
||||
);
|
||||
continue;
|
||||
}
|
||||
},
|
||||
Err(e) => {
|
||||
warn!("Failed to lock EXIF DAO: {:?}", e);
|
||||
continue;
|
||||
}
|
||||
};
|
||||
let exif_memories = collect_exif_memories(
|
||||
&exif_dao,
|
||||
&span_context,
|
||||
&lib.root_path,
|
||||
lib.id,
|
||||
now,
|
||||
span_mode,
|
||||
years_back,
|
||||
&client_timezone,
|
||||
&path_excluder,
|
||||
);
|
||||
|
||||
for (rel_path, date_taken_ts, last_modified_ts) in rows {
|
||||
// Apply per-library exclusions in Rust — they're a small
|
||||
// set and pushing them into the SQL WHERE adds bind-param
|
||||
// gymnastics with no measurable win at this scale.
|
||||
let full_path = base.join(&rel_path);
|
||||
if path_excluder.is_excluded(&full_path) {
|
||||
trace!("Memory excluded by PathExcluder: {:?}", full_path);
|
||||
continue;
|
||||
}
|
||||
let exif_paths: HashSet<PathBuf> = exif_memories
|
||||
.iter()
|
||||
.map(|(item, _)| PathBuf::from(&lib.root_path).join(&item.path))
|
||||
.collect();
|
||||
|
||||
let Some(file_date) = date_in_client_tz(date_taken_ts, client_timezone) else {
|
||||
continue;
|
||||
};
|
||||
let fs_memories = collect_filesystem_memories(
|
||||
&lib.root_path,
|
||||
lib.id,
|
||||
&path_excluder,
|
||||
&exif_paths,
|
||||
now,
|
||||
span_mode,
|
||||
years_back,
|
||||
&client_timezone,
|
||||
);
|
||||
|
||||
memories_with_dates.push((
|
||||
MemoryItem {
|
||||
path: rel_path,
|
||||
created: Some(date_taken_ts),
|
||||
modified: Some(last_modified_ts),
|
||||
library_id: lib.id,
|
||||
},
|
||||
file_date,
|
||||
));
|
||||
}
|
||||
memories_with_dates.extend(exif_memories);
|
||||
memories_with_dates.extend(fs_memories);
|
||||
}
|
||||
|
||||
// Sort once over the merged result set. The SQL filter handles the
|
||||
// matching; sort order is purely UI concern.
|
||||
match span_mode {
|
||||
// Month: chronological — gives an "overview" feel.
|
||||
// Sort by absolute time for a more 'overview'
|
||||
MemoriesSpan::Month => memories_with_dates.sort_by(|a, b| a.1.cmp(&b.1)),
|
||||
// Week: full date then timestamp (oldest → newest).
|
||||
// For week span, sort by full date + timestamp (chronological)
|
||||
MemoriesSpan::Week => {
|
||||
memories_with_dates.sort_by(|a, b| {
|
||||
a.1.cmp(&b.1)
|
||||
.then_with(|| match (a.0.created, b.0.created) {
|
||||
(Some(at), Some(bt)) => at.cmp(&bt),
|
||||
// First, sort by full date (year, month, day)
|
||||
let date_cmp = a.1.cmp(&b.1);
|
||||
if date_cmp != std::cmp::Ordering::Equal {
|
||||
return date_cmp;
|
||||
}
|
||||
|
||||
// Then sort by full created timestamp (oldest to newest)
|
||||
match (a.0.created, b.0.created) {
|
||||
(Some(a_time), Some(b_time)) => a_time.cmp(&b_time),
|
||||
(Some(_), None) => std::cmp::Ordering::Less,
|
||||
(None, Some(_)) => std::cmp::Ordering::Greater,
|
||||
(None, None) => std::cmp::Ordering::Equal,
|
||||
}
|
||||
});
|
||||
}
|
||||
// For day span, sort by day of month then by time
|
||||
MemoriesSpan::Day => {
|
||||
memories_with_dates.sort_by(|a, b| {
|
||||
let day_comparison = a.1.day().cmp(&b.1.day());
|
||||
|
||||
if day_comparison == std::cmp::Ordering::Equal {
|
||||
match (a.0.created, b.0.created) {
|
||||
(Some(a_time), Some(b_time)) => a_time.cmp(&b_time),
|
||||
(Some(_), None) => std::cmp::Ordering::Less,
|
||||
(None, Some(_)) => std::cmp::Ordering::Greater,
|
||||
(None, None) => std::cmp::Ordering::Equal,
|
||||
})
|
||||
});
|
||||
}
|
||||
// Day: same calendar day across years, sub-sorted by timestamp.
|
||||
MemoriesSpan::Day => {
|
||||
memories_with_dates.sort_by(|a, b| match (a.0.created, b.0.created) {
|
||||
(Some(at), Some(bt)) => at.cmp(&bt),
|
||||
(Some(_), None) => std::cmp::Ordering::Less,
|
||||
(None, Some(_)) => std::cmp::Ordering::Greater,
|
||||
(None, None) => std::cmp::Ordering::Equal,
|
||||
}
|
||||
} else {
|
||||
day_comparison
|
||||
}
|
||||
});
|
||||
}
|
||||
}
|
||||
// Sort by day of the month and time (using the created timestamp)
|
||||
|
||||
let items: Vec<MemoryItem> = memories_with_dates.into_iter().map(|(m, _)| m).collect();
|
||||
|
||||
@@ -477,7 +652,13 @@ pub async fn list_memories(
|
||||
KeyValue::new("span", format!("{:?}", span_mode)),
|
||||
KeyValue::new("years_back", years_back.to_string()),
|
||||
KeyValue::new("result_count", items.len().to_string()),
|
||||
KeyValue::new("tz_offset_minutes", tz_offset_minutes.to_string()),
|
||||
KeyValue::new(
|
||||
"client_timezone",
|
||||
format!(
|
||||
"{:?}",
|
||||
client_timezone.unwrap_or_else(|| FixedOffset::east_opt(0).unwrap())
|
||||
),
|
||||
),
|
||||
KeyValue::new("excluded_dirs", format!("{:?}", app_state.excluded_dirs)),
|
||||
],
|
||||
);
|
||||
@@ -486,10 +667,50 @@ pub async fn list_memories(
|
||||
HttpResponse::Ok().json(MemoriesResponse { items })
|
||||
}
|
||||
|
||||
fn is_memories_match(
|
||||
file_path: &str,
|
||||
file_date: NaiveDate,
|
||||
today: NaiveDate,
|
||||
span: MemoriesSpan,
|
||||
years_back: u32,
|
||||
) -> bool {
|
||||
if file_date > today {
|
||||
return false;
|
||||
}
|
||||
let years_diff = (today.year() - file_date.year()).unsigned_abs();
|
||||
if years_diff > years_back {
|
||||
warn!(
|
||||
"File ({}) date is too far in the past: {:?} vs {:?}",
|
||||
file_path, file_date, today
|
||||
);
|
||||
return false;
|
||||
}
|
||||
|
||||
match span {
|
||||
MemoriesSpan::Day => same_month_day_any_year(file_date, today),
|
||||
MemoriesSpan::Week => same_week_any_year(file_date, today),
|
||||
MemoriesSpan::Month => same_month_any_year(file_date, today),
|
||||
}
|
||||
}
|
||||
|
||||
fn same_month_day_any_year(a: NaiveDate, b: NaiveDate) -> bool {
|
||||
a.month() == b.month() && a.day() == b.day()
|
||||
}
|
||||
|
||||
// Match same ISO week number and same weekday (ignoring year)
|
||||
fn same_week_any_year(a: NaiveDate, b: NaiveDate) -> bool {
|
||||
a.iso_week().week().eq(&b.iso_week().week())
|
||||
}
|
||||
|
||||
// Match same month (ignoring day and year)
|
||||
fn same_month_any_year(a: NaiveDate, b: NaiveDate) -> bool {
|
||||
a.month() == b.month()
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
use chrono::{Datelike, Timelike};
|
||||
use chrono::Timelike;
|
||||
use std::fs::{self, File};
|
||||
use tempfile::tempdir;
|
||||
|
||||
@@ -648,53 +869,98 @@ mod tests {
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_extract_date_from_filename_leading_zero_scan_id_should_not_match() {
|
||||
// Sequential film-scan IDs like 000227580005.jpg parsed as a 12-digit
|
||||
// ms timestamp resolve to 1970-01-03; the leading zero rules out a
|
||||
// real epoch value at any sane resolution. Resolver should fall
|
||||
// through to fs_time instead of pinning the photo to 1970.
|
||||
assert!(extract_date_from_filename("000227580005.jpg").is_none());
|
||||
fn test_memory_date_priority_filename() {
|
||||
let temp_dir = tempdir().unwrap();
|
||||
let temp_file = temp_dir.path().join("Screenshot_2014-06-01-20-44-50.png");
|
||||
File::create(&temp_file).unwrap();
|
||||
|
||||
// Test that filename takes priority (even with EXIF data available)
|
||||
let exif_date = DateTime::<Utc>::from_timestamp(1609459200, 0) // 2021-01-01
|
||||
.unwrap()
|
||||
.timestamp();
|
||||
|
||||
let (date, created, _) = get_memory_date_with_priority(
|
||||
&temp_file,
|
||||
Some(exif_date),
|
||||
&Some(*Local::now().fixed_offset().offset()),
|
||||
)
|
||||
.unwrap();
|
||||
|
||||
// Check that date is from filename (2014), NOT EXIF (2021)
|
||||
assert_eq!(date.year(), 2014);
|
||||
assert_eq!(date.month(), 6);
|
||||
assert_eq!(date.day(), 1);
|
||||
|
||||
// Check that created timestamp matches the date from filename
|
||||
assert!(created.is_some());
|
||||
let ts = created.unwrap();
|
||||
// The timestamp should be for 2014-06-01 20:44:50 in the LOCAL timezone
|
||||
let dt_from_ts = Local.timestamp_opt(ts, 0).unwrap();
|
||||
assert_eq!(dt_from_ts.year(), 2014);
|
||||
assert_eq!(dt_from_ts.month(), 6);
|
||||
assert_eq!(dt_from_ts.day(), 1);
|
||||
assert_eq!(dt_from_ts.hour(), 20);
|
||||
assert_eq!(dt_from_ts.minute(), 44);
|
||||
assert_eq!(dt_from_ts.second(), 50);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_extract_date_from_filename_far_future_should_not_match() {
|
||||
// IMG_21323906751390.jpeg → first 10 digits = 2132390675 → 2037.
|
||||
// Plausibility gate rejects it so the resolver falls through to
|
||||
// fs_time (which carries the real ingest date).
|
||||
assert!(extract_date_from_filename("IMG_21323906751390.jpeg").is_none());
|
||||
fn test_memory_date_priority_metadata_fallback() {
|
||||
let temp_dir = tempdir().unwrap();
|
||||
let temp_file = temp_dir.path().join("regular_image.jpg");
|
||||
File::create(&temp_file).unwrap();
|
||||
|
||||
// Test metadata fallback when no filename date or EXIF
|
||||
let (date, created, modified) =
|
||||
get_memory_date_with_priority(&temp_file, None, &None).unwrap();
|
||||
|
||||
// Both date and timestamps should be from metadata (recent)
|
||||
let today = Local::now().date_naive();
|
||||
assert_eq!(date.year(), today.year());
|
||||
assert_eq!(date.month(), today.month());
|
||||
|
||||
// Both timestamps should be valid
|
||||
assert!(created.is_some());
|
||||
assert!(modified.is_some());
|
||||
|
||||
// Check that timestamps are recent
|
||||
let dt_created = DateTime::<Utc>::from_timestamp(created.unwrap(), 0).unwrap();
|
||||
assert_eq!(dt_created.year(), today.year());
|
||||
|
||||
let dt_modified = DateTime::<Utc>::from_timestamp(modified.unwrap(), 0).unwrap();
|
||||
assert_eq!(dt_modified.year(), today.year());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_extract_date_from_filename_snapchat_sequential_ids_rejected() {
|
||||
// Modern Snapchat-prefixed filenames carry sequential app-assigned
|
||||
// IDs whose digits happen to fall inside plausible epoch ranges
|
||||
// when truncated. Reported cases (real save dates per FileModifyDate):
|
||||
// Snapchat-1021849065.mp4 → 10 digits → 2002-05-19 (saved 2021)
|
||||
// Snapchat-1751031586660373917.jpg → 19 digits → 2002-09-09 (saved 2016)
|
||||
// We discriminate by length + Snapchat-launch floor: only exactly
|
||||
// 10 digits AND post-2011-09-23 (Snapchat launch) is treated as
|
||||
// a real unix epoch. Anything else falls through to fs_time.
|
||||
assert!(extract_date_from_filename("Snapchat-1021849065.mp4").is_none());
|
||||
assert!(extract_date_from_filename("Snapchat-1751031586660373917.jpg").is_none());
|
||||
// Case-insensitive match — lowercase variant should also reject.
|
||||
assert!(extract_date_from_filename("snapchat-1021849065.mp4").is_none());
|
||||
}
|
||||
fn test_memory_date_priority_exif_over_metadata() {
|
||||
let temp_dir = tempdir().unwrap();
|
||||
let temp_file = temp_dir.path().join("regular_image.jpg");
|
||||
File::create(&temp_file).unwrap();
|
||||
|
||||
#[test]
|
||||
fn test_extract_date_from_filename_snapchat_early_era_unix_epoch() {
|
||||
// Early Snapchat (2013-2014ish) wrote real unix-second filenames.
|
||||
// Snapchat-1383929602.jpg → 1383929602 = 2013-11-08 16:53:22 UTC.
|
||||
// The blanket-prefix denial introduced for sequential IDs broke
|
||||
// these — restore via a length=10 + post-launch sanity gate.
|
||||
let date_time = extract_date_from_filename("Snapchat-1383929602.jpg").unwrap();
|
||||
assert_eq!(date_time.timestamp(), 1383929602);
|
||||
}
|
||||
// Test that EXIF takes priority over metadata (but not filename)
|
||||
// EXIF date: June 15, 2020 12:00:00 UTC (safe from timezone edge cases)
|
||||
let exif_date = DateTime::<Utc>::from_timestamp(1592222400, 0) // 2020-06-15 12:00:00 UTC
|
||||
.unwrap()
|
||||
.timestamp();
|
||||
|
||||
// The obsolete `test_memory_date_priority_*` tests covered the old
|
||||
// request-time waterfall in `get_memory_date_with_priority`. Their
|
||||
// replacement lives in `crate::date_resolver::tests` (resolver
|
||||
// waterfall) and the SQL surface is exercised by integration tests
|
||||
// that hit `get_memories_in_window` directly.
|
||||
let (date, created, modified) =
|
||||
get_memory_date_with_priority(&temp_file, Some(exif_date), &None).unwrap();
|
||||
|
||||
// Date should be from EXIF (2020), not metadata (today)
|
||||
assert_eq!(date.year(), 2020);
|
||||
assert_eq!(date.month(), 6);
|
||||
assert_eq!(date.day(), 15);
|
||||
|
||||
// Created timestamp should also be from EXIF
|
||||
assert!(created.is_some());
|
||||
assert_eq!(created.unwrap(), exif_date);
|
||||
|
||||
// Modified should still be from metadata
|
||||
assert!(modified.is_some());
|
||||
let today = Local::now().date_naive();
|
||||
let dt_modified = DateTime::<Utc>::from_timestamp(modified.unwrap(), 0).unwrap();
|
||||
assert_eq!(dt_modified.year(), today.year());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_path_excluder_absolute_under_base() {
|
||||
|
||||
@@ -1,159 +0,0 @@
|
||||
//! Perceptual image hashing for near-duplicate detection.
|
||||
//!
|
||||
//! Two 64-bit signals per image, packed into i64 for storage and fast
|
||||
//! Hamming distance via XOR + popcount:
|
||||
//!
|
||||
//! - **pHash (DCT)** — robust to lossy recompression, format conversion,
|
||||
//! moderate brightness/contrast shifts. The primary signal.
|
||||
//! - **dHash (gradient)** — much cheaper to compute, robust to scaling
|
||||
//! and small crops. Acts as a fallback / corroboration when pHash is
|
||||
//! ambiguous (very flat images can collide).
|
||||
//!
|
||||
//! Image-only by design. Videos, decode failures, and any image we
|
||||
//! can't open all return `None` — perceptual hash failure is non-fatal
|
||||
//! and must not block the indexer; the file is still hashed by blake3
|
||||
//! and exact-match dedup keeps working.
|
||||
|
||||
use std::path::Path;
|
||||
|
||||
use image_hasher::{HashAlg, HasherConfig};
|
||||
|
||||
/// 64-bit perceptual fingerprint pair.
|
||||
#[derive(Clone, Copy, Debug, PartialEq, Eq)]
|
||||
pub struct PerceptualIdentity {
|
||||
pub phash_64: i64,
|
||||
pub dhash_64: i64,
|
||||
}
|
||||
|
||||
/// Compute pHash + dHash for an image at `path`. Returns `None` on
|
||||
/// decode failure (unsupported format, corrupt bytes, video, etc.) —
|
||||
/// callers should treat that as "no perceptual signal available" and
|
||||
/// proceed with exact-match dedup only.
|
||||
pub fn compute(path: &Path) -> Option<PerceptualIdentity> {
|
||||
let img = image::open(path).ok()?;
|
||||
|
||||
// 8x8 = 64 bits, the standard size for pHash/dHash. Larger sizes
|
||||
// give more discriminative power but no longer fit in i64 and the
|
||||
// marginal robustness isn't worth the storage / index cost for a
|
||||
// personal-scale library.
|
||||
let phash = HasherConfig::new()
|
||||
.hash_alg(HashAlg::Mean)
|
||||
.hash_size(8, 8)
|
||||
.preproc_dct()
|
||||
.to_hasher()
|
||||
.hash_image(&img);
|
||||
|
||||
let dhash = HasherConfig::new()
|
||||
.hash_alg(HashAlg::Gradient)
|
||||
.hash_size(8, 8)
|
||||
.to_hasher()
|
||||
.hash_image(&img);
|
||||
|
||||
Some(PerceptualIdentity {
|
||||
phash_64: bytes_to_i64(phash.as_bytes())?,
|
||||
dhash_64: bytes_to_i64(dhash.as_bytes())?,
|
||||
})
|
||||
}
|
||||
|
||||
/// Hamming distance between two 64-bit perceptual hashes. The primary
|
||||
/// query primitive: two images are "near-duplicates" when this is below
|
||||
/// a threshold (default 8 for pHash, ~12% similarity tolerance). The
|
||||
/// duplicates module clusters via a BK-tree which uses its own copy of
|
||||
/// this calculation; this helper is kept for ad-hoc tools and tests.
|
||||
#[allow(dead_code)]
|
||||
#[inline]
|
||||
pub fn hamming_distance(a: i64, b: i64) -> u32 {
|
||||
(a ^ b).count_ones()
|
||||
}
|
||||
|
||||
fn bytes_to_i64(bytes: &[u8]) -> Option<i64> {
|
||||
if bytes.len() < 8 {
|
||||
return None;
|
||||
}
|
||||
let mut buf = [0u8; 8];
|
||||
buf.copy_from_slice(&bytes[..8]);
|
||||
Some(i64::from_be_bytes(buf))
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
use image::{ImageBuffer, Rgb};
|
||||
|
||||
fn write_test_image(path: &Path, seed: u32) {
|
||||
// Deterministic-but-distinct image content: simple gradient with
|
||||
// a per-seed offset. Gives pHash/dHash a real signal to work
|
||||
// with (a uniform image collapses to all-zero hashes).
|
||||
let img: ImageBuffer<Rgb<u8>, Vec<u8>> = ImageBuffer::from_fn(64, 64, |x, y| {
|
||||
let r = ((x + seed) & 0xFF) as u8;
|
||||
let g = ((y + seed * 2) & 0xFF) as u8;
|
||||
let b = ((x ^ y ^ seed) & 0xFF) as u8;
|
||||
Rgb([r, g, b])
|
||||
});
|
||||
img.save(path).unwrap();
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn identical_bytes_yield_identical_hashes() {
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
let a = dir.path().join("a.png");
|
||||
let b = dir.path().join("b.png");
|
||||
write_test_image(&a, 42);
|
||||
write_test_image(&b, 42);
|
||||
let ha = compute(&a).expect("hash a");
|
||||
let hb = compute(&b).expect("hash b");
|
||||
assert_eq!(ha, hb);
|
||||
assert_eq!(hamming_distance(ha.phash_64, hb.phash_64), 0);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn distinct_images_have_distinct_hashes() {
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
let a = dir.path().join("a.png");
|
||||
let b = dir.path().join("b.png");
|
||||
write_test_image(&a, 42);
|
||||
write_test_image(&b, 123);
|
||||
let ha = compute(&a).expect("hash a");
|
||||
let hb = compute(&b).expect("hash b");
|
||||
assert_ne!(ha.phash_64, hb.phash_64);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn resized_copy_is_near_duplicate_under_threshold() {
|
||||
// The whole point of perceptual hashing: a resized copy of the
|
||||
// same source image should land within a small Hamming distance
|
||||
// of the original. We check the dHash specifically because it's
|
||||
// the more resize-robust of the two; pHash is also tight but
|
||||
// gradient-based dHash gives the most reliable signal here.
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
let a = dir.path().join("a.png");
|
||||
write_test_image(&a, 7);
|
||||
let img = image::open(&a).unwrap();
|
||||
let small = img.resize_exact(32, 32, image::imageops::FilterType::Lanczos3);
|
||||
let b = dir.path().join("b.png");
|
||||
small.save(&b).unwrap();
|
||||
|
||||
let ha = compute(&a).expect("hash a");
|
||||
let hb = compute(&b).expect("hash b");
|
||||
let d_dhash = hamming_distance(ha.dhash_64, hb.dhash_64);
|
||||
assert!(
|
||||
d_dhash <= 8,
|
||||
"expected dhash Hamming distance <= 8 for resized copy, got {}",
|
||||
d_dhash
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn unsupported_path_returns_none() {
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
let p = dir.path().join("notanimage.txt");
|
||||
std::fs::write(&p, b"hello").unwrap();
|
||||
assert!(compute(&p).is_none());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn missing_file_returns_none() {
|
||||
let p = Path::new("/nonexistent/path/that/does/not/exist.png");
|
||||
assert!(compute(p).is_none());
|
||||
}
|
||||
}
|
||||
-355
@@ -1,355 +0,0 @@
|
||||
//! HTTP handlers for the server-side persona store.
|
||||
//!
|
||||
//! Personas previously lived only in mobile AsyncStorage; this module
|
||||
//! elevates them so they can sync across devices and so the
|
||||
//! `entity_facts.persona_id` column has something to reference.
|
||||
//!
|
||||
//! Built-in personas (default / journal / factual) are seeded by the
|
||||
//! migration. Customs are created here and may be migrated up from a
|
||||
//! device's local store via `POST /personas/migrate`.
|
||||
|
||||
use actix_web::dev::{ServiceFactory, ServiceRequest};
|
||||
use actix_web::{App, HttpResponse, Responder, web};
|
||||
use serde::{Deserialize, Serialize};
|
||||
use std::sync::Mutex;
|
||||
|
||||
use crate::data::Claims;
|
||||
use crate::database::models::Persona;
|
||||
use crate::database::{ImportPersona, PersonaDao, PersonaPatch};
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Wire shapes — camelCase out the door, snake_case from the DB.
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[derive(Serialize)]
|
||||
pub struct PersonaView {
|
||||
pub id: String,
|
||||
pub name: String,
|
||||
#[serde(rename = "systemPrompt")]
|
||||
pub system_prompt: String,
|
||||
#[serde(rename = "isBuiltIn")]
|
||||
pub is_built_in: bool,
|
||||
#[serde(rename = "includeAllMemories")]
|
||||
pub include_all_memories: bool,
|
||||
#[serde(rename = "createdAt")]
|
||||
pub created_at: i64,
|
||||
#[serde(rename = "updatedAt")]
|
||||
pub updated_at: i64,
|
||||
/// "Strict mode" — when true, the agent's recall_* tools return
|
||||
/// only facts whose status is 'reviewed'. See migration
|
||||
/// 2026-05-10-000400.
|
||||
#[serde(rename = "reviewedOnlyFacts")]
|
||||
pub reviewed_only_facts: bool,
|
||||
/// Gate for the agent's update_fact / supersede_fact tools.
|
||||
/// Default false — fresh personas let the agent create but not
|
||||
/// alter. See migration 2026-05-10-000500.
|
||||
#[serde(rename = "allowAgentCorrections")]
|
||||
pub allow_agent_corrections: bool,
|
||||
}
|
||||
|
||||
impl From<Persona> for PersonaView {
|
||||
fn from(p: Persona) -> Self {
|
||||
Self {
|
||||
id: p.persona_id,
|
||||
name: p.name,
|
||||
system_prompt: p.system_prompt,
|
||||
is_built_in: p.is_built_in,
|
||||
include_all_memories: p.include_all_memories,
|
||||
created_at: p.created_at,
|
||||
updated_at: p.updated_at,
|
||||
reviewed_only_facts: p.reviewed_only_facts,
|
||||
allow_agent_corrections: p.allow_agent_corrections,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
#[derive(Deserialize)]
|
||||
pub struct CreatePersonaRequest {
|
||||
pub name: String,
|
||||
#[serde(rename = "systemPrompt")]
|
||||
pub system_prompt: String,
|
||||
/// Optional caller-provided id. When present (e.g. a client that
|
||||
/// already minted `"custom-1735124234"` locally and is upgrading from
|
||||
/// the AsyncStorage-only era), the server uses it; collisions return
|
||||
/// 409. When absent the server mints `"custom-<ms>"`.
|
||||
#[serde(default, rename = "personaId")]
|
||||
pub persona_id: Option<String>,
|
||||
}
|
||||
|
||||
#[derive(Deserialize)]
|
||||
pub struct UpdatePersonaRequest {
|
||||
#[serde(default)]
|
||||
pub name: Option<String>,
|
||||
#[serde(default, rename = "systemPrompt")]
|
||||
pub system_prompt: Option<String>,
|
||||
#[serde(default, rename = "includeAllMemories")]
|
||||
pub include_all_memories: Option<bool>,
|
||||
#[serde(default, rename = "reviewedOnlyFacts")]
|
||||
pub reviewed_only_facts: Option<bool>,
|
||||
#[serde(default, rename = "allowAgentCorrections")]
|
||||
pub allow_agent_corrections: Option<bool>,
|
||||
}
|
||||
|
||||
#[derive(Deserialize)]
|
||||
pub struct MigrateRequest {
|
||||
pub personas: Vec<MigratePersona>,
|
||||
}
|
||||
|
||||
#[derive(Deserialize)]
|
||||
pub struct MigratePersona {
|
||||
pub id: String,
|
||||
pub name: String,
|
||||
#[serde(rename = "systemPrompt")]
|
||||
pub system_prompt: String,
|
||||
#[serde(default, rename = "isBuiltIn")]
|
||||
pub is_built_in: bool,
|
||||
#[serde(default, rename = "createdAt")]
|
||||
pub created_at: Option<i64>,
|
||||
}
|
||||
|
||||
#[derive(Serialize)]
|
||||
pub struct MigrateResponse {
|
||||
pub inserted: usize,
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Service registration
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
pub type PersonaDaoData = web::Data<Mutex<Box<dyn PersonaDao>>>;
|
||||
|
||||
pub fn add_persona_services<T>(app: App<T>) -> App<T>
|
||||
where
|
||||
T: ServiceFactory<ServiceRequest, Config = (), Error = actix_web::Error, InitError = ()>,
|
||||
{
|
||||
app.service(
|
||||
web::scope("/personas")
|
||||
.service(web::resource("/migrate").route(web::post().to(migrate_personas)))
|
||||
.service(
|
||||
web::resource("")
|
||||
.route(web::get().to(list_personas))
|
||||
.route(web::post().to(create_persona)),
|
||||
)
|
||||
.service(
|
||||
web::resource("/{persona_id}")
|
||||
.route(web::put().to(update_persona))
|
||||
.route(web::delete().to(delete_persona)),
|
||||
),
|
||||
)
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Handlers
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
fn user_id_from_claims(claims: &Claims) -> Option<i32> {
|
||||
claims.sub.parse::<i32>().ok()
|
||||
}
|
||||
|
||||
async fn list_personas(claims: Claims, dao: PersonaDaoData) -> impl Responder {
|
||||
let Some(uid) = user_id_from_claims(&claims) else {
|
||||
return HttpResponse::Unauthorized().json(serde_json::json!({"error": "Invalid claims"}));
|
||||
};
|
||||
let cx = opentelemetry::Context::current();
|
||||
let mut dao = dao.lock().expect("Unable to lock PersonaDao");
|
||||
match dao.list_personas(&cx, uid) {
|
||||
Ok(rows) => {
|
||||
let views: Vec<PersonaView> = rows.into_iter().map(PersonaView::from).collect();
|
||||
HttpResponse::Ok().json(views)
|
||||
}
|
||||
Err(e) => {
|
||||
log::error!("list_personas error: {:?}", e);
|
||||
HttpResponse::InternalServerError().json(serde_json::json!({"error": "Database error"}))
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
async fn create_persona(
|
||||
claims: Claims,
|
||||
body: web::Json<CreatePersonaRequest>,
|
||||
dao: PersonaDaoData,
|
||||
) -> impl Responder {
|
||||
let Some(uid) = user_id_from_claims(&claims) else {
|
||||
return HttpResponse::Unauthorized().json(serde_json::json!({"error": "Invalid claims"}));
|
||||
};
|
||||
|
||||
if body.name.trim().is_empty() {
|
||||
return HttpResponse::BadRequest().json(serde_json::json!({"error": "name is required"}));
|
||||
}
|
||||
if body.system_prompt.trim().is_empty() {
|
||||
return HttpResponse::BadRequest()
|
||||
.json(serde_json::json!({"error": "systemPrompt is required"}));
|
||||
}
|
||||
|
||||
let cx = opentelemetry::Context::current();
|
||||
let mut dao = dao.lock().expect("Unable to lock PersonaDao");
|
||||
|
||||
let pid = match body.persona_id.as_deref() {
|
||||
Some(s) if !s.trim().is_empty() => s.to_string(),
|
||||
_ => format!("custom-{}", chrono::Utc::now().timestamp_millis()),
|
||||
};
|
||||
|
||||
if matches!(pid.as_str(), "default" | "journal" | "factual") {
|
||||
return HttpResponse::Conflict()
|
||||
.json(serde_json::json!({"error": "persona id collides with a built-in"}));
|
||||
}
|
||||
|
||||
// Pre-check existence so we can return 409 cleanly. The DB UNIQUE
|
||||
// would also catch it, but parsing Diesel's "constraint violation"
|
||||
// out of a generic DbError is uglier than a quick lookup.
|
||||
if let Ok(Some(_)) = dao.get_persona(&cx, uid, &pid) {
|
||||
return HttpResponse::Conflict()
|
||||
.json(serde_json::json!({"error": "persona already exists"}));
|
||||
}
|
||||
|
||||
match dao.create_persona(
|
||||
&cx,
|
||||
uid,
|
||||
&pid,
|
||||
&body.name,
|
||||
&body.system_prompt,
|
||||
false,
|
||||
false,
|
||||
) {
|
||||
Ok(p) => HttpResponse::Created().json(PersonaView::from(p)),
|
||||
Err(e) => {
|
||||
log::error!("create_persona error: {:?}", e);
|
||||
HttpResponse::InternalServerError().json(serde_json::json!({"error": "Database error"}))
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
async fn update_persona(
|
||||
claims: Claims,
|
||||
path: web::Path<String>,
|
||||
body: web::Json<UpdatePersonaRequest>,
|
||||
dao: PersonaDaoData,
|
||||
) -> impl Responder {
|
||||
let Some(uid) = user_id_from_claims(&claims) else {
|
||||
return HttpResponse::Unauthorized().json(serde_json::json!({"error": "Invalid claims"}));
|
||||
};
|
||||
let pid = path.into_inner();
|
||||
let cx = opentelemetry::Context::current();
|
||||
let mut dao = dao.lock().expect("Unable to lock PersonaDao");
|
||||
|
||||
// Built-in personas are owned by the migration; the canonical voice
|
||||
// text lives in source. A client renaming or rewriting the prompt
|
||||
// here would diverge from what new users get seeded with and hide
|
||||
// the operator's actual customization (their own custom persona)
|
||||
// from the picker. `include_all_memories` stays editable on
|
||||
// built-ins — that's a per-user preference, not the persona's
|
||||
// identity. Mirrors the same guard delete_persona enforces below.
|
||||
match dao.get_persona(&cx, uid, &pid) {
|
||||
Ok(Some(p)) if p.is_built_in => {
|
||||
let editing_identity = body.name.is_some() || body.system_prompt.is_some();
|
||||
if editing_identity {
|
||||
return HttpResponse::Conflict().json(serde_json::json!({
|
||||
"error": "Cannot edit name or systemPrompt of a built-in persona"
|
||||
}));
|
||||
}
|
||||
}
|
||||
Ok(None) => {
|
||||
return HttpResponse::NotFound()
|
||||
.json(serde_json::json!({"error": "Persona not found"}));
|
||||
}
|
||||
Err(e) => {
|
||||
log::error!("update_persona lookup error: {:?}", e);
|
||||
return HttpResponse::InternalServerError()
|
||||
.json(serde_json::json!({"error": "Database error"}));
|
||||
}
|
||||
Ok(Some(_)) => {}
|
||||
}
|
||||
|
||||
let patch = PersonaPatch {
|
||||
name: body.name.clone(),
|
||||
system_prompt: body.system_prompt.clone(),
|
||||
include_all_memories: body.include_all_memories,
|
||||
reviewed_only_facts: body.reviewed_only_facts,
|
||||
allow_agent_corrections: body.allow_agent_corrections,
|
||||
};
|
||||
match dao.update_persona(&cx, uid, &pid, patch) {
|
||||
Ok(Some(p)) => HttpResponse::Ok().json(PersonaView::from(p)),
|
||||
Ok(None) => {
|
||||
HttpResponse::NotFound().json(serde_json::json!({"error": "Persona not found"}))
|
||||
}
|
||||
Err(e) => {
|
||||
log::error!("update_persona error: {:?}", e);
|
||||
HttpResponse::InternalServerError().json(serde_json::json!({"error": "Database error"}))
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
async fn delete_persona(
|
||||
claims: Claims,
|
||||
path: web::Path<String>,
|
||||
dao: PersonaDaoData,
|
||||
) -> impl Responder {
|
||||
let Some(uid) = user_id_from_claims(&claims) else {
|
||||
return HttpResponse::Unauthorized().json(serde_json::json!({"error": "Invalid claims"}));
|
||||
};
|
||||
let pid = path.into_inner();
|
||||
let cx = opentelemetry::Context::current();
|
||||
let mut dao = dao.lock().expect("Unable to lock PersonaDao");
|
||||
|
||||
match dao.get_persona(&cx, uid, &pid) {
|
||||
Ok(Some(p)) if p.is_built_in => {
|
||||
return HttpResponse::Conflict()
|
||||
.json(serde_json::json!({"error": "Cannot delete built-in persona"}));
|
||||
}
|
||||
Ok(None) => {
|
||||
return HttpResponse::NotFound()
|
||||
.json(serde_json::json!({"error": "Persona not found"}));
|
||||
}
|
||||
Err(e) => {
|
||||
log::error!("delete_persona lookup error: {:?}", e);
|
||||
return HttpResponse::InternalServerError()
|
||||
.json(serde_json::json!({"error": "Database error"}));
|
||||
}
|
||||
Ok(Some(_)) => {}
|
||||
}
|
||||
|
||||
match dao.delete_persona(&cx, uid, &pid) {
|
||||
Ok(_) => HttpResponse::NoContent().finish(),
|
||||
Err(e) => {
|
||||
log::error!("delete_persona error: {:?}", e);
|
||||
HttpResponse::InternalServerError().json(serde_json::json!({"error": "Database error"}))
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
async fn migrate_personas(
|
||||
claims: Claims,
|
||||
body: web::Json<MigrateRequest>,
|
||||
dao: PersonaDaoData,
|
||||
) -> impl Responder {
|
||||
let Some(uid) = user_id_from_claims(&claims) else {
|
||||
return HttpResponse::Unauthorized().json(serde_json::json!({"error": "Invalid claims"}));
|
||||
};
|
||||
let cx = opentelemetry::Context::current();
|
||||
let mut dao = dao.lock().expect("Unable to lock PersonaDao");
|
||||
|
||||
// Filter out built-in ids — those are already seeded by the
|
||||
// migration and re-importing them would be a no-op anyway thanks to
|
||||
// INSERT OR IGNORE, but skipping early avoids the UNIQUE round-trip.
|
||||
let now = chrono::Utc::now().timestamp_millis();
|
||||
let rows: Vec<ImportPersona> = body
|
||||
.personas
|
||||
.iter()
|
||||
.filter(|p| !matches!(p.id.as_str(), "default" | "journal" | "factual"))
|
||||
.map(|p| ImportPersona {
|
||||
persona_id: p.id.clone(),
|
||||
name: p.name.clone(),
|
||||
system_prompt: p.system_prompt.clone(),
|
||||
is_built_in: p.is_built_in,
|
||||
created_at: p.created_at.unwrap_or(now),
|
||||
})
|
||||
.collect();
|
||||
|
||||
match dao.bulk_import(&cx, uid, &rows) {
|
||||
Ok(inserted) => HttpResponse::Ok().json(MigrateResponse { inserted }),
|
||||
Err(e) => {
|
||||
log::error!("migrate_personas error: {:?}", e);
|
||||
HttpResponse::InternalServerError().json(serde_json::json!({"error": "Database error"}))
|
||||
}
|
||||
}
|
||||
}
|
||||
+3
-41
@@ -10,38 +10,22 @@ use crate::database::{
|
||||
connect,
|
||||
};
|
||||
use crate::database::{PreviewDao, SqlitePreviewDao};
|
||||
use crate::faces;
|
||||
use crate::libraries::{self, Library, LibraryHealthMap};
|
||||
use crate::libraries::{self, Library};
|
||||
use crate::tags::{SqliteTagDao, TagDao};
|
||||
use crate::video::actors::{
|
||||
PlaylistGenerator, PreviewClipGenerator, StreamActor, VideoPlaylistManager,
|
||||
};
|
||||
use actix::{Actor, Addr};
|
||||
use std::env;
|
||||
use std::sync::{Arc, Mutex, RwLock};
|
||||
use std::sync::{Arc, Mutex};
|
||||
|
||||
pub struct AppState {
|
||||
pub stream_manager: Arc<Addr<StreamActor>>,
|
||||
pub playlist_manager: Arc<Addr<VideoPlaylistManager>>,
|
||||
pub preview_clip_generator: Arc<Addr<PreviewClipGenerator>>,
|
||||
/// All configured media libraries. Ordered by `id` ascending; the first
|
||||
/// entry is the primary library. Frozen at startup — handlers that
|
||||
/// only need stable lookup (id → name / root_path) read this. Mutable
|
||||
/// flags (`enabled`, `excluded_dirs`) reflect their startup values;
|
||||
/// for live state see [`AppState::live_libraries`].
|
||||
/// entry is the primary library.
|
||||
pub libraries: Vec<Library>,
|
||||
/// Live view of the libraries table, shared mutably between the
|
||||
/// watcher (which reads it at the top of each tick to honour the
|
||||
/// latest `enabled` / `excluded_dirs`) and the PATCH /libraries/{id}
|
||||
/// handler (which writes it on a successful mutation). The split
|
||||
/// from [`AppState::libraries`] is deliberate: handlers that only
|
||||
/// look up by id don't need to take a lock per request.
|
||||
pub live_libraries: Arc<RwLock<Vec<Library>>>,
|
||||
/// Per-library availability snapshot. Updated by the file watcher at
|
||||
/// the top of each tick via `libraries::refresh_health`. HTTP handlers
|
||||
/// read it (e.g. `/libraries` surfacing). See "Library availability
|
||||
/// and safety" in CLAUDE.md.
|
||||
pub library_health: LibraryHealthMap,
|
||||
/// Legacy shim equal to `libraries[0].root_path`. Phase 2 transitional —
|
||||
/// new code should go through `primary_library()`.
|
||||
pub base_path: String,
|
||||
@@ -121,15 +105,11 @@ impl AppState {
|
||||
preview_dao,
|
||||
);
|
||||
|
||||
let library_health = libraries::new_health_map(&libraries_vec);
|
||||
let live_libraries = Arc::new(RwLock::new(libraries_vec.clone()));
|
||||
Self {
|
||||
stream_manager,
|
||||
playlist_manager: Arc::new(video_playlist_manager.start()),
|
||||
preview_clip_generator: Arc::new(preview_clip_generator.start()),
|
||||
libraries: libraries_vec,
|
||||
live_libraries,
|
||||
library_health,
|
||||
base_path,
|
||||
thumbnail_path,
|
||||
video_path,
|
||||
@@ -219,11 +199,6 @@ impl Default for AppState {
|
||||
Arc::new(Mutex::new(Box::new(SqliteTagDao::default())));
|
||||
let knowledge_dao: Arc<Mutex<Box<dyn KnowledgeDao>>> =
|
||||
Arc::new(Mutex::new(Box::new(SqliteKnowledgeDao::new())));
|
||||
let persona_dao: Arc<Mutex<Box<dyn crate::database::PersonaDao>>> = Arc::new(Mutex::new(
|
||||
Box::new(crate::database::SqlitePersonaDao::new()),
|
||||
));
|
||||
let face_dao: Arc<Mutex<Box<dyn faces::FaceDao>>> =
|
||||
Arc::new(Mutex::new(Box::new(faces::SqliteFaceDao::new())));
|
||||
|
||||
// Load base path and ensure the primary library row reflects it.
|
||||
let base_path = env::var("BASE_PATH").expect("BASE_PATH was not set in the env");
|
||||
@@ -249,9 +224,7 @@ impl Default for AppState {
|
||||
location_dao.clone(),
|
||||
search_dao.clone(),
|
||||
tag_dao.clone(),
|
||||
face_dao.clone(),
|
||||
knowledge_dao,
|
||||
persona_dao,
|
||||
libraries_vec.clone(),
|
||||
);
|
||||
|
||||
@@ -368,11 +341,6 @@ impl AppState {
|
||||
Arc::new(Mutex::new(Box::new(SqliteTagDao::default())));
|
||||
let knowledge_dao: Arc<Mutex<Box<dyn KnowledgeDao>>> =
|
||||
Arc::new(Mutex::new(Box::new(SqliteKnowledgeDao::new())));
|
||||
let persona_dao: Arc<Mutex<Box<dyn crate::database::PersonaDao>>> = Arc::new(Mutex::new(
|
||||
Box::new(crate::database::SqlitePersonaDao::new()),
|
||||
));
|
||||
let face_dao: Arc<Mutex<Box<dyn faces::FaceDao>>> =
|
||||
Arc::new(Mutex::new(Box::new(faces::SqliteFaceDao::new())));
|
||||
|
||||
// Initialize test InsightGenerator with all data sources
|
||||
let base_path_str = base_path.to_string_lossy().to_string();
|
||||
@@ -380,8 +348,6 @@ impl AppState {
|
||||
id: crate::libraries::PRIMARY_LIBRARY_ID,
|
||||
name: "main".to_string(),
|
||||
root_path: base_path_str.clone(),
|
||||
enabled: true,
|
||||
excluded_dirs: Vec::new(),
|
||||
};
|
||||
let insight_generator = InsightGenerator::new(
|
||||
ollama.clone(),
|
||||
@@ -395,9 +361,7 @@ impl AppState {
|
||||
location_dao.clone(),
|
||||
search_dao.clone(),
|
||||
tag_dao.clone(),
|
||||
face_dao.clone(),
|
||||
knowledge_dao,
|
||||
persona_dao,
|
||||
vec![test_lib],
|
||||
);
|
||||
|
||||
@@ -420,8 +384,6 @@ impl AppState {
|
||||
id: crate::libraries::PRIMARY_LIBRARY_ID,
|
||||
name: "main".to_string(),
|
||||
root_path: base_path_str.clone(),
|
||||
enabled: true,
|
||||
excluded_dirs: Vec::new(),
|
||||
}];
|
||||
AppState::new(
|
||||
Arc::new(StreamActor {}.start()),
|
||||
|
||||
+9
-464
@@ -33,11 +33,6 @@ where
|
||||
.service(web::resource("image/tags/all").route(web::get().to(get_all_tags::<TagD>)))
|
||||
.service(web::resource("image/tags/batch").route(web::post().to(update_tags::<TagD>)))
|
||||
.service(web::resource("image/tags/lookup").route(web::post().to(lookup_tags_batch::<TagD>)))
|
||||
.service(
|
||||
web::resource("image/tags/{id}")
|
||||
.route(web::put().to(update_tag::<TagD>))
|
||||
.route(web::delete().to(delete_tag::<TagD>)),
|
||||
)
|
||||
}
|
||||
|
||||
async fn add_tag<D: TagDao>(
|
||||
@@ -58,14 +53,7 @@ async fn add_tag<D: TagDao>(
|
||||
tag_dao
|
||||
.get_all_tags(&span_context, None)
|
||||
.and_then(|tags| {
|
||||
// Case-insensitive match. With the unique-NOCASE index on
|
||||
// tags.name now in place, a case-sensitive find here would
|
||||
// miss a casing-only collision and let the subsequent
|
||||
// create_tag INSERT crash on the constraint.
|
||||
if let Some((_, tag)) = tags
|
||||
.iter()
|
||||
.find(|t| t.1.name.eq_ignore_ascii_case(&tag_name))
|
||||
{
|
||||
if let Some((_, tag)) = tags.iter().find(|t| t.1.name == tag_name) {
|
||||
Ok(tag.clone())
|
||||
} else {
|
||||
info!(
|
||||
@@ -83,74 +71,6 @@ async fn add_tag<D: TagDao>(
|
||||
.into_http_internal_err()
|
||||
}
|
||||
|
||||
async fn update_tag<D: TagDao>(
|
||||
_: Claims,
|
||||
http_request: HttpRequest,
|
||||
path: web::Path<i32>,
|
||||
body: web::Json<UpdateTagRequest>,
|
||||
tag_dao: web::Data<Mutex<D>>,
|
||||
) -> impl Responder {
|
||||
let tracer = global_tracer();
|
||||
let context = extract_context_from_request(&http_request);
|
||||
let span = tracer.start_with_context("update_tag", &context);
|
||||
let span_context = opentelemetry::Context::current_with_span(span);
|
||||
|
||||
let id = path.into_inner();
|
||||
let trimmed = body.name.trim();
|
||||
if trimmed.is_empty() {
|
||||
return HttpResponse::BadRequest()
|
||||
.json(serde_json::json!({ "error": "Tag name must not be empty" }));
|
||||
}
|
||||
|
||||
let mut tag_dao = tag_dao.lock().expect("Unable to get TagDao");
|
||||
match tag_dao.update_tag_name(&span_context, id, trimmed) {
|
||||
Ok(UpdateTagOutcome::Renamed(tag)) => {
|
||||
span_context.span().set_status(Status::Ok);
|
||||
info!("Renamed tag {} -> '{}'", id, trimmed);
|
||||
HttpResponse::Ok().json(tag)
|
||||
}
|
||||
Ok(UpdateTagOutcome::NotFound) => {
|
||||
HttpResponse::NotFound().json(serde_json::json!({ "error": "Tag not found" }))
|
||||
}
|
||||
Ok(UpdateTagOutcome::Conflict { existing }) => HttpResponse::Conflict().json(
|
||||
serde_json::json!({ "error": "Tag name already exists", "existing_tag": existing }),
|
||||
),
|
||||
Err(e) => {
|
||||
log::error!("update_tag failed: {:?}", e);
|
||||
HttpResponse::InternalServerError()
|
||||
.json(serde_json::json!({ "error": "Update failed" }))
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
async fn delete_tag<D: TagDao>(
|
||||
_: Claims,
|
||||
http_request: HttpRequest,
|
||||
path: web::Path<i32>,
|
||||
tag_dao: web::Data<Mutex<D>>,
|
||||
) -> impl Responder {
|
||||
let tracer = global_tracer();
|
||||
let context = extract_context_from_request(&http_request);
|
||||
let span = tracer.start_with_context("delete_tag", &context);
|
||||
let span_context = opentelemetry::Context::current_with_span(span);
|
||||
|
||||
let id = path.into_inner();
|
||||
let mut tag_dao = tag_dao.lock().expect("Unable to get TagDao");
|
||||
match tag_dao.delete_tag(&span_context, id) {
|
||||
Ok(true) => {
|
||||
span_context.span().set_status(Status::Ok);
|
||||
info!("Deleted tag {}", id);
|
||||
HttpResponse::NoContent().finish()
|
||||
}
|
||||
Ok(false) => HttpResponse::NotFound().json(serde_json::json!({ "error": "Tag not found" })),
|
||||
Err(e) => {
|
||||
log::error!("delete_tag failed: {:?}", e);
|
||||
HttpResponse::InternalServerError()
|
||||
.json(serde_json::json!({ "error": "Delete failed" }))
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
async fn get_tags<D: TagDao>(
|
||||
_: Claims,
|
||||
http_request: HttpRequest,
|
||||
@@ -364,15 +284,9 @@ async fn lookup_tags_batch<D: TagDao>(
|
||||
// Stage 1: query → content_hash mapping. Files without a hash yet
|
||||
// (just-indexed, hash compute failed, etc.) skip the sibling
|
||||
// expansion and only get tags from their own rel_path.
|
||||
// Library-agnostic by design: this endpoint takes raw rel_paths from
|
||||
// the client (typically Apollo) with no library context. Span all
|
||||
// libraries and let the hash-keyed sibling expansion below do the
|
||||
// disambiguation. Same-rel_path/different-content collisions across
|
||||
// libraries surface as multiple hashes for one path — fine, we union
|
||||
// every sibling tag set.
|
||||
let exif_records = {
|
||||
let mut dao = exif_dao.lock().expect("Unable to get ExifDao");
|
||||
match dao.get_exif_batch(&span_context, None, &query_paths) {
|
||||
match dao.get_exif_batch(&span_context, &query_paths) {
|
||||
Ok(rows) => rows,
|
||||
Err(e) => {
|
||||
return HttpResponse::InternalServerError()
|
||||
@@ -507,11 +421,6 @@ pub struct InsertTaggedPhoto {
|
||||
#[diesel(column_name = rel_path)]
|
||||
pub photo_name: String,
|
||||
pub created_time: i64,
|
||||
/// Hash-keyed identity. The DAO populates this from
|
||||
/// `image_exif.content_hash` at insert time when known; the
|
||||
/// reconciliation pass backfills rows inserted before the hash
|
||||
/// landed. See CLAUDE.md "Multi-library data model".
|
||||
pub content_hash: Option<String>,
|
||||
}
|
||||
|
||||
#[derive(Queryable, Clone, Debug)]
|
||||
@@ -525,8 +434,6 @@ pub struct TaggedPhoto {
|
||||
pub tag_id: i32,
|
||||
#[allow(dead_code)] // Part of API contract
|
||||
pub created_time: i64,
|
||||
#[allow(dead_code)]
|
||||
pub content_hash: Option<String>,
|
||||
}
|
||||
|
||||
#[derive(Debug, Deserialize)]
|
||||
@@ -535,22 +442,6 @@ pub struct AddTagsRequest {
|
||||
pub tag_ids: Vec<i32>,
|
||||
}
|
||||
|
||||
#[derive(Debug, Deserialize)]
|
||||
pub struct UpdateTagRequest {
|
||||
pub name: String,
|
||||
}
|
||||
|
||||
/// Result of an attempted tag rename. Returning a typed outcome (rather
|
||||
/// than `anyhow::Result<Tag>`) lets the handler map each case to a
|
||||
/// distinct HTTP status without sniffing error strings, and keeps the
|
||||
/// 409 path a normal control-flow result instead of a DB constraint
|
||||
/// violation surfacing as a generic 500.
|
||||
pub enum UpdateTagOutcome {
|
||||
Renamed(Tag),
|
||||
NotFound,
|
||||
Conflict { existing: Tag },
|
||||
}
|
||||
|
||||
pub trait TagDao: Send + Sync {
|
||||
fn get_all_tags(
|
||||
&mut self,
|
||||
@@ -620,26 +511,6 @@ pub trait TagDao: Send + Sync {
|
||||
context: &opentelemetry::Context,
|
||||
file_paths: &[String],
|
||||
) -> anyhow::Result<std::collections::HashMap<String, i64>>;
|
||||
/// Rename a tag in place. The tag id stays stable so existing
|
||||
/// `tagged_photo` rows automatically reflect the new name without
|
||||
/// a join-table rewrite. Conflict is resolved against the rest of
|
||||
/// the table case-insensitively (mirroring the
|
||||
/// `idx_tags_name_nocase` UNIQUE index) — a rename that changes
|
||||
/// only the case of the tag's own current name is allowed.
|
||||
fn update_tag_name(
|
||||
&mut self,
|
||||
context: &opentelemetry::Context,
|
||||
id: i32,
|
||||
new_name: &str,
|
||||
) -> anyhow::Result<UpdateTagOutcome>;
|
||||
/// Globally remove a tag and every `tagged_photo` row that
|
||||
/// references it. Returns `true` if a tag was deleted, `false` if
|
||||
/// no row matched the id. The schema's FK is `ON DELETE CASCADE`
|
||||
/// but SQLite only honors that with `PRAGMA foreign_keys = ON`,
|
||||
/// which this project doesn't set — the impl deletes both tables
|
||||
/// explicitly in a single transaction so partial state is
|
||||
/// impossible.
|
||||
fn delete_tag(&mut self, context: &opentelemetry::Context, id: i32) -> anyhow::Result<bool>;
|
||||
}
|
||||
|
||||
pub struct SqliteTagDao {
|
||||
@@ -833,83 +704,6 @@ impl TagDao for SqliteTagDao {
|
||||
})
|
||||
}
|
||||
|
||||
fn update_tag_name(
|
||||
&mut self,
|
||||
context: &opentelemetry::Context,
|
||||
id: i32,
|
||||
new_name: &str,
|
||||
) -> anyhow::Result<UpdateTagOutcome> {
|
||||
let mut conn = self
|
||||
.connection
|
||||
.lock()
|
||||
.expect("Unable to lock SqliteTagDao connection");
|
||||
trace_db_call(context, "update", "update_tag_name", |span| {
|
||||
span.set_attributes(vec![
|
||||
KeyValue::new("tag_id", id as i64),
|
||||
KeyValue::new("new_name", new_name.to_string()),
|
||||
]);
|
||||
|
||||
let target = tags::table
|
||||
.filter(tags::id.eq(id))
|
||||
.select((tags::id, tags::name, tags::created_time))
|
||||
.get_result::<Tag>(conn.deref_mut())
|
||||
.optional()
|
||||
.with_context(|| format!("Unable to look up tag id {}", id))?;
|
||||
let target = match target {
|
||||
Some(t) => t,
|
||||
None => return Ok(UpdateTagOutcome::NotFound),
|
||||
};
|
||||
|
||||
// Case-insensitive collision check on every other row.
|
||||
// Belt-and-suspenders: idx_tags_name_nocase enforces this at
|
||||
// the index level, but checking up front gives the handler
|
||||
// a clean 409 with the existing tag's id instead of a
|
||||
// generic constraint-violation 500. Tags table is small;
|
||||
// loading peers and comparing in Rust avoids a fragile
|
||||
// dsl::sql composition for case-insensitive equality.
|
||||
let conflict = tags::table
|
||||
.filter(tags::id.ne(id))
|
||||
.select((tags::id, tags::name, tags::created_time))
|
||||
.get_results::<Tag>(conn.deref_mut())
|
||||
.with_context(|| "Unable to query for tag-name conflict")?
|
||||
.into_iter()
|
||||
.find(|t| t.name.eq_ignore_ascii_case(new_name));
|
||||
if let Some(existing) = conflict {
|
||||
return Ok(UpdateTagOutcome::Conflict { existing });
|
||||
}
|
||||
|
||||
diesel::update(tags::table.filter(tags::id.eq(id)))
|
||||
.set(tags::name.eq(new_name))
|
||||
.execute(conn.deref_mut())
|
||||
.with_context(|| format!("Unable to rename tag {}", id))?;
|
||||
|
||||
Ok(UpdateTagOutcome::Renamed(Tag {
|
||||
id: target.id,
|
||||
name: new_name.to_string(),
|
||||
created_time: target.created_time,
|
||||
}))
|
||||
})
|
||||
}
|
||||
|
||||
fn delete_tag(&mut self, context: &opentelemetry::Context, id: i32) -> anyhow::Result<bool> {
|
||||
let mut conn = self
|
||||
.connection
|
||||
.lock()
|
||||
.expect("Unable to lock SqliteTagDao connection");
|
||||
trace_db_call(context, "delete", "delete_tag", |span| {
|
||||
span.set_attribute(KeyValue::new("tag_id", id as i64));
|
||||
|
||||
// tagged_photo.tag_id is `ON DELETE CASCADE` and the
|
||||
// connection now sets `PRAGMA foreign_keys = ON`, so a
|
||||
// single DELETE on tags removes its tagged_photo rows
|
||||
// atomically.
|
||||
let removed = diesel::delete(tags::table.filter(tags::id.eq(id)))
|
||||
.execute(conn.deref_mut())
|
||||
.with_context(|| format!("Unable to delete tag {}", id))?;
|
||||
Ok(removed > 0)
|
||||
})
|
||||
}
|
||||
|
||||
fn remove_tag(
|
||||
&mut self,
|
||||
context: &opentelemetry::Context,
|
||||
@@ -965,31 +759,11 @@ impl TagDao for SqliteTagDao {
|
||||
KeyValue::new("tag_id", tag_id.to_string()),
|
||||
]);
|
||||
|
||||
// Eagerly populate content_hash so this tag follows the bytes,
|
||||
// not the path (see CLAUDE.md "Multi-library data model").
|
||||
// None is fine — the reconciliation pass will backfill once
|
||||
// image_exif has a hash for this file. We deliberately don't
|
||||
// require library_id here: the tag handler is library-
|
||||
// agnostic by design, and any matching image_exif row's hash
|
||||
// is acceptable. If the path resolves to different bytes in
|
||||
// different libraries, reconciliation per-library refines.
|
||||
let content_hash: Option<String> = {
|
||||
use crate::database::schema::image_exif as ie;
|
||||
ie::table
|
||||
.filter(ie::rel_path.eq(path))
|
||||
.filter(ie::content_hash.is_not_null())
|
||||
.select(ie::content_hash)
|
||||
.first::<Option<String>>(conn.deref_mut())
|
||||
.ok()
|
||||
.flatten()
|
||||
};
|
||||
|
||||
diesel::insert_into(tagged_photo::table)
|
||||
.values(InsertTaggedPhoto {
|
||||
tag_id,
|
||||
photo_name: path.to_string(),
|
||||
created_time: Utc::now().timestamp(),
|
||||
content_hash,
|
||||
})
|
||||
.execute(conn.deref_mut())
|
||||
.with_context(|| format!("Unable to tag file {:?} in sqlite", path))
|
||||
@@ -1394,7 +1168,6 @@ mod tests {
|
||||
tag_id: tag.id,
|
||||
created_time: Utc::now().timestamp(),
|
||||
photo_name: path.to_string(),
|
||||
content_hash: None,
|
||||
};
|
||||
|
||||
if self.tagged_photos.borrow().contains_key(path) {
|
||||
@@ -1465,54 +1238,6 @@ mod tests {
|
||||
}
|
||||
Ok(counts)
|
||||
}
|
||||
|
||||
fn update_tag_name(
|
||||
&mut self,
|
||||
_context: &opentelemetry::Context,
|
||||
id: i32,
|
||||
new_name: &str,
|
||||
) -> anyhow::Result<UpdateTagOutcome> {
|
||||
// Conflict pass first so the target tag's own old name
|
||||
// doesn't collide with itself.
|
||||
let conflict = self
|
||||
.tags
|
||||
.borrow()
|
||||
.iter()
|
||||
.find(|t| t.id != id && t.name.eq_ignore_ascii_case(new_name))
|
||||
.cloned();
|
||||
if let Some(existing) = conflict {
|
||||
return Ok(UpdateTagOutcome::Conflict { existing });
|
||||
}
|
||||
let mut tags = self.tags.borrow_mut();
|
||||
match tags.iter_mut().find(|t| t.id == id) {
|
||||
Some(t) => {
|
||||
t.name = new_name.to_string();
|
||||
Ok(UpdateTagOutcome::Renamed(t.clone()))
|
||||
}
|
||||
None => Ok(UpdateTagOutcome::NotFound),
|
||||
}
|
||||
}
|
||||
|
||||
fn delete_tag(
|
||||
&mut self,
|
||||
_context: &opentelemetry::Context,
|
||||
id: i32,
|
||||
) -> anyhow::Result<bool> {
|
||||
let target_name = {
|
||||
let tags = self.tags.borrow();
|
||||
tags.iter().find(|t| t.id == id).map(|t| t.name.clone())
|
||||
};
|
||||
let Some(name) = target_name else {
|
||||
return Ok(false);
|
||||
};
|
||||
// Mirror the cascade: drop any tagged_photo references, then
|
||||
// remove the tag itself.
|
||||
for (_path, tags) in self.tagged_photos.borrow_mut().iter_mut() {
|
||||
tags.retain(|t| t.id != id && t.name != name);
|
||||
}
|
||||
self.tags.borrow_mut().retain(|t| t.id != id);
|
||||
Ok(true)
|
||||
}
|
||||
}
|
||||
|
||||
#[actix_rt::test]
|
||||
@@ -1528,29 +1253,20 @@ mod tests {
|
||||
// Seed: two paths tagged, one path untagged.
|
||||
dao.tagged_photos.borrow_mut().insert(
|
||||
"a.jpg".into(),
|
||||
vec![Tag {
|
||||
id: 1,
|
||||
name: "alpha".into(),
|
||||
created_time: 0,
|
||||
}],
|
||||
vec![Tag { id: 1, name: "alpha".into(), created_time: 0 }],
|
||||
);
|
||||
dao.tagged_photos.borrow_mut().insert(
|
||||
"b.jpg".into(),
|
||||
vec![
|
||||
Tag {
|
||||
id: 2,
|
||||
name: "beta".into(),
|
||||
created_time: 0,
|
||||
},
|
||||
Tag {
|
||||
id: 3,
|
||||
name: "gamma".into(),
|
||||
created_time: 0,
|
||||
},
|
||||
Tag { id: 2, name: "beta".into(), created_time: 0 },
|
||||
Tag { id: 3, name: "gamma".into(), created_time: 0 },
|
||||
],
|
||||
);
|
||||
let grouped = dao
|
||||
.get_tags_grouped_by_paths(&ctx, &["a.jpg".into(), "b.jpg".into(), "c.jpg".into()])
|
||||
.get_tags_grouped_by_paths(
|
||||
&ctx,
|
||||
&["a.jpg".into(), "b.jpg".into(), "c.jpg".into()],
|
||||
)
|
||||
.unwrap();
|
||||
assert_eq!(grouped.get("a.jpg").map(|v| v.len()), Some(1));
|
||||
assert_eq!(grouped.get("b.jpg").map(|v| v.len()), Some(2));
|
||||
@@ -1665,177 +1381,6 @@ mod tests {
|
||||
None
|
||||
);
|
||||
}
|
||||
|
||||
async fn rename_tag(
|
||||
dao: &Data<Mutex<TestTagDao>>,
|
||||
id: i32,
|
||||
new_name: &str,
|
||||
) -> actix_web::http::StatusCode {
|
||||
use actix_web::Responder;
|
||||
let req = TestRequest::default().to_http_request();
|
||||
let body = web::Json(UpdateTagRequest {
|
||||
name: new_name.to_string(),
|
||||
});
|
||||
let claims = Claims::valid_user(String::from("1"));
|
||||
let resp = update_tag(claims, req.clone(), web::Path::from(id), body, dao.clone()).await;
|
||||
resp.respond_to(&req).status()
|
||||
}
|
||||
|
||||
#[actix_rt::test]
|
||||
async fn update_tag_renames_successfully() {
|
||||
let mut dao = TestTagDao::new();
|
||||
let tag = dao
|
||||
.create_tag(&opentelemetry::Context::current(), "old")
|
||||
.unwrap();
|
||||
let dao = Data::new(Mutex::new(dao));
|
||||
|
||||
assert_eq!(
|
||||
rename_tag(&dao, tag.id, "new").await,
|
||||
actix_web::http::StatusCode::OK
|
||||
);
|
||||
|
||||
let mut locked = dao.lock().unwrap();
|
||||
let all = locked
|
||||
.get_all_tags(&opentelemetry::Context::current(), None)
|
||||
.unwrap();
|
||||
assert_eq!(all.len(), 1);
|
||||
assert_eq!(all[0].1.name, "new");
|
||||
}
|
||||
|
||||
#[actix_rt::test]
|
||||
async fn update_tag_not_found_returns_404() {
|
||||
let dao = Data::new(Mutex::new(TestTagDao::new()));
|
||||
assert_eq!(
|
||||
rename_tag(&dao, 99999, "nope").await,
|
||||
actix_web::http::StatusCode::NOT_FOUND
|
||||
);
|
||||
}
|
||||
|
||||
#[actix_rt::test]
|
||||
async fn update_tag_empty_name_returns_400() {
|
||||
let mut dao = TestTagDao::new();
|
||||
let tag = dao
|
||||
.create_tag(&opentelemetry::Context::current(), "keep")
|
||||
.unwrap();
|
||||
let dao = Data::new(Mutex::new(dao));
|
||||
|
||||
assert_eq!(
|
||||
rename_tag(&dao, tag.id, " ").await,
|
||||
actix_web::http::StatusCode::BAD_REQUEST
|
||||
);
|
||||
|
||||
let mut locked = dao.lock().unwrap();
|
||||
let all = locked
|
||||
.get_all_tags(&opentelemetry::Context::current(), None)
|
||||
.unwrap();
|
||||
assert_eq!(all[0].1.name, "keep", "name must not change on 400");
|
||||
}
|
||||
|
||||
#[actix_rt::test]
|
||||
async fn update_tag_conflict_returns_409() {
|
||||
let mut dao = TestTagDao::new();
|
||||
let _a = dao
|
||||
.create_tag(&opentelemetry::Context::current(), "a")
|
||||
.unwrap();
|
||||
let b = dao
|
||||
.create_tag(&opentelemetry::Context::current(), "b")
|
||||
.unwrap();
|
||||
let dao = Data::new(Mutex::new(dao));
|
||||
|
||||
// Case-insensitive collision: renaming b -> "A" must conflict with a.
|
||||
assert_eq!(
|
||||
rename_tag(&dao, b.id, "A").await,
|
||||
actix_web::http::StatusCode::CONFLICT
|
||||
);
|
||||
|
||||
let mut locked = dao.lock().unwrap();
|
||||
let all = locked
|
||||
.get_all_tags(&opentelemetry::Context::current(), None)
|
||||
.unwrap();
|
||||
let b_after = all.iter().find(|(_, t)| t.id == b.id).unwrap();
|
||||
assert_eq!(b_after.1.name, "b", "no DB change on 409");
|
||||
}
|
||||
|
||||
async fn delete_via_handler(
|
||||
dao: &Data<Mutex<TestTagDao>>,
|
||||
id: i32,
|
||||
) -> actix_web::http::StatusCode {
|
||||
use actix_web::Responder;
|
||||
let req = TestRequest::default().to_http_request();
|
||||
let claims = Claims::valid_user(String::from("1"));
|
||||
let resp = delete_tag(claims, req.clone(), web::Path::from(id), dao.clone()).await;
|
||||
resp.respond_to(&req).status()
|
||||
}
|
||||
|
||||
#[actix_rt::test]
|
||||
async fn delete_tag_removes_tag_and_cascades_tagged_photos() {
|
||||
let mut dao = TestTagDao::new();
|
||||
let tag = dao
|
||||
.create_tag(&opentelemetry::Context::current(), "doomed")
|
||||
.unwrap();
|
||||
dao.tag_file(&opentelemetry::Context::current(), "a.jpg", tag.id)
|
||||
.unwrap();
|
||||
dao.tag_file(&opentelemetry::Context::current(), "b.jpg", tag.id)
|
||||
.unwrap();
|
||||
let dao = Data::new(Mutex::new(dao));
|
||||
|
||||
assert_eq!(
|
||||
delete_via_handler(&dao, tag.id).await,
|
||||
actix_web::http::StatusCode::NO_CONTENT
|
||||
);
|
||||
|
||||
let mut locked = dao.lock().unwrap();
|
||||
assert!(
|
||||
locked
|
||||
.get_all_tags(&opentelemetry::Context::current(), None)
|
||||
.unwrap()
|
||||
.is_empty()
|
||||
);
|
||||
assert!(
|
||||
locked
|
||||
.get_tags_for_path(&opentelemetry::Context::current(), "a.jpg")
|
||||
.unwrap()
|
||||
.is_empty(),
|
||||
"tagged_photo references must be cleaned up by the cascade"
|
||||
);
|
||||
assert!(
|
||||
locked
|
||||
.get_tags_for_path(&opentelemetry::Context::current(), "b.jpg")
|
||||
.unwrap()
|
||||
.is_empty()
|
||||
);
|
||||
}
|
||||
|
||||
#[actix_rt::test]
|
||||
async fn delete_tag_unknown_id_returns_404() {
|
||||
let dao = Data::new(Mutex::new(TestTagDao::new()));
|
||||
assert_eq!(
|
||||
delete_via_handler(&dao, 99999).await,
|
||||
actix_web::http::StatusCode::NOT_FOUND
|
||||
);
|
||||
}
|
||||
|
||||
#[actix_rt::test]
|
||||
async fn update_tag_case_only_change_succeeds() {
|
||||
let mut dao = TestTagDao::new();
|
||||
let tag = dao
|
||||
.create_tag(&opentelemetry::Context::current(), "vacation")
|
||||
.unwrap();
|
||||
let dao = Data::new(Mutex::new(dao));
|
||||
|
||||
// The conflict check excludes the target's own row, so changing
|
||||
// only the case of the tag's current name must succeed.
|
||||
assert_eq!(
|
||||
rename_tag(&dao, tag.id, "Vacation").await,
|
||||
actix_web::http::StatusCode::OK
|
||||
);
|
||||
|
||||
let mut locked = dao.lock().unwrap();
|
||||
let all = locked
|
||||
.get_all_tags(&opentelemetry::Context::current(), None)
|
||||
.unwrap();
|
||||
assert_eq!(all[0].1.name, "Vacation");
|
||||
}
|
||||
}
|
||||
#[derive(QueryableByName, Debug, Clone)]
|
||||
pub struct FileWithTagCount {
|
||||
|
||||
@@ -1,275 +0,0 @@
|
||||
//! Thumbnail generation + the media-count Prometheus gauges.
|
||||
//!
|
||||
//! Startup and per-tick scans walk each library and produce a 200×200
|
||||
//! thumbnail under `THUMBNAILS/<library_id>/<rel_path>`, falling through
|
||||
//! a fast path (`image` crate), a RAW-preview path (`exif::extract_embedded_jpeg_preview`),
|
||||
//! and ffmpeg for video / HEIF / NEF / ARW. Files that fail every
|
||||
//! decoder get a sibling `.unsupported` sentinel so subsequent scans
|
||||
//! skip them silently.
|
||||
|
||||
use std::path::{Path, PathBuf};
|
||||
|
||||
use lazy_static::lazy_static;
|
||||
use log::{debug, error, info, warn};
|
||||
use opentelemetry::{
|
||||
KeyValue,
|
||||
trace::{Span, TraceContextExt, Tracer},
|
||||
};
|
||||
use prometheus::IntGauge;
|
||||
use rayon::prelude::*;
|
||||
use walkdir::DirEntry;
|
||||
|
||||
use crate::content_hash;
|
||||
use crate::exif;
|
||||
use crate::file_types;
|
||||
use crate::libraries;
|
||||
use crate::otel::global_tracer;
|
||||
use crate::video::actors::{generate_image_thumbnail_ffmpeg, generate_video_thumbnail};
|
||||
|
||||
lazy_static! {
|
||||
pub static ref IMAGE_GAUGE: IntGauge = IntGauge::new(
|
||||
"imageserver_image_total",
|
||||
"Count of the images on the server"
|
||||
)
|
||||
.unwrap();
|
||||
pub static ref VIDEO_GAUGE: IntGauge = IntGauge::new(
|
||||
"imageserver_video_total",
|
||||
"Count of the videos on the server"
|
||||
)
|
||||
.unwrap();
|
||||
}
|
||||
|
||||
/// Sentinel path written next to a would-be thumbnail when a file cannot be
|
||||
/// decoded by either the `image` crate or ffmpeg. Its presence causes future
|
||||
/// scans to skip the file instead of re-logging the failure.
|
||||
pub fn unsupported_thumbnail_sentinel(thumb_path: &Path) -> PathBuf {
|
||||
let mut s = thumb_path.as_os_str().to_owned();
|
||||
s.push(".unsupported");
|
||||
PathBuf::from(s)
|
||||
}
|
||||
|
||||
pub fn generate_image_thumbnail(src: &Path, thumb_path: &Path) -> std::io::Result<()> {
|
||||
// The `image` crate doesn't auto-apply EXIF Orientation on load, and
|
||||
// saving back out as JPEG drops EXIF entirely — so without baking the
|
||||
// rotation into the pixels here, browsers see the raw landscape buffer
|
||||
// of a portrait phone shot and render it sideways. Read once up front
|
||||
// and apply to whichever decode branch we end up taking.
|
||||
let orientation = exif::read_orientation(src).unwrap_or(1);
|
||||
|
||||
// RAW formats (ARW/NEF/CR2/etc): try the file's embedded JPEG preview
|
||||
// first. Avoids ffmpeg choking on proprietary RAW compression (Sony ARW
|
||||
// in particular), and is faster than decoding RAW pixels anyway.
|
||||
if let Some(preview) = exif::extract_embedded_jpeg_preview(src) {
|
||||
let img = image::load_from_memory(&preview).map_err(|e| {
|
||||
std::io::Error::new(
|
||||
std::io::ErrorKind::InvalidData,
|
||||
format!("decode embedded preview {:?}: {}", src, e),
|
||||
)
|
||||
})?;
|
||||
let img = exif::apply_orientation(img, orientation);
|
||||
let scaled = img.thumbnail(200, u32::MAX);
|
||||
scaled
|
||||
.save_with_format(thumb_path, image::ImageFormat::Jpeg)
|
||||
.map_err(|e| std::io::Error::other(format!("save {:?}: {}", thumb_path, e)))?;
|
||||
return Ok(());
|
||||
}
|
||||
|
||||
if file_types::needs_ffmpeg_thumbnail(src) {
|
||||
return generate_image_thumbnail_ffmpeg(src, thumb_path);
|
||||
}
|
||||
|
||||
let img = image::open(src).map_err(|e| {
|
||||
std::io::Error::new(std::io::ErrorKind::InvalidData, format!("{:?}: {}", src, e))
|
||||
})?;
|
||||
let img = exif::apply_orientation(img, orientation);
|
||||
let scaled = img.thumbnail(200, u32::MAX);
|
||||
scaled
|
||||
.save(thumb_path)
|
||||
.map_err(|e| std::io::Error::other(format!("save {:?}: {}", thumb_path, e)))?;
|
||||
Ok(())
|
||||
}
|
||||
|
||||
pub fn create_thumbnails(libs: &[libraries::Library], excluded_dirs: &[String]) {
|
||||
let tracer = global_tracer();
|
||||
let span = tracer.start("creating thumbnails");
|
||||
|
||||
let thumbs = &dotenv::var("THUMBNAILS").expect("THUMBNAILS not defined");
|
||||
let thumbnail_directory: &Path = Path::new(thumbs);
|
||||
|
||||
for lib in libs {
|
||||
info!(
|
||||
"Scanning thumbnails for library '{}' at {}",
|
||||
lib.name, lib.root_path
|
||||
);
|
||||
let images = PathBuf::from(&lib.root_path);
|
||||
// Effective excludes = global env-var excludes ∪ library row's
|
||||
// excluded_dirs. Lets a parent-library mount skip the subtree
|
||||
// already covered by a child library.
|
||||
let effective_excludes = lib.effective_excluded_dirs(excluded_dirs);
|
||||
|
||||
// Prune EXCLUDED_DIRS so we don't generate thumbnails-of-thumbnails
|
||||
// for Synology @eaDir trees. file_scan handles filter_entry pruning.
|
||||
crate::file_scan::walk_library_files(&images, &effective_excludes)
|
||||
.into_par_iter()
|
||||
.for_each(|entry| {
|
||||
let src = entry.path();
|
||||
let Ok(relative_path) = src.strip_prefix(&images) else {
|
||||
return;
|
||||
};
|
||||
// Library-scoped legacy path: prevents two libraries with
|
||||
// the same rel_path from clobbering each other's thumbs.
|
||||
// Hash-keyed promotion happens lazily on first hash-aware
|
||||
// request — keeping this loop ExifDao-free preserves the
|
||||
// current "cargo build && go" startup story.
|
||||
let thumb_path = content_hash::library_scoped_legacy_path(
|
||||
thumbnail_directory,
|
||||
lib.id,
|
||||
relative_path,
|
||||
);
|
||||
let bare_legacy = thumbnail_directory.join(relative_path);
|
||||
|
||||
// Backwards-compat check: if a single-library install has a
|
||||
// bare-legacy thumb here already, accept it as present.
|
||||
// Same for the sentinel. Means we don't redo work after
|
||||
// upgrade and we don't leave stale duplicates around.
|
||||
if thumb_path.exists()
|
||||
|| bare_legacy.exists()
|
||||
|| unsupported_thumbnail_sentinel(&thumb_path).exists()
|
||||
|| unsupported_thumbnail_sentinel(&bare_legacy).exists()
|
||||
{
|
||||
return;
|
||||
}
|
||||
|
||||
let Some(parent) = thumb_path.parent() else {
|
||||
return;
|
||||
};
|
||||
if let Err(e) = std::fs::create_dir_all(parent) {
|
||||
error!("Failed to create thumbnail dir {:?}: {}", parent, e);
|
||||
return;
|
||||
}
|
||||
|
||||
if is_video(&entry) {
|
||||
let mut video_span = tracer.start_with_context(
|
||||
"generate_video_thumbnail",
|
||||
&opentelemetry::Context::new()
|
||||
.with_remote_span_context(span.span_context().clone()),
|
||||
);
|
||||
video_span.set_attributes(vec![
|
||||
KeyValue::new("type", "video"),
|
||||
KeyValue::new("file-name", thumb_path.display().to_string()),
|
||||
KeyValue::new("library", lib.name.clone()),
|
||||
]);
|
||||
|
||||
debug!("Generating video thumbnail: {:?}", thumb_path);
|
||||
if let Err(e) = generate_video_thumbnail(src, &thumb_path) {
|
||||
let sentinel = unsupported_thumbnail_sentinel(&thumb_path);
|
||||
error!(
|
||||
"Unable to thumbnail video {:?}: {}. Writing sentinel {:?}",
|
||||
src, e, sentinel
|
||||
);
|
||||
if let Err(se) = std::fs::write(&sentinel, b"") {
|
||||
warn!("Failed to write sentinel {:?}: {}", sentinel, se);
|
||||
}
|
||||
}
|
||||
video_span.end();
|
||||
} else if is_image(&entry) {
|
||||
match generate_image_thumbnail(src, &thumb_path) {
|
||||
Ok(_) => info!("Saved thumbnail: {:?}", thumb_path),
|
||||
Err(e) => {
|
||||
let sentinel = unsupported_thumbnail_sentinel(&thumb_path);
|
||||
error!(
|
||||
"Unable to thumbnail {:?}: {}. Writing sentinel {:?}",
|
||||
src, e, sentinel
|
||||
);
|
||||
if let Err(se) = std::fs::write(&sentinel, b"") {
|
||||
warn!("Failed to write sentinel {:?}: {}", sentinel, se);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
});
|
||||
}
|
||||
|
||||
debug!("Finished making thumbnails");
|
||||
|
||||
for lib in libs {
|
||||
let effective_excludes = lib.effective_excluded_dirs(excluded_dirs);
|
||||
update_media_counts(Path::new(&lib.root_path), &effective_excludes);
|
||||
}
|
||||
}
|
||||
|
||||
pub fn update_media_counts(media_dir: &Path, excluded_dirs: &[String]) {
|
||||
let mut image_count = 0;
|
||||
let mut video_count = 0;
|
||||
for entry in crate::file_scan::walk_library_files(media_dir, excluded_dirs) {
|
||||
if is_image(&entry) {
|
||||
image_count += 1;
|
||||
} else if is_video(&entry) {
|
||||
video_count += 1;
|
||||
}
|
||||
}
|
||||
|
||||
IMAGE_GAUGE.set(image_count);
|
||||
VIDEO_GAUGE.set(video_count);
|
||||
}
|
||||
|
||||
pub fn is_image(entry: &DirEntry) -> bool {
|
||||
file_types::direntry_is_image(entry)
|
||||
}
|
||||
|
||||
pub fn is_video(entry: &DirEntry) -> bool {
|
||||
file_types::direntry_is_video(entry)
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
use std::fs;
|
||||
use tempfile::TempDir;
|
||||
|
||||
#[test]
|
||||
fn unsupported_thumbnail_sentinel_appends_suffix() {
|
||||
let p = Path::new("/thumbs/lib1/photo.jpg");
|
||||
let s = unsupported_thumbnail_sentinel(p);
|
||||
assert_eq!(s, PathBuf::from("/thumbs/lib1/photo.jpg.unsupported"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn unsupported_thumbnail_sentinel_preserves_extension_so_existing_thumb_is_distinct() {
|
||||
// A future scan checks both `thumb.exists()` and
|
||||
// `sentinel.exists()` — they must be distinct paths.
|
||||
let p = Path::new("foo.jpeg");
|
||||
let s = unsupported_thumbnail_sentinel(p);
|
||||
assert_ne!(s, PathBuf::from("foo.jpeg"));
|
||||
assert!(s.to_string_lossy().ends_with(".unsupported"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn unsupported_thumbnail_sentinel_handles_paths_without_extension() {
|
||||
let p = Path::new("/thumbs/notes");
|
||||
let s = unsupported_thumbnail_sentinel(p);
|
||||
assert_eq!(s, PathBuf::from("/thumbs/notes.unsupported"));
|
||||
}
|
||||
|
||||
/// Smoke-test update_media_counts: build a tempdir with two images
|
||||
/// and one video, run the walker, and assert the gauges line up.
|
||||
/// Exercises the is_image / is_video classifier on real DirEntry
|
||||
/// values without needing a Prometheus registry.
|
||||
#[test]
|
||||
fn update_media_counts_counts_images_and_videos_in_tempdir() {
|
||||
let tmp = TempDir::new().expect("tempdir");
|
||||
fs::write(tmp.path().join("a.jpg"), b"").unwrap();
|
||||
fs::write(tmp.path().join("b.png"), b"").unwrap();
|
||||
fs::write(tmp.path().join("c.mp4"), b"").unwrap();
|
||||
fs::write(tmp.path().join("notes.txt"), b"").unwrap();
|
||||
// Reset gauges first in case another test mutated them — the
|
||||
// gauges are process-global statics.
|
||||
IMAGE_GAUGE.set(0);
|
||||
VIDEO_GAUGE.set(0);
|
||||
|
||||
update_media_counts(tmp.path(), &[]);
|
||||
|
||||
assert_eq!(IMAGE_GAUGE.get(), 2, "jpg + png");
|
||||
assert_eq!(VIDEO_GAUGE.get(), 1, "mp4");
|
||||
}
|
||||
}
|
||||
+9
-52
@@ -1,8 +1,8 @@
|
||||
use crate::database::PreviewDao;
|
||||
use crate::is_video;
|
||||
use crate::libraries::Library;
|
||||
use crate::otel::global_tracer;
|
||||
use crate::thumbnails::is_video;
|
||||
use crate::video::ffmpeg::{generate_preview_clip, get_duration_seconds_blocking};
|
||||
use crate::video::ffmpeg::generate_preview_clip;
|
||||
use actix::prelude::*;
|
||||
use log::{debug, error, info, trace, warn};
|
||||
use opentelemetry::KeyValue;
|
||||
@@ -107,62 +107,19 @@ pub async fn create_playlist(video_path: &str, playlist_file: &str) -> Result<Ch
|
||||
result
|
||||
}
|
||||
|
||||
pub fn generate_video_thumbnail(path: &Path, destination: &Path) -> std::io::Result<()> {
|
||||
// Probe duration up front and seek to ~50% — gives a more
|
||||
// representative frame than a fixed offset (skipping title cards on
|
||||
// long videos, landing inside the clip on 1–2s Snapchat MP4s) and
|
||||
// sidesteps the seek-past-EOF class of bug entirely. When duration
|
||||
// probing fails (LRV files, fragmented MP4s, ffprobe missing) fall
|
||||
// back to the first frame: ugly but reliable.
|
||||
//
|
||||
// -vf scale + -c:v mjpeg mirrors `generate_image_thumbnail_ffmpeg`. The
|
||||
// filter chain matters as much as the scale does: without it, ffmpeg
|
||||
// hands the decoded frame straight to the mjpeg encoder, which rejects
|
||||
// any non-yuvj420p source ("Non full-range YUV is non-standard"). The
|
||||
// filter chain lets ffmpeg auto-insert the pix_fmt converter the
|
||||
// encoder needs, which is how the image-thumbnail path already handles
|
||||
// the same class of source.
|
||||
let seek = get_duration_seconds_blocking(path).map(|d| format!("{:.3}", d / 2.0));
|
||||
|
||||
let mut cmd = Command::new("ffmpeg");
|
||||
cmd.arg("-y");
|
||||
if let Some(s) = &seek {
|
||||
cmd.arg("-ss").arg(s);
|
||||
}
|
||||
let output = cmd
|
||||
pub fn generate_video_thumbnail(path: &Path, destination: &Path) {
|
||||
Command::new("ffmpeg")
|
||||
.arg("-ss")
|
||||
.arg("3")
|
||||
.arg("-i")
|
||||
.arg(path)
|
||||
.arg(path.to_str().unwrap())
|
||||
.arg("-vframes")
|
||||
.arg("1")
|
||||
.arg("-vf")
|
||||
.arg("scale=200:-1")
|
||||
.arg("-f")
|
||||
.arg("image2")
|
||||
.arg("-c:v")
|
||||
.arg("mjpeg")
|
||||
.arg(destination)
|
||||
.output()?;
|
||||
|
||||
if !output.status.success() {
|
||||
return Err(std::io::Error::other(format!(
|
||||
"ffmpeg failed ({}): {}",
|
||||
output.status,
|
||||
String::from_utf8_lossy(&output.stderr).trim()
|
||||
)));
|
||||
}
|
||||
// ffmpeg can exit 0 without writing a frame for malformed files where
|
||||
// the probe duration lies. Confirm a non-empty file actually landed —
|
||||
// returning Err makes the caller write the `.unsupported` sentinel so
|
||||
// we stop re-detecting on every scan.
|
||||
let wrote = std::fs::metadata(destination)
|
||||
.map(|m| m.len() > 0)
|
||||
.unwrap_or(false);
|
||||
if !wrote {
|
||||
return Err(std::io::Error::other(
|
||||
"ffmpeg exited successfully but produced no thumbnail output",
|
||||
));
|
||||
}
|
||||
Ok(())
|
||||
.output()
|
||||
.expect("Failure to create video frame");
|
||||
}
|
||||
|
||||
/// Use ffmpeg to extract a 200px-wide thumbnail from formats the `image` crate
|
||||
|
||||
+42
-174
@@ -223,83 +223,20 @@ impl Ffmpeg {
|
||||
}
|
||||
|
||||
/// Get video duration in seconds as f64 for precise interval calculation.
|
||||
///
|
||||
/// Returns `Ok(None)` when ffprobe runs successfully but the container has no
|
||||
/// readable duration (notably GoPro `LRV` low-res preview files, some
|
||||
/// fragmented MP4s, and short Snapchat clips with stripped headers). Callers
|
||||
/// can fall back to a duration-agnostic encode rather than treating this as
|
||||
/// a hard failure — previously the `parse::<f64>` on empty stdout produced
|
||||
/// "cannot parse float from empty string" and poisoned the preview-clip row
|
||||
/// with status=failed, which the watcher would re-queue every full scan.
|
||||
async fn get_duration_seconds(input_file: &str) -> Result<Option<f64>> {
|
||||
if let Some(d) = probe_duration(input_file, "format=duration").await? {
|
||||
return Ok(Some(d));
|
||||
}
|
||||
// Fall back to the per-stream duration — populated for some MP4s where
|
||||
// the format-level duration tag is missing.
|
||||
probe_duration(input_file, "stream=duration").await
|
||||
}
|
||||
|
||||
/// Synchronous cousin of `get_duration_seconds`, for callers running on
|
||||
/// blocking thread pools (Rayon). Same fallback strategy: tries
|
||||
/// `format=duration`, then `stream=duration`. Returns `None` for any
|
||||
/// failure — ffprobe missing, container without a duration tag, parse
|
||||
/// error — so callers can pick a duration-agnostic default.
|
||||
pub fn get_duration_seconds_blocking(input_file: &std::path::Path) -> Option<f64> {
|
||||
if let Some(d) = probe_duration_blocking(input_file, "format=duration") {
|
||||
return Some(d);
|
||||
}
|
||||
probe_duration_blocking(input_file, "stream=duration")
|
||||
}
|
||||
|
||||
fn probe_duration_blocking(input_file: &std::path::Path, show_entries: &str) -> Option<f64> {
|
||||
let out = std::process::Command::new("ffprobe")
|
||||
.args(["-v", "quiet"])
|
||||
.args(["-show_entries", show_entries])
|
||||
.args(["-of", "csv=p=0"])
|
||||
.arg("-i")
|
||||
.arg(input_file)
|
||||
.output()
|
||||
.ok()?;
|
||||
let raw = String::from_utf8_lossy(&out.stdout);
|
||||
parse_ffprobe_duration(&raw)
|
||||
}
|
||||
|
||||
async fn probe_duration(input_file: &str, show_entries: &str) -> Result<Option<f64>> {
|
||||
let out = Command::new("ffprobe")
|
||||
.args(["-v", "quiet"])
|
||||
.args(["-show_entries", show_entries])
|
||||
.args(["-of", "csv=p=0"])
|
||||
async fn get_duration_seconds(input_file: &str) -> Result<f64> {
|
||||
Command::new("ffprobe")
|
||||
.args(["-i", input_file])
|
||||
.args(["-show_entries", "format=duration"])
|
||||
.args(["-v", "quiet"])
|
||||
.args(["-of", "csv=p=0"])
|
||||
.output()
|
||||
.await?;
|
||||
let raw = String::from_utf8_lossy(&out.stdout);
|
||||
Ok(parse_ffprobe_duration(&raw))
|
||||
}
|
||||
|
||||
/// Parse ffprobe's `csv=p=0` duration output. Returns the first valid
|
||||
/// positive finite duration, or `None` when there isn't one.
|
||||
///
|
||||
/// Stream-level queries (`-show_entries stream=duration`) emit one value per
|
||||
/// stream, one per line; format-level queries emit a single line. The shape
|
||||
/// also varies — `N/A` for streams without a known duration, empty string
|
||||
/// for containers without the tag at all, and (rarely) `0`/`-1` for
|
||||
/// fragmented MP4s. All of those have to map to `None` so the caller can
|
||||
/// fall back to a duration-agnostic encode.
|
||||
fn parse_ffprobe_duration(stdout: &str) -> Option<f64> {
|
||||
for line in stdout.lines() {
|
||||
let trimmed = line.trim();
|
||||
if trimmed.is_empty() || trimmed == "N/A" {
|
||||
continue;
|
||||
}
|
||||
if let Ok(d) = trimmed.parse::<f64>()
|
||||
&& d.is_finite()
|
||||
&& d > 0.0
|
||||
{
|
||||
return Some(d);
|
||||
}
|
||||
}
|
||||
None
|
||||
.await
|
||||
.map(|out| String::from_utf8_lossy(&out.stdout).trim().to_string())
|
||||
.and_then(|duration_str| {
|
||||
duration_str
|
||||
.parse::<f64>()
|
||||
.map_err(|e| std::io::Error::other(e.to_string()))
|
||||
})
|
||||
}
|
||||
|
||||
/// Generate a preview clip from a video file.
|
||||
@@ -331,39 +268,28 @@ pub async fn generate_preview_clip(input_file: &str, output_file: &str) -> Resul
|
||||
|
||||
cmd.arg("-i").arg(input_file);
|
||||
|
||||
// Branch on duration. `None` means ffprobe couldn't tell us — we treat
|
||||
// it like the <1s case and just transcode the whole file. The selected
|
||||
// clip-duration we report back is computed alongside, so callers don't
|
||||
// need to re-probe.
|
||||
let clip_duration = match duration {
|
||||
None => {
|
||||
warn!(
|
||||
"Unknown duration for '{}', transcoding whole file as preview",
|
||||
input_file
|
||||
);
|
||||
cmd.args(["-vf", "scale=-2:480,format=yuv420p"]);
|
||||
// Cap the encode at 10s so a long video with stripped duration
|
||||
// metadata doesn't spend forever generating a "preview".
|
||||
cmd.args(["-t", "10"]);
|
||||
10.0
|
||||
}
|
||||
Some(d) if d < 1.0 => {
|
||||
cmd.args(["-vf", "scale=-2:480,format=yuv420p"]);
|
||||
d
|
||||
}
|
||||
Some(d) => {
|
||||
let segment_count = if d < 10.0 { d.floor() as u32 } else { 10 };
|
||||
let interval = d / segment_count as f64;
|
||||
let vf = format!(
|
||||
"select='lt(mod(t,{:.4}),1)',setpts=N/FRAME_RATE/TB,fps=30,scale=-2:480,format=yuv420p",
|
||||
interval
|
||||
);
|
||||
let af = format!("aselect='lt(mod(t,{:.4}),1)',asetpts=N/SR/TB", interval);
|
||||
cmd.args(["-vf", &vf]);
|
||||
cmd.args(["-af", &af]);
|
||||
if d < 10.0 { d.floor() } else { 10.0 }
|
||||
}
|
||||
};
|
||||
if duration < 1.0 {
|
||||
// Very short video (<1s): transcode the whole thing to 480p MP4
|
||||
// format=yuv420p ensures 10-bit sources are converted to 8-bit for h264_nvenc
|
||||
cmd.args(["-vf", "scale=-2:480,format=yuv420p"]);
|
||||
} else {
|
||||
let segment_count = if duration < 10.0 {
|
||||
duration.floor() as u32
|
||||
} else {
|
||||
10
|
||||
};
|
||||
let interval = duration / segment_count as f64;
|
||||
|
||||
// format=yuv420p ensures 10-bit sources are converted to 8-bit for h264_nvenc
|
||||
let vf = format!(
|
||||
"select='lt(mod(t,{:.4}),1)',setpts=N/FRAME_RATE/TB,fps=30,scale=-2:480,format=yuv420p",
|
||||
interval
|
||||
);
|
||||
let af = format!("aselect='lt(mod(t,{:.4}),1)',asetpts=N/SR/TB", interval);
|
||||
|
||||
cmd.args(["-vf", &vf]);
|
||||
cmd.args(["-af", &af]);
|
||||
}
|
||||
|
||||
// Force 30fps output so high-framerate sources (60fps) don't play back
|
||||
// at double speed due to select/setpts timestamp mismatches.
|
||||
@@ -394,6 +320,14 @@ pub async fn generate_preview_clip(input_file: &str, output_file: &str) -> Resul
|
||||
let metadata = std::fs::metadata(output_file)?;
|
||||
let file_size = metadata.len();
|
||||
|
||||
let clip_duration = if duration < 1.0 {
|
||||
duration
|
||||
} else if duration < 10.0 {
|
||||
duration.floor()
|
||||
} else {
|
||||
10.0
|
||||
};
|
||||
|
||||
info!(
|
||||
"Generated preview clip '{}' ({:.1}s, {} bytes) in {:?}",
|
||||
output_file,
|
||||
@@ -404,69 +338,3 @@ pub async fn generate_preview_clip(input_file: &str, output_file: &str) -> Resul
|
||||
|
||||
Ok((clip_duration, file_size))
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::parse_ffprobe_duration;
|
||||
|
||||
#[test]
|
||||
fn empty_output_returns_none() {
|
||||
// The original bug: ffprobe -show_entries format=duration returned
|
||||
// "" for some GoPro LRV files, and `parse::<f64>` panicked with
|
||||
// "cannot parse float from empty string".
|
||||
assert_eq!(parse_ffprobe_duration(""), None);
|
||||
assert_eq!(parse_ffprobe_duration("\n"), None);
|
||||
assert_eq!(parse_ffprobe_duration(" \n \n"), None);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn na_returns_none() {
|
||||
// ffprobe emits "N/A" for streams without a known duration.
|
||||
assert_eq!(parse_ffprobe_duration("N/A"), None);
|
||||
assert_eq!(parse_ffprobe_duration("N/A\nN/A\n"), None);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn parses_simple_duration() {
|
||||
assert_eq!(parse_ffprobe_duration("12.345"), Some(12.345));
|
||||
assert_eq!(parse_ffprobe_duration("12.345\n"), Some(12.345));
|
||||
assert_eq!(parse_ffprobe_duration("0.5"), Some(0.5));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn rejects_non_positive_durations() {
|
||||
// Fragmented MP4s and broken containers occasionally report 0 or a
|
||||
// negative duration. Treat as "unknown" so the caller falls back to
|
||||
// whole-file transcoding rather than dividing by zero downstream.
|
||||
assert_eq!(parse_ffprobe_duration("0"), None);
|
||||
assert_eq!(parse_ffprobe_duration("0.0"), None);
|
||||
assert_eq!(parse_ffprobe_duration("-1.5"), None);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn rejects_non_finite_durations() {
|
||||
assert_eq!(parse_ffprobe_duration("inf"), None);
|
||||
assert_eq!(parse_ffprobe_duration("nan"), None);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn first_valid_line_wins_for_stream_query() {
|
||||
// `-show_entries stream=duration` emits one value per stream. For a
|
||||
// video file the video stream is first; we accept it and ignore
|
||||
// any audio-stream values that follow.
|
||||
assert_eq!(parse_ffprobe_duration("12.5\n8.3\n"), Some(12.5));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn skips_leading_na_and_blank_lines() {
|
||||
// Stream queries can put N/A first (e.g. data stream before the
|
||||
// video stream); the parser should keep scanning.
|
||||
assert_eq!(parse_ffprobe_duration("N/A\n\n7.25\n"), Some(7.25));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn rejects_garbage() {
|
||||
assert_eq!(parse_ffprobe_duration("not a number"), None);
|
||||
assert_eq!(parse_ffprobe_duration("12.5abc"), None);
|
||||
}
|
||||
}
|
||||
|
||||
+1
-1
@@ -1,6 +1,6 @@
|
||||
use crate::otel::global_tracer;
|
||||
use crate::thumbnails::{is_video, update_media_counts};
|
||||
use crate::video::ffmpeg::{Ffmpeg, GifType};
|
||||
use crate::{is_video, update_media_counts};
|
||||
use log::info;
|
||||
use opentelemetry::trace::Tracer;
|
||||
use std::fs;
|
||||
|
||||
-975
@@ -1,975 +0,0 @@
|
||||
//! Background file-watcher loop + the orphaned-playlist cleanup job.
|
||||
//!
|
||||
//! `watch_files` spins a thread that, on every tick (default 60 s
|
||||
//! quick-scan / 3600 s full-scan), probes each library's availability,
|
||||
//! drains the unhashed / date / face-detection backlogs via
|
||||
//! [`crate::backfill`], walks newly-modified files through
|
||||
//! [`process_new_files`], updates the media-count gauges, and runs the
|
||||
//! three-stage maintenance pipeline (missing-file scan → back-ref
|
||||
//! refresh → orphan GC).
|
||||
//!
|
||||
//! `cleanup_orphaned_playlists` runs on a slower interval (default 24
|
||||
//! hours) and reaps HLS playlists whose source videos no longer exist
|
||||
//! in any library. Both jobs respect [`crate::libraries::LibraryHealthMap`]
|
||||
//! — a stale library skips destructive paths so transient unmounts
|
||||
//! don't trigger data loss.
|
||||
|
||||
use std::collections::{HashMap, HashSet};
|
||||
use std::path::{Path, PathBuf};
|
||||
use std::sync::{Arc, Mutex, RwLock};
|
||||
use std::time::{Duration, SystemTime};
|
||||
|
||||
use actix::Addr;
|
||||
use chrono::Utc;
|
||||
use log::{debug, error, info, warn};
|
||||
use walkdir::WalkDir;
|
||||
|
||||
use crate::backfill;
|
||||
use crate::content_hash;
|
||||
use crate::database::models::InsertImageExif;
|
||||
use crate::database::{ExifDao, PreviewDao, SqliteExifDao, SqlitePreviewDao};
|
||||
use crate::date_resolver;
|
||||
use crate::exif;
|
||||
use crate::face_watch;
|
||||
use crate::faces;
|
||||
use crate::file_types;
|
||||
use crate::libraries;
|
||||
use crate::library_maintenance;
|
||||
use crate::perceptual_hash;
|
||||
use crate::tags;
|
||||
use crate::tags::SqliteTagDao;
|
||||
use crate::thumbnails;
|
||||
use crate::video;
|
||||
use crate::video::actors::{GeneratePreviewClipMessage, QueueVideosMessage, VideoPlaylistManager};
|
||||
|
||||
/// Clean up orphaned HLS playlists and segments whose source videos no longer exist.
|
||||
///
|
||||
/// `libs_lock` is the shared live view of the libraries table — read at the
|
||||
/// top of each cleanup pass so a PATCH /libraries/{id} that disables or
|
||||
/// re-mounts a library is picked up without a restart.
|
||||
pub fn cleanup_orphaned_playlists(
|
||||
libs_lock: Arc<RwLock<Vec<libraries::Library>>>,
|
||||
excluded_dirs: Vec<String>,
|
||||
library_health: libraries::LibraryHealthMap,
|
||||
) {
|
||||
std::thread::spawn(move || {
|
||||
let video_path = dotenv::var("VIDEO_PATH").expect("VIDEO_PATH must be set");
|
||||
|
||||
// Get cleanup interval from environment (default: 24 hours)
|
||||
let cleanup_interval_secs = dotenv::var("PLAYLIST_CLEANUP_INTERVAL_SECONDS")
|
||||
.ok()
|
||||
.and_then(|s| s.parse::<u64>().ok())
|
||||
.unwrap_or(86400); // 24 hours
|
||||
|
||||
info!("Starting orphaned playlist cleanup job");
|
||||
info!(" Cleanup interval: {} seconds", cleanup_interval_secs);
|
||||
info!(" Playlist directory: {}", video_path);
|
||||
{
|
||||
let libs = libs_lock.read().unwrap_or_else(|e| e.into_inner());
|
||||
for lib in libs.iter() {
|
||||
info!(
|
||||
" Checking sources under '{}' at {}",
|
||||
lib.name, lib.root_path
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
loop {
|
||||
std::thread::sleep(Duration::from_secs(cleanup_interval_secs));
|
||||
|
||||
// Fresh snapshot per tick so a PATCH /libraries/{id} that
|
||||
// disabled a library (or rewrote its excluded_dirs) is
|
||||
// honoured immediately.
|
||||
let libs: Vec<libraries::Library> =
|
||||
libs_lock.read().unwrap_or_else(|e| e.into_inner()).clone();
|
||||
|
||||
// Safety gate: skip the cleanup cycle if any library is
|
||||
// stale. A missing source video on a stale library is
|
||||
// indistinguishable from a transient unmount, and the
|
||||
// cleanup is destructive — we'd rather leak a few playlist
|
||||
// files for a tick than delete one whose source is briefly
|
||||
// unreachable. The cycle re-runs on the next interval.
|
||||
{
|
||||
let guard = library_health.read().unwrap_or_else(|e| e.into_inner());
|
||||
let stale: Vec<String> = libs
|
||||
.iter()
|
||||
.filter(|lib| guard.get(&lib.id).map(|h| !h.is_online()).unwrap_or(false))
|
||||
.map(|lib| lib.name.clone())
|
||||
.collect();
|
||||
if !stale.is_empty() {
|
||||
warn!(
|
||||
"Skipping orphaned-playlist cleanup: {} library(ies) stale: [{}]",
|
||||
stale.len(),
|
||||
stale.join(", ")
|
||||
);
|
||||
continue;
|
||||
}
|
||||
}
|
||||
|
||||
info!("Running orphaned playlist cleanup");
|
||||
let start = std::time::Instant::now();
|
||||
let mut deleted_count = 0;
|
||||
let mut error_count = 0;
|
||||
|
||||
// Find all .m3u8 files in VIDEO_PATH
|
||||
let playlists: Vec<PathBuf> = WalkDir::new(&video_path)
|
||||
.into_iter()
|
||||
.filter_map(|e| e.ok())
|
||||
.filter(|e| e.file_type().is_file())
|
||||
.filter(|e| {
|
||||
e.path()
|
||||
.extension()
|
||||
.and_then(|s| s.to_str())
|
||||
.map(|ext| ext.eq_ignore_ascii_case("m3u8"))
|
||||
.unwrap_or(false)
|
||||
})
|
||||
.map(|e| e.path().to_path_buf())
|
||||
.collect();
|
||||
|
||||
info!("Found {} playlist files to check", playlists.len());
|
||||
|
||||
for playlist_path in playlists {
|
||||
// Extract the original video filename from playlist name
|
||||
// Playlist format: {VIDEO_PATH}/{original_filename}.m3u8
|
||||
if let Some(filename) = playlist_path.file_stem() {
|
||||
let video_filename = filename.to_string_lossy();
|
||||
|
||||
// Search for this video file across every configured
|
||||
// library, respecting EXCLUDED_DIRS so we don't
|
||||
// false-resurrect playlists for videos that only
|
||||
// exist inside an excluded subtree. As soon as one
|
||||
// library has a matching source, we're done — the
|
||||
// playlist isn't orphaned.
|
||||
let mut video_exists = false;
|
||||
'libs: for lib in &libs {
|
||||
let effective = lib.effective_excluded_dirs(&excluded_dirs);
|
||||
for entry in image_api::file_scan::walk_library_files(
|
||||
Path::new(&lib.root_path),
|
||||
&effective,
|
||||
) {
|
||||
if let Some(entry_stem) = entry.path().file_stem()
|
||||
&& entry_stem == filename
|
||||
&& file_types::is_video_file(entry.path())
|
||||
{
|
||||
video_exists = true;
|
||||
break 'libs;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if !video_exists {
|
||||
debug!(
|
||||
"Source video for playlist {} no longer exists, deleting",
|
||||
playlist_path.display()
|
||||
);
|
||||
|
||||
// Delete the playlist file
|
||||
if let Err(e) = std::fs::remove_file(&playlist_path) {
|
||||
warn!(
|
||||
"Failed to delete playlist {}: {}",
|
||||
playlist_path.display(),
|
||||
e
|
||||
);
|
||||
error_count += 1;
|
||||
} else {
|
||||
deleted_count += 1;
|
||||
|
||||
// Also try to delete associated .ts segment files
|
||||
// They are typically named {filename}N.ts in the same directory
|
||||
if let Some(parent_dir) = playlist_path.parent() {
|
||||
for entry in WalkDir::new(parent_dir)
|
||||
.max_depth(1)
|
||||
.into_iter()
|
||||
.filter_map(|e| e.ok())
|
||||
.filter(|e| e.file_type().is_file())
|
||||
{
|
||||
let entry_path = entry.path();
|
||||
if let Some(ext) = entry_path.extension()
|
||||
&& ext.eq_ignore_ascii_case("ts")
|
||||
{
|
||||
// Check if this .ts file belongs to our playlist
|
||||
if let Some(ts_stem) = entry_path.file_stem() {
|
||||
let ts_name = ts_stem.to_string_lossy();
|
||||
if ts_name.starts_with(&*video_filename) {
|
||||
if let Err(e) = std::fs::remove_file(entry_path) {
|
||||
debug!(
|
||||
"Failed to delete segment {}: {}",
|
||||
entry_path.display(),
|
||||
e
|
||||
);
|
||||
} else {
|
||||
debug!(
|
||||
"Deleted segment: {}",
|
||||
entry_path.display()
|
||||
);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
info!(
|
||||
"Orphaned playlist cleanup completed in {:?}: deleted {} playlists, {} errors",
|
||||
start.elapsed(),
|
||||
deleted_count,
|
||||
error_count
|
||||
);
|
||||
}
|
||||
});
|
||||
}
|
||||
|
||||
pub fn watch_files(
|
||||
libs_lock: Arc<RwLock<Vec<libraries::Library>>>,
|
||||
playlist_manager: Addr<VideoPlaylistManager>,
|
||||
preview_generator: Addr<video::actors::PreviewClipGenerator>,
|
||||
face_client: crate::ai::face_client::FaceClient,
|
||||
excluded_dirs: Vec<String>,
|
||||
library_health: libraries::LibraryHealthMap,
|
||||
) {
|
||||
std::thread::spawn(move || {
|
||||
// Get polling intervals from environment variables
|
||||
// Quick scan: Check recently modified files (default: 60 seconds)
|
||||
let quick_interval_secs = dotenv::var("WATCH_QUICK_INTERVAL_SECONDS")
|
||||
.ok()
|
||||
.and_then(|s| s.parse::<u64>().ok())
|
||||
.unwrap_or(60);
|
||||
|
||||
// Full scan: Check all files regardless of modification time (default: 3600 seconds = 1 hour)
|
||||
let full_interval_secs = dotenv::var("WATCH_FULL_INTERVAL_SECONDS")
|
||||
.ok()
|
||||
.and_then(|s| s.parse::<u64>().ok())
|
||||
.unwrap_or(3600);
|
||||
|
||||
info!("Starting optimized file watcher");
|
||||
info!(" Quick scan interval: {} seconds", quick_interval_secs);
|
||||
info!(" Full scan interval: {} seconds", full_interval_secs);
|
||||
// Surface face-detection state at boot so it's obvious whether
|
||||
// the watcher will hit Apollo. The branch silently no-ops when
|
||||
// disabled (intentional for legacy deploys), which makes "why
|
||||
// aren't faces being detected?" hard to diagnose otherwise.
|
||||
if face_client.is_enabled() {
|
||||
info!(" Face detection: ENABLED");
|
||||
} else {
|
||||
info!(
|
||||
" Face detection: DISABLED (set APOLLO_FACE_API_BASE_URL \
|
||||
or APOLLO_API_BASE_URL to enable)"
|
||||
);
|
||||
}
|
||||
{
|
||||
let libs = libs_lock.read().unwrap_or_else(|e| e.into_inner());
|
||||
for lib in libs.iter() {
|
||||
info!(
|
||||
" Watching library '{}' (id={}) at {}",
|
||||
lib.name, lib.id, lib.root_path
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
// Create DAOs for tracking processed files
|
||||
let exif_dao = Arc::new(Mutex::new(
|
||||
Box::new(SqliteExifDao::new()) as Box<dyn ExifDao>
|
||||
));
|
||||
let preview_dao = Arc::new(Mutex::new(
|
||||
Box::new(SqlitePreviewDao::new()) as Box<dyn PreviewDao>
|
||||
));
|
||||
let face_dao = Arc::new(Mutex::new(
|
||||
Box::new(faces::SqliteFaceDao::new()) as Box<dyn faces::FaceDao>
|
||||
));
|
||||
// tag_dao for the watcher's auto-bind path. Independent of the
|
||||
// request-handler tag_dao instance — both end up pointing at the
|
||||
// same SQLite file via SqliteTagDao::default().
|
||||
let watcher_tag_dao = Arc::new(Mutex::new(
|
||||
Box::new(SqliteTagDao::default()) as Box<dyn tags::TagDao>
|
||||
));
|
||||
|
||||
let mut last_quick_scan = SystemTime::now();
|
||||
let mut last_full_scan = SystemTime::now();
|
||||
let mut scan_count = 0u64;
|
||||
|
||||
// Per-library cursor for the missing-file scan. Each tick reads
|
||||
// a page from `offset`, stat()s the rows, deletes confirmed-
|
||||
// missing ones, and advances or wraps the cursor. State held
|
||||
// in-memory so a watcher restart resumes from 0 — fine, the
|
||||
// sweep is idempotent.
|
||||
let mut missing_file_offsets: HashMap<i32, i64> = HashMap::new();
|
||||
|
||||
let missing_scan_page_size: i64 = dotenv::var("IMAGE_EXIF_MISSING_SCAN_PAGE_SIZE")
|
||||
.ok()
|
||||
.and_then(|s| s.parse().ok())
|
||||
.filter(|n: &i64| *n > 0)
|
||||
.unwrap_or(library_maintenance::DEFAULT_SCAN_PAGE_SIZE);
|
||||
let missing_delete_cap: usize = dotenv::var("IMAGE_EXIF_MISSING_DELETE_CAP_PER_TICK")
|
||||
.ok()
|
||||
.and_then(|s| s.parse().ok())
|
||||
.filter(|n: &usize| *n > 0)
|
||||
.unwrap_or(library_maintenance::DEFAULT_MISSING_DELETE_CAP);
|
||||
|
||||
// Two-tick orphan-GC consensus state. Carried across ticks via
|
||||
// `OrphanGcState`; see library_maintenance::run_orphan_gc.
|
||||
let mut orphan_gc_state = library_maintenance::OrphanGcState::default();
|
||||
|
||||
// Initial availability sweep before the loop's first sleep so
|
||||
// /libraries reports the truth from the very first request,
|
||||
// rather than the optimistic Online default that
|
||||
// new_health_map seeds. Without this, an unmounted share would
|
||||
// appear online for up to WATCH_QUICK_INTERVAL_SECONDS (default
|
||||
// 60s) after boot. Same probe logic as the per-tick gate
|
||||
// below; no ingest runs here, just the health update + log.
|
||||
// Disabled libraries skip the probe entirely — they should
|
||||
// never enter the health map (treated as out-of-scope).
|
||||
{
|
||||
let libs = libs_lock.read().unwrap_or_else(|e| e.into_inner());
|
||||
for lib in libs.iter() {
|
||||
if !lib.enabled {
|
||||
continue;
|
||||
}
|
||||
let context = opentelemetry::Context::new();
|
||||
let had_data = exif_dao
|
||||
.lock()
|
||||
.expect("exif_dao poisoned")
|
||||
.count_for_library(&context, lib.id)
|
||||
.map(|n| n > 0)
|
||||
.unwrap_or(false);
|
||||
libraries::refresh_health(&library_health, lib, had_data);
|
||||
}
|
||||
}
|
||||
|
||||
loop {
|
||||
std::thread::sleep(Duration::from_secs(quick_interval_secs));
|
||||
|
||||
let now = SystemTime::now();
|
||||
let since_last_full = now
|
||||
.duration_since(last_full_scan)
|
||||
.unwrap_or(Duration::from_secs(0));
|
||||
|
||||
let is_full_scan = since_last_full.as_secs() >= full_interval_secs;
|
||||
|
||||
// Fresh snapshot per tick — picks up PATCH /libraries/{id}
|
||||
// mutations to `enabled` / `excluded_dirs` without restart.
|
||||
let libs: Vec<libraries::Library> =
|
||||
libs_lock.read().unwrap_or_else(|e| e.into_inner()).clone();
|
||||
|
||||
for lib in &libs {
|
||||
// Operator kill switch: a disabled library is invisible
|
||||
// to the watcher entirely. No probe, no ingest, no
|
||||
// maintenance, no health entry. Distinct from Stale —
|
||||
// Stale is "we wanted to but couldn't"; Disabled is
|
||||
// "we don't want to". Toggle via SQL.
|
||||
if !lib.enabled {
|
||||
debug!(
|
||||
"watcher: skipping library '{}' (id={}) — enabled=false",
|
||||
lib.name, lib.id
|
||||
);
|
||||
continue;
|
||||
}
|
||||
|
||||
// Availability probe: every tick checks that the
|
||||
// library's mount is reachable, is a directory, is
|
||||
// readable, and (if image_exif has rows for it) is
|
||||
// non-empty. A Stale library skips ingest, backlog
|
||||
// drains, and metric refresh — reads/serving in HTTP
|
||||
// handlers continue to work. Branches B/C extend the
|
||||
// probe gate to cover handoff and orphan GC. See
|
||||
// CLAUDE.md "Library availability and safety".
|
||||
let had_data = {
|
||||
let context = opentelemetry::Context::new();
|
||||
let mut guard = exif_dao.lock().expect("exif_dao poisoned");
|
||||
guard
|
||||
.count_for_library(&context, lib.id)
|
||||
.map(|n| n > 0)
|
||||
.unwrap_or(false)
|
||||
};
|
||||
let health = libraries::refresh_health(&library_health, lib, had_data);
|
||||
if !health.is_online() {
|
||||
// Skip every write path for this library this tick.
|
||||
// Don't refresh the media-count gauge either — a
|
||||
// probe-failed library would otherwise flap to 0
|
||||
// image / 0 video and pollute Prometheus.
|
||||
continue;
|
||||
}
|
||||
|
||||
// Drain the unhashed-hash backlog AND the face-detection
|
||||
// backlog every tick, regardless of quick/full. Quick
|
||||
// scans only walk recently-modified files, so the
|
||||
// pre-Phase-3 backlog never enters their candidate set
|
||||
// — without these standalone passes, backfill +
|
||||
// detection only progressed during full scans
|
||||
// (default once an hour).
|
||||
// Effective excludes for this library: global env-var
|
||||
// ∪ row's excluded_dirs. Compute once per tick — used
|
||||
// by every walker below for this library.
|
||||
let effective_excludes = lib.effective_excluded_dirs(&excluded_dirs);
|
||||
|
||||
if face_client.is_enabled() {
|
||||
let context = opentelemetry::Context::new();
|
||||
backfill::backfill_unhashed_backlog(&context, lib, &exif_dao);
|
||||
backfill::process_face_backlog(
|
||||
&context,
|
||||
lib,
|
||||
&face_client,
|
||||
&face_dao,
|
||||
&watcher_tag_dao,
|
||||
&effective_excludes,
|
||||
);
|
||||
}
|
||||
|
||||
// Date-taken backfill: drain rows whose canonical date is
|
||||
// either unresolved or only fs_time-sourced. Independent
|
||||
// of face detection — runs even on deploys that don't
|
||||
// configure Apollo, since `/memories` depends on it.
|
||||
{
|
||||
let context = opentelemetry::Context::new();
|
||||
backfill::backfill_missing_date_taken(&context, lib, &exif_dao);
|
||||
}
|
||||
|
||||
if is_full_scan {
|
||||
info!(
|
||||
"Running full scan for library '{}' (scan #{})",
|
||||
lib.name, scan_count
|
||||
);
|
||||
process_new_files(
|
||||
lib,
|
||||
Arc::clone(&exif_dao),
|
||||
Arc::clone(&preview_dao),
|
||||
Arc::clone(&face_dao),
|
||||
Arc::clone(&watcher_tag_dao),
|
||||
face_client.clone(),
|
||||
&effective_excludes,
|
||||
None,
|
||||
playlist_manager.clone(),
|
||||
preview_generator.clone(),
|
||||
);
|
||||
} else {
|
||||
debug!(
|
||||
"Running quick scan for library '{}' (checking files modified in last {} seconds)",
|
||||
lib.name,
|
||||
quick_interval_secs + 10
|
||||
);
|
||||
let check_since = last_quick_scan
|
||||
.checked_sub(Duration::from_secs(10))
|
||||
.unwrap_or(last_quick_scan);
|
||||
process_new_files(
|
||||
lib,
|
||||
Arc::clone(&exif_dao),
|
||||
Arc::clone(&preview_dao),
|
||||
Arc::clone(&face_dao),
|
||||
Arc::clone(&watcher_tag_dao),
|
||||
face_client.clone(),
|
||||
&effective_excludes,
|
||||
Some(check_since),
|
||||
playlist_manager.clone(),
|
||||
preview_generator.clone(),
|
||||
);
|
||||
}
|
||||
|
||||
// Update media counts per library (metric aggregates across all)
|
||||
thumbnails::update_media_counts(Path::new(&lib.root_path), &effective_excludes);
|
||||
|
||||
// Missing-file detection: prune image_exif rows whose
|
||||
// source file is no longer on disk. Per-library, so we
|
||||
// pass library-online-this-tick implicitly (we only
|
||||
// reach here if the probe gate at the top of the
|
||||
// iteration passed). Capped + paginated so a huge
|
||||
// library doesn't stall the watcher; rows we don't
|
||||
// visit this tick get visited next tick. See
|
||||
// library_maintenance::detect_missing_files_for_library.
|
||||
{
|
||||
let context = opentelemetry::Context::new();
|
||||
let offset = missing_file_offsets.get(&lib.id).copied().unwrap_or(0);
|
||||
let (deleted, next_offset) =
|
||||
library_maintenance::detect_missing_files_for_library(
|
||||
&context,
|
||||
lib,
|
||||
&exif_dao,
|
||||
offset,
|
||||
missing_scan_page_size,
|
||||
missing_delete_cap,
|
||||
);
|
||||
missing_file_offsets.insert(lib.id, next_offset);
|
||||
if deleted > 0 {
|
||||
debug!(
|
||||
"missing-file scan: library '{}' next_offset={}",
|
||||
lib.name, next_offset
|
||||
);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Reconciliation: cross-library, so it runs once per tick
|
||||
// outside the per-library loop. Idempotent — fast no-op when
|
||||
// there's nothing to do. Operates on the database alone, no
|
||||
// filesystem dependency, so it doesn't need a health gate.
|
||||
// See database::reconcile and CLAUDE.md "Multi-library data
|
||||
// model" for the rules.
|
||||
{
|
||||
let mut conn = image_api::database::connect();
|
||||
let _ = image_api::database::reconcile::run(&mut conn);
|
||||
|
||||
// Back-ref refresh: hash-keyed rows whose
|
||||
// (library_id, rel_path) tuple no longer matches any
|
||||
// image_exif row but whose hash still does. After a
|
||||
// recent→archive move, the missing-file scan removes
|
||||
// the old image_exif row; this pass repoints face /
|
||||
// tag / insight back-refs at the surviving location.
|
||||
// DB-only, no health gate needed — uses what's in
|
||||
// image_exif as truth.
|
||||
let _ = library_maintenance::refresh_back_refs(&mut conn);
|
||||
|
||||
// Orphan GC: the destructive end of the maintenance
|
||||
// pipeline. Two-tick consensus + every-library-online
|
||||
// requirement is enforced inside run_orphan_gc; we
|
||||
// pass the current all-online flag and the function
|
||||
// tracks the previous tick's flag in OrphanGcState.
|
||||
let all_online = library_maintenance::all_libraries_online(&libs, &library_health);
|
||||
let _ =
|
||||
library_maintenance::run_orphan_gc(&mut conn, &mut orphan_gc_state, all_online);
|
||||
}
|
||||
|
||||
if is_full_scan {
|
||||
last_full_scan = now;
|
||||
}
|
||||
last_quick_scan = now;
|
||||
scan_count += 1;
|
||||
}
|
||||
});
|
||||
}
|
||||
|
||||
/// Check if a playlist needs to be (re)generated.
|
||||
///
|
||||
/// Returns true if:
|
||||
/// - Playlist doesn't exist, OR
|
||||
/// - Source video is newer than the playlist
|
||||
///
|
||||
/// When metadata for either path is unreadable, returns true so the
|
||||
/// caller errs on the side of regeneration (a redundant transcode
|
||||
/// beats a stale playlist).
|
||||
pub fn playlist_needs_generation(video_path: &Path, playlist_path: &Path) -> bool {
|
||||
if !playlist_path.exists() {
|
||||
return true;
|
||||
}
|
||||
|
||||
// Check if source video is newer than playlist
|
||||
if let (Ok(video_meta), Ok(playlist_meta)) = (
|
||||
std::fs::metadata(video_path),
|
||||
std::fs::metadata(playlist_path),
|
||||
) && let (Ok(video_modified), Ok(playlist_modified)) =
|
||||
(video_meta.modified(), playlist_meta.modified())
|
||||
{
|
||||
return video_modified > playlist_modified;
|
||||
}
|
||||
|
||||
// If we can't determine, assume it needs generation
|
||||
true
|
||||
}
|
||||
|
||||
pub fn process_new_files(
|
||||
library: &libraries::Library,
|
||||
exif_dao: Arc<Mutex<Box<dyn ExifDao>>>,
|
||||
preview_dao: Arc<Mutex<Box<dyn PreviewDao>>>,
|
||||
face_dao: Arc<Mutex<Box<dyn faces::FaceDao>>>,
|
||||
tag_dao: Arc<Mutex<Box<dyn tags::TagDao>>>,
|
||||
face_client: crate::ai::face_client::FaceClient,
|
||||
excluded_dirs: &[String],
|
||||
modified_since: Option<SystemTime>,
|
||||
playlist_manager: Addr<VideoPlaylistManager>,
|
||||
preview_generator: Addr<video::actors::PreviewClipGenerator>,
|
||||
) {
|
||||
let context = opentelemetry::Context::new();
|
||||
let thumbs = dotenv::var("THUMBNAILS").expect("THUMBNAILS not defined");
|
||||
let thumbnail_directory = Path::new(&thumbs);
|
||||
let base_path = Path::new(&library.root_path);
|
||||
|
||||
// Walk, prune EXCLUDED_DIRS subtrees, and apply image/video + modified_since
|
||||
// filters. See `file_scan` for why exclusion has to happen at WalkDir
|
||||
// time (filter_entry) rather than at face-detect time.
|
||||
let files: Vec<(PathBuf, String)> =
|
||||
image_api::file_scan::enumerate_indexable_files(base_path, excluded_dirs, modified_since);
|
||||
|
||||
if files.is_empty() {
|
||||
debug!("No files to process");
|
||||
return;
|
||||
}
|
||||
|
||||
debug!("Found {} files to check", files.len());
|
||||
|
||||
// Batch query: Get all EXIF data for these files in one query
|
||||
let file_paths: Vec<String> = files.iter().map(|(_, rel_path)| rel_path.clone()).collect();
|
||||
|
||||
let existing_exif_paths: HashMap<String, bool> = {
|
||||
let mut dao = exif_dao.lock().expect("Unable to lock ExifDao");
|
||||
// Walk is per-library, so scope the lookup so a same-named file
|
||||
// in another library doesn't make this one look already-indexed.
|
||||
match dao.get_exif_batch(&context, Some(library.id), &file_paths) {
|
||||
Ok(exif_records) => exif_records
|
||||
.into_iter()
|
||||
.map(|record| (record.file_path, true))
|
||||
.collect(),
|
||||
Err(e) => {
|
||||
error!("Error batch querying EXIF data: {:?}", e);
|
||||
HashMap::new()
|
||||
}
|
||||
}
|
||||
};
|
||||
|
||||
let mut new_files_found = false;
|
||||
let mut files_needing_row = Vec::new();
|
||||
|
||||
// Register every image/video file in image_exif. Rows without EXIF
|
||||
// still carry library_id, rel_path, content_hash, and size_bytes so
|
||||
// derivative dedup and DB-indexed sort/filter work for every file,
|
||||
// not just photos with parseable EXIF.
|
||||
for (file_path, relative_path) in &files {
|
||||
// Check both the library-scoped legacy path (current shape) and
|
||||
// the bare-legacy path (pre-multi-library shape). Either one
|
||||
// existing means a thumbnail is already on disk for this file.
|
||||
let scoped_thumb_path = content_hash::library_scoped_legacy_path(
|
||||
thumbnail_directory,
|
||||
library.id,
|
||||
relative_path,
|
||||
);
|
||||
let bare_legacy_thumb_path = thumbnail_directory.join(relative_path);
|
||||
let needs_thumbnail = !scoped_thumb_path.exists()
|
||||
&& !bare_legacy_thumb_path.exists()
|
||||
&& !thumbnails::unsupported_thumbnail_sentinel(&scoped_thumb_path).exists()
|
||||
&& !thumbnails::unsupported_thumbnail_sentinel(&bare_legacy_thumb_path).exists();
|
||||
let needs_row = !existing_exif_paths.contains_key(relative_path);
|
||||
|
||||
if needs_thumbnail || needs_row {
|
||||
new_files_found = true;
|
||||
|
||||
if needs_thumbnail {
|
||||
info!("New file detected (missing thumbnail): {}", relative_path);
|
||||
}
|
||||
|
||||
if needs_row {
|
||||
files_needing_row.push((file_path.clone(), relative_path.clone()));
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if !files_needing_row.is_empty() {
|
||||
info!(
|
||||
"Registering {} new files in image_exif",
|
||||
files_needing_row.len()
|
||||
);
|
||||
|
||||
for (file_path, relative_path) in files_needing_row {
|
||||
let timestamp = Utc::now().timestamp();
|
||||
|
||||
// Hash + size from filesystem metadata — always attempted so
|
||||
// every file gets a content_hash, even when EXIF is absent.
|
||||
let (content_hash, size_bytes) = match content_hash::compute(&file_path) {
|
||||
Ok(id) => (Some(id.content_hash), Some(id.size_bytes)),
|
||||
Err(e) => {
|
||||
warn!("Failed to hash {}: {:?}", file_path.display(), e);
|
||||
(None, None)
|
||||
}
|
||||
};
|
||||
|
||||
// Perceptual hashes (pHash + dHash). Best-effort — None for
|
||||
// videos and decode failures. Drives near-duplicate detection
|
||||
// in the Apollo duplicates surface; failure here is non-fatal
|
||||
// and never blocks indexing.
|
||||
let perceptual = perceptual_hash::compute(&file_path);
|
||||
|
||||
// EXIF is best-effort enrichment. When extraction fails (or the
|
||||
// file type doesn't support EXIF) we still store a row with all
|
||||
// EXIF fields NULL; the file remains visible to sort-by-date
|
||||
// and tag queries via its rel_path and filesystem timestamps.
|
||||
let exif_fields = if exif::supports_exif(&file_path) {
|
||||
match exif::extract_exif_from_path(&file_path) {
|
||||
Ok(data) => Some(data),
|
||||
Err(e) => {
|
||||
debug!(
|
||||
"No EXIF or parse error for {}: {:?}",
|
||||
file_path.display(),
|
||||
e
|
||||
);
|
||||
None
|
||||
}
|
||||
}
|
||||
} else {
|
||||
None
|
||||
};
|
||||
|
||||
// Canonical date_taken via the waterfall — kamadak-exif (already
|
||||
// computed above) → exiftool fallback for videos / MakerNote /
|
||||
// QuickTime → filename regex → earliest_fs_time. Source is
|
||||
// recorded so the per-tick backfill drain can re-run weak
|
||||
// resolutions later.
|
||||
let resolved_date = date_resolver::resolve_date_taken(
|
||||
&file_path,
|
||||
exif_fields.as_ref().and_then(|e| e.date_taken),
|
||||
);
|
||||
|
||||
let insert_exif = InsertImageExif {
|
||||
library_id: library.id,
|
||||
file_path: relative_path.clone(),
|
||||
camera_make: exif_fields.as_ref().and_then(|e| e.camera_make.clone()),
|
||||
camera_model: exif_fields.as_ref().and_then(|e| e.camera_model.clone()),
|
||||
lens_model: exif_fields.as_ref().and_then(|e| e.lens_model.clone()),
|
||||
width: exif_fields.as_ref().and_then(|e| e.width),
|
||||
height: exif_fields.as_ref().and_then(|e| e.height),
|
||||
orientation: exif_fields.as_ref().and_then(|e| e.orientation),
|
||||
gps_latitude: exif_fields
|
||||
.as_ref()
|
||||
.and_then(|e| e.gps_latitude.map(|v| v as f32)),
|
||||
gps_longitude: exif_fields
|
||||
.as_ref()
|
||||
.and_then(|e| e.gps_longitude.map(|v| v as f32)),
|
||||
gps_altitude: exif_fields
|
||||
.as_ref()
|
||||
.and_then(|e| e.gps_altitude.map(|v| v as f32)),
|
||||
focal_length: exif_fields
|
||||
.as_ref()
|
||||
.and_then(|e| e.focal_length.map(|v| v as f32)),
|
||||
aperture: exif_fields
|
||||
.as_ref()
|
||||
.and_then(|e| e.aperture.map(|v| v as f32)),
|
||||
shutter_speed: exif_fields.as_ref().and_then(|e| e.shutter_speed.clone()),
|
||||
iso: exif_fields.as_ref().and_then(|e| e.iso),
|
||||
date_taken: resolved_date.map(|r| r.timestamp),
|
||||
created_time: timestamp,
|
||||
last_modified: timestamp,
|
||||
content_hash,
|
||||
size_bytes,
|
||||
phash_64: perceptual.map(|h| h.phash_64),
|
||||
dhash_64: perceptual.map(|h| h.dhash_64),
|
||||
date_taken_source: resolved_date.map(|r| r.source.as_str().to_string()),
|
||||
};
|
||||
|
||||
let mut dao = exif_dao.lock().expect("Unable to lock ExifDao");
|
||||
if let Err(e) = dao.store_exif(&context, insert_exif) {
|
||||
error!(
|
||||
"Failed to register {} in image_exif: {:?}",
|
||||
relative_path, e
|
||||
);
|
||||
} else {
|
||||
debug!("Registered {} in image_exif", relative_path);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// ── Face detection pass ────────────────────────────────────────────
|
||||
// Run after EXIF writes so newly-registered files have their
|
||||
// content_hash populated. Skipped wholesale when face_client is
|
||||
// disabled (no Apollo integration configured) — Phase 3 wires this
|
||||
// up; the watcher remains usable on legacy deploys.
|
||||
if face_client.is_enabled() {
|
||||
// Opportunistic content_hash backfill: photos indexed before
|
||||
// content-hashing landed (or where the hash compute failed
|
||||
// silently on insert) end up in image_exif with NULL
|
||||
// content_hash. build_face_candidates keys on content_hash, so
|
||||
// those files would never become candidates without backfill.
|
||||
// Idempotent — subsequent scans see the populated hashes and
|
||||
// no-op. The dedicated `backfill_hashes` binary is still the
|
||||
// right tool for very large legacy libraries; this branch
|
||||
// ensures small/medium deploys self-heal without operator
|
||||
// action.
|
||||
backfill::backfill_missing_content_hashes(&context, &files, library, &exif_dao);
|
||||
let candidates =
|
||||
backfill::build_face_candidates(&context, library, &files, &exif_dao, &face_dao);
|
||||
debug!(
|
||||
"face_watch: scan tick — {} image file(s) walked, {} candidate(s) (library '{}', modified_since={})",
|
||||
files
|
||||
.iter()
|
||||
.filter(|(p, _)| !file_types::is_video_file(p))
|
||||
.count(),
|
||||
candidates.len(),
|
||||
library.name,
|
||||
modified_since.is_some(),
|
||||
);
|
||||
if !candidates.is_empty() {
|
||||
face_watch::run_face_detection_pass(
|
||||
library,
|
||||
excluded_dirs,
|
||||
&face_client,
|
||||
Arc::clone(&face_dao),
|
||||
Arc::clone(&tag_dao),
|
||||
candidates,
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
// Check for videos that need HLS playlists
|
||||
let video_path_base = dotenv::var("VIDEO_PATH").expect("VIDEO_PATH must be set");
|
||||
let mut videos_needing_playlists = Vec::new();
|
||||
|
||||
for (file_path, _relative_path) in &files {
|
||||
if file_types::is_video_file(file_path) {
|
||||
// Construct expected playlist path
|
||||
let playlist_filename =
|
||||
format!("{}.m3u8", file_path.file_name().unwrap().to_string_lossy());
|
||||
let playlist_path = Path::new(&video_path_base).join(&playlist_filename);
|
||||
|
||||
// Check if playlist needs (re)generation
|
||||
if playlist_needs_generation(file_path, &playlist_path) {
|
||||
videos_needing_playlists.push(file_path.clone());
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Send queue request to playlist manager
|
||||
if !videos_needing_playlists.is_empty() {
|
||||
playlist_manager.do_send(QueueVideosMessage {
|
||||
video_paths: videos_needing_playlists,
|
||||
});
|
||||
}
|
||||
|
||||
// Check for videos that need preview clips
|
||||
// Collect (full_path, relative_path) for video files
|
||||
let video_files: Vec<(String, String)> = files
|
||||
.iter()
|
||||
.filter(|(file_path, _)| file_types::is_video_file(file_path))
|
||||
.map(|(file_path, rel_path)| (file_path.to_string_lossy().to_string(), rel_path.clone()))
|
||||
.collect();
|
||||
|
||||
if !video_files.is_empty() {
|
||||
// Query DB using relative paths (consistent with how GET/POST handlers store them)
|
||||
let video_rel_paths: Vec<String> = video_files.iter().map(|(_, rel)| rel.clone()).collect();
|
||||
|
||||
let existing_previews: HashMap<String, String> = {
|
||||
let mut dao = preview_dao.lock().expect("Unable to lock PreviewDao");
|
||||
match dao.get_previews_batch(&context, &video_rel_paths) {
|
||||
Ok(clips) => clips
|
||||
.into_iter()
|
||||
.map(|clip| (clip.file_path, clip.status))
|
||||
.collect(),
|
||||
Err(e) => {
|
||||
error!("Error batch querying preview clips: {:?}", e);
|
||||
HashMap::new()
|
||||
}
|
||||
}
|
||||
};
|
||||
|
||||
for (full_path, relative_path) in &video_files {
|
||||
let status = existing_previews.get(relative_path).map(|s| s.as_str());
|
||||
let needs_preview = match status {
|
||||
None => true, // No record at all
|
||||
Some("failed") => true, // Retry failed
|
||||
Some("pending") => true, // Stale pending from previous run
|
||||
_ => false, // processing or complete
|
||||
};
|
||||
|
||||
if needs_preview {
|
||||
// Insert pending record using relative path
|
||||
if status.is_none() {
|
||||
let mut dao = preview_dao.lock().expect("Unable to lock PreviewDao");
|
||||
let _ = dao.insert_preview(&context, relative_path, "pending");
|
||||
}
|
||||
|
||||
// Send full path in the message — the actor will derive relative path from it
|
||||
preview_generator.do_send(GeneratePreviewClipMessage {
|
||||
video_path: full_path.clone(),
|
||||
});
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Generate thumbnails for all files that need them
|
||||
if new_files_found {
|
||||
info!("Processing thumbnails for new files...");
|
||||
thumbnails::create_thumbnails(std::slice::from_ref(library), excluded_dirs);
|
||||
}
|
||||
|
||||
// Reconciliation: on a full scan, prune image_exif rows whose rel_path no
|
||||
// longer exists on disk for this library. Keeps the DB in parity so
|
||||
// downstream DB-backed listings (e.g. recursive /photos) don't return
|
||||
// phantom files. Skipped on quick scans — those only look at recently
|
||||
// modified files and can't distinguish "missing" from "unchanged".
|
||||
if modified_since.is_none() {
|
||||
let disk_paths: HashSet<String> = files.iter().map(|(_, rel)| rel.clone()).collect();
|
||||
let db_paths: Vec<String> = {
|
||||
let mut dao = exif_dao.lock().expect("Unable to lock ExifDao");
|
||||
dao.get_rel_paths_for_library(&context, library.id)
|
||||
.unwrap_or_else(|e| {
|
||||
error!(
|
||||
"Reconciliation: failed to load image_exif rel_paths for lib {}: {:?}",
|
||||
library.id, e
|
||||
);
|
||||
Vec::new()
|
||||
})
|
||||
};
|
||||
|
||||
let stale: Vec<String> = db_paths
|
||||
.into_iter()
|
||||
.filter(|p| !disk_paths.contains(p))
|
||||
.collect();
|
||||
|
||||
if !stale.is_empty() {
|
||||
info!(
|
||||
"Reconciliation: pruning {} stale image_exif rows for library '{}'",
|
||||
stale.len(),
|
||||
library.name
|
||||
);
|
||||
let mut dao = exif_dao.lock().expect("Unable to lock ExifDao");
|
||||
for rel in &stale {
|
||||
if let Err(e) = dao.delete_exif_by_library(&context, library.id, rel) {
|
||||
warn!(
|
||||
"Reconciliation: failed to delete {} (lib {}): {:?}",
|
||||
rel, library.id, e
|
||||
);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
use std::fs;
|
||||
use std::thread::sleep;
|
||||
use std::time::Duration as StdDuration;
|
||||
use tempfile::TempDir;
|
||||
|
||||
#[test]
|
||||
fn playlist_needs_generation_true_when_playlist_missing() {
|
||||
let tmp = TempDir::new().unwrap();
|
||||
let video = tmp.path().join("clip.mp4");
|
||||
fs::write(&video, b"v").unwrap();
|
||||
let playlist = tmp.path().join("clip.mp4.m3u8");
|
||||
// playlist does not exist
|
||||
assert!(playlist_needs_generation(&video, &playlist));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn playlist_needs_generation_false_when_playlist_is_newer() {
|
||||
let tmp = TempDir::new().unwrap();
|
||||
let video = tmp.path().join("clip.mp4");
|
||||
fs::write(&video, b"v").unwrap();
|
||||
// Sleep to guarantee a distinct mtime for the playlist created next.
|
||||
// Many filesystems have ~10 ms mtime resolution; 50 ms is plenty.
|
||||
sleep(StdDuration::from_millis(50));
|
||||
let playlist = tmp.path().join("clip.mp4.m3u8");
|
||||
fs::write(&playlist, b"#EXTM3U").unwrap();
|
||||
assert!(!playlist_needs_generation(&video, &playlist));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn playlist_needs_generation_true_when_video_is_newer() {
|
||||
let tmp = TempDir::new().unwrap();
|
||||
let playlist = tmp.path().join("clip.mp4.m3u8");
|
||||
fs::write(&playlist, b"#EXTM3U").unwrap();
|
||||
sleep(StdDuration::from_millis(50));
|
||||
let video = tmp.path().join("clip.mp4");
|
||||
fs::write(&video, b"v").unwrap();
|
||||
assert!(playlist_needs_generation(&video, &playlist));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn playlist_needs_generation_true_when_video_missing_metadata() {
|
||||
// Video doesn't exist; metadata fails for it. Falls through to the
|
||||
// "assume needs regeneration" branch.
|
||||
let tmp = TempDir::new().unwrap();
|
||||
let video = tmp.path().join("missing.mp4");
|
||||
let playlist = tmp.path().join("missing.mp4.m3u8");
|
||||
fs::write(&playlist, b"#EXTM3U").unwrap();
|
||||
assert!(playlist_needs_generation(&video, &playlist));
|
||||
}
|
||||
}
|
||||
Reference in New Issue
Block a user