Commit Graph

117 Commits

Author SHA1 Message Date
Cameron Cordes
832b50d587 image_exif: manual date_taken override (set/clear endpoints)
Add `POST /image/exif/date` and `POST /image/exif/date/clear` so an
operator can correct a row whose canonical-date waterfall landed on the
wrong value (camera clock reset, fs_time fallback for a copied-from-
backup file, etc). New `original_date_taken` / `original_date_taken_source`
columns snapshot the prior value on first override so revert is lossless.

The waterfall source set is now `'exif' | 'exiftool' | 'filename' | 'fs_time' | 'manual'`.
The existing `idx_image_exif_date_backfill` partial index already filters
to `date_taken IS NULL OR date_taken_source = 'fs_time'`, so manual rows
are naturally excluded from the per-tick drain — no index change needed.

`ExifMetadata` now exposes `date_taken_source` + originals so a UI can
render "manually set; was X via filename".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 19:26:43 -04:00
Cameron Cordes
7f12890f4b memories: single-SQL rewrite + 20-year lookback
Replaces the EXIF-loop + WalkDir-fallback pipeline that powered
`/memories` with a single per-library SQL query
(`get_memories_in_window`) that uses `strftime('%m-%d' | '%W' | '%m',
date_taken, 'unixepoch', tz_offset)` for calendar matching in the
client's timezone, plus a `years_back` lower bound and a
no-future-dates upper bound. Returns only the matching rows; the
handler applies per-library `PathExcluder` post-query and sorts.

Drops:
- `collect_exif_memories` — replaced by the single SQL query.
- `collect_filesystem_memories` — the canonical-date pipeline now
  populates `date_taken` for every row at ingest, so the WalkDir
  fallback that scanned 14k+ files each request is no longer needed.
- `get_memory_date_with_priority` and friends — request-time waterfall
  superseded by `date_resolver` running at ingest. The associated
  three priority-tests are dropped; their replacement lives in
  `date_resolver::tests`.

On a ~14k-file library this drops `/memories` from 10–15 s
(dominated by `fs::metadata` per row) to single-digit ms.

Bumps `DEFAULT_YEARS_BACK` from 15 → 20 to surface deeper archives
on matching anniversaries.

Note vs. ISO weeks: the original Rust used `chrono::iso_week().week()`
for week-span matching. SQLite's `%W` is Monday-anchored but uses week
0 for days before the first Monday, so it can disagree with ISO at
year boundaries by ±1. Acceptable for nostalgia browsing.

Adds 3 new DAO tests covering month-span filter, library scoping, and
the unknown-span-token guard. Also adds a CLAUDE.md section describing
the canonical-date pipeline end-to-end and the new
`DATE_BACKFILL_MAX_PER_TICK` env var.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 16:04:09 -04:00
Cameron Cordes
54e0635a98 date_backfill: per-tick drain for unresolved date_taken rows
Adds two ExifDao methods (`get_rows_needing_date_backfill` /
`backfill_date_taken`) and a `backfill_missing_date_taken` watcher pass
that runs on every tick alongside `backfill_unhashed_backlog`.

The drain queries the partial index for rows where `date_taken IS NULL`
or `date_taken_source = 'fs_time'`, batches up to
`DATE_BACKFILL_MAX_PER_TICK` paths (default 500), and feeds them through
`date_resolver::resolve_dates_batch` — a single exiftool subprocess
covers the whole tick. Rows that newly resolve to `exiftool` /
`filename` / `fs_time` get persisted via `backfill_date_taken` (touches
only `date_taken` + `date_taken_source` so EXIF / hash / perceptual
columns survive).

`filename`-sourced rows are intentionally not re-resolved — the regex
is authoritative when it matches and re-running exiftool wouldn't
change the answer. Files that have disappeared from disk are skipped
so a ghost row doesn't loop through the drain forever; the
missing-file scan in `library_maintenance` retires those separately.

Comes with two DAO unit tests (eligibility filter + column-isolation).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 16:03:03 -04:00
Cameron Cordes
84326501a9 image_exif: add date_taken_source column
New nullable TEXT column tracks which step of the canonical-date
waterfall (kamadak-exif → exiftool → filename → fs_time) populated
`date_taken`. Lets a later per-tick drain re-resolve weak sources
(`fs_time`) once stronger ones become available, and gives the UI/debug
surface a way to answer "why does this photo show up under this date?".

Adds the column at all `InsertImageExif` construction sites with `None`
placeholders (the resolver wiring lands in a follow-up commit), and
extends the `update_exif` SET tuple so the column survives the GPS-write
re-read path. Partial index `idx_image_exif_date_backfill` is created
for the upcoming drain query.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 15:57:49 -04:00
Cameron Cordes
67cf0c7f73 duplicates: folder-pair view of exact dups
Bucket exact-dup rows by (library_id, dirname) pair on each side, then
filter by coverage = shared / min(folder_a_total, folder_b_total) and
an absolute floor on shared count. Surfaces "this folder is mostly
contained in that folder" matches that the per-file EXACT view buries
under one row each — e.g. an old phone-backup tree shadowing the
organized library, or a topic-grouped folder duplicating a date-grouped
one within the same library.

New endpoint: GET /duplicates/folder-pairs?library=&include_resolved=
&min_coverage=&min_shared=. Cached 5 min keyed on (library, include_resolved);
the user-tunable thresholds filter the cached unfiltered pair list so
slider drags don't re-bucket. Shares the resolve / unresolve flow with
the existing tabs — the frontend fans out N parallel /resolve calls,
one per shared content_hash.

Folder names carry no signal (BMW lives under Night Photos, not BMW_backup),
so bucketing is purely on (library_id, dirname) co-occurrence in
exact-dup groups. Within-folder dups (same hash twice in the same
folder) are skipped — those belong to the EXACT tab.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 12:43:29 -04:00
Cameron Cordes
7584cd8792 duplicates: perceptual hash + soft-mark resolution + upload 409
Adds pHash + dHash columns alongside the existing blake3 content_hash so
near-duplicates (re-encoded, resized, format-converted copies) become
queryable. /duplicates/{exact,perceptual} return groups; /duplicates/
{resolve,unresolve} flip a duplicate_of_hash soft-mark on losing rows
and union perceptual-only tag sets onto the survivor. The default
/photos listing filters duplicate_of_hash IS NULL so demoted siblings
stop cluttering the grid; include_duplicates=true opts back in for
Apollo's review modal. Upload now hashes bytes pre-write and returns
409 with the canonical sibling when a file's bytes already exist.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 17:36:01 -04:00
Cameron Cordes
263e27e108 multi-library: handoff + orphan GC with two-tick consensus
Branch C of the multi-library data-model rollout. Implements the
operational maintenance pipeline pinned in CLAUDE.md → "Multi-library
data model" / "Library availability and safety". Branches A and B
land first; this branch builds on top.

New module: src/library_maintenance.rs

Three idempotent passes the watcher runs every tick after the
per-library ingest loop:

1. Missing-file scan (per online library)

   For each Online library, load a paginated page of image_exif rows
   (IMAGE_EXIF_MISSING_SCAN_PAGE_SIZE, default 500), stat() each one,
   and delete rows whose source file is NotFound. Permission/IO
   errors are skipped, never deleted. Capped at
   IMAGE_EXIF_MISSING_DELETE_CAP_PER_TICK (default 200) per library
   per tick — so a pathological mount that returns NotFound for
   everything can't wipe the table in one cycle. Cursor advances
   across ticks, wraps on partial-page returns, and naturally cycles
   through the entire library over many minutes. Skipped wholesale
   for Stale libraries via the existing probe gate.

2. Back-ref refresh (DB-only)

   For face_detections / tagged_photo / photo_insights: any
   hash-keyed row whose (library_id, rel_path) no longer matches an
   image_exif row, but whose content_hash does, is repointed at a
   surviving image_exif location. Pure SQL with EXISTS guards so
   rows whose hash is fully orphaned are left alone (the orphan GC
   handles those). Idempotent; no availability gate needed.

   This is what makes a recent → archive move invisible to readers:
   when pass 1 retires the lib-A row, pass 2 pivots tags / faces /
   insights to lib-B's surviving path before any client notices.

3. Orphan GC (destructive)

   Hash-keyed derived rows whose content_hash has no image_exif
   referent are GC-eligible. Two-tick consensus: a hash must be
   observed orphaned on two consecutive ticks AND every library must
   be Online for both. A single Stale tick within the window cancels
   all pending deletes (they remain marked but won't be promoted) —
   they're re-evaluated next tick. The pending set lives in
   OrphanGcState (in-memory); a watcher restart resets it, which can
   only delay a delete, never cause one. Hashes that re-appear in
   image_exif between ticks are "revived" from the pending set
   (handles transient share unmount / remount).

Two new ExifDao methods:
  - list_rel_paths_for_library_page(library_id, limit, offset) for
    the paginated missing-file scan.
  - (count_for_library landed in Branch A.)

Watcher wiring (main.rs)

Per-library: missing-file scan inside the existing per-library
loop, after process_new_files, gated by the same probe check that
already protects ingest. After the loop: reconcile (Branch B),
back-ref refresh, then run_orphan_gc. The maintenance connection is
opened once per tick (image_api::database::connect), used by all
three DB-only passes, and dropped at end of tick.

CLAUDE.md gains a "Maintenance pipeline" subsection that describes
the three passes and their interaction with the existing
availability-and-safety policy.

Tests: 225 pass (217 from Branch B + 8 new in library_maintenance
covering back-ref refresh including the fully-orphaned no-op case,
two-tick GC consensus, Stale-tick consensus reset, image_exif
re-appearance revival, multi-table delete, and the
all_libraries_online helper).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 16:27:53 +00:00
Cameron Cordes
eea1bf3181 multi-library: availability probe + scoped EXIF queries + collision fixes
Branch A of the multi-library data-model rollout. Three threads of
correctness/safety work that ship together because the new mount
needs all three before it can land:

1. Library availability probe (libraries.rs, state.rs, main.rs)

   New LibraryHealth (Online | Stale { reason, since }) and a shared
   LibraryHealthMap on AppState. Probe checks root_path exists +
   is_dir + readable + non-empty (relative to a "had_data" signal so
   fresh mounts aren't downgraded). The watcher tick begins with a
   refresh_health() per library; stale libraries skip ingest, the
   hash backfill, and face-detection backlog drains for that tick.
   The orphaned-playlist cleanup also gates on every library being
   online — a missing source on a stale library is indistinguishable
   from a transient unmount, and the cleanup is destructive.

   /libraries now returns each library with its current health
   state. Logs only on Online↔Stale transitions so a long outage
   doesn't spam.

   New ExifDao::count_for_library is the "had_data" signal.

2. EXIF queries scoped by library_id (database/mod.rs, files.rs,
   main.rs, tags.rs)

   query_by_exif gains an Option<i32> library filter; /photos and
   /photos/exif now pass it. Without this, an EXIF-filtered request
   scoped to ?library=N returned cross-library results because the
   handler resolved the library but didn't push it through to SQL.

   get_exif_batch gains the same option. The watcher's per-library
   ingest, face-candidate build, and content-hash backfill all
   scope to their library; the union-mode /photos date-sort path
   and the library-agnostic tag fan-out (lookup_tags_batch, by
   design) keep using None.

3. Derivative-path collision fixes (content_hash.rs, main.rs)

   New content_hash::library_scoped_legacy_path helper:
   <derivative_dir>/<library_id>/<rel_path>. Thumbnail generation
   (startup walk + watcher needs-thumb check) and serving now use
   it; serving falls back to the bare-legacy mirrored path so
   pre-multi-library deployments keep working without
   regeneration. Without this, lib2 with the same rel_path as lib1
   would have its thumbnail request short-circuit to lib1's image.

   Orphaned-playlist cleanup walks every library when checking for
   the source video (was: BASE_PATH only). Without this, mounting
   a 2nd library and waiting 24h would delete every playlist whose
   source lived only in the 2nd library.

   The HLS playlist write path collision (filename-only basename,
   not rel_path) is left as a known issue with a TODO at the call
   site — the actor-pipeline rewrite belongs in Branch B/C.

Tests: 212 pass (cargo test --lib). New tests cover the probe
states (online / missing root / non-dir / empty-with-prior-data),
refresh_health transitions, query_by_exif scoping, get_exif_batch
keying on (library_id, rel_path), library_scoped_legacy_path, and
count_for_library.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 14:12:49 +00:00
Cameron Cordes
f50655fb21 indexer: apply EXCLUDED_DIRS to remaining WalkDir callers
Audit follow-up to 5bf4956. The same `@eaDir` pruning that protects
the indexer also needs to protect the other walks under library roots:

- `create_thumbnails` walks every file in every library to generate
  thumbnails. Without EXCLUDED_DIRS, it would generate thumbnails of
  Synology's `SYNOFILE_THUMB_*.jpg` thumbnails (thumbnails of thumbnails).
- `update_media_counts` walks for the prometheus IMAGE / VIDEO gauges.
  Without EXCLUDED_DIRS, the gauges over-count by however many phantom
  `@eaDir` images live alongside the real photos.
- `cleanup_orphaned_playlists` walks BASE_PATH searching for source
  videos by filename. EXCLUDED_DIRS isn't a behavior change for typical
  Synology mounts (no .mp4 in @eaDir), but it's a correctness win for
  any operator-defined exclude that happens to contain video.

Refactor: add `walk_library_files(base, excluded_dirs) -> Vec<DirEntry>`
to file_scan.rs as the shared primitive. `enumerate_indexable_files`
now layers media-type + mtime filters on top of it. One new test
covers the lower-level helper (returns all extensions, prunes excluded
subtrees).

`generate_video_gifs` (currently `#[allow(dead_code)]`, not reachable
from main) gets the `update_media_counts` signature update and reads
EXCLUDED_DIRS from env so a future revival isn't broken — but its
WalkDir walk stays raw because the dual lib/bin compile makes the
file_scan module path non-trivial there. Tagged with a comment.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 20:21:17 +00:00
Cameron Cordes
6a6a4a6a46 tags: batch lookup expands content-hash siblings cross-library
The first cut matched by rel_path only — fine for single-library
deploys but wrong for multi-library setups where the same content
lives under different rel_paths (e.g. a backup mount holding copies
of the primary library). A tag applied under library A would silently
not appear in the library-B grid badge even though the carousel's
per-path /image/tags would resolve it correctly via siblings.

The batch handler now does the expansion server-side in three queries
regardless of input size:

  1. image_exif batch lookup → query path → content_hash
  2. image_exif JOIN by content_hash → all sibling rel_paths sharing
     each hash (paths are deduped across libraries)
  3. tagged_photo + tags JOIN over the union of (query + sibling)
     rel_paths

Tags are then aggregated back to query paths via a sibling→originals
reverse map, deduped by tag id. Files without a content_hash (just
indexed, hash compute pending, etc.) skip step 2 and only get tags
from their own rel_path — same fallback the per-path handler uses.

Adds ExifDao::get_rel_paths_for_hashes (batch counterpart of
get_rel_paths_by_hash) chunked at 500 to stay under SQLite's
SQLITE_LIMIT_VARIABLE_NUMBER. Five queries for a 4k-photo grid is
still ~800x cheaper than per-path HTTP fan-out.
2026-04-30 00:36:44 +00:00
Cameron Cordes
7621282419 Thumb orientation + library filter on /photos/exif
Two follow-ups on the same feature branch:

1. Bake EXIF orientation into generated thumbnails. The `image` crate
   doesn't apply Orientation on load, and `save_with_format(..Jpeg)`
   drops EXIF — so portrait phone shots ended up sideways in any client
   that displays the cached thumb directly (no EXIF tag for the browser
   to compensate from). New `exif::read_orientation` reads the tag
   cheaply (no full EXIF parse) and `exif::apply_orientation` does the
   rotate/flip via image's existing `rotate90/180/270` + `fliph/flipv`.
   Applied in both branches of `generate_image_thumbnail` (RAW embedded-
   JPEG path and the regular `image::open` path). Existing thumbnails
   in the cache are still wrong-orientation; wipe the thumb dir or run
   a one-off backfill once this lands.

2. Optional `library` query param on `/photos/exif`. Accepts numeric id
   or name (same shape as `/image?library=...`), resolved via the
   existing `resolve_library_param` helper so a bad value 400s before
   we touch the DAO. Filter is applied post-query in the handler
   rather than pushed into `query_by_exif` to keep the DAO trait
   (and its test mocks) unchanged. Cheap enough at typical library
   counts; can be moved into SQL later if it ever isn't.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 17:29:36 -04:00
Cameron Cordes
c6f82ebaba Batch EXIF endpoint: GET /photos/exif
Adds a single round-trip projection of `image_exif` for every photo whose
`date_taken` falls in `[date_from, date_to]`. Wraps the existing
`ExifDao::query_by_exif` DAO method which already handles the SQL filter
in one query against the covering index — the only missing piece was
HTTP plumbing.

Designed for window-scoped consumers like Apollo's photo-to-track
matcher, which currently does N+1 (one `/photos` listing + one
`/image/metadata` per photo). Because `/image/metadata` serializes on
`Data<Mutex<dyn ExifDao>>`, that pattern can take 10s+ for windows with
hundreds of photos. The new endpoint takes one mutex acquisition for
the whole batch.

Response shape:
  { photos: [
      { file_path, library_id, library_name,
        camera_model, width, height,
        gps_latitude, gps_longitude, date_taken } ],
    total: N }

Two notes on scope:
- Photos with NULL `date_taken` are excluded by `query_by_exif`'s
  semantics. Filename-extracted dates are not synthesized here; rare
  callers that need that fallback can still hit `/image/metadata`.
- GPS columns are stored as f32 in image_exif to keep row size small;
  the JSON shape widens to f64 so clients don't have to know about the
  on-disk precision.

Library names are pre-mapped from `app_state.libraries` once and
stamped on each row, avoiding an O(rows × libraries) linear scan.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 16:38:53 -04:00
Cameron
dc2a96162e fix(dates): prefer earliest of fs created/modified as fallback
On copied or restored files (e.g. a backup library), the OS stamps
created at copy time while modified is preserved from the source, so
the earlier of the two is a better proxy for when the content
originated. Adds utils::earliest_fs_time and threads it through the
three spots that fall back to filesystem dates: photos-list sort,
memories grouping, and insight-generation timestamp.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 17:20:12 -04:00
Cameron
3027a3ffda perf: DB-backed recursive /photos + watcher reconciliation
Recursive listings now query image_exif instead of walking disk, taking
union-mode /photos from ~17s to sub-second on a 10k-file library. The
watcher's full scan prunes stale image_exif rows so the DB stays in
parity with the filesystem when files are deleted externally.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 01:55:07 +00:00
Cameron
b04dd8b601 fix: demote path-not-exists validation errors to debug
The /image cross-library fallback tries the resolved library first and falls
back to any library holding the rel_path. The first attempt emitted error-level
noise on every grid tile in union mode. Split the validation error so only
traversal attempts log at error; missing-file cases log at debug.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 01:55:07 +00:00
Cameron
2c8de8dcc6 feat: union /photos and /memories across libraries
When `library` is omitted, both endpoints now walk every configured
library root, interleave the results, and tag each row with its source
library via the parallel `photo_libraries` / per-row `library_id`
arrays. Previously the handlers fell back to the primary library,
silently hiding the rest.

Threads a parallel `file_libraries: Vec<i32>` through the sort/paginate
helpers so library attribution survives sorting and pagination.
Directory names are de-duplicated across libraries.

`get_all_with_date_taken` grows an optional library filter so memories
can scope its EXIF query per-library during the union walk.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 01:55:07 +00:00
Cameron
586b735af5 feat: include per-photo library id in /photos response
Adds a parallel `photo_libraries: Vec<i32>` array alongside `photos`
in `PhotosResponse` so clients can render per-thumbnail badges.
Populated with the scoped library id at the two main return sites;
left empty for `/favorites` since favorites are library-agnostic.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 01:55:07 +00:00
Cameron
c2ee3996be chore: apply cargo fmt + clippy cleanup across crate
Silence forward-looking dead_code on unused DAO modules, annotate
individual placeholder items, rewrite tautological assert!(true/false)
in token tests as panic! arms, and pick up fmt drift.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 01:55:07 +00:00
Cameron
a0f3bfab5f fix: validate gps-summary path against every library
The /photos/gps-summary handler validated the incoming path against
the primary library's root with new_file=false, which requires the
path to exist on disk. For a viewer opened on a file from a
non-primary library, tapping the GPS link produced activePath =
<folder from lib 2>, the primary-only check failed, and the server
400'd — so the map came up empty.

Validation here is purely a traversal guard (the DAO does a prefix
LIKE against rel_path), so we now accept the path as long as any
configured library can resolve it without escaping its root.

Also applies cargo fmt drift on files touched this session.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 01:55:07 +00:00
Cameron
7becbc0737 fix: normalize rel_path separators in non-recursive /photos listing
On Windows, strip_prefix preserves backslashes, so the non-recursive
branch was looking up tags for 'Melissa\img1.jpg' while tagged_photo
stores 'Melissa/img1.jpg' — every file was filtered out. Normalize to
'/' to match the watcher and populate_knowledge.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 01:55:07 +00:00
Cameron
2d942a9926 feat: content-hash-aware tag/insight sharing + library scoping
Tags and insights now follow content across libraries via content_hash
lookups on the read path, so the same file indexed at different rel_paths
in multiple libraries shares its annotations. Recursive tag search scopes
hits to the selected library by checking each tagged rel_path against
the library's disk (with a content-hash sibling fallback so tags attached
under one library's rel_path still match a content-equivalent file in
another). The /image and /image/metadata handlers fall back across
libraries when the file isn't under the resolved one, so union-mode
search results (which carry no library attribution in the response)
still serve correctly.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-21 01:55:07 +00:00
Cameron
c01a0479b7 fix: honor library param in /image, /photos, /memories
The Phase 3 plumbing accepted `library=` but didn't actually route
requests through the scoped library once it was resolved. Three
concrete bugs surfaced when testing against a second mounted library:

- `/image` always resolved paths against AppState.base_path (primary),
  so thumbnails for non-primary libraries 400'd when their rel_paths
  didn't exist under primary. Now resolves against the scoped library
  and defaults to primary when the param is omitted.

- `/memories` walked the scoped library correctly but its helper
  functions hardcoded `library_id: PRIMARY_LIBRARY_ID` on every
  MemoryItem, causing clients to route thumbnails back to primary
  regardless of which library the memory actually came from.

- `/photos` non-recursive listing delegated to a `RealFileSystem`
  constructed from AppState.base_path at startup, so walks always
  hit primary even when `library=2` was passed. The non-primary
  path now uses list_files against the scoped library's root;
  primary still goes through FileSystemAccess to preserve the
  existing test mock plumbing.

Also adds `library` to ThumbnailRequest so the /image query param
is actually parsed.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-21 01:55:07 +00:00
Cameron
0aaea91cc2 feat: add content_hash backfill + register every media file
Adds blake3 content hashing as the basis for derivative dedup
(thumbnails, HLS) across libraries. Computed inline by the watcher on
ingest and by a new `backfill_hashes` binary for historical rows.

Key changes:
- `content_hash` and `size_bytes` are now populated on new image_exif
  rows; a new ExifDao surface (`get_rows_missing_hash`,
  `backfill_content_hash`, `find_by_content_hash`) supports backfill and
  future hash-keyed lookups.
- The watcher now registers every image/video in image_exif, not just
  files with parseable EXIF. EXIF becomes optional enrichment; videos
  and other non-EXIF files still get a hashed row. This also makes
  DB-indexed sort/filter cover the full library.
- `/image` thumbnail serve dual-looks up hash-keyed path first, then
  falls back to the legacy mirrored layout.
- Upload flow accepts `?library=` query param + hashes uploaded files.
- Store_exif logs the underlying Diesel error on insert failure so
  constraint violations surface instead of hiding behind a generic
  InsertError.
- New migration normalizes rel_path separators to forward slash across
  all tables, deduplicating any rows that collide after normalization.
  Fixes spurious UNIQUE violations from mixed backslash/forward-slash
  paths on Windows ingest.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-21 01:55:07 +00:00
Cameron
ce5b337582 feat: make file watcher, thumbnails, and upload library-aware
`watch_files` and `create_thumbnails` now iterate every configured
library, tagging rows with the correct `library_id`. `process_new_files`
takes a `&Library` so InsertImageExif no longer hardcodes the primary
library. Upload accepts an optional `library` query param to pick a
target library; omitted still defaults to primary for backwards
compatibility.

Hash-keyed thumbnail/HLS storage with dual-lookup fallback is deferred
to Phase 5, where it's bundled with the content hash backfill that
actually makes the hash-keyed paths meaningful. Until hashes are
populated, the legacy mirrored layout is a no-op to change.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-21 01:55:07 +00:00
Cameron
48e5de6eab feat: add GET /libraries and library query param plumbing
New `/libraries` endpoint returns configured libraries so clients can
discover them. `FilesRequest` and `MemoriesRequest` gain an optional
`library` param (accepts name or numeric id). Unknown values are
rejected with 400; absent values span all libraries. `/memories`
now scopes its filesystem walk + EXIF query to the resolved library.
`MemoryItem` carries `library_id` so union-mode clients can render a
per-item source badge.

Behavior is unchanged in single-library mode: omitting `library` still
returns results from the primary library, which is the only one
configured until a second row is added to the libraries table.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-21 01:55:07 +00:00
Cameron
ffcddbb843 feat: multi-library foundation (schema + libraries module)
Adds a `libraries` registry table and threads library_id through
per-instance metadata tables (image_exif, photo_insights,
entity_photo_links, video_preview_clips). File-path columns renamed to
rel_path to make the relative-to-root semantics explicit. Adds
content_hash + size_bytes on image_exif to support future hash-keyed
thumbnail/HLS dedup. Tags and favorites stay library-agnostic so they
share across libraries by rel_path.

Behavior is unchanged: a single primary library (id=1) is seeded from
BASE_PATH on first boot; all handlers and DAOs route through it as a
transitional shim until the API gains a library query param.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-21 01:55:07 +00:00
Cameron
da16fddce3 Address path traversal and other security fixes 2026-04-10 14:58:57 -04:00
Cameron
da039bbc49 fix: include files without EXIF when sorting by date
Date sorting previously used a DB-level query that acted as an inner join,
silently dropping files with no image_exif row. Replace it with the existing
in-memory sort which already falls back to filename-extracted and filesystem
dates, so all files appear in sorted results.

Also removes the now-unused get_files_sorted_by_date trait method and its
SqliteExifDao implementation and test mock.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-07 14:43:26 -04:00
Cameron
c1b6013412 chore: cargo fmt + clippy fix for collapsed if-let chain (T017)
- cargo fmt applied across all modified source files
- Collapse nested if let Some / if !is_empty into a single let-chain (clippy::collapsible_match)
- All other warnings are pre-existing dead-code lint on unused trait methods

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-18 23:09:58 -04:00
Cameron
8ecd3c6cf8 refactor: use Arc<Mutex<SqliteConnection>> in SqliteTagDao, remove unsafe impl Sync
Aligns SqliteTagDao with the pattern used by SqliteExifDao and SqliteInsightDao.
The unsafe impl Sync workaround is no longer needed since Arc<Mutex<>> provides
safe interior mutability and automatic Sync derivation.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-18 17:10:11 -04:00
Cameron
7a0da1ab4a Build insight title from generated summary 2026-02-24 16:08:25 -05:00
Cameron
1efdd02eda Add GPS summary sorting
Run cargo fmt/clippy
2026-01-28 10:52:17 -05:00
Cameron
1d2f4e3441 Add circular thumbnail creation for Map view 2026-01-26 20:04:14 -05:00
Cameron
073b5ed418 Added gps-summary endpoint for Map integration 2026-01-26 11:58:24 -05:00
Cameron
be483c9c1a Add database optimizations for photo search and pagination
Implement database-level sorting with composite indexes for efficient date and tag queries. Add pagination metadata support and optimize tag count queries using batch processing.
2026-01-18 19:17:10 -05:00
Cameron
af35a996a3 Cleanup unused message embedding code
Fixup some warnings
2026-01-14 13:33:36 -05:00
Cameron
e2d6cd7258 Run clippy fix 2026-01-14 13:17:58 -05:00
Cameron
f65f4efde8 Make date parse from metadata a little more consistent 2026-01-14 12:54:36 -05:00
Cameron
fa600f1c2c Fallback to sorting by Metadata date 2026-01-11 14:39:50 -05:00
Cameron
d86b2c3746 Add Google Takeout data import infrastructure
Implements Phase 1 & 2 of Google Takeout RAG integration:
- Database migrations for calendar_events, location_history, search_history
- DAO implementations with hybrid time + semantic search
- Parsers for .ics, JSON, and HTML Google Takeout formats
- Import utilities with batch insert optimization

Features:
- CalendarEventDao: Hybrid time-range + semantic search for events
- LocationHistoryDao: GPS proximity with Haversine distance calculation
- SearchHistoryDao: Semantic-first search (queries are embedding-rich)
- Batch inserts for performance (1M+ records in minutes vs hours)
- OpenTelemetry tracing for all database operations

Import utilities:
- import_calendar: Parse .ics with optional embedding generation
- import_location_history: High-volume GPS data with batch inserts
- import_search_history: Always generates embeddings for semantic search

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-05 14:50:49 -05:00
Cameron
1171f19845 Create Insight Generation Feature
Added integration with Messages API and Ollama
2026-01-03 10:30:37 -05:00
Cameron
54e23a29b3 Fix warnings 2025-12-29 14:29:29 -05:00
Cameron
ccd16ba987 Files endpoint refactoring 2025-12-26 22:20:01 -05:00
Cameron
c035678162 Add tracing to EXIF DAO methods 2025-12-23 22:57:24 -05:00
Cameron
636701a69e Refactor file type checking for better consistency
Fix tests
2025-12-23 22:30:53 -05:00
Cameron
3a64b30621 Fix Date sorting in tagged/recursive search 2025-12-23 22:07:40 -05:00
Cameron
df94010d21 Fix tests and improve memories date error log 2025-12-19 14:20:51 -05:00
Cameron
28d85dc4a5 Fix recursive search and media filtering 2025-12-18 11:25:50 -05:00
Cameron
721b66481e Add EXIF search implementation to list photos endpoint 2025-12-18 10:06:13 -05:00
Cameron
eb8e08b9ff Add EXIF search infrastructure (Phase 1 & 2)
Implements foundation for EXIF-based photo search capabilities:

- Add geo.rs module with GPS distance calculations (Haversine + bounding box)
- Extend FilesRequest with EXIF search parameters (camera, GPS, date, media type)
- Add MediaType enum and DateTakenAsc/DateTakenDesc sort options
- Create date_taken index migration for efficient date queries
- Implement ExifDao methods: get_exif_batch, query_by_exif, get_camera_makes
- Add FileWithMetadata struct for date-aware sorting
- Implement date sorting with filename extraction fallback
- Make extract_date_from_filename public for reuse

Next: Integrate EXIF filtering into list_photos() and enhance get_all_tags()
2025-12-18 09:34:07 -05:00