Feature/unified nl search #106

Open
cameron wants to merge 26 commits from feature/unified-nl-search into master

26 Commits

Author SHA1 Message Date
Cameron Cordes 48a1b753f0 AI: add enable_thinking reasoning toggle plumbed to llama.cpp
New optional SamplingOverride forwarded to llama-server as
chat_template_kwargs.enable_thinking (gates Qwen3-style reasoning
blocks). None leaves the template default; other backends ignore it.
Wired through the agentic-insight and chat-turn request bodies/handlers.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-17 18:14:44 -04:00
Cameron Cordes f2ab8d3740 Unified search: use ANY-mode tag matching, not ALL
ALL-mode over-constrains NL queries — the model maps several query words to
tags and few photos carry every one, zeroing the candidate set. Switch to
ANY (a photo matches if it has any named tag); the semantic CLIP ranking
provides precision within that pool. Exclude tags still filter out.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 02:25:24 -04:00
Cameron Cordes 6e5898e766 Unified search: rank within filtered set instead of pre-thresholding CLIP
When structured filters are present they're the constraint and CLIP only ranks
within the candidate set, so drop the global similarity threshold for that
case. Previously the 0.2 whole-library threshold ran BEFORE intersecting with
the filters, discarding filter-matching photos that scored just under it (e.g.
a 2022 beach photo at 0.18) — producing after_struct_filter=0 even when matches
existed. Plain semantic (no filters) keeps the user's threshold.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 02:20:06 -04:00
Cameron Cordes 6c315edacc clip_client: log encode_text failures (URL + status/body or network error)
The CLIP encode failure reason was only ever returned in the HTTP response
body, never logged server-side, making 502s from /photos/search opaque. Log
the underlying cause — network error to the URL, or the Apollo HTTP status +
response body — so CLIP-service problems are diagnosable from the ImageApi log.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 02:02:57 -04:00
Cameron Cordes 0a40e78528 Unified search: UNIFIED_SEARCH_MODEL env override for the translation step
Pin the NL->structured translation to a small, fast model that can stay
co-resident with CLIP (and the chat model) so it never evicts them on a tight
VRAM budget. Precedence: UNIFIED_SEARCH_MODEL env > client-selected model >
configured default. Logs the effective model (backend.model()) so model A/B
tests are visible. Documented in .env.example.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 01:58:48 -04:00
Cameron Cordes e56235acc5 Unified search: stage-by-stage logging to debug empty results
Log the translated query (semantic/tags/place/date/media + has_struct), the
tag-filter file count, candidate-row + allowed-hash counts, and the CLIP
considered/hits/after-filter counts. Pinpoints which stage drops results to
zero (over-extracted filter, tag path mismatch, Any/All over-constraint, or
CLIP threshold). info-level for now while debugging.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 01:29:21 -04:00
Cameron Cordes fcbd7e2733 Unified search: accept client model override (avoid model swapping)
Add an optional `model` query param to /photos/search/unified, passed into
resolve_backend's overrides. The client sends the user's currently-selected
local model so the translation step reuses an already-loaded model instead of
forcing a llama-swap eviction + cold start. Falls back to the configured
default when absent. Still local only (no hybrid).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 01:19:53 -04:00
Cameron Cordes e4c875f473 Unified NL search Phase 2: /photos/search/unified endpoint
Composes the two existing engines (Path A orchestration):
- Translate NL -> StructuredQuery via local LLM, respecting LLM_BACKEND
  (resolve_backend(Local) -> ollama or llama-swap; no hybrid).
- Forward-geocode the place name into a gps circle.
- Structured filters (tags/EXIF/geo/date/media) build a candidate set of EXIF
  rows; CLIP ranks within it, joined by content_hash. Degenerate cases match
  existing behavior: semantic-only -> plain CLIP; filters-only -> date-sorted.
- Echoes the interpreted query (incl. resolved place) for editable client chips.

Refactor: extracted reusable cores from clip_search (score_photos, resolve_hits,
parse_library_scope, score_error_response) shared by both endpoints. Removed the
Phase 1 allow-until-wired attributes now that nl_query + geo are consumed.

fmt + clippy clean; 23 backend tests pass (7 geo, 12 nl_query, 4 unified).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 01:03:43 -04:00
Cameron Cordes 50ed780844 Unified NL search Phase 1: NL→structured-query translator + forward geocoding
Foundation for the /photos/search/unified endpoint (Phase 2). Two new,
fully unit-tested pieces, not yet wired into a route (allow-until-wired,
mirroring llm_client.rs):

- ai/nl_query.rs: translate a free-text query into a StructuredQuery via one
  grounded LLM call. Two-stage — the model emits names/ISO dates, then a pure
  resolve step maps tag names against the real vocab and converts dates to
  unix seconds. Hallucinated (non-vocab) tags are surfaced in unmatched_tags
  rather than silently used as hard filters — the anti-noise guard. 12 tests.

- geo::forward_geocode + bbox_to_circle: resolve a place name to a circle via
  Nominatim /search, collapsing the bounding box to centroid + circumscribing
  radius so "Portland" and "Italy" both map onto the existing gps circle
  filter with no schema change. Radius is the max centroid-to-corner distance
  (corners aren't equidistant on a sphere). 4 tests.

fmt + clippy clean; 19 new tests pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 00:44:16 -04:00
Cameron Cordes 7e21213181 Reels: bound disk/ledger growth (pre-gen prune + on-demand cache sweep)
Nothing reaped reels before, so the on-disk cache and ledger grew
unbounded — each night's daily reel is a new ~4MB file + ledger row that's
stale within ~26h.

- Pre-gen self-prune: after recording a reel, prune_superseded keeps the
  newest PREGEN_KEEP_PER_SCOPE (2) rows per (span, library) and unlinks the
  superseded reels' mp4+sidecar. Caps the ledger/disk at ~spans×libraries×2.
- On-disk sweeper (spawn_reel_cache_sweeper): every 24h, removes reel mp4s
  with no ledger row and no live job older than REEL_CACHE_MAX_AGE_DAYS (7) —
  bounding the on-demand cache, which has no ledger row and otherwise grows
  forever — plus crashed-render cruft (.mp4.tmp/.concat.txt/orphan sidecars).
  Runs regardless of REEL_PREGEN_ENABLED; disable with REEL_CACHE_SWEEP_ENABLED=0.
- New DAO methods prune_superseded + all_cache_keys (with tests); env knobs
  documented in .env.example.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-13 23:27:32 -04:00
Cameron Cordes 664b3694f8 Reels pre-gen: always render the agentic reel, don't adopt on-demand mp4
Past the key-aware dedup, any mp4 already at the cache key was not
pre-generated by us (no matching ledger row) — typically an on-demand
fast-scripted reel sharing the key after the max_segments alignment.
Adopting it recorded a ledger row pointing at the fast reel, silently
defeating agentic pre-gen. Drop the adopt-existing-mp4 shortcut and
always produce_reel (atomic overwrite). Worst case is one redundant
re-render if a prior run crashed between render and ledger write.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-13 23:16:14 -04:00
Cameron Cordes b52b1eb323 Reels pre-gen: make dedup cache-key-aware so key changes regenerate
exists_fresh only matched (span, library, render_version, age), so a
cache-key change that doesn't bump RENDER_VERSION (e.g. the max_segments
alignment, or any future selection-logic tweak) left last night's ledger
row looking 'fresh' — the nightly run would skip and the orphaned reel
would persist. Dedup now compares the stored cache_key to the freshly
computed key (and confirms the mp4 exists), so a changed key forces a
regen within the freshness window. exists_fresh stays as the HTTP
endpoint's fast gate.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-13 23:14:39 -04:00
Cameron Cordes 19fc1bbdf8 Reels pre-gen: use DEFAULT_MAX_SEGMENTS so cache keys match on-demand
pregen_one hardcoded max_segments: 24 while create_reel_handler defaults
to DEFAULT_MAX_SEGMENTS (40). Since the cache key encodes the raw
max_segments, the pre-generated reel's key never matched the client's
on-demand request, so POST /reels cache-hit an older max=40 reel and the
agentic pre-gen file was left orphaned. Align to DEFAULT_MAX_SEGMENTS (as
the plan specified) so the on-demand cache-hit path serves the pre-gen
reel. Content is unchanged — the actual beat count is duration-budgeted
either way; only the key descriptor differed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-13 23:12:54 -04:00
Cameron Cordes ca007a618d Reels pre-gen: record true media count + real upsert for user_ai_prefs
- pregen_one recorded media_count as planned.len() (beat count); record
  the actual media item total (media.len(), photos + clips) in both the
  cache-hit and freshly-rendered ledger paths. Drops the redundant
  photo_count binding.
- Replace upsert_prefs's insert-then-catch-error-then-update dance with a
  single atomic INSERT ... ON CONFLICT(id) DO UPDATE. Explicit id=1 makes
  the conflict target deterministic; explicit column .set((...)) keeps
  None -> NULL overwrite semantics so the row mirrors the latest request
  exactly, and genuine insert errors surface instead of being swallowed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-13 15:19:41 -04:00
Cameron Cordes e4d8d374fb Reels pre-gen: fix runtime breakers from review (1-5)
1. Drop the unregistered prefs_dao/reel_dao web::Data extractors from
   create_reel_handler / precomputed_reel_handler and read the DAOs off
   AppState instead (consistent with the scheduler). Missing app_data
   would have 500'd every POST /reels and /reels/precomputed at runtime.
2. Restore the dropped 'return' in the cache-hit branch — without it a
   cache hit fell through, overwrote the Done job with Queued, and
   re-ran the whole TTS+render pipeline on every request.
3. Make secs_until_next_run_hour minute/second-accurate so a batch that
   finishes inside the run hour sleeps ~24h instead of busy-looping
   (wake, re-run, sleep 0) for the rest of the hour. Tests updated.
4. Prune photo/user-bound tools (get_file_tags, get_faces_in_photo,
   recall_facts_for_photo, recall_facts_for_entity) from the agentic
   reel scripter's allow-list — they no-op/error with the empty
   file/user context and only burn iterations.
5. Align AGENTIC_SYSTEM_PROMPT's advertised tool list with the actual
   (pruned) allow-list.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-13 15:14:36 -04:00
Cameron Cordes 5c9ee56527 Fix agentic reel audit issues: midnight bug, DAO wiring, dead code, DST timezone, validation
Blocking fixes:
- secs_until_next_run_hour: same-hour now returns 0 instead of 24h
- capture_prefs: called at both handler return points, never fails request
- capture_prefs: resolves library param, upserts to user_ai_prefs via DAO
- Scheduler: uses AppState DAOs instead of separate connections
- Pregen dedup: uses resolved library param instead of hardcoded 'all'
- run_readonly_tool_loop: added #[allow(dead_code)] (used in main.rs only)
- run_readonly_tool_loop: removed dead messages.push() call
- InsightGenerator: added exif_dao() getter for scheduler reuse

Medium fixes:
- Input validation: run_hour clamped 0-23, week_dow clamped 0-6
- DST-sensitive timezone: fixed_tz_offset() with env var config

Low fixes:
- Documented REEL_PREGEN_MAX_TOOL_ITERS and REEL_PREGEN_TZ_FIXED_MINUTES
- Removed dead test_app_state function and unused imports

Also fix: UpsertUserAiPrefs import path, chrono::Local::with_ymd_and_hms
requires TimeZone trait + .single(), unwrap_or_else closure simplification
2026-06-13 14:59:00 -04:00
Cameron Cordes f707353807 feat: nightly agentic pre-generation of memory reels
Implement end-to-end nightly pre-generation of memory reels with agentic
scripting that grounds narration in calendar, location, messages, and RAG.

Sections A-E from the plan:

A. Extract produce_reel pipeline core from run_reel_job with
   ScripterMode::Fast/Agentic and progress callbacks.

B. Agentic scripter: factor run_readonly_tool_loop from the insight
   generator, build read-only tool gate, prompt builder with GPS, and
   generate_script_agentic with fallback to fast path.

C. Precomputed reels ledger (SQLite table + DAO), GET /reels/precomputed
   handler with validity gate, GET /reels/by-key/{key}/video streaming,
   and normalize_library_key helper.

D. Nightly scheduler: spawn_pregen_scheduler with configurable hour,
   run_pregen_batch (day/week/month spans), pregen_one with dedup and
   disk-check, secs_until_next_run_hour time math.

E. user_ai_prefs passive mirror table + DAO for param capture in
   create_reel_handler and replay in the scheduler.

Also fixes resolve_library_param signature to take &[Library] and adds
resolve_library_param_state wrapper for AppState callers.

New files: migrations/2026-06-13-000000_add_precomputed_reels/,
  migrations/2026-06-13-000010_add_user_ai_prefs/,
  src/database/precomputed_reel_dao.rs,
  src/database/user_ai_prefs_dao.rs
2026-06-13 14:29:34 -04:00
Cameron Cordes b30c8c16d0 Reels: clips play through the beat instead of freezing early
A clip beat capped playback at CLIP_SECONDS and filled the rest of the
narration with a tpad freeze-frame, so a clip stopped dead on its last
frame for a second or two before the transition — a glitchy pause that
stills don't have. Extract clip_beat_plan: the clip now plays for as
much of its beat as the source footage covers, and we freeze only when
the source is genuinely shorter than the narration. Bump RENDER_VERSION.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-13 11:00:01 -04:00
Cameron Cordes f5581edf5e Reels: ease burst fade 0.08s → 0.12s
0.08s read as too abrupt; 0.12s keeps the burst clearly snappier than the
0.35s held-shot fade without jarring. Bumps RENDER_VERSION.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-13 00:07:41 -04:00
Cameron Cordes 65793a2dda Reels: mixed-media (video clip beats) + faster burst fade
Videos in a span now appear as clip beats: the first few seconds of the
video (capped at CLIP_SECONDS=5, and to the source length) filled to the
portrait canvas like photos, with its live audio ducked under the
narration (amix at 0.35). If the narration outlasts the clip, the last
frame is held (tpad); clips with no audio track just play under narration.

Selection splits the beat budget between photo beats and clip beats —
clips get up to half (≥1 when present), photos the rest — then merges
both back into chronological order. SegmentMedia gains a Clip variant;
beats carry `media` (photos or one clip) and the cache key tags P/C so a
path used as a still vs a clip differ.

Also drops the burst fade from 0.15s to 0.08s so a quick burst reads
clearly differently from a held shot. Bumps RENDER_VERSION.

The clip filtergraph (fill + duck-mix + last-frame hold) is unit-tested
but, like the rest of the ffmpeg path, wants a real render check on the
GPU host.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-13 00:02:51 -04:00
Cameron Cordes 299e32b014 Bump version to 1.4.0
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-12 23:45:24 -04:00
Cameron Cordes 6e90f24307 Reels: burst beats + duration budget for week/month, plus step logging
Restructures a reel around beats — one narration line over one or more
photos — instead of one line per photo. A single-photo beat is a held
shot; a multi-photo beat is a quick burst that flashes through several
moments of an event while the line is read. So a week/month reel can show
everything it spans without a narrated (and timed) segment per photo.

Selection (selector.rs):
- Duration budget: cap the number of narrated beats to ~REEL_TARGET_SECONDS
  (default 90, env-tunable) so week/month reels don't run minutes long.
- Event clustering by time gap; when there are more events than the beat
  budget, adjacent events merge so the whole span stays covered. Each beat
  bursts up to MAX_BURST_PHOTOS (an even spread), so a 40-shot dinner
  contributes a handful of quick frames, not forty narrated seconds.

Render (render.rs): a beat renders its photos as a concat of per-photo
fills (blurred-bg portrait, fps-before-fade) under one muxed narration;
burst photos get a snappier fade. beat_durations splits the narration
across the photos, stretching only if a long burst would flash too fast.

Adds high-level info logs across the steps (request → script → per-beat
narrate/render → join → done with elapsed) for visibility. Bumps
RENDER_VERSION to re-render cached reels.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-12 23:43:18 -04:00
Cameron Cordes 740fc4d841 Reels: fix steppy fade (fps before fade) and ease the expression bump
The fade looked steppy/low-frame-rate because the filtergraph normalized
fps AFTER the fade filters: the brightness ramp was sampled at the looped
still's coarse input cadence, then duplicated up to 30fps. Move fps ahead
of the fades, pin the still's input framerate (-framerate), and force CFR
output (-r) so the dip ramps across a full 30 frames and plays steadily.

Ease narration expressiveness from 0.7 to 0.6 (still tunable via
REEL_TTS_EXAGGERATION). Bump RENDER_VERSION so existing reels re-render.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-12 23:20:52 -04:00
Cameron Cordes 7715a7a905 Reels: portrait canvas with blurred fill, fade transitions, warmer TTS
Fixes the "image is tiny" problem: a 1920x1080 landscape reel letterboxes
to a ~25%-height band on a portrait phone. Switch to a portrait 1080x1920
canvas and fill it per photo with a blurred, zoomed copy of the image
behind the sharp fitted photo — so the frame is always full regardless of
the photo's orientation, with no black bars and no cropping of the subject.

Add a quick 0.35s fade in/out baked into each segment so concatenated
photos dip smoothly instead of hard-cutting (fade-out lands in the
narration's silent tail, so speech isn't clipped). Drop the unused
Ken Burns branch — motion can return deliberately later.

Warm up the narration a touch: thread Chatterbox's `exaggeration` through
synthesize_serialized and default reels to 0.7 (tunable via
REEL_TTS_EXAGGERATION). Bump RENDER_VERSION so existing landscape reels
re-render.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-12 23:10:26 -04:00
Cameron Cordes 42453d5786 Fix reel concat: force -f mp4 for the .tmp output path
The concat stage wrote to <key>.mp4.tmp (for an atomic publish-rename),
but ffmpeg infers the muxer from the output extension and can't map
.tmp to a format — "Unable to choose an output format". Force the mp4
muxer explicitly so the temp extension is irrelevant. Segment render,
NVENC, TTS, and scripting were already working end-to-end; this was the
only failure, at the final join.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-12 22:56:48 -04:00
Cameron Cordes e3f731b3b2 Add memory-reel backend: on-demand narrated photo slideshow
New POST /reels + GET /reels/{id} (+ /video) build an MP4 slideshow of a
memory span (day/week/month), narrated by the LLM in a cloned voice.

Pipeline (src/reels/): a selector resolves which photos + reel metadata,
the scripter writes one narration line per photo via a single LLM call
(reusing each photo's cached insight as context — no fresh vision calls,
so reel generation stays off the GPU's vision slot), each line is
synthesized to speech, and the renderer assembles stills + narration via
ffmpeg. Jobs run in the background (mirroring the TTS speech-job
registry) since a reel takes minutes; the finished MP4 is cached on disk
keyed by the selection so a repeat request is instant.

The segment model is media-typed (Photo today) so a video-clip segment
(phase 2) and a nightly pre-render (phase 3) slot in without reworking
the pipeline. Ken Burns motion is implemented but defaulted off pending a
visual check on the GPU box.

Supporting changes:
- memories: extract gather_memory_items() so the reel selector reuses the
  exact window/exclusion/tz/sort logic behind /memories.
- ai::tts: add synthesize_serialized() so reel narration honors the same
  single-GPU permit + write lease as user TTS requests.
- video::ffmpeg: make get_duration_seconds() pub for narration timing.
- AppState: reels_path (REELS_DIRECTORY, defaults beside preview clips).

Pure logic (cache key, script parsing, ffmpeg arg/filter construction,
even sampling, segment timing) is unit-tested (26 tests). The runtime
path (ffmpeg render, TTS, LLM) needs a real run on the GPU host to verify
end-to-end — not exercisable in CI.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-12 22:31:08 -04:00