ImageApi

Author	SHA1	Message	Date
Cameron Cordes	48a1b753f0	AI: add enable_thinking reasoning toggle plumbed to llama.cpp New optional SamplingOverride forwarded to llama-server as chat_template_kwargs.enable_thinking (gates Qwen3-style reasoning blocks). None leaves the template default; other backends ignore it. Wired through the agentic-insight and chat-turn request bodies/handlers. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-17 18:14:44 -04:00
Cameron Cordes	f2ab8d3740	Unified search: use ANY-mode tag matching, not ALL ALL-mode over-constrains NL queries — the model maps several query words to tags and few photos carry every one, zeroing the candidate set. Switch to ANY (a photo matches if it has any named tag); the semantic CLIP ranking provides precision within that pool. Exclude tags still filter out. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-14 02:25:24 -04:00
Cameron Cordes	6e5898e766	Unified search: rank within filtered set instead of pre-thresholding CLIP When structured filters are present they're the constraint and CLIP only ranks within the candidate set, so drop the global similarity threshold for that case. Previously the 0.2 whole-library threshold ran BEFORE intersecting with the filters, discarding filter-matching photos that scored just under it (e.g. a 2022 beach photo at 0.18) — producing after_struct_filter=0 even when matches existed. Plain semantic (no filters) keeps the user's threshold. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-14 02:20:06 -04:00
Cameron Cordes	6c315edacc	clip_client: log encode_text failures (URL + status/body or network error) The CLIP encode failure reason was only ever returned in the HTTP response body, never logged server-side, making 502s from /photos/search opaque. Log the underlying cause — network error to the URL, or the Apollo HTTP status + response body — so CLIP-service problems are diagnosable from the ImageApi log. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-14 02:02:57 -04:00
Cameron Cordes	0a40e78528	Unified search: UNIFIED_SEARCH_MODEL env override for the translation step Pin the NL->structured translation to a small, fast model that can stay co-resident with CLIP (and the chat model) so it never evicts them on a tight VRAM budget. Precedence: UNIFIED_SEARCH_MODEL env > client-selected model > configured default. Logs the effective model (backend.model()) so model A/B tests are visible. Documented in .env.example. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-14 01:58:48 -04:00
Cameron Cordes	e56235acc5	Unified search: stage-by-stage logging to debug empty results Log the translated query (semantic/tags/place/date/media + has_struct), the tag-filter file count, candidate-row + allowed-hash counts, and the CLIP considered/hits/after-filter counts. Pinpoints which stage drops results to zero (over-extracted filter, tag path mismatch, Any/All over-constraint, or CLIP threshold). info-level for now while debugging. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-14 01:29:21 -04:00
Cameron Cordes	fcbd7e2733	Unified search: accept client model override (avoid model swapping) Add an optional `model` query param to /photos/search/unified, passed into resolve_backend's overrides. The client sends the user's currently-selected local model so the translation step reuses an already-loaded model instead of forcing a llama-swap eviction + cold start. Falls back to the configured default when absent. Still local only (no hybrid). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-14 01:19:53 -04:00
Cameron Cordes	e4c875f473	Unified NL search Phase 2: /photos/search/unified endpoint Composes the two existing engines (Path A orchestration): - Translate NL -> StructuredQuery via local LLM, respecting LLM_BACKEND (resolve_backend(Local) -> ollama or llama-swap; no hybrid). - Forward-geocode the place name into a gps circle. - Structured filters (tags/EXIF/geo/date/media) build a candidate set of EXIF rows; CLIP ranks within it, joined by content_hash. Degenerate cases match existing behavior: semantic-only -> plain CLIP; filters-only -> date-sorted. - Echoes the interpreted query (incl. resolved place) for editable client chips. Refactor: extracted reusable cores from clip_search (score_photos, resolve_hits, parse_library_scope, score_error_response) shared by both endpoints. Removed the Phase 1 allow-until-wired attributes now that nl_query + geo are consumed. fmt + clippy clean; 23 backend tests pass (7 geo, 12 nl_query, 4 unified). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-14 01:03:43 -04:00
Cameron Cordes	50ed780844	Unified NL search Phase 1: NL→structured-query translator + forward geocoding Foundation for the /photos/search/unified endpoint (Phase 2). Two new, fully unit-tested pieces, not yet wired into a route (allow-until-wired, mirroring llm_client.rs): - ai/nl_query.rs: translate a free-text query into a StructuredQuery via one grounded LLM call. Two-stage — the model emits names/ISO dates, then a pure resolve step maps tag names against the real vocab and converts dates to unix seconds. Hallucinated (non-vocab) tags are surfaced in unmatched_tags rather than silently used as hard filters — the anti-noise guard. 12 tests. - geo::forward_geocode + bbox_to_circle: resolve a place name to a circle via Nominatim /search, collapsing the bounding box to centroid + circumscribing radius so "Portland" and "Italy" both map onto the existing gps circle filter with no schema change. Radius is the max centroid-to-corner distance (corners aren't equidistant on a sphere). 4 tests. fmt + clippy clean; 19 new tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-14 00:44:16 -04:00
Cameron Cordes	7e21213181	Reels: bound disk/ledger growth (pre-gen prune + on-demand cache sweep) Nothing reaped reels before, so the on-disk cache and ledger grew unbounded — each night's daily reel is a new ~4MB file + ledger row that's stale within ~26h. - Pre-gen self-prune: after recording a reel, prune_superseded keeps the newest PREGEN_KEEP_PER_SCOPE (2) rows per (span, library) and unlinks the superseded reels' mp4+sidecar. Caps the ledger/disk at ~spans×libraries×2. - On-disk sweeper (spawn_reel_cache_sweeper): every 24h, removes reel mp4s with no ledger row and no live job older than REEL_CACHE_MAX_AGE_DAYS (7) — bounding the on-demand cache, which has no ledger row and otherwise grows forever — plus crashed-render cruft (.mp4.tmp/.concat.txt/orphan sidecars). Runs regardless of REEL_PREGEN_ENABLED; disable with REEL_CACHE_SWEEP_ENABLED=0. - New DAO methods prune_superseded + all_cache_keys (with tests); env knobs documented in .env.example. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-13 23:27:32 -04:00
Cameron Cordes	664b3694f8	Reels pre-gen: always render the agentic reel, don't adopt on-demand mp4 Past the key-aware dedup, any mp4 already at the cache key was not pre-generated by us (no matching ledger row) — typically an on-demand fast-scripted reel sharing the key after the max_segments alignment. Adopting it recorded a ledger row pointing at the fast reel, silently defeating agentic pre-gen. Drop the adopt-existing-mp4 shortcut and always produce_reel (atomic overwrite). Worst case is one redundant re-render if a prior run crashed between render and ledger write. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-13 23:16:14 -04:00
Cameron Cordes	b52b1eb323	Reels pre-gen: make dedup cache-key-aware so key changes regenerate exists_fresh only matched (span, library, render_version, age), so a cache-key change that doesn't bump RENDER_VERSION (e.g. the max_segments alignment, or any future selection-logic tweak) left last night's ledger row looking 'fresh' — the nightly run would skip and the orphaned reel would persist. Dedup now compares the stored cache_key to the freshly computed key (and confirms the mp4 exists), so a changed key forces a regen within the freshness window. exists_fresh stays as the HTTP endpoint's fast gate. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-13 23:14:39 -04:00
Cameron Cordes	19fc1bbdf8	Reels pre-gen: use DEFAULT_MAX_SEGMENTS so cache keys match on-demand pregen_one hardcoded max_segments: 24 while create_reel_handler defaults to DEFAULT_MAX_SEGMENTS (40). Since the cache key encodes the raw max_segments, the pre-generated reel's key never matched the client's on-demand request, so POST /reels cache-hit an older max=40 reel and the agentic pre-gen file was left orphaned. Align to DEFAULT_MAX_SEGMENTS (as the plan specified) so the on-demand cache-hit path serves the pre-gen reel. Content is unchanged — the actual beat count is duration-budgeted either way; only the key descriptor differed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-13 23:12:54 -04:00
Cameron Cordes	ca007a618d	Reels pre-gen: record true media count + real upsert for user_ai_prefs - pregen_one recorded media_count as planned.len() (beat count); record the actual media item total (media.len(), photos + clips) in both the cache-hit and freshly-rendered ledger paths. Drops the redundant photo_count binding. - Replace upsert_prefs's insert-then-catch-error-then-update dance with a single atomic INSERT ... ON CONFLICT(id) DO UPDATE. Explicit id=1 makes the conflict target deterministic; explicit column .set((...)) keeps None -> NULL overwrite semantics so the row mirrors the latest request exactly, and genuine insert errors surface instead of being swallowed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-13 15:19:41 -04:00
Cameron Cordes	e4d8d374fb	Reels pre-gen: fix runtime breakers from review (1-5) 1. Drop the unregistered prefs_dao/reel_dao web::Data extractors from create_reel_handler / precomputed_reel_handler and read the DAOs off AppState instead (consistent with the scheduler). Missing app_data would have 500'd every POST /reels and /reels/precomputed at runtime. 2. Restore the dropped 'return' in the cache-hit branch — without it a cache hit fell through, overwrote the Done job with Queued, and re-ran the whole TTS+render pipeline on every request. 3. Make secs_until_next_run_hour minute/second-accurate so a batch that finishes inside the run hour sleeps ~24h instead of busy-looping (wake, re-run, sleep 0) for the rest of the hour. Tests updated. 4. Prune photo/user-bound tools (get_file_tags, get_faces_in_photo, recall_facts_for_photo, recall_facts_for_entity) from the agentic reel scripter's allow-list — they no-op/error with the empty file/user context and only burn iterations. 5. Align AGENTIC_SYSTEM_PROMPT's advertised tool list with the actual (pruned) allow-list. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-13 15:14:36 -04:00
Cameron Cordes	5c9ee56527	Fix agentic reel audit issues: midnight bug, DAO wiring, dead code, DST timezone, validation Blocking fixes: - secs_until_next_run_hour: same-hour now returns 0 instead of 24h - capture_prefs: called at both handler return points, never fails request - capture_prefs: resolves library param, upserts to user_ai_prefs via DAO - Scheduler: uses AppState DAOs instead of separate connections - Pregen dedup: uses resolved library param instead of hardcoded 'all' - run_readonly_tool_loop: added #[allow(dead_code)] (used in main.rs only) - run_readonly_tool_loop: removed dead messages.push() call - InsightGenerator: added exif_dao() getter for scheduler reuse Medium fixes: - Input validation: run_hour clamped 0-23, week_dow clamped 0-6 - DST-sensitive timezone: fixed_tz_offset() with env var config Low fixes: - Documented REEL_PREGEN_MAX_TOOL_ITERS and REEL_PREGEN_TZ_FIXED_MINUTES - Removed dead test_app_state function and unused imports Also fix: UpsertUserAiPrefs import path, chrono::Local::with_ymd_and_hms requires TimeZone trait + .single(), unwrap_or_else closure simplification	2026-06-13 14:59:00 -04:00
Cameron Cordes	f707353807	feat: nightly agentic pre-generation of memory reels Implement end-to-end nightly pre-generation of memory reels with agentic scripting that grounds narration in calendar, location, messages, and RAG. Sections A-E from the plan: A. Extract produce_reel pipeline core from run_reel_job with ScripterMode::Fast/Agentic and progress callbacks. B. Agentic scripter: factor run_readonly_tool_loop from the insight generator, build read-only tool gate, prompt builder with GPS, and generate_script_agentic with fallback to fast path. C. Precomputed reels ledger (SQLite table + DAO), GET /reels/precomputed handler with validity gate, GET /reels/by-key/{key}/video streaming, and normalize_library_key helper. D. Nightly scheduler: spawn_pregen_scheduler with configurable hour, run_pregen_batch (day/week/month spans), pregen_one with dedup and disk-check, secs_until_next_run_hour time math. E. user_ai_prefs passive mirror table + DAO for param capture in create_reel_handler and replay in the scheduler. Also fixes resolve_library_param signature to take &[Library] and adds resolve_library_param_state wrapper for AppState callers. New files: migrations/2026-06-13-000000_add_precomputed_reels/, migrations/2026-06-13-000010_add_user_ai_prefs/, src/database/precomputed_reel_dao.rs, src/database/user_ai_prefs_dao.rs	2026-06-13 14:29:34 -04:00
Cameron Cordes	b30c8c16d0	Reels: clips play through the beat instead of freezing early A clip beat capped playback at CLIP_SECONDS and filled the rest of the narration with a tpad freeze-frame, so a clip stopped dead on its last frame for a second or two before the transition — a glitchy pause that stills don't have. Extract clip_beat_plan: the clip now plays for as much of its beat as the source footage covers, and we freeze only when the source is genuinely shorter than the narration. Bump RENDER_VERSION. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-13 11:00:01 -04:00
Cameron Cordes	f5581edf5e	Reels: ease burst fade 0.08s → 0.12s 0.08s read as too abrupt; 0.12s keeps the burst clearly snappier than the 0.35s held-shot fade without jarring. Bumps RENDER_VERSION. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-13 00:07:41 -04:00
Cameron Cordes	65793a2dda	Reels: mixed-media (video clip beats) + faster burst fade Videos in a span now appear as clip beats: the first few seconds of the video (capped at CLIP_SECONDS=5, and to the source length) filled to the portrait canvas like photos, with its live audio ducked under the narration (amix at 0.35). If the narration outlasts the clip, the last frame is held (tpad); clips with no audio track just play under narration. Selection splits the beat budget between photo beats and clip beats — clips get up to half (≥1 when present), photos the rest — then merges both back into chronological order. SegmentMedia gains a Clip variant; beats carry `media` (photos or one clip) and the cache key tags P/C so a path used as a still vs a clip differ. Also drops the burst fade from 0.15s to 0.08s so a quick burst reads clearly differently from a held shot. Bumps RENDER_VERSION. The clip filtergraph (fill + duck-mix + last-frame hold) is unit-tested but, like the rest of the ffmpeg path, wants a real render check on the GPU host. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-13 00:02:51 -04:00
Cameron Cordes	6e90f24307	Reels: burst beats + duration budget for week/month, plus step logging Restructures a reel around beats — one narration line over one or more photos — instead of one line per photo. A single-photo beat is a held shot; a multi-photo beat is a quick burst that flashes through several moments of an event while the line is read. So a week/month reel can show everything it spans without a narrated (and timed) segment per photo. Selection (selector.rs): - Duration budget: cap the number of narrated beats to ~REEL_TARGET_SECONDS (default 90, env-tunable) so week/month reels don't run minutes long. - Event clustering by time gap; when there are more events than the beat budget, adjacent events merge so the whole span stays covered. Each beat bursts up to MAX_BURST_PHOTOS (an even spread), so a 40-shot dinner contributes a handful of quick frames, not forty narrated seconds. Render (render.rs): a beat renders its photos as a concat of per-photo fills (blurred-bg portrait, fps-before-fade) under one muxed narration; burst photos get a snappier fade. beat_durations splits the narration across the photos, stretching only if a long burst would flash too fast. Adds high-level info logs across the steps (request → script → per-beat narrate/render → join → done with elapsed) for visibility. Bumps RENDER_VERSION to re-render cached reels. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-12 23:43:18 -04:00
Cameron Cordes	740fc4d841	Reels: fix steppy fade (fps before fade) and ease the expression bump The fade looked steppy/low-frame-rate because the filtergraph normalized fps AFTER the fade filters: the brightness ramp was sampled at the looped still's coarse input cadence, then duplicated up to 30fps. Move fps ahead of the fades, pin the still's input framerate (-framerate), and force CFR output (-r) so the dip ramps across a full 30 frames and plays steadily. Ease narration expressiveness from 0.7 to 0.6 (still tunable via REEL_TTS_EXAGGERATION). Bump RENDER_VERSION so existing reels re-render. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-12 23:20:52 -04:00
Cameron Cordes	7715a7a905	Reels: portrait canvas with blurred fill, fade transitions, warmer TTS Fixes the "image is tiny" problem: a 1920x1080 landscape reel letterboxes to a ~25%-height band on a portrait phone. Switch to a portrait 1080x1920 canvas and fill it per photo with a blurred, zoomed copy of the image behind the sharp fitted photo — so the frame is always full regardless of the photo's orientation, with no black bars and no cropping of the subject. Add a quick 0.35s fade in/out baked into each segment so concatenated photos dip smoothly instead of hard-cutting (fade-out lands in the narration's silent tail, so speech isn't clipped). Drop the unused Ken Burns branch — motion can return deliberately later. Warm up the narration a touch: thread Chatterbox's `exaggeration` through synthesize_serialized and default reels to 0.7 (tunable via REEL_TTS_EXAGGERATION). Bump RENDER_VERSION so existing landscape reels re-render. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-12 23:10:26 -04:00
Cameron Cordes	42453d5786	Fix reel concat: force -f mp4 for the .tmp output path The concat stage wrote to <key>.mp4.tmp (for an atomic publish-rename), but ffmpeg infers the muxer from the output extension and can't map .tmp to a format — "Unable to choose an output format". Force the mp4 muxer explicitly so the temp extension is irrelevant. Segment render, NVENC, TTS, and scripting were already working end-to-end; this was the only failure, at the final join. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-12 22:56:48 -04:00
Cameron Cordes	e3f731b3b2	Add memory-reel backend: on-demand narrated photo slideshow New POST /reels + GET /reels/{id} (+ /video) build an MP4 slideshow of a memory span (day/week/month), narrated by the LLM in a cloned voice. Pipeline (src/reels/): a selector resolves which photos + reel metadata, the scripter writes one narration line per photo via a single LLM call (reusing each photo's cached insight as context — no fresh vision calls, so reel generation stays off the GPU's vision slot), each line is synthesized to speech, and the renderer assembles stills + narration via ffmpeg. Jobs run in the background (mirroring the TTS speech-job registry) since a reel takes minutes; the finished MP4 is cached on disk keyed by the selection so a repeat request is instant. The segment model is media-typed (Photo today) so a video-clip segment (phase 2) and a nightly pre-render (phase 3) slot in without reworking the pipeline. Ken Burns motion is implemented but defaulted off pending a visual check on the GPU box. Supporting changes: - memories: extract gather_memory_items() so the reel selector reuses the exact window/exclusion/tz/sort logic behind /memories. - ai::tts: add synthesize_serialized() so reel narration honors the same single-GPU permit + write lease as user TTS requests. - video::ffmpeg: make get_duration_seconds() pub for narration timing. - AppState: reels_path (REELS_DIRECTORY, defaults beside preview clips). Pure logic (cache key, script parsing, ffmpeg arg/filter construction, even sampling, segment timing) is unit-tested (26 tests). The runtime path (ffmpeg render, TTS, LLM) needs a real run on the GPU host to verify end-to-end — not exercisable in CI. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-12 22:31:08 -04:00
Cameron Cordes	1017fe73af	Include start offset in voice-name window tag Clones that don't start at 0:00 are tagged with where the reference window begins (grandma-at1m32s-30s), so voices cloned from different sections of the same source are distinguishable in the voice list. Zero-start names keep the existing -30s form. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-12 16:21:41 -04:00
Cameron Cordes	1dec34540d	Add start/duration window selection for voice-clone reference clips Both voice creation endpoints (upload + from-library) now accept optional start_seconds/duration_seconds, threaded to ffmpeg as -ss/-t, so the reference window can target clean speech anywhere in a long recording instead of always the first N seconds. Duration is clamped to the LLAMA_SWAP_TTS_REF_SECONDS cap and the voice-name tag reflects the actual window length. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-12 16:09:03 -04:00
Cameron Cordes	2e0f78aa1b	Add user-configurable TTS pronunciation overrides A JSON map (TTS_PRONUNCIATIONS_PATH, default tts_pronunciations.json) rewrites mispronounced words — place names, initialisms, dotted abbreviations — to phonetic spellings before synthesis, applied after markdown cleanup in both /tts/speech paths. Whole-word smartcase matching (lowercase keys match any casing, uppercase keys exact), longest key wins, hot-reloaded on mtime change with last-good fallback on parse errors. See tts_pronunciations.example.json. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-11 23:06:18 -04:00
Cameron Cordes	3fa4fa8501	Strip markdown decoration from model-emitted insight titles Models wrap the title line despite the prompt — "Title: A Day in the Woods", "## Title: ...", bold around just the label — which made parse_title_body's bare "Title:" prefix match fall through to the fallbacks and leak asterisks into the stored title. strip_title_markdown trims bold/italic markers, heading hashes, backticks, and quotes from both ends; applied to the label line, the extracted title, both fallback paths, and generate_photo_title (which previously stripped only quotes). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-11 22:18:43 -04:00
Cameron Cordes	efd05db523	Make the embedding model swappable via env for A/B testing Trialing Qwen3-Embedding-0.6B (1024-dim, instruct-prefixed queries) against nomic required code changes at every hardcoded seam; now it's a config flip plus a reembed_embeddings run. - EMBEDDING_DIM env (default 768) replaces every hardcoded dim check: daily summary / calendar / search / location DAOs, Ollama batch validation, reembed_embeddings - entities gains the dim guard it never had — a wrong-dim vector silently kills dedup/recall (cosine over mismatched lengths is 0), so store None and warn instead - embed_query / embed_document split with EMBED_QUERY_PREFIX / EMBED_DOCUMENT_PREFIX (literal \n expanded): retrieval models treat the two sides differently — nomic wants search_query:/search_document:, Qwen3 wants Instruct:...\nQuery: on queries only. All query-side call sites and all corpus writers now declare their side. - document the contract in CLAUDE.md: change the model or any of these vars → re-run reembed_embeddings or search is garbage Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-11 21:40:40 -04:00
Cameron Cordes	b1493f5aca	Wait out TTS GPU hold before the insight job timeout starts The GPU lease keeps per-request reqwest budgets from burning behind a cross-model swap, but the job-level INSIGHT_GENERATION_TIMEOUT_SECS wall-clock started at spawn — an insight queued behind a running TTS synthesis parked its first chat call on the lease and timed out ("timeout after 180s") before chatterbox even finished loading. Acquire-and-drop an LLM read lease before starting the job clock in both insight handlers: the wait for the GPU happens before the timeout begins, mirroring the per-request lease semantics. Dropped immediately — holding it across the generation would deadlock the chat calls' own lease acquisitions. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-11 19:15:38 -04:00
Cameron Cordes	a022a3d15d	Fix RAG vector-space mismatch and search_rag retrieval quality Queries embedded via llama-swap were searching corpora embedded via Ollama (measured: spaces diverged). Introduce LocalLlm — the local Ollama + llama-swap pair with LLM_BACKEND dispatch baked in — and route all embedding writers through it; anything embedding via a concrete client reintroduces the bug. - search_rag: embed the model's query verbatim (no metadata boilerplate), make date optional — no time-decay when omitted, so "when did X happen?" queries rank purely by similarity across all time - reembed_embeddings bin: re-embed summaries / calendar / search / knowledge entities via the active backend, with old-new cosine report per table and truncate-and-retry for inputs over the embed server's physical batch size - import_calendar, import_search_history: embed through LocalLlm - search_messages / get_sms_messages: render sender → recipient so sent messages are attributable to a conversation - insight job failures: store the one-line anyhow context chain ({:#}) instead of the Debug dump the client was shown verbatim - serialize env_dispatch tests behind a lock (parallel-runner flake) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-11 19:06:52 -04:00
Cameron Cordes	0accc4ef2f	Add GPU lease coordinating LLM and TTS requests through llama-swap llama-swap runs chat/vision/Chatterbox as a mutually-exclusive set on one GPU and HOLDS a request for a non-resident model until the resident model drains, then swaps. That hold burned the holder's reqwest timeout (measured: a queued TTS lost 77s behind one LLM turn; an LLM request behind a synthesis waited the entire remaining synth), so concurrent insight + read-aloud timed out instead of queueing. ai::gpu adds a fair RwLock lease acquired before each request is sent, so cross-model waits happen before the HTTP timeout starts: chat/vision share the read lease, TTS synthesis and voice-library ops (which spin Chatterbox up) take the write lease, and embeddings take none (the embed slot is in llama-swap's always-resident group). Speech jobs now flip queued->running only after acquiring the GPU, letting the client anchor its poll deadline to that transition. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-11 18:20:06 -04:00
Cameron Cordes	03699f7413	Add TTS voice deletion, async speech jobs, voice-list cache, ref-seconds name tags - DELETE /tts/voices/{name}: remove a cloned voice via the llama-swap passthrough (upstream chatterbox-tts-api exposes DELETE /voices/{name}). - POST/GET/DELETE /tts/speech/jobs: durable job flow for long syntheses — dispatch returns 202 + job id, the synth queues on the GPU permit instead of fast-failing 429, and clients poll for the result (kept ~10 min). - GET /tts/voices now serves an in-memory cache so listing voices doesn't make llama-swap spin up the TTS model (evicting the resident LLM); invalidated on create/delete, ?refresh=1 forces an upstream re-query. - Created voice names are tagged with LLAMA_SWAP_TTS_REF_SECONDS (e.g. grandma-30s) so the library shows which ref length produced each clone. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-10 17:36:15 -04:00
Cameron Cordes	31904fef80	Raise chat truncation default num_ctx to 32k, env-overridable The history-truncation budget assumed an 8192-token context whenever a chat request omitted num_ctx, while the llama-swap chat slots serve 20k-131k. Replayed transcripts past ~6k tokens were silently gutted every turn — losing conversation history and destroying llama.cpp KV-cache prefix reuse (full SWA re-prefill per turn). Default is now 32768 (real conversations top out around 16k), with AGENTIC_CHAT_DEFAULT_NUM_CTX to override per deploy, floored at headroom + 1024. Explicit per-request num_ctx still wins. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-09 19:14:02 -04:00
Cameron Cordes	13f3635db2	Fix clippy lints in backfill and libraries tests Keep `cargo clippy --tests` clean alongside the agentic-loop changes: alias backfill's five-element setup() tuple as SetupFixture (type_complexity) and build the single-library health map via std::slice::from_ref instead of cloning (unnecessary clone-to-slice). No behavior change. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-09 18:29:44 -04:00
Cameron Cordes	b711252c23	Resolve persona prompts server-side; drop synthetic prompt in chat_turn A request carrying persona_id but no system_prompt used to fall back to the neutral default voice. Both agentic generation (generate_agentic_insight_handler) and chat bootstrap now resolve the persona's stored prompt from the persona store, with precedence: explicit non-blank client system_prompt > persona store lookup > existing default ("default" persona id behaves the same — used if the store has a row, neutral default otherwise). Resolution happens at the handler / bootstrap entry where the DAO is reachable; internals are unchanged. resolve_bootstrap_system_prompt takes the resolved persona prompt as a second argument, with precedence tests. Also in insight_chat: - Sync chat_turn no longer persists the synthetic "Please write your final answer now without calling any more tools." user message pushed on iteration exhaustion — extracted both streaming variants' synthetic_idx pattern into push/remove_synthetic_final_prompt (the remove is a defensive no-op on index drift) and applied it to all three loops; round-trip test included. - Strip leaked <think> blocks from the final content persisted as the reply in chat_turn and both streaming AgenticLoopOutcomes (mid-stream TextDeltas are untouched; the raw transcript keeps the block). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-09 18:29:35 -04:00
Cameron Cordes	091982bdfc	Add recall_facts_for_entity tool; fix generation gates and tool output Agentic-loop fixes in the generator: - New recall_facts_for_entity tool (always-on, like recall_entities): fetches facts for one entity by id so the model can follow up on entities surfaced by recall_entities that aren't photo-linked (recall_facts_for_photo only covers linked entities). Mirrors that tool's persona scoping (PersonaFilter::Single) and the persona's reviewed_only_facts filter exactly, and renders in the same "Entity: ... / - predicate object" style. Wired through execute_tool and the trajectory summarizer. - Generation now resolves gates persona-aware: current_gate_opts_for_persona(images_inline, Some((user_id, persona_id))) instead of the None-defaulting wrapper, so a persona's allow_agent_corrections opens propose_correction during generation the same way chat turns already did. The now-unused current_gate_opts wrapper is removed. - Strip leaked <think> blocks from the final assistant content before parse_title_body / store_insight (raw training transcript keeps them). - Honest truncation labels: get_sms_messages and get_location_history said "Found N ..." while listing only the first K; found_header now emits "Found N ... (showing first K):" when truncated, and the summarizer still parses the count. - Clamp days_radius in get_calendar_events and get_location_history to 1..=30, matching get_sms_messages. - persona_system_prompt helper (persona store lookup, blank-prompt -> None) for server-side persona resolution; callers land in the next commit. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-09 18:29:20 -04:00
Cameron Cordes	592dfcb42c	Accumulate streamed tool calls across chunks in Ollama streaming Ollama >=0.8 can stream tool_calls incrementally across NDJSON chunks; chat_with_tools_stream did `tool_calls = Some(tcs)` per chunk, so only the last chunk's calls survived assembly and earlier calls were silently dropped. Append into the accumulator instead. - ollama: append_streamed_tool_calls helper + tests covering two calls arriving in separate chunks and the single-chunk batch case. - llamacpp: the SSE delta assembly was already correct (per-index BTreeMap, same-index argument fragments concatenate, distinct indexes accumulate); extracted it into apply_tool_call_deltas / finalize_tool_calls and added tests pinning that behavior. - llm_client: new shared strip_think_blocks (moved from ollama's private extract_final_answer, which now delegates) so the tool-calling final content paths can reuse it; unit tests for tagged/plain/unclosed/empty cases. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-09 18:29:06 -04:00
Cameron Cordes	8e4f91561b	Add per-file insight history endpoint and rate-by-id Expose GET /insights/history?path=... returning every generated version of a photo's insight (current plus superseded), newest-first, backing the mobile per-file insight history view. - New get_insight_history_handler; reuses the existing get_insight_history DAO method (removed its dead_code allow). - impl From<PhotoInsight> for PhotoInsightResponse, collapsing the mapping that was duplicated across the single-get and all-insights handlers. - rate_insight_by_id DAO method + optional insight_id on RateInsightRequest so previously generated versions can be approved/rejected (the path-based rate only touches the current row). - DAO tests for history ordering/scoping and id-targeted rating. - cargo fmt normalized a multi-line assert in insight_chat.rs tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-07 18:28:22 -04:00
Cameron Cordes	412da2ce8e	Collapse blank lines to a single break in TTS text cleaning Chatterbox inserts a long pause — sometimes ~20s of silence — for each blank line it sees, and insight text is markdown full of paragraph breaks. clean_for_tts previously preserved paragraph structure (\n{3,} -> \n\n), so every paragraph boundary still reached the model as a double newline. Now any run of 2+ newlines, including whitespace-only blank lines, collapses to a single newline so the worst pause a break can cause is a normal line-break pause. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-04 09:12:43 -04:00
Cameron Cordes	cab867da60	Serialize /tts/speech with a single permit; 429 when busy The Chatterbox wrapper has no internal lock or cancellation, so concurrent synth requests contend on the single GPU and abandoned (timed-out) jobs cascade into stacked slowness. Gate synthesis behind a one-permit semaphore and fast-fail concurrent requests with 429 instead of queueing. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-03 14:02:56 -04:00
Cameron Cordes	d8dd260c6b	Give TTS synthesis its own (longer) request timeout Long insights are chunked + synthesized server-side and can run past the shared 180s chat/embedding client timeout, causing spurious timeouts. /tts/speech now uses a per-request timeout from LLAMA_SWAP_TTS_REQUEST_TIMEOUT_SECONDS (default 600), overriding the client default without affecting chat/embeddings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-03 10:25:06 -04:00
Cameron Cordes	ccacfe1113	Instrument TTS handlers with OTel spans (codebase standard) Each /tts handler now opens an http.tts.* span via extract_context_from_request + global_tracer().start_with_context, sets Status::Ok / Status::error on every outcome, and records useful attributes (model, format, voice_name, byte counts) — matching the insight handlers. Prometheus request metrics were already covered by the app-wide actix-web-prom middleware. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-02 23:10:43 -04:00
Cameron Cordes	62d517dcda	Normalize voice-clone reference audio to WAV via ffmpeg Chatterbox validates the reference clip by file extension and rejects formats like .aac/.opus. Always transcode the reference (upload bytes and library files alike) to mono 24 kHz WAV with ffmpeg before forwarding, so any source format is accepted and the from-library audio/video paths are unified. The reference length cap is now configurable via LLAMA_SWAP_TTS_REF_SECONDS (default 30) — Chatterbox is zero-shot, so a clean ~10-20s clip is the sweet spot. Drops the now-unused mime guesser. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-02 22:50:08 -04:00
Cameron Cordes	51be5df214	Clean insight text for TTS and pass through Chatterbox tuning knobs /tts/speech now normalizes input before synthesis: unwraps markdown links/images to visible text, drops heading/list/blockquote/emphasis markers and URLs, strips emoji (which non-turbo Chatterbox mispronounces or skips), and collapses whitespace. Centralized in clean_for_tts so the app, WebUI, and curl all get clean audio. Bracketed tags are deliberately preserved for a future Turbo (paralinguistic) switch. Adds optional exaggeration / cfg_weight / temperature to the request, clamped to Chatterbox's documented ranges and forwarded on the speech body. Unit tests cover markdown/emoji/URL stripping and tag preservation. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-02 22:15:05 -04:00
Cameron Cordes	69268d03fe	Add TTS endpoints backed by Chatterbox via llama-swap LlamaCppClient gains text_to_speech (OpenAI /audio/speech), list_voices and create_voice (voice library at the swap-root /upstream/<model>/voices passthrough), plus a tts_model slot configured via LLAMA_SWAP_TTS_MODEL (default "chatterbox"). New Claims-gated routes: - POST /tts/speech -> { audio_base64, format } for data: URI playback - GET /tts/voices -> voice library passthrough - POST /tts/voices/upload -> clone a voice from an uploaded clip (multipart) - POST /tts/voices/from-library -> clone from a library file (ffmpeg-extracts audio from video; audio forwarded as-is) Security: voice_name sanitized to [A-Za-z0-9_-] (it becomes an upstream filename), 25 MB upload cap, library refs restricted to real audio/video, path confined via is_valid_full_path. Adds is_audio_file + unit tests for the sanitizer, mime guesser, and swap-root derivation. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-02 22:04:42 -04:00
Cameron Cordes	b9b6e51af1	Stop ffprobe walking every frame in video stream probe probe_video_stream_meta requested a bare `side_data_list` section in -show_entries. On modern ffprobe that's the frame side-data section, so ffprobe enumerated every frame to collect it — reading the entire mdat. For non-faststart phone clips on the SMB mount this turned a metadata probe into a full-file read: /video/generate took 10-32s per open (0% CPU, time proportional to file size). Switch to `stream_side_data_list`, which reads the Display Matrix rotation from the stream header (moov) without touching frames. Codec, frame rate, and rotation are unchanged; the existing rotation parser already reads streams[0].side_data_list[].rotation. Fixes both the open-path probe and the transcode actor's probe. Cold opens now return near-instantly. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-30 13:19:47 -04:00
Cameron Cordes	16ae82ba70	Normalize video rel_path lookup to forward slashes on Windows generate_video built the rel_path for its image_exif lookup by stripping the library root from the absolute path, leaving backslashes on Windows (Melissa\clip.mp4). file_scan stores rel_paths forward-slash and get_exif_batch matches exactly with no normalization, so the lookup missed and the handler re-hashed the entire video file on every request. Extract rel_path_for_lookup and normalize separators with replace('\\', '/'). Adds tests for Windows/Unix separators, file-at-root, leading separator stripping, and the no-match fallback. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-30 12:51:44 -04:00
Cameron Cordes	a542ea411b	Exclude inlined image bytes from chat context budget The truncation budget estimated message size by serializing the full ChatMessage array, including the base64 image persisted in the first user message. A 1024px JPEG is hundreds of KB of base64 characters — 8-19x the entire ~24KB text budget at the default num_ctx — and the image lives in the protected prefix that's never dropped. The budget check was therefore essentially always over, dropping all tool history and firing the "trimmed context" banner on every turn for vision backends that inline images. estimate_bytes now strips image payloads before counting and charges a flat IMAGE_TOKENS_EACH per image instead, so the budget reflects real text token pressure. Adds a regression test covering a short conversation with one large image. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-30 11:51:57 -04:00

1 2 3 4 5 ...

505 Commits