ImageApi

Author	SHA1	Message	Date
Cameron Cordes	1017fe73af	Include start offset in voice-name window tag Clones that don't start at 0:00 are tagged with where the reference window begins (grandma-at1m32s-30s), so voices cloned from different sections of the same source are distinguishable in the voice list. Zero-start names keep the existing -30s form. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-12 16:21:41 -04:00
Cameron Cordes	1dec34540d	Add start/duration window selection for voice-clone reference clips Both voice creation endpoints (upload + from-library) now accept optional start_seconds/duration_seconds, threaded to ffmpeg as -ss/-t, so the reference window can target clean speech anywhere in a long recording instead of always the first N seconds. Duration is clamped to the LLAMA_SWAP_TTS_REF_SECONDS cap and the voice-name tag reflects the actual window length. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-12 16:09:03 -04:00
Cameron Cordes	2e0f78aa1b	Add user-configurable TTS pronunciation overrides A JSON map (TTS_PRONUNCIATIONS_PATH, default tts_pronunciations.json) rewrites mispronounced words — place names, initialisms, dotted abbreviations — to phonetic spellings before synthesis, applied after markdown cleanup in both /tts/speech paths. Whole-word smartcase matching (lowercase keys match any casing, uppercase keys exact), longest key wins, hot-reloaded on mtime change with last-good fallback on parse errors. See tts_pronunciations.example.json. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-11 23:06:18 -04:00
Cameron Cordes	3fa4fa8501	Strip markdown decoration from model-emitted insight titles Models wrap the title line despite the prompt — "Title: A Day in the Woods", "## Title: ...", bold around just the label — which made parse_title_body's bare "Title:" prefix match fall through to the fallbacks and leak asterisks into the stored title. strip_title_markdown trims bold/italic markers, heading hashes, backticks, and quotes from both ends; applied to the label line, the extracted title, both fallback paths, and generate_photo_title (which previously stripped only quotes). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-11 22:18:43 -04:00
Cameron Cordes	efd05db523	Make the embedding model swappable via env for A/B testing Trialing Qwen3-Embedding-0.6B (1024-dim, instruct-prefixed queries) against nomic required code changes at every hardcoded seam; now it's a config flip plus a reembed_embeddings run. - EMBEDDING_DIM env (default 768) replaces every hardcoded dim check: daily summary / calendar / search / location DAOs, Ollama batch validation, reembed_embeddings - entities gains the dim guard it never had — a wrong-dim vector silently kills dedup/recall (cosine over mismatched lengths is 0), so store None and warn instead - embed_query / embed_document split with EMBED_QUERY_PREFIX / EMBED_DOCUMENT_PREFIX (literal \n expanded): retrieval models treat the two sides differently — nomic wants search_query:/search_document:, Qwen3 wants Instruct:...\nQuery: on queries only. All query-side call sites and all corpus writers now declare their side. - document the contract in CLAUDE.md: change the model or any of these vars → re-run reembed_embeddings or search is garbage Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-11 21:40:40 -04:00
Cameron Cordes	b1493f5aca	Wait out TTS GPU hold before the insight job timeout starts The GPU lease keeps per-request reqwest budgets from burning behind a cross-model swap, but the job-level INSIGHT_GENERATION_TIMEOUT_SECS wall-clock started at spawn — an insight queued behind a running TTS synthesis parked its first chat call on the lease and timed out ("timeout after 180s") before chatterbox even finished loading. Acquire-and-drop an LLM read lease before starting the job clock in both insight handlers: the wait for the GPU happens before the timeout begins, mirroring the per-request lease semantics. Dropped immediately — holding it across the generation would deadlock the chat calls' own lease acquisitions. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-11 19:15:38 -04:00
Cameron Cordes	a022a3d15d	Fix RAG vector-space mismatch and search_rag retrieval quality Queries embedded via llama-swap were searching corpora embedded via Ollama (measured: spaces diverged). Introduce LocalLlm — the local Ollama + llama-swap pair with LLM_BACKEND dispatch baked in — and route all embedding writers through it; anything embedding via a concrete client reintroduces the bug. - search_rag: embed the model's query verbatim (no metadata boilerplate), make date optional — no time-decay when omitted, so "when did X happen?" queries rank purely by similarity across all time - reembed_embeddings bin: re-embed summaries / calendar / search / knowledge entities via the active backend, with old-new cosine report per table and truncate-and-retry for inputs over the embed server's physical batch size - import_calendar, import_search_history: embed through LocalLlm - search_messages / get_sms_messages: render sender → recipient so sent messages are attributable to a conversation - insight job failures: store the one-line anyhow context chain ({:#}) instead of the Debug dump the client was shown verbatim - serialize env_dispatch tests behind a lock (parallel-runner flake) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-11 19:06:52 -04:00
Cameron Cordes	0accc4ef2f	Add GPU lease coordinating LLM and TTS requests through llama-swap llama-swap runs chat/vision/Chatterbox as a mutually-exclusive set on one GPU and HOLDS a request for a non-resident model until the resident model drains, then swaps. That hold burned the holder's reqwest timeout (measured: a queued TTS lost 77s behind one LLM turn; an LLM request behind a synthesis waited the entire remaining synth), so concurrent insight + read-aloud timed out instead of queueing. ai::gpu adds a fair RwLock lease acquired before each request is sent, so cross-model waits happen before the HTTP timeout starts: chat/vision share the read lease, TTS synthesis and voice-library ops (which spin Chatterbox up) take the write lease, and embeddings take none (the embed slot is in llama-swap's always-resident group). Speech jobs now flip queued->running only after acquiring the GPU, letting the client anchor its poll deadline to that transition. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-11 18:20:06 -04:00
Cameron Cordes	03699f7413	Add TTS voice deletion, async speech jobs, voice-list cache, ref-seconds name tags - DELETE /tts/voices/{name}: remove a cloned voice via the llama-swap passthrough (upstream chatterbox-tts-api exposes DELETE /voices/{name}). - POST/GET/DELETE /tts/speech/jobs: durable job flow for long syntheses — dispatch returns 202 + job id, the synth queues on the GPU permit instead of fast-failing 429, and clients poll for the result (kept ~10 min). - GET /tts/voices now serves an in-memory cache so listing voices doesn't make llama-swap spin up the TTS model (evicting the resident LLM); invalidated on create/delete, ?refresh=1 forces an upstream re-query. - Created voice names are tagged with LLAMA_SWAP_TTS_REF_SECONDS (e.g. grandma-30s) so the library shows which ref length produced each clone. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-10 17:36:15 -04:00
Cameron Cordes	31904fef80	Raise chat truncation default num_ctx to 32k, env-overridable The history-truncation budget assumed an 8192-token context whenever a chat request omitted num_ctx, while the llama-swap chat slots serve 20k-131k. Replayed transcripts past ~6k tokens were silently gutted every turn — losing conversation history and destroying llama.cpp KV-cache prefix reuse (full SWA re-prefill per turn). Default is now 32768 (real conversations top out around 16k), with AGENTIC_CHAT_DEFAULT_NUM_CTX to override per deploy, floored at headroom + 1024. Explicit per-request num_ctx still wins. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-09 19:14:02 -04:00
Cameron Cordes	13f3635db2	Fix clippy lints in backfill and libraries tests Keep `cargo clippy --tests` clean alongside the agentic-loop changes: alias backfill's five-element setup() tuple as SetupFixture (type_complexity) and build the single-library health map via std::slice::from_ref instead of cloning (unnecessary clone-to-slice). No behavior change. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-09 18:29:44 -04:00
Cameron Cordes	b711252c23	Resolve persona prompts server-side; drop synthetic prompt in chat_turn A request carrying persona_id but no system_prompt used to fall back to the neutral default voice. Both agentic generation (generate_agentic_insight_handler) and chat bootstrap now resolve the persona's stored prompt from the persona store, with precedence: explicit non-blank client system_prompt > persona store lookup > existing default ("default" persona id behaves the same — used if the store has a row, neutral default otherwise). Resolution happens at the handler / bootstrap entry where the DAO is reachable; internals are unchanged. resolve_bootstrap_system_prompt takes the resolved persona prompt as a second argument, with precedence tests. Also in insight_chat: - Sync chat_turn no longer persists the synthetic "Please write your final answer now without calling any more tools." user message pushed on iteration exhaustion — extracted both streaming variants' synthetic_idx pattern into push/remove_synthetic_final_prompt (the remove is a defensive no-op on index drift) and applied it to all three loops; round-trip test included. - Strip leaked <think> blocks from the final content persisted as the reply in chat_turn and both streaming AgenticLoopOutcomes (mid-stream TextDeltas are untouched; the raw transcript keeps the block). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-09 18:29:35 -04:00
Cameron Cordes	091982bdfc	Add recall_facts_for_entity tool; fix generation gates and tool output Agentic-loop fixes in the generator: - New recall_facts_for_entity tool (always-on, like recall_entities): fetches facts for one entity by id so the model can follow up on entities surfaced by recall_entities that aren't photo-linked (recall_facts_for_photo only covers linked entities). Mirrors that tool's persona scoping (PersonaFilter::Single) and the persona's reviewed_only_facts filter exactly, and renders in the same "Entity: ... / - predicate object" style. Wired through execute_tool and the trajectory summarizer. - Generation now resolves gates persona-aware: current_gate_opts_for_persona(images_inline, Some((user_id, persona_id))) instead of the None-defaulting wrapper, so a persona's allow_agent_corrections opens propose_correction during generation the same way chat turns already did. The now-unused current_gate_opts wrapper is removed. - Strip leaked <think> blocks from the final assistant content before parse_title_body / store_insight (raw training transcript keeps them). - Honest truncation labels: get_sms_messages and get_location_history said "Found N ..." while listing only the first K; found_header now emits "Found N ... (showing first K):" when truncated, and the summarizer still parses the count. - Clamp days_radius in get_calendar_events and get_location_history to 1..=30, matching get_sms_messages. - persona_system_prompt helper (persona store lookup, blank-prompt -> None) for server-side persona resolution; callers land in the next commit. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-09 18:29:20 -04:00
Cameron Cordes	592dfcb42c	Accumulate streamed tool calls across chunks in Ollama streaming Ollama >=0.8 can stream tool_calls incrementally across NDJSON chunks; chat_with_tools_stream did `tool_calls = Some(tcs)` per chunk, so only the last chunk's calls survived assembly and earlier calls were silently dropped. Append into the accumulator instead. - ollama: append_streamed_tool_calls helper + tests covering two calls arriving in separate chunks and the single-chunk batch case. - llamacpp: the SSE delta assembly was already correct (per-index BTreeMap, same-index argument fragments concatenate, distinct indexes accumulate); extracted it into apply_tool_call_deltas / finalize_tool_calls and added tests pinning that behavior. - llm_client: new shared strip_think_blocks (moved from ollama's private extract_final_answer, which now delegates) so the tool-calling final content paths can reuse it; unit tests for tagged/plain/unclosed/empty cases. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-09 18:29:06 -04:00
Cameron Cordes	8e4f91561b	Add per-file insight history endpoint and rate-by-id Expose GET /insights/history?path=... returning every generated version of a photo's insight (current plus superseded), newest-first, backing the mobile per-file insight history view. - New get_insight_history_handler; reuses the existing get_insight_history DAO method (removed its dead_code allow). - impl From<PhotoInsight> for PhotoInsightResponse, collapsing the mapping that was duplicated across the single-get and all-insights handlers. - rate_insight_by_id DAO method + optional insight_id on RateInsightRequest so previously generated versions can be approved/rejected (the path-based rate only touches the current row). - DAO tests for history ordering/scoping and id-targeted rating. - cargo fmt normalized a multi-line assert in insight_chat.rs tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-07 18:28:22 -04:00
Cameron Cordes	412da2ce8e	Collapse blank lines to a single break in TTS text cleaning Chatterbox inserts a long pause — sometimes ~20s of silence — for each blank line it sees, and insight text is markdown full of paragraph breaks. clean_for_tts previously preserved paragraph structure (\n{3,} -> \n\n), so every paragraph boundary still reached the model as a double newline. Now any run of 2+ newlines, including whitespace-only blank lines, collapses to a single newline so the worst pause a break can cause is a normal line-break pause. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-04 09:12:43 -04:00
Cameron Cordes	cab867da60	Serialize /tts/speech with a single permit; 429 when busy The Chatterbox wrapper has no internal lock or cancellation, so concurrent synth requests contend on the single GPU and abandoned (timed-out) jobs cascade into stacked slowness. Gate synthesis behind a one-permit semaphore and fast-fail concurrent requests with 429 instead of queueing. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-03 14:02:56 -04:00
Cameron Cordes	d8dd260c6b	Give TTS synthesis its own (longer) request timeout Long insights are chunked + synthesized server-side and can run past the shared 180s chat/embedding client timeout, causing spurious timeouts. /tts/speech now uses a per-request timeout from LLAMA_SWAP_TTS_REQUEST_TIMEOUT_SECONDS (default 600), overriding the client default without affecting chat/embeddings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-03 10:25:06 -04:00
Cameron Cordes	ccacfe1113	Instrument TTS handlers with OTel spans (codebase standard) Each /tts handler now opens an http.tts.* span via extract_context_from_request + global_tracer().start_with_context, sets Status::Ok / Status::error on every outcome, and records useful attributes (model, format, voice_name, byte counts) — matching the insight handlers. Prometheus request metrics were already covered by the app-wide actix-web-prom middleware. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-02 23:10:43 -04:00
Cameron Cordes	62d517dcda	Normalize voice-clone reference audio to WAV via ffmpeg Chatterbox validates the reference clip by file extension and rejects formats like .aac/.opus. Always transcode the reference (upload bytes and library files alike) to mono 24 kHz WAV with ffmpeg before forwarding, so any source format is accepted and the from-library audio/video paths are unified. The reference length cap is now configurable via LLAMA_SWAP_TTS_REF_SECONDS (default 30) — Chatterbox is zero-shot, so a clean ~10-20s clip is the sweet spot. Drops the now-unused mime guesser. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-02 22:50:08 -04:00
Cameron Cordes	51be5df214	Clean insight text for TTS and pass through Chatterbox tuning knobs /tts/speech now normalizes input before synthesis: unwraps markdown links/images to visible text, drops heading/list/blockquote/emphasis markers and URLs, strips emoji (which non-turbo Chatterbox mispronounces or skips), and collapses whitespace. Centralized in clean_for_tts so the app, WebUI, and curl all get clean audio. Bracketed tags are deliberately preserved for a future Turbo (paralinguistic) switch. Adds optional exaggeration / cfg_weight / temperature to the request, clamped to Chatterbox's documented ranges and forwarded on the speech body. Unit tests cover markdown/emoji/URL stripping and tag preservation. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-02 22:15:05 -04:00
Cameron Cordes	69268d03fe	Add TTS endpoints backed by Chatterbox via llama-swap LlamaCppClient gains text_to_speech (OpenAI /audio/speech), list_voices and create_voice (voice library at the swap-root /upstream/<model>/voices passthrough), plus a tts_model slot configured via LLAMA_SWAP_TTS_MODEL (default "chatterbox"). New Claims-gated routes: - POST /tts/speech -> { audio_base64, format } for data: URI playback - GET /tts/voices -> voice library passthrough - POST /tts/voices/upload -> clone a voice from an uploaded clip (multipart) - POST /tts/voices/from-library -> clone from a library file (ffmpeg-extracts audio from video; audio forwarded as-is) Security: voice_name sanitized to [A-Za-z0-9_-] (it becomes an upstream filename), 25 MB upload cap, library refs restricted to real audio/video, path confined via is_valid_full_path. Adds is_audio_file + unit tests for the sanitizer, mime guesser, and swap-root derivation. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-02 22:04:42 -04:00
Cameron Cordes	b9b6e51af1	Stop ffprobe walking every frame in video stream probe probe_video_stream_meta requested a bare `side_data_list` section in -show_entries. On modern ffprobe that's the frame side-data section, so ffprobe enumerated every frame to collect it — reading the entire mdat. For non-faststart phone clips on the SMB mount this turned a metadata probe into a full-file read: /video/generate took 10-32s per open (0% CPU, time proportional to file size). Switch to `stream_side_data_list`, which reads the Display Matrix rotation from the stream header (moov) without touching frames. Codec, frame rate, and rotation are unchanged; the existing rotation parser already reads streams[0].side_data_list[].rotation. Fixes both the open-path probe and the transcode actor's probe. Cold opens now return near-instantly. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-30 13:19:47 -04:00
Cameron Cordes	16ae82ba70	Normalize video rel_path lookup to forward slashes on Windows generate_video built the rel_path for its image_exif lookup by stripping the library root from the absolute path, leaving backslashes on Windows (Melissa\clip.mp4). file_scan stores rel_paths forward-slash and get_exif_batch matches exactly with no normalization, so the lookup missed and the handler re-hashed the entire video file on every request. Extract rel_path_for_lookup and normalize separators with replace('\\', '/'). Adds tests for Windows/Unix separators, file-at-root, leading separator stripping, and the no-match fallback. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-30 12:51:44 -04:00
Cameron Cordes	a542ea411b	Exclude inlined image bytes from chat context budget The truncation budget estimated message size by serializing the full ChatMessage array, including the base64 image persisted in the first user message. A 1024px JPEG is hundreds of KB of base64 characters — 8-19x the entire ~24KB text budget at the default num_ctx — and the image lives in the protected prefix that's never dropped. The budget check was therefore essentially always over, dropping all tool history and firing the "trimmed context" banner on every turn for vision backends that inline images. estimate_bytes now strips image payloads before counting and charges a flat IMAGE_TOKENS_EACH per image instead, so the budget reflects real text token pressure. Adds a regression test covering a short conversation with one large image. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-30 11:51:57 -04:00
Cameron Cordes	962f7bf05c	Add reconnectable async chat-turn flow with in-memory TurnRegistry Replace the one-shot SSE chat stream with an async dispatch + reconnectable replay flow so the mobile client survives backgrounding, network blips, and OS-killed sockets without losing an in-flight agentic turn. - TurnRegistry/TurnEntry: in-memory per-turn event buffer (cap 500, front eviction) shared by the agentic loop (writer) and SSE replay readers. ReplayOutcome + replay_from/next_batch distinguish Events/CaughtUp/Gone; next_batch registers the Notify before reading state (no lost wakeup) and drains every buffered event before signaling terminal, so the final Done/Error is never dropped and the stream closes cleanly. - Endpoints: POST /insights/chat/turn (202 + turn_id), GET /insights/chat/turn/{id} (SSE replay, ?skip_before= resume, per-event seq, 410 on eviction), DELETE /insights/chat/turn/{id} (real task abort + cooperative is_running() check at each loop boundary). - Cancellation actually stops the task (AbortHandle stored on the entry) and emits a Done{cancelled:true}; callers skip persistence on cancel. - Background sweeper drops stale turns; interval clamped to <=300s. - OpenTelemetry spans: ai.chat.turn.execute/replay/cancel. - Legacy POST /insights/chat/stream path preserved unchanged. Tests: registry coverage for terminal delivery (race guard), waiting, Gone, abort, eviction; handler integration tests for 404/410, skip_before, seq stamping, completed replay, and cancel. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-29 19:50:25 -04:00
Cameron Cordes	cdd981fe64	fix: inline DB error source into DbError struct The previous fix logged the underlying error in a separate log line, but the error that propagated up still showed just "DbError { kind: InsertError }" at the call site. Now the source message is captured on the struct itself, so Debug/Display output at any call site shows the actual Diesel error inline. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-27 22:30:19 -04:00
Cameron Cordes	dad0220587	fix: stop swallowing DB errors across the entire DAO layer Every map_err(\|_\| DbError::new(...)) and map_err(\|_\| anyhow!("...")) in the database layer was discarding the actual Diesel/SQLite error, making failures impossible to diagnose from logs. - Add DbError::log() that logs the source error before converting - Replace all ~130 swallowed outer map_err closures with DbError::log - Replace all ~47 swallowed inner anyhow closures to include the source error in the message Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-27 13:56:48 -04:00
Cameron Cordes	39ad83f55b	fix: surface actual Diesel error in store_insight instead of generic InsertError The previous map_err closures discarded the Diesel error, making failures like missing columns impossible to diagnose from logs. Now the underlying error is logged before converting to DbError. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-27 13:53:54 -04:00
Cameron Cordes	9654d256f4	fix: persist token counts and fix agentic insight_id mapping - Add prompt_eval_count and eval_count columns to photo_insights so token usage from llama-swap/Ollama is stored and returned by the API - Fix agentic generator return: was (prompt_eval_count, eval_count), handler destructured first element as insight_id — now returns (insight_id, prompt_eval_count, eval_count) - Wire prompt_eval_count/eval_count from DB into PhotoInsightResponse instead of hardcoded None Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-27 13:47:57 -04:00
Cameron Cordes	449ce1fda1	chore: resolve all clippy warnings and formatting - Replace impl ToString with impl Display for InsightJobStatus and InsightGenerationType - Rename from_str → parse to avoid confusion with std::str::FromStr - Collapse nested if statements (handlers, insight_chat, insight_generator, image handlers) - Use is_multiple_of() instead of manual modulo checks - Suppress deprecated diesel::dsl::count_distinct (no drop-in replacement available in current Diesel version) - Scope MutexGuard in synthesize_merge to drop before await - Allow dead_code on generate_no_think, enumerate_indexable_files, total_deleted (intended for future use) - Allow type_complexity on Diesel query result tuples Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-27 13:13:48 -04:00
Cameron Cordes	a410683edf	fix: fail fast when LLM_BACKEND=llamacpp but LlamaCppClient is unconfigured Previously embed_one() silently fell back to Ollama embeddings, which would load nomic-embed-text into VRAM alongside llama-swap — wasting memory on an unintended model. Now returns an error with an actionable message instead. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-27 13:02:42 -04:00
Cameron Cordes	2818936739	fix: audit fixes for async insight jobs + persist generation params - Fix query param mismatch: rename GenerationStatusQuery.file_path to path so the client's app-resume buildQuery({ path: ... }) resolves correctly instead of always getting 400 - Remove dead _lib_id bindings from both generate handlers - Return 202 Accepted instead of 200 from generate endpoints - Restore OpenTelemetry span instrumentation on generate handlers - Remove stale UNIQUE constraint from initial migration (incompatible with plain-INSERT DAO) - Add tests for status guard: complete_job/fail_job are no-ops when job is already cancelled, and cancel_job by id - Persist generation params (num_ctx, temperature, top_p, top_k, min_p, system_prompt, persona_id) on the photo_insights table for auditing Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-27 13:02:15 -04:00
Cameron Cordes	b87eb4e690	feat: async insight generation with SQLite job tracking - Add insight_generation_jobs table migration and DAO - Implement job lifecycle: create_or_get_active, complete, fail, cancel - Refactor POST /insights/generate and /agentic to async spawn with timeout - Add GET /insights/generation/status endpoint with job_id and file_path lookup - Use String for enum fields in Diesel models to avoid private Bound type - Add from_str() helpers on InsightJobStatus and InsightGenerationType - Fix update_training_messages to return Result<usize, DbError> - 7/7 DAO unit tests passing	2026-05-27 10:02:18 -04:00
Cameron Cordes	b03ee60342	fix: prevent hybrid mode from leaking OpenRouter model to local llamacpp client When backend=hybrid with LLM_BACKEND=llamacpp, the user-selected model (an OpenRouter id like "google/gemini-3-flash-preview") was being applied to the local LlamaCppClient's primary_model and vision_model. This caused describe_image to send the OpenRouter model name to llama-swap, which returned 400 because it has no such slot. Guard the local-client model override with !is_hybrid so it only applies in local-only mode (where the user is selecting a different local model). Bump to v1.2.0. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-26 09:55:16 -04:00
Cameron Cordes	0a627f4880	Add contact name filter to SMS search tool + misc improvements - sms search tool: accept contact name, trim/validate, skip when contact_id is set, pass to API client - sms_client: new contact field in SmsSearchParams, URL-encode on wire - Tool description clarifies contact_id takes precedence when both given - Add parse_title_body helper for LLM response parsing - llamacpp backend improvements	2026-05-25 21:46:18 -04:00
cameron	b9175e2718	image: add xlarge (4096px) on-demand preview tier New `PhotoSize::XLarge` variant sits between `Large` (2048px) and `Full` (original). On-demand generated and disk-cached at `_xlarge/<hash>.jpg`, same waterfall as `Large` (embedded RAW preview → ffmpeg → image crate). Sources below 4096px serve at native size. Reduces decoded bitmap memory from ~192MB (48MP full) to ~64MB for the mobile viewer's zoom tier. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 15:33:03 -04:00
Cameron Cordes	9dba659d1e	test: add llamacpp model-slot consistency and content-null tests Cover the properties that prevent mid-turn model swaps in llama-swap exclusive mode: vision_model defaults to primary, cloned local client mirrors the user-selected model, embeddings stay on their own slot. Also test the content:null serialization for tool-calling messages. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-24 19:29:51 -04:00
Cameron Cordes	208344ad98	ai: mirror chat model on local client to prevent mid-turn model swap When the user selects a model from the picker, the local client's primary_model and vision_model now match the chat model. Prevents llama-swap exclusive mode from swapping models when describe_photo or rerank fires during an agentic turn. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-24 19:27:29 -04:00
Cameron Cordes	a8a661f70a	ai: extract ResolvedBackend, remove ~480 lines of duplicated dispatch Replace 5 copies of the ~80-line backend resolution pattern with a single InsightGenerator::resolve_backend() builder that returns a ResolvedBackend (chat + local clients, BackendKind enum, images_inline flag). Tool dispatch now takes &ResolvedBackend instead of &OllamaClient + model + backend strings. Remove duplicated ollama/openrouter/llamacpp fields from InsightChatService — InsightGenerator owns them and resolve_backend uses them. Delete build_chat_clients (replaced by resolve_backend). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-24 15:00:50 -04:00
Cameron Cordes	0631820fbf	ai: send images directly to llamacpp chat models + add ResolvedBackend llamacpp models now receive images via OpenAI content-parts instead of the describe-then-inline strategy (hybrid mode unchanged). Fixes assistant messages with tool_calls emitting content: null instead of "" to satisfy strict Jinja template role-alternation checks. Adds debug logging of message role sequences on llamacpp requests. Introduces BackendKind enum, SamplingOverrides, and ResolvedBackend in a new backend.rs module. InsightGenerator::resolve_backend centralises client construction + vision capability detection — next step wires the existing inline dispatch through it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-24 14:00:37 -04:00
Cameron Cordes	be51421b38	ai: collapse llamacpp into LLM_BACKEND env switch Reverts the per-request backend="llamacpp" value. Chat/vision/embedding backend is now a deploy-time decision (LLM_BACKEND=ollama\|llamacpp), applied globally across chat, vision describe, and embeddings — so embedding vectors stay in one space across the index. - Per-request backend whitelist back to "local"\|"hybrid". A request arriving with backend="llamacpp" is rejected. - LLM_BACKEND=llamacpp swaps the entire local stack to llama-swap: chat hits the chat slot, describe hits the vision slot, embeddings hit the embed slot. Hybrid mode still routes chat to OpenRouter but uses LLM_BACKEND for the describe pass. - Drops env vars HYBRID_VISION_BACKEND, LLAMA_SWAP_VISION_MODELS, EMBEDDING_BACKEND (the last never shipped). Drops the LlamaCppClient.vision_models allowlist — capability inference now reports has_vision only for the configured vision_model slot. - Drops the /insights/llamacpp/models handler. /insights/models is the single endpoint; returns Ollama servers under LLM_BACKEND=ollama and llama-swap slots (from LLAMA_SWAP_ALLOWED_MODELS) under LLM_BACKEND=llamacpp. Same envelope shape either way. - New ai::embed_one helper routes embeddings through llama-swap when LLM_BACKEND=llamacpp (else Ollama). Wires it into the four insight_generator embedding sites. - Cross-replay matrix simplifies to pre-llamacpp shape (local↔local, hybrid↔hybrid, hybrid→local allowed; local→hybrid rejected).	2026-05-21 11:36:58 -04:00
Cameron Cordes	f0927f5355	ai: add llamacpp backend (llama-swap) as third LLM client Wires a new LlamaCppClient (OpenAI-compatible /v1 wire format) alongside OllamaClient and OpenRouterClient. Per-slot routing for chat/vision/embed via env (LLAMA_SWAP_URL + *_MODEL vars); capability inference uses an env allowlist since /v1/models doesn't report modality. InsightGenerator + InsightChatService gain three-way dispatch on chat_backend = "local" \| "hybrid" \| "llamacpp". Hybrid and llamacpp share the describe-then-inline path (text-only chat after a separate vision describe). HYBRID_VISION_BACKEND=llamacpp lets hybrid route its describe pass through llama-swap's vision slot while chat still goes to OpenRouter. Cross-replay matrix added (validate_cross_replay): local<->llamacpp and hybrid<->llamacpp allowed; local->hybrid and llamacpp->hybrid rejected. New /insights/llamacpp/models handler mirrors the OpenRouter shape.	2026-05-20 17:52:33 -04:00
Cameron Cordes	19798184f0	image: add on-demand size=large preview tier (~2048px JPEG q85) Adds a third PhotoSize between Thumb (200px) and Full (original). The viewer placeholder and map callout previously upscaled a 200px thumb into a full-screen / full-width view, which looked visibly blocky on 3× devices. The new tier is generated on-demand, disk-cached, and served via the existing /image endpoint. Storage layout mirrors the Thumb branch's lookup chain: 1. hash-keyed: <thumbs>/_large/<hash[..2]>/<hash>.jpg (shared across libraries when content_hash is known) 2. library-scoped legacy: <thumbs>/_large/<lib_id>/<rel_path> Generation pipeline mirrors generate_image_thumbnail: - RAW: decode the embedded JPEG preview, apply EXIF orientation, resize to 2048-long-edge, encode JPEG q85 - HEIC/HEIF: ffmpeg with scale + q:v 5 (≈ q85) - everything else: image crate decode + thumbnail() + JpegEncoder Never upscales — sources below the 2048 cap re-encode at native size. Handler offloads decode/resize to web::block to keep the actix worker free (a 24MP source takes 100–500ms). Writes via tempfile+rename so concurrent readers can't observe a half-written JPEG. On any generation failure, falls through to the Full branch (which itself serves the RAW embedded preview for unrenderable RAW containers). Video requests for size=large fall back to the existing thumb pipeline since there's no useful 2048px video tier. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 13:14:49 -04:00
Cameron Cordes	b843a4a366	file_types: filter macOS AppleDouble + .DS_Store from media predicates Symptom: Apollo's logs showed bursts of 422 decode_failed from ImageApi's CLIP backfill — e.g. `._DSC_2182-S.jpg`. macOS writes `._<name>` AppleDouble sidecars when copying to non-HFS volumes (SMB, FAT, exFAT), and they carry the original file's extension even though their bytes are extended-attribute metadata, not the image. ImageApi's walker matched them via the extension predicate, sent them through the ingest pipeline, and accumulated failed rows in face_detections + clip_embedding while pinning Apollo's eviction timer with the 422 burst. Fix: predicate-level guard in is_image_file / is_video_file (and by inheritance is_media_file). Every walker that already gates on these (face_watch, backfill, clip_watch, watcher, files, probe_clip_search) inherits the skip without per-callsite edits. Narrow scope on purpose — `._*` prefix + the exact `.DS_Store` basename — rather than blanket dotfile filtering, because a user could plausibly name a cover image `.cover.jpg`. Existing rows are not cleaned by this change. To purge what already accumulated (one-shot, run from your DB shell after deploying): DELETE FROM image_exif WHERE file_path LIKE '%/._%' OR file_path LIKE '%/.DS_Store'; DELETE FROM face_detections WHERE rel_path LIKE '%/._%' OR rel_path LIKE '%/.DS_Store'; DELETE FROM tagged_photo WHERE file_path LIKE '%/._%' OR file_path LIKE '%/.DS_Store'; DELETE FROM favorites WHERE path LIKE '%/._%' OR path LIKE '%/.DS_Store'; The maintenance pipeline's missing-file scan would NOT catch these on its own — the files exist on disk (they're real macOS metadata, just not images), so stat() returns Ok and the row sticks.	2026-05-17 20:10:16 -04:00
Cameron	acdffc1558	cargo fmt: drop trailing blank line in actors.rs Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:14:30 -04:00
Cameron	bd61e10158	chore: add .gitattributes + unit tests for ffprobe rational parser LF normalization across OSes; *.sql pinned to LF for stable diffs. Tests cover the rational frame-rate parser (NTSC 29.97, integer fps, slow-mo 240, ffprobe's 0/0 unknown sentinel, malformed and out-of-range inputs). Extracted the closure into a free fn for the test seam. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:13:06 -04:00
Cameron	1b70a6f0b4	video: probe frame rate via ffprobe and return on /video/generate Adds frame_rate to GenerateVideoResponse so the mobile scrubber can step at the source's real fps instead of a hardcoded 30. probe_video_stream_meta gains a frame_rate field (avg_frame_rate preferred, r_frame_rate fallback, nonsense values rejected) and is now pub so the handler can reuse it. Cost is one ffprobe per /video/generate call; degrades silently to None on probe failure. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:03:21 -04:00
Cameron Cordes	87093a63d7	clip-search: accept library_ids (multi-select whitelist) on /photos/search Previously the endpoint only accepted `library=<id>` (single id) — multi- select scopes had to be filtered upstream by Apollo, which kept the filter logic out of FileViewer-React's reach (it calls ImageApi directly and got no scoping for 2+ active libraries). Adds `library_ids` (comma-separated id list, e.g. `?library_ids=1,3`). Parsed inside the existing scope decision: `library_ids` wins when both are supplied; either / both empty falls back to "every enabled library" (historical default). Malformed entries return 400. Dedupes ids while preserving order so a stray `library_ids=1,1,3` doesn't double-pass to the DAO. The single-id path still works unchanged for older clients. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 09:30:46 -04:00
Cameron Cordes	922f7df8d3	clip-search: offset-based pagination on /photos/search Adds `offset` query param (default 0) and `total_matching` + `offset` response fields. Backend already computes the full sorted list of above-threshold matches per query; pagination just slices it at [offset, offset+limit) instead of always returning the top window. Offsets past the end return an empty page cleanly so the client can stop fetching naturally. Re-scores on every page rather than caching the sorted list — at personal-library scale (~14k embeddings, 768d) the dot-product loop is sub-100ms and the lack of state means no eviction / staleness concerns. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 16:56:10 -04:00

1 2 3 4 5 ...

480 Commits