ImageApi

Author	SHA1	Message	Date
Cameron Cordes	31904fef80	Raise chat truncation default num_ctx to 32k, env-overridable The history-truncation budget assumed an 8192-token context whenever a chat request omitted num_ctx, while the llama-swap chat slots serve 20k-131k. Replayed transcripts past ~6k tokens were silently gutted every turn — losing conversation history and destroying llama.cpp KV-cache prefix reuse (full SWA re-prefill per turn). Default is now 32768 (real conversations top out around 16k), with AGENTIC_CHAT_DEFAULT_NUM_CTX to override per deploy, floored at headroom + 1024. Explicit per-request num_ctx still wins. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-09 19:14:02 -04:00
Cameron Cordes	b711252c23	Resolve persona prompts server-side; drop synthetic prompt in chat_turn A request carrying persona_id but no system_prompt used to fall back to the neutral default voice. Both agentic generation (generate_agentic_insight_handler) and chat bootstrap now resolve the persona's stored prompt from the persona store, with precedence: explicit non-blank client system_prompt > persona store lookup > existing default ("default" persona id behaves the same — used if the store has a row, neutral default otherwise). Resolution happens at the handler / bootstrap entry where the DAO is reachable; internals are unchanged. resolve_bootstrap_system_prompt takes the resolved persona prompt as a second argument, with precedence tests. Also in insight_chat: - Sync chat_turn no longer persists the synthetic "Please write your final answer now without calling any more tools." user message pushed on iteration exhaustion — extracted both streaming variants' synthetic_idx pattern into push/remove_synthetic_final_prompt (the remove is a defensive no-op on index drift) and applied it to all three loops; round-trip test included. - Strip leaked <think> blocks from the final content persisted as the reply in chat_turn and both streaming AgenticLoopOutcomes (mid-stream TextDeltas are untouched; the raw transcript keeps the block). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-09 18:29:35 -04:00
Cameron Cordes	091982bdfc	Add recall_facts_for_entity tool; fix generation gates and tool output Agentic-loop fixes in the generator: - New recall_facts_for_entity tool (always-on, like recall_entities): fetches facts for one entity by id so the model can follow up on entities surfaced by recall_entities that aren't photo-linked (recall_facts_for_photo only covers linked entities). Mirrors that tool's persona scoping (PersonaFilter::Single) and the persona's reviewed_only_facts filter exactly, and renders in the same "Entity: ... / - predicate object" style. Wired through execute_tool and the trajectory summarizer. - Generation now resolves gates persona-aware: current_gate_opts_for_persona(images_inline, Some((user_id, persona_id))) instead of the None-defaulting wrapper, so a persona's allow_agent_corrections opens propose_correction during generation the same way chat turns already did. The now-unused current_gate_opts wrapper is removed. - Strip leaked <think> blocks from the final assistant content before parse_title_body / store_insight (raw training transcript keeps them). - Honest truncation labels: get_sms_messages and get_location_history said "Found N ..." while listing only the first K; found_header now emits "Found N ... (showing first K):" when truncated, and the summarizer still parses the count. - Clamp days_radius in get_calendar_events and get_location_history to 1..=30, matching get_sms_messages. - persona_system_prompt helper (persona store lookup, blank-prompt -> None) for server-side persona resolution; callers land in the next commit. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-09 18:29:20 -04:00
Cameron Cordes	592dfcb42c	Accumulate streamed tool calls across chunks in Ollama streaming Ollama >=0.8 can stream tool_calls incrementally across NDJSON chunks; chat_with_tools_stream did `tool_calls = Some(tcs)` per chunk, so only the last chunk's calls survived assembly and earlier calls were silently dropped. Append into the accumulator instead. - ollama: append_streamed_tool_calls helper + tests covering two calls arriving in separate chunks and the single-chunk batch case. - llamacpp: the SSE delta assembly was already correct (per-index BTreeMap, same-index argument fragments concatenate, distinct indexes accumulate); extracted it into apply_tool_call_deltas / finalize_tool_calls and added tests pinning that behavior. - llm_client: new shared strip_think_blocks (moved from ollama's private extract_final_answer, which now delegates) so the tool-calling final content paths can reuse it; unit tests for tagged/plain/unclosed/empty cases. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-09 18:29:06 -04:00
Cameron Cordes	8e4f91561b	Add per-file insight history endpoint and rate-by-id Expose GET /insights/history?path=... returning every generated version of a photo's insight (current plus superseded), newest-first, backing the mobile per-file insight history view. - New get_insight_history_handler; reuses the existing get_insight_history DAO method (removed its dead_code allow). - impl From<PhotoInsight> for PhotoInsightResponse, collapsing the mapping that was duplicated across the single-get and all-insights handlers. - rate_insight_by_id DAO method + optional insight_id on RateInsightRequest so previously generated versions can be approved/rejected (the path-based rate only touches the current row). - DAO tests for history ordering/scoping and id-targeted rating. - cargo fmt normalized a multi-line assert in insight_chat.rs tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-07 18:28:22 -04:00
Cameron Cordes	412da2ce8e	Collapse blank lines to a single break in TTS text cleaning Chatterbox inserts a long pause — sometimes ~20s of silence — for each blank line it sees, and insight text is markdown full of paragraph breaks. clean_for_tts previously preserved paragraph structure (\n{3,} -> \n\n), so every paragraph boundary still reached the model as a double newline. Now any run of 2+ newlines, including whitespace-only blank lines, collapses to a single newline so the worst pause a break can cause is a normal line-break pause. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-04 09:12:43 -04:00
Cameron Cordes	cab867da60	Serialize /tts/speech with a single permit; 429 when busy The Chatterbox wrapper has no internal lock or cancellation, so concurrent synth requests contend on the single GPU and abandoned (timed-out) jobs cascade into stacked slowness. Gate synthesis behind a one-permit semaphore and fast-fail concurrent requests with 429 instead of queueing. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-03 14:02:56 -04:00
Cameron Cordes	d8dd260c6b	Give TTS synthesis its own (longer) request timeout Long insights are chunked + synthesized server-side and can run past the shared 180s chat/embedding client timeout, causing spurious timeouts. /tts/speech now uses a per-request timeout from LLAMA_SWAP_TTS_REQUEST_TIMEOUT_SECONDS (default 600), overriding the client default without affecting chat/embeddings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-03 10:25:06 -04:00
Cameron Cordes	ccacfe1113	Instrument TTS handlers with OTel spans (codebase standard) Each /tts handler now opens an http.tts.* span via extract_context_from_request + global_tracer().start_with_context, sets Status::Ok / Status::error on every outcome, and records useful attributes (model, format, voice_name, byte counts) — matching the insight handlers. Prometheus request metrics were already covered by the app-wide actix-web-prom middleware. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-02 23:10:43 -04:00
Cameron Cordes	62d517dcda	Normalize voice-clone reference audio to WAV via ffmpeg Chatterbox validates the reference clip by file extension and rejects formats like .aac/.opus. Always transcode the reference (upload bytes and library files alike) to mono 24 kHz WAV with ffmpeg before forwarding, so any source format is accepted and the from-library audio/video paths are unified. The reference length cap is now configurable via LLAMA_SWAP_TTS_REF_SECONDS (default 30) — Chatterbox is zero-shot, so a clean ~10-20s clip is the sweet spot. Drops the now-unused mime guesser. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-02 22:50:08 -04:00
Cameron Cordes	51be5df214	Clean insight text for TTS and pass through Chatterbox tuning knobs /tts/speech now normalizes input before synthesis: unwraps markdown links/images to visible text, drops heading/list/blockquote/emphasis markers and URLs, strips emoji (which non-turbo Chatterbox mispronounces or skips), and collapses whitespace. Centralized in clean_for_tts so the app, WebUI, and curl all get clean audio. Bracketed tags are deliberately preserved for a future Turbo (paralinguistic) switch. Adds optional exaggeration / cfg_weight / temperature to the request, clamped to Chatterbox's documented ranges and forwarded on the speech body. Unit tests cover markdown/emoji/URL stripping and tag preservation. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-02 22:15:05 -04:00
Cameron Cordes	69268d03fe	Add TTS endpoints backed by Chatterbox via llama-swap LlamaCppClient gains text_to_speech (OpenAI /audio/speech), list_voices and create_voice (voice library at the swap-root /upstream/<model>/voices passthrough), plus a tts_model slot configured via LLAMA_SWAP_TTS_MODEL (default "chatterbox"). New Claims-gated routes: - POST /tts/speech -> { audio_base64, format } for data: URI playback - GET /tts/voices -> voice library passthrough - POST /tts/voices/upload -> clone a voice from an uploaded clip (multipart) - POST /tts/voices/from-library -> clone from a library file (ffmpeg-extracts audio from video; audio forwarded as-is) Security: voice_name sanitized to [A-Za-z0-9_-] (it becomes an upstream filename), 25 MB upload cap, library refs restricted to real audio/video, path confined via is_valid_full_path. Adds is_audio_file + unit tests for the sanitizer, mime guesser, and swap-root derivation. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-02 22:04:42 -04:00
Cameron Cordes	a542ea411b	Exclude inlined image bytes from chat context budget The truncation budget estimated message size by serializing the full ChatMessage array, including the base64 image persisted in the first user message. A 1024px JPEG is hundreds of KB of base64 characters — 8-19x the entire ~24KB text budget at the default num_ctx — and the image lives in the protected prefix that's never dropped. The budget check was therefore essentially always over, dropping all tool history and firing the "trimmed context" banner on every turn for vision backends that inline images. estimate_bytes now strips image payloads before counting and charges a flat IMAGE_TOKENS_EACH per image instead, so the budget reflects real text token pressure. Adds a regression test covering a short conversation with one large image. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-30 11:51:57 -04:00
Cameron Cordes	962f7bf05c	Add reconnectable async chat-turn flow with in-memory TurnRegistry Replace the one-shot SSE chat stream with an async dispatch + reconnectable replay flow so the mobile client survives backgrounding, network blips, and OS-killed sockets without losing an in-flight agentic turn. - TurnRegistry/TurnEntry: in-memory per-turn event buffer (cap 500, front eviction) shared by the agentic loop (writer) and SSE replay readers. ReplayOutcome + replay_from/next_batch distinguish Events/CaughtUp/Gone; next_batch registers the Notify before reading state (no lost wakeup) and drains every buffered event before signaling terminal, so the final Done/Error is never dropped and the stream closes cleanly. - Endpoints: POST /insights/chat/turn (202 + turn_id), GET /insights/chat/turn/{id} (SSE replay, ?skip_before= resume, per-event seq, 410 on eviction), DELETE /insights/chat/turn/{id} (real task abort + cooperative is_running() check at each loop boundary). - Cancellation actually stops the task (AbortHandle stored on the entry) and emits a Done{cancelled:true}; callers skip persistence on cancel. - Background sweeper drops stale turns; interval clamped to <=300s. - OpenTelemetry spans: ai.chat.turn.execute/replay/cancel. - Legacy POST /insights/chat/stream path preserved unchanged. Tests: registry coverage for terminal delivery (race guard), waiting, Gone, abort, eviction; handler integration tests for 404/410, skip_before, seq stamping, completed replay, and cancel. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-29 19:50:25 -04:00
Cameron Cordes	9654d256f4	fix: persist token counts and fix agentic insight_id mapping - Add prompt_eval_count and eval_count columns to photo_insights so token usage from llama-swap/Ollama is stored and returned by the API - Fix agentic generator return: was (prompt_eval_count, eval_count), handler destructured first element as insight_id — now returns (insight_id, prompt_eval_count, eval_count) - Wire prompt_eval_count/eval_count from DB into PhotoInsightResponse instead of hardcoded None Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-27 13:47:57 -04:00
Cameron Cordes	449ce1fda1	chore: resolve all clippy warnings and formatting - Replace impl ToString with impl Display for InsightJobStatus and InsightGenerationType - Rename from_str → parse to avoid confusion with std::str::FromStr - Collapse nested if statements (handlers, insight_chat, insight_generator, image handlers) - Use is_multiple_of() instead of manual modulo checks - Suppress deprecated diesel::dsl::count_distinct (no drop-in replacement available in current Diesel version) - Scope MutexGuard in synthesize_merge to drop before await - Allow dead_code on generate_no_think, enumerate_indexable_files, total_deleted (intended for future use) - Allow type_complexity on Diesel query result tuples Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-27 13:13:48 -04:00
Cameron Cordes	a410683edf	fix: fail fast when LLM_BACKEND=llamacpp but LlamaCppClient is unconfigured Previously embed_one() silently fell back to Ollama embeddings, which would load nomic-embed-text into VRAM alongside llama-swap — wasting memory on an unintended model. Now returns an error with an actionable message instead. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-27 13:02:42 -04:00
Cameron Cordes	2818936739	fix: audit fixes for async insight jobs + persist generation params - Fix query param mismatch: rename GenerationStatusQuery.file_path to path so the client's app-resume buildQuery({ path: ... }) resolves correctly instead of always getting 400 - Remove dead _lib_id bindings from both generate handlers - Return 202 Accepted instead of 200 from generate endpoints - Restore OpenTelemetry span instrumentation on generate handlers - Remove stale UNIQUE constraint from initial migration (incompatible with plain-INSERT DAO) - Add tests for status guard: complete_job/fail_job are no-ops when job is already cancelled, and cancel_job by id - Persist generation params (num_ctx, temperature, top_p, top_k, min_p, system_prompt, persona_id) on the photo_insights table for auditing Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-27 13:02:15 -04:00
Cameron Cordes	b87eb4e690	feat: async insight generation with SQLite job tracking - Add insight_generation_jobs table migration and DAO - Implement job lifecycle: create_or_get_active, complete, fail, cancel - Refactor POST /insights/generate and /agentic to async spawn with timeout - Add GET /insights/generation/status endpoint with job_id and file_path lookup - Use String for enum fields in Diesel models to avoid private Bound type - Add from_str() helpers on InsightJobStatus and InsightGenerationType - Fix update_training_messages to return Result<usize, DbError> - 7/7 DAO unit tests passing	2026-05-27 10:02:18 -04:00
Cameron Cordes	b03ee60342	fix: prevent hybrid mode from leaking OpenRouter model to local llamacpp client When backend=hybrid with LLM_BACKEND=llamacpp, the user-selected model (an OpenRouter id like "google/gemini-3-flash-preview") was being applied to the local LlamaCppClient's primary_model and vision_model. This caused describe_image to send the OpenRouter model name to llama-swap, which returned 400 because it has no such slot. Guard the local-client model override with !is_hybrid so it only applies in local-only mode (where the user is selecting a different local model). Bump to v1.2.0. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-26 09:55:16 -04:00
Cameron Cordes	0a627f4880	Add contact name filter to SMS search tool + misc improvements - sms search tool: accept contact name, trim/validate, skip when contact_id is set, pass to API client - sms_client: new contact field in SmsSearchParams, URL-encode on wire - Tool description clarifies contact_id takes precedence when both given - Add parse_title_body helper for LLM response parsing - llamacpp backend improvements	2026-05-25 21:46:18 -04:00
Cameron Cordes	9dba659d1e	test: add llamacpp model-slot consistency and content-null tests Cover the properties that prevent mid-turn model swaps in llama-swap exclusive mode: vision_model defaults to primary, cloned local client mirrors the user-selected model, embeddings stay on their own slot. Also test the content:null serialization for tool-calling messages. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-24 19:29:51 -04:00
Cameron Cordes	208344ad98	ai: mirror chat model on local client to prevent mid-turn model swap When the user selects a model from the picker, the local client's primary_model and vision_model now match the chat model. Prevents llama-swap exclusive mode from swapping models when describe_photo or rerank fires during an agentic turn. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-24 19:27:29 -04:00
Cameron Cordes	a8a661f70a	ai: extract ResolvedBackend, remove ~480 lines of duplicated dispatch Replace 5 copies of the ~80-line backend resolution pattern with a single InsightGenerator::resolve_backend() builder that returns a ResolvedBackend (chat + local clients, BackendKind enum, images_inline flag). Tool dispatch now takes &ResolvedBackend instead of &OllamaClient + model + backend strings. Remove duplicated ollama/openrouter/llamacpp fields from InsightChatService — InsightGenerator owns them and resolve_backend uses them. Delete build_chat_clients (replaced by resolve_backend). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-24 15:00:50 -04:00
Cameron Cordes	0631820fbf	ai: send images directly to llamacpp chat models + add ResolvedBackend llamacpp models now receive images via OpenAI content-parts instead of the describe-then-inline strategy (hybrid mode unchanged). Fixes assistant messages with tool_calls emitting content: null instead of "" to satisfy strict Jinja template role-alternation checks. Adds debug logging of message role sequences on llamacpp requests. Introduces BackendKind enum, SamplingOverrides, and ResolvedBackend in a new backend.rs module. InsightGenerator::resolve_backend centralises client construction + vision capability detection — next step wires the existing inline dispatch through it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-24 14:00:37 -04:00
Cameron Cordes	be51421b38	ai: collapse llamacpp into LLM_BACKEND env switch Reverts the per-request backend="llamacpp" value. Chat/vision/embedding backend is now a deploy-time decision (LLM_BACKEND=ollama\|llamacpp), applied globally across chat, vision describe, and embeddings — so embedding vectors stay in one space across the index. - Per-request backend whitelist back to "local"\|"hybrid". A request arriving with backend="llamacpp" is rejected. - LLM_BACKEND=llamacpp swaps the entire local stack to llama-swap: chat hits the chat slot, describe hits the vision slot, embeddings hit the embed slot. Hybrid mode still routes chat to OpenRouter but uses LLM_BACKEND for the describe pass. - Drops env vars HYBRID_VISION_BACKEND, LLAMA_SWAP_VISION_MODELS, EMBEDDING_BACKEND (the last never shipped). Drops the LlamaCppClient.vision_models allowlist — capability inference now reports has_vision only for the configured vision_model slot. - Drops the /insights/llamacpp/models handler. /insights/models is the single endpoint; returns Ollama servers under LLM_BACKEND=ollama and llama-swap slots (from LLAMA_SWAP_ALLOWED_MODELS) under LLM_BACKEND=llamacpp. Same envelope shape either way. - New ai::embed_one helper routes embeddings through llama-swap when LLM_BACKEND=llamacpp (else Ollama). Wires it into the four insight_generator embedding sites. - Cross-replay matrix simplifies to pre-llamacpp shape (local↔local, hybrid↔hybrid, hybrid→local allowed; local→hybrid rejected).	2026-05-21 11:36:58 -04:00
Cameron Cordes	f0927f5355	ai: add llamacpp backend (llama-swap) as third LLM client Wires a new LlamaCppClient (OpenAI-compatible /v1 wire format) alongside OllamaClient and OpenRouterClient. Per-slot routing for chat/vision/embed via env (LLAMA_SWAP_URL + *_MODEL vars); capability inference uses an env allowlist since /v1/models doesn't report modality. InsightGenerator + InsightChatService gain three-way dispatch on chat_backend = "local" \| "hybrid" \| "llamacpp". Hybrid and llamacpp share the describe-then-inline path (text-only chat after a separate vision describe). HYBRID_VISION_BACKEND=llamacpp lets hybrid route its describe pass through llama-swap's vision slot while chat still goes to OpenRouter. Cross-replay matrix added (validate_cross_replay): local<->llamacpp and hybrid<->llamacpp allowed; local->hybrid and llamacpp->hybrid rejected. New /insights/llamacpp/models handler mirrors the OpenRouter shape.	2026-05-20 17:52:33 -04:00
Cameron Cordes	66267cc345	clip-search: fmt + clippy clamp + test AppState arg Pulls cargo fmt + clippy pass over the new files only — pre-existing files left untouched even though fmt has drift on them. clamp(1,200) swaps a manual min/max chain that clippy flagged. test AppState constructor needed ClipClient::new(None) so the lib-test target compiles. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 16:10:52 -04:00
Cameron Cordes	8d9e76cf15	clip-search: migration + client + probe binary Probe-phase scaffolding for CLIP semantic search. Adds the column that will hold per-photo embeddings, the HTTP client to Apollo's inference service, and a throwaway probe binary so we can eyeball search-result quality on the live library before building the persistence layer (backlog drain, /photos/search endpoint, UI). - migrations/2026-05-14-000000_add_clip_embedding/ — adds image_exif.clip_embedding (BLOB) and clip_model_version (TEXT), plus a partial index on (clip_embedding IS NULL AND content_hash IS NOT NULL) for the future backfill drain. - src/database/models.rs — extends ImageExif struct to match. - src/ai/clip_client.rs — encode_image / encode_text / health, same Permanent/Transient/Disabled taxonomy as face_client. - src/bin/probe_clip_search.rs — --query <q> --library N --limit M --top K. Encodes a sample and prints top-K cosine similarities. No DB writes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 16:10:52 -04:00
Cameron	c30cadde02	ai: fix UTF-8 byte-slice panics in insight_generator log/truncation paths Switch four `&s[..N]` / `&s[..s.len().min(N)]` sites to `chars().take(N).collect::<String>()` so truncation lands on character boundaries instead of mid-codepoint. The agentic summary preview log was panicking when generated content hit an em-dash at byte 200; the few-shot passage cap, brief_json_args debug formatter, and a test assertion message had the same latent bug. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 15:10:02 -04:00
Cameron Cordes	8503ef7884	chore: cargo fmt + clippy --fix sweep across the crate Pure mechanical cleanup of accumulated drift in files outside the HLS-content-hash branch's main change set. No behavior change. - `cargo fmt` on every previously-misformatted file (`ai/insight_generator.rs`, `database/knowledge_dao.rs`, `faces.rs`, `knowledge.rs`, `libraries.rs`). - `cargo clippy --fix`: - `needless_borrow`: `&library` → `library` in `handlers/image.rs` (two sites in the photo-listing path). - Manual clippy pass for warnings clippy emits but can't auto-apply: - `field_reassign_with_default` in `database/reconcile.rs::run` — consolidated into a struct-literal initializer. - `needless_range_loop` in `database/knowledge_dao.rs::union_perceptual_tags` — inner `for b in (a+1)..indices.len() { let ib = indices[b]; ... }` becomes `for &ib in &indices[a + 1..] { ... }`. - Doc-list indentation: continuation lines under nested bullets in `database/mod.rs::get_memories_in_window` and `database/knowledge_dao.rs::build_entity_graph` realigned to the list-item content column. Deliberately not touched (each deserves its own focused commit, with testing, rather than getting bundled into a sweep): - 4× `deprecated count_distinct` in `faces.rs` — diesel API migration to `AggregateExpressionMethods::aggregate_distinct` may shift result types; needs verification against the existing stats queries. - `await_holding_lock` in `knowledge.rs:807` — `std::sync::Mutex` held across `ollama.generate(...).await`. Genuine concurrency bug; fix requires understanding the surrounding flow before just dropping the guard. - 2× `type_complexity` in `database/mod.rs` — cosmetic, would need a `type` alias and corresponding callers updated. - Dead `total_deleted` on `library_maintenance::GcStats` and `file_scan::enumerate_indexable_files` — both are public surface retained for future use; deletion is a separate decision. All 707 tests still pass. Release build clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 16:25:05 -04:00
Cameron Cordes	e67e00ef8a	knowledge: predicate-quality nudge + bulk-reject endpoint Two coupled changes to fight the speech-act-predicate problem (facts like (Cameron, expressed, "I'm tempted to...")): 1. System prompt grows an explicit predicate-quality rule. The agent is told to use relationship-shaped verbs (lives_in, works_at, attended, is_friend_of, interested_in), and is given an explicit DON'T list (expressed, said, mentioned, stated, quoted, noted, discussed, thought, wondered). Plus a concrete Bad / Good example contrasting the noise pattern with the structured paraphrase the agent should be writing. Stops the bleed for new insights. 2. Cleanup tools for the legacy noise that's already in the table: - get_predicate_stats(persona, limit) returns [(predicate, count)] sorted desc — feeds the curation UI's PREDICATES tab. - bulk_reject_facts_by_predicate(persona, predicate, audit) flips every ACTIVE fact under that predicate to 'rejected' in one transaction, stamping last_modified_* so the action is attributable + reversible per-fact through the entity detail panel. REVIEWED facts under the same predicate are left alone — the curator may have hand-approved an exception ("interested_in" might be largely noise but a reviewed entry is intentional). New HTTP endpoints: GET /knowledge/predicate-stats?limit= POST /knowledge/predicates/{predicate}/bulk-reject Persona-scoped via the existing X-Persona-Id header. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 21:50:26 -04:00
Cameron Cordes	6dca0c027d	fmt: cargo fmt sweep No logic changes - line reflow, brace placement, and method-chain splits across handlers / personas / state / faces / knowledge / insights_dao / knowledge_dao / populate_knowledge. Picked up incidentally while running fmt for the sms-search work. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 19:21:00 -04:00
Cameron Cordes	7329cc5ce7	insights: push sms search filters server-side, render snippets, expand fts5 docs - Refactor search_messages_with_contact -> search_messages(query, &SmsSearchParams) exposing date_from / date_to / offset / is_mms / has_media; drop the over-fetch + client-side date post-filter that could silently drop in-window hits past position 100. - Surface SMS-API's <mark>-wrapped snippet for MMS messages that only matched via message_parts_fts (attachment text / filename) - pre-snippet, those rendered as a blank body preview to the LLM. - Expose is_mms / has_media on the search_messages tool schema; expand the FTS5 syntax docs with worked examples for phrase / prefix / boolean / NEAR / grouping so the model picks the right operator. - Unit tests for format_search_hits (body fallback, snippet preferred, MMS attachment-only regression, empty-snippet fallback) and strip_mark_tags. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 19:20:19 -04:00
Cameron Cordes	fd4dd89bbb	knowledge: agent self-correction with audit + per-persona gate + revert Bundles three coupled changes so agent-side mutations stay auditable and reversible: 1. Audit columns on entity_facts — `last_modified_by_model` / `last_modified_by_backend` / `last_modified_at`. Stamped on every mutation path (update_fact, supersede_fact, manual PATCH, manual supersede, the new revert). NULL on rows never touched since creation. Partial index on `last_modified_at WHERE NOT NULL` keeps the "show me recent edits" feed fast without bloating from legacy rows. 2. Per-persona gate `personas.allow_agent_corrections` (BOOLEAN, default 0). Defense in depth at two layers: - build_tool_definitions: when off, `update_fact` and `supersede_fact` aren't in the catalog at all, so even a hallucinated tool call by the model fails fast. - tool_update_fact / tool_supersede_fact: re-checks the persona flag at call time and returns an explicit "corrections disabled" error if it's somehow off (e.g. flag flipped mid- loop). ToolGateOpts grows the flag; current_gate_opts splits into `current_gate_opts` (no persona context, defaults closed) + `current_gate_opts_for_persona` for chat callers that have a persona id. Both call sites in insight_chat are updated. 3. Revert action — new DAO method `revert_supersession` + `POST /knowledge/facts/{id}/restore`. Flips status back to 'active', clears `superseded_by`, clears `valid_until` (we don't track whether it was hand-set vs auto-stamped, so the safe reset is to drop it — user can re-bound after). Stamps `last_modified_` so the revert itself is attributable. Manual paths (PATCH / supersede via HTTP, plus restore) stamp the audit columns with `("manual", "manual")`. Agent paths stamp the loop-time chat model and backend (mirroring the existing created_by_ convention). FactDetail in the HTTP response now carries the audit triple alongside the existing provenance. Apollo wires the new field set in the matching commit. PersonaView / UpdatePersonaRequest grow `allowAgentCorrections`; the PersonaPatch + InsertPersona + bulk_import paths thread it. 317 lib tests pass, including unchanged update_fact / supersede DAO tests (now passing audit=None — None means "no provenance context to attribute", legacy semantics). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 20:56:56 -04:00
Cameron Cordes	86c331571d	knowledge: per-persona reviewed-only mode + agent reads include reviewed Two coupled changes to the agent's recall surface: 1. Default scope expanded. recall_facts_for_photo and recall_entities used to filter to status='active' only — which silently dropped 'reviewed' (human-verified) facts. Now they surface active + reviewed by default. Reviewed is strictly more trusted than active and shouldn't have been hidden. Rejected and superseded stay filtered. 2. New persona toggle `reviewed_only_facts` (BOOLEAN, default false, migration 2026-05-10-000400). When set, the agent's recall on that persona returns ONLY facts with status='reviewed' — strict mode for tasks where hallucinated agent claims are particularly costly. Wired: - schema.rs / Persona / InsertPersona / PersonaPatch grow the field. - PersonaView returns it as `reviewedOnlyFacts` (camelCase wire). - PUT /personas/{id} accepts it (mobile editor surfaces it). - InsightGenerator now carries a PersonaDao reference so recall_facts_for_photo can read the active persona's flag at start; one extra read per recall, cheap. Composes with include_all_memories: that operates on the persona scope axis (single vs hive), reviewed_only_facts on the status axis. They're orthogonal. Legacy persona rows pick up the default false on migration; no behavior change unless explicitly toggled. The 4 existing persona construction sites (one production, two tests, one InsertPersona in knowledge_dao tests) all default the field. populate_knowledge bin + state.rs constructors also wire the new persona_dao arg. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 20:21:39 -04:00
Cameron Cordes	f53338923d	knowledge: stamp model + backend on facts for audit Adds two nullable TEXT columns to entity_facts — `created_by_model` (LLM identifier) and `created_by_backend` ("local" / "hybrid" / "manual" / NULL) — so the curator can audit which configurations produce good fact-keeping and which produce noise. photo_insights already carries model_version + backend, and entity_facts.source_insight_id links to it, but: - source_insight_id is set post-loop, so chat-continuation and regenerated-insight facts lose the link. - JOINing per read is more friction than embedding provenance on the row itself. - Manual facts (POST /knowledge/facts) have no insight at all and need their own "manual" provenance marker. Threading: execute_tool grows `model` + `backend` params, passed from the three call sites (agentic insight loop, chat single-turn, chat stream) using the loop-time `chat_backend.primary_model()` + `effective_backend` already in scope. tool_store_fact stamps the new fact accordingly; manual create_fact stamps backend="manual". Legacy rows leave both NULL — pre-tracking data can't be back- filled reliably from training_messages without burning compute. Indexes are partial (WHERE NOT NULL) so legacy rows don't bloat them, and "show me all facts from model X" stays fast. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 20:05:14 -04:00
Cameron Cordes	85f3716379	knowledge: fact supersession + photo-date valid_from Two Phase-2 followups in one commit since they're coupled at the write path: * Agent populates valid_from from the source photo's date_taken when calling store_fact. Loose semantics — date_taken is evidence at that date, not strictly when the fact started being true — but gives the curator a calendar anchor and pairs with supersession to close intervals cleanly. valid_until stays NULL (a single photo can't tell us when something stopped). Honours the existing upsert_fact dedup (corroborated facts keep their first-recorded valid_from). * Supersession: new column entity_facts.superseded_by INTEGER (migration 2026-05-10-000200), new status value 'superseded', new DAO method supersede_fact, new HTTP endpoint POST /knowledge/facts/{id}/supersede. Marking an old fact as replaced by a new one atomically: flips status to 'superseded', sets superseded_by, and stamps valid_until from the new fact's valid_from (when not already set). delete_fact clears dangling supersession pointers in the same transaction so the column never points at a missing row — no FK because SQLite can't ALTER ADD with REFERENCES, but the DAO maintains the invariant. Pairs with conflict detection from the previous slice: once the old fact's valid_until is closed, its interval no longer overlaps the new fact's, so they stop flagging — the supersede action resolves the conflict. Two tests pin the contract: supersede stamps valid_until from new.valid_from while respecting an existing valid_until, and deleting the supersedeR clears the dangling pointer while leaving the old fact's 'superseded' status in place for history. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 19:47:06 -04:00
Cameron Cordes	01f5ad7527	knowledge: valid-time on facts + interval-aware conflict detection Adds bitemporal support to entity_facts. Existing `created_at` is transaction time (when we recorded the fact); the new `valid_from` / `valid_until` BIGINT columns are valid time (when the fact is/was true in the real world). NULL on either side = unbounded on that side, both NULL = "always-true / unknown" — matches the default state of every legacy row, no backfill needed. The split matters for time-bounded predicates like is_in_relationship_with / lives_in / works_at: recording the fact once doesn't mean the relationship is still ongoing. Same predicate across different windows ("lives_in NYC 2018-2020", "lives_in SF 2020-present") is no longer a conflict — the interval-aware check in get_entity only flags pairs whose windows overlap. Facts with no valid-time data still flag against everything (worst case for legacy rows — user adds dates to suppress). API surface: - POST /knowledge/facts accepts optional valid_from / valid_until. - PATCH /knowledge/facts/{id} accepts both with tri-state semantics: field omitted = leave alone, JSON null = clear to NULL, number = set. Implemented via a small serde helper around Option<Option>. - GET /knowledge/entities/{id} surfaces both fields per fact and uses them in conflict detection. Agent path (insight_generator) writes NULL/NULL for now — deriving valid_from from the source photo's date_taken is slated for a follow-up agent tool alongside Phase 2's supersession. Test pins set + clear semantics via update_fact: setting both bounds, leaving them alone on a subsequent patch, then clearing valid_until back to NULL. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 19:25:55 -04:00
Cameron Cordes	d7aee4f228	knowledge: cosine dedup, fact create endpoint, recall nudge Phase 1 of the knowledge curation work. Three small server-side changes to support an Apollo-side curation surface and reduce the agent's near- duplicate output rate going forward: - upsert_entity grows an embedding-cosine fallback after the exact name match misses. New entities whose embedding sits above ENTITY_DEDUP_COSINE_THRESHOLD (default 0.92) against any same-type active entity collapse onto the existing row. Eliminates the Sarah / Sara / Sarah J. trio the FTS5 prefix check was missing. - POST /knowledge/facts symmetric with the existing PATCH/DELETE so the curation UI can create facts directly. Persona-scoped via X-Persona-Id; validates subject (and optional object) entity existence; reuses KnowledgeDao::upsert_fact so corroboration semantics match the agent path. - One sentence in build_system_content telling the agent to call recall_entities before store_entity when a name resembles something already known. Cheap; complements the DAO-layer guard. Includes upsert_entity_collapses_near_duplicate_by_embedding test covering both the collapse-on-near-match path and the don't-collapse-on- unrelated-embedding path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 15:16:05 -04:00
Cameron Cordes	08a5f46be1	chat: scope insight lookup by library_id to fix regen-shadow bug When a photo exists in more than one library and the user regenerates its insight from library A's chat, the regenerate streams cleanly, store_insight flips library A's old row to is_current=false, and inserts a new is_current=true row tagged (library A, rel_path). On the next history fetch the user sees their old transcript — the regenerate appears to vanish. The cause: get_insight(file_path) filters on rel_path + is_current only, so library B's untouched is_current=true row for the same rel_path satisfies the query and gets returned by SQLite's .first() ahead of A's new row. Because get_insight is also what chat_turn_stream uses to decide bootstrap vs. continuation, the next chat turn after the shadow hit also routes against the wrong insight, so update_training_messages corrupts library B's transcript with library A's chat. Fix: add get_current_insight_for_library(library_id, file_path) filtered on (library_id, rel_path, is_current=true) and route the chat surface (load_history, chat_turn{,_stream}, rewind_history) through it. load_history falls back to the cross-library get_insight when the scoped lookup misses — preserves the "scalar data merges across libraries" intent for the case where the active library has no insight but another does. The path-only get_insight stays for callers that don't have library context (populate_knowledge, the photo-grid metadata fetch). chat_history_handler stops dropping the parsed library on the floor and threads it through. Single-library deploys see no behaviour change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 14:03:41 -04:00
Cameron Cordes	b9d9ba0320	chat: route search_messages({date}) to get_sms_messages When the LLM calls search_messages with { date, limit } and no query, it's making the predictable mistake of conflating the two "messages"-shaped tools. The previous behaviour returned an error that pointed it at get_sms_messages — correct, but burning a turn on the misroute. Long photo-chat threads where the user asks "what was happening that weekend?" hit this on small models roughly half the time. Now the date-string-without-query case transparently dispatches to get_sms_messages with the same args (date / limit / days_radius / contact name all pass through unchanged) and prepends a short "(Note: routed to get_sms_messages — prefer it directly next time)" to the result. The model sees real data on its first try while still learning the right tool for next time. Cases that don't have a get_sms_messages equivalent (numeric contact_id, or start_ts / end_ts windows) keep the original error so the model knows to either supply a query or restructure its call. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 13:48:13 -04:00
Cameron Cordes	fbd769e475	personas: composite FK + built-in update guard Two persona-infrastructure correctness fixes that go together because the second one (FK with CASCADE) requires the first (preventing the persona row from being mutated out from under its facts). 1. update_persona handler refuses name/systemPrompt edits to built-ins (409). includeAllMemories stays editable — that's a per-user preference, not the persona's identity. Mirrors the existing delete_persona guard. The DAO is intentionally permissive so the guard sits at the HTTP layer; persona_dao test pins that contract. 2. Migration 2026-05-10 adds user_id to entity_facts and a composite FK (user_id, persona_id) -> personas(user_id, persona_id) ON DELETE CASCADE. This closes two issues at once: - Persona orphans: deleting a custom persona used to leave its facts dangling forever, readable only via PersonaFilter::All. CASCADE now wipes them with the persona row. - Multi-user fact leakage: PersonaFilter::Single("default") used to surface every user's default-scoped facts. PersonaFilter is now { user_id, persona_id } and all read paths (get_facts_for_entity, list_facts, get_recent_activity) filter on user_id first. upsert_fact's dedup key extends to user_id so identical claims under shared persona names from different users no longer corroborate-bump each other's confidence. - user_id threads from Claims.sub.parse::<i32>().unwrap_or(1) at the chat / insight handlers through ChatTurnRequest, the streaming agentic loop, execute_tool, and into the leaf tools (tool_store_fact, tool_recall_facts_for_photo). The ".unwrap_or(1)" accommodates Apollo's service token whose sub is non-numeric on legacy mints. - Backfill picks the smallest user_id matching each legacy fact's persona_id so the FK holds for already-stored rows. Five new knowledge_dao tests with FK-on connection: persona scoping isolation, All-variant union per-user, dedup not crossing users, CASCADE delete, FK rejection of unknown personas. Plus dao_update_does_not_block_built_ins documenting where the HTTP-layer guard lives. Apollo coordinates separately — the matching changes there add the /api/personas proxy and start sending persona_id on photo-chat turns. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 13:30:35 -04:00
Cameron Cordes	3e2f36a748	personas: elevate to server with per-persona fact scoping Move personas off the mobile client into ImageApi as first-class records, and scope entity_facts by persona so each one builds its own voice over a shared entity graph. The new include_all_memories flag lets a persona opt back into the full hive-mind pool for human browsing of /knowledge/*; agentic generation always stays in-voice. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 17:59:20 -04:00
Cameron Cordes	3699e059a2	insight-chat: include Date taken + GPS in bootstrap photo context The bootstrap system message gave the model a file path and (in hybrid mode) a visual description, but no temporal anchor. Models defaulted to today's date when calling get_sms_messages — Nov 2014 photos were getting "2024-03-11" passed as `date`, missing every historical message and leading the model to confidently misreport context. This commit folds two more EXIF-sourced facts into the --- PHOTO CONTEXT --- block: Date taken: <YYYY-MM-DD or "unknown"> GPS: <lat, lon to 4dp> (omitted when no GPS) Resolution waterfall for date_taken matches the documented canonical date pipeline at the EXIF / filename steps, but intentionally stops short of the fs-time fallback `generate_agentic_insight_for_photo` uses — for chat we'd rather show "unknown" than mislead the model with an inode mtime. GPS is taken straight from EXIF when both lat/lon are populated; absent GPS suppresses the line entirely so the model doesn't hallucinate coordinates. InsightGenerator gains a `fetch_exif(file_path)` accessor (crate- visible) so the chat service doesn't need its own ExifDao plumbing. build_bootstrap_system_message picks up two new params (date, gps); existing tests updated and 5 new tests cover: - date present / absent / waterfall (EXIF wins, filename fallback, None when neither source has it) - GPS present / absent - ordering (path → date → visual) Total insight_chat unit tests: 33 (up from 27).	2026-05-08 11:14:39 -04:00
Cameron Cordes	a0ec1a5080	insight-chat: photo context belongs in system msg, not user turn After refresh, the rendered transcript was showing two unwanted artifacts in the initial user bubble: Photo file path: pics/DSC_5171.jpg please tell me about this photo and what was going on around it Please write your final answer now without calling any more tools. Two distinct bugs: 1. Bootstrap was prepending `Photo file path: <path>` (and, in hybrid mode, the visual description block) into the user-turn content. The model needed it to call file_path-keyed tools, but the user could see it in their own bubble on replay. 2. The no-tools fallback ("Please write your final answer now…") was a synthetic user message we never stripped from history, so it persisted into training_messages, rendered as a second user bubble, AND wiped the prior tool-call accumulator inside load_history (user-turn handler clears pending_tools), which is why the tool invocations disappeared from the assistant bubble after refresh. Fixes: - New `build_bootstrap_system_message` helper composes the persona with a `--- PHOTO CONTEXT ---` block (path + optional visual description). Lives in the system message, not the user turn. The user's bubble shows only what they typed. - Streaming agentic loop's no-tools fallback now records its insertion index and removes the synthetic user prompt from `messages` after the model responds. Final assistant content stays — it reads coherently on replay without the synthetic prompt above it. Applies to both bootstrap and continuation. 3 new tests cover the system-message composer (path-only, with visual block, persona-trim). Total insight_chat unit tests: 27.	2026-05-08 11:07:03 -04:00
Cameron Cordes	24ecf2abd4	insight-chat: prepend Photo file path: <path> to bootstrap user turn Bug: bootstrap user_content was just the user's typed message (plus the hybrid visual description). Tools that take a file_path arg — recall_facts_for_photo, get_file_tags, get_faces_in_photo — had no way to learn the canonical path. Small models would invent placeholders like "input_file_0.png" or call the tool with a name guessed from a hidden multimodal input handle, neither of which matched any real photo. Fix: prepend a single-line "Photo file path: <normalized>\n\n" block to user_content. Same shape generate_agentic_insight_for_photo already uses for non-chat callers — kept the bootstrap minimal (no date / GPS / tags pre-stuffing; the agentic loop can fetch those via tools when needed). Hybrid still injects the visual description block between the path block and the user message; local mode just gets path + user text.	2026-05-08 10:59:35 -04:00
Cameron Cordes	a29ff406a1	insight-chat: extract bootstrap resolution helpers + unit-test them resolve_bootstrap_system_prompt and resolve_bootstrap_backend run on every bootstrap turn — they pick the persisted system prompt and the chosen backend label. They were inline conditionals before; pulling them out makes the rules testable without spinning up the full streaming stack. 9 new tests cover: - system prompt fallback to BOOTSTRAP_DEFAULT_SYSTEM_PROMPT for None, empty string, whitespace-only - supplied non-empty prompts pass through verbatim, with interior newlines / spacing preserved (Apollo personas use multi-line tool listings) - backend defaults to "local" for None / empty - "local" / "hybrid" accepted case-insensitively with edge-trim - unknown labels return a descriptive error Total insight_chat tests: 24 (up from 15). No behaviour change.	2026-05-08 10:56:22 -04:00
Cameron Cordes	928efe49f9	insight-chat: bootstrap insight on first Discuss message + regenerate flag Tap-Discuss-on-no-insight previously failed silently: ImageApi's /insights/chat/stream required an existing agentic insight, errored when missing, and emitted the failure as `event: error` — which the frontend SSE consumer ignored (it listens for `error_message`). This commit closes both gaps with a server-side state machine: - /insights/chat/stream now branches on insight presence. Missing insight (or `regenerate: true` in the body) → bootstrap path: builds [System(req.system_prompt), User(req.user_message + image)], runs the agentic loop, generates a title, persists a new row via store_insight (which auto-flips priors). Existing insight → continuation path (unchanged behaviour). - New `regenerate: bool` request field forces bootstrap even when an insight exists. Takes precedence over `amend`. - `done` SSE payload field-name alignment with Apollo's frontend convention: prompt_eval_count → prompt_tokens, eval_count → eval_tokens, num_ctx echo added. - `amended_insight_id` semantics broaden — now populated whenever the turn produced a new row (bootstrap, regenerate, or amend). Existing amend clients keep working unchanged; new clients get the new row's id for free. - `event: error` → `event: error_message` so frontend errors stop silently dropping. Refactor: extracted run_streaming_agentic_loop, build_chat_clients, and generate_title as shared helpers between bootstrap and continuation. Continuation path's outer logic moves to run_continuation_streaming with no behaviour change. Mobile-ready: any client (Apollo backend, mobile, future) sends one request to /insights/chat/stream and gets the right path. Apollo's proxy stays a dumb pipe. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 10:41:50 -04:00
Cameron Cordes	8bd1a85070	insight-chat: cargo fmt sweep on the get_faces_in_photo additions Single-line dao lock + reordered faces import. No logic changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 17:53:31 -04:00

1 2 3

131 Commits