# Insight Chat improvements — design **Date:** 2026-05-07 **Branch:** `feature/insight-chat-improvements` (in both `ImageApi/` and `FileViewer-React/`) **Scope:** ImageApi photo-anchored insight + chat surface, plus the FileViewer-React client. Apollo's free/visit chat is **not** in this cycle. ## Problem Three concrete gaps in today's insight + chat surface: 1. **Tool drift.** ImageApi exposes 13 tools to the LLM. Some are gated on `apollo_enabled` / `has_vision`, but several optional ones (`search_rag`, `get_calendar_events`, `get_location_history`) are registered unconditionally even when their backing tables are empty. Descriptions vary in quality and a couple have outright bugs. 2. **Inconsistent / incomplete tool descriptions.** Tools like `search_messages` describe their selection rules but omit useful examples; `store_fact` doesn't show the `object_entity_id` vs `object_value` choice; `get_sms_messages` accepts a `days_radius` parameter that the backing client silently ignores. The LLM is being instructed against a slightly wrong reality. 3. **System prompt fights the persona.** Today's generation prompt prepends the user's `custom_system_prompt` and then immediately asserts `"You are a personal photo memory assistant..."`. The user message demands `"a detailed insight with a title and summary"`. Both contradict whatever voice / shape / POV the persona just established. On chat continuation the persona is baked into the stored transcript at generation time and can't be changed without regenerating. ## Goals - Tool catalog is **representative** — every tool registered for a turn is backed by data the user actually has. - Tool descriptions are **concise but complete**, with examples for any tool whose param choice has multiple modes or non-obvious interactions. - Persona / system prompt is **authoritative** for voice, length, and shape — both at generation and during chat continuation. - Per-turn system prompt overrides on chat work without surprising side-effects on the stored transcript outside `amend` mode. ## Non-goals - Apollo backend / frontend changes. Separate cycle. - Refactoring the `generate_photo_title` post-hoc title flow. Already takes `custom_system_prompt`. - Tool consolidation (e.g. merging `search_messages` + `get_sms_messages`). Considered and deferred — keeps blast radius small. - Removing knowledge-memory tools (`recall_*` / `store_*`). Audit confirmed they have a live read path via `knowledge.rs` HTTP routes. - Persisting persona changes to the stored transcript outside `amend` mode. Deliberate — re-opens use the persona currently active in the client, not a sticky historical setting. --- ## Design ### A. System prompt — generation Today (`insight_generator.rs:3305–3326`): ``` [custom_system_prompt if any] + "You are a personal photo memory assistant helping to reconstruct..." + {owner_id_note} + {fewshot_block} + "IMPORTANT INSTRUCTIONS: 1. You MUST call multiple tools... 2. When calling get_sms_messages and search_rag... 3. Use recall_facts_for_photo... ... 8. You have a hard budget of {max_iterations} iterations..." ``` The first concatenation is the bug: `custom` claims one identity, the next line asserts another. **New structure** — two named blocks, in order: ``` [Identity / voice / format block] ← persona-controlled (or neutral default) [Procedural block] ← always identity-free ``` **Identity block:** - When `custom_system_prompt` is supplied: use that string verbatim, no pre/append. - When not: a neutral default that doesn't fight a future persona. Working text: `"You are reconstructing a memory from a photo. Use the gathered context to write a thoughtful summary; you decide voice, length, and shape."` **Procedural block** — identity-free, always emitted: ``` Tool-use guidance: - You have a budget of {max_iterations} tool-calling iterations. - Call tools to gather context BEFORE writing your final answer; don't answer after one or two calls. - When calling get_sms_messages or search_rag, make at least one call WITHOUT a contact filter — surrounding events matter even when a contact is known. - Use recall_facts_for_photo + recall_entities to load any prior knowledge about subjects in the photo. - When you identify people / places / events / things, use store_entity + store_fact to grow the persistent memory. - A tool returning no results is informative; continue with the others. {owner_id_note if applicable} {fewshot_block if applicable} ``` Differences from today's "IMPORTANT INSTRUCTIONS" block: removed the "you are a personal photo memory assistant" framing and the explicit "at least 5 tool calls" floor (replaced with the softer "don't answer after one or two"). Few-shot stays — it's pattern-of-tool-use, not identity. ### B. User message — generation Today (line 3357): ``` {visual_block}Please analyze this photo and gather any relevant context from the surrounding weeks. Photo file path: {file_path} Date taken: {date} {contact_info} {gps_info} {tags_info} Use the available tools to gather more context about this moment (messages, calendar events, location history, etc.), then write a detailed insight with a title and summary. ``` Problems: the trailing line bakes in output shape ("title and summary"), and the title from the resulting response is **discarded anyway** — `generate_photo_title` (line 3494) regenerates the title post-hoc from the summary. So the prompt is constraining voice for no data-model benefit. **New payload** — context-only, no output prescription: ``` {visual_block}Photo file path: {file_path} Date taken: {date} {contact_info} {gps_info} {tags_info} Gather context with the available tools, then respond. ``` The persona owns shape. If a user wants "title-then-paragraph" output, their persona prompt says so. ### C. System prompt — chat continuation Add `system_prompt: Option` to `ChatTurnRequest` (and to its HTTP wrapper `ChatTurnHttpRequest`). It carries through both the non-streaming `chat_turn` and the streaming `chat_turn_stream`. **Append mode (default, `amend=false`)** — ephemeral swap-and-restore, mirroring the existing `annotate_system_with_budget` pattern: 1. Load stored transcript. 2. If `system_prompt` is `Some(s)`: - If first message is a `system` role: stash original content, replace with `s`. - Else: prepend a synthetic ephemeral system message with `s` (note it's synthetic so the restore step pops it rather than rewriting). 3. Run `annotate_system_with_budget` on top (existing per-turn budget note appends to whatever's there now). 4. Run the agentic loop. 5. **Before persistence**, restore the original system content (or pop the synthetic one). Run `restore_system_content` for the budget annotation as today. 6. Save. Result: the model sees the override; the stored transcript is unchanged outside the model's actual reply. **Amend mode (`amend=true`)**: - If `system_prompt` is supplied: the override stays in place during the serialization for the new insight row. The new row's `training_messages` system message is the override. `is_current=false` flips on prior rows as today. - If not supplied: behaves as today (stored transcript's system message carries forward unchanged). ### D. FileViewer-React — client wiring `hooks/useInsightChat.tsx`: - `SendTurnOptions` gains `systemPromptOverride?: string | null`. - Inside `sendTurn`, before issuing the streaming POST: 1. Read the active persona's `systemPrompt` from AsyncStorage (already loaded for generation flows — reuse the same accessor). 2. If a one-shot `systemPromptOverride` is set, append as a suffix (`${persona}\n\n${override}`) so persona voice survives + override tweaks the turn. 3. Include the resulting string as `system_prompt` on the request body. - No history-load change. The history endpoint still returns the stored transcript. `components/InsightChatModal.tsx`: - Add a small "Style note" composer affordance — a one-shot text input that, when filled, becomes the `systemPromptOverride` for the next send. Cleared after send. - The existing persona chip continues to open `PersonaManagerModal`. `hooks/usePersonas.tsx` and the bundled defaults: - Built-in `assistant` and `journal` prompts get audited and rewritten to **explicitly state voice / shape / length** — since the framework no longer guarantees a default shape, the persona must. ### E. Tool catalog — gating Widen `build_tool_definitions` from `(has_vision: bool, apollo_enabled: bool)` to a single `ToolGateOpts` struct: ```rust pub struct ToolGateOpts { pub has_vision: bool, pub apollo_enabled: bool, pub daily_summaries_present: bool, pub calendar_present: bool, pub location_history_present: bool, } ``` The chat / generation services compute the three new fields lazily per turn via `SELECT 1 FROM LIMIT 1` (cheap; cached for the turn's duration). Lazy because operators import data after launch and we don't want to require a restart for the LLM to discover its new capabilities. Per-tool gating: | Tool | Existing gate | New gate | |---|---|---| | `describe_photo` | `has_vision` | unchanged | | `get_personal_place_at` | `apollo_enabled` | unchanged | | `get_calendar_events` | none | `calendar_present` | | `get_location_history` | none | `location_history_present` | | `search_rag` | none | `daily_summaries_present` | All other tools always-on. (`get_sms_messages` and `search_messages` fail informatively if SMS-API is unreachable; not worth a startup probe since intermittent failures are the same shape.) ### F. Tool descriptions — convention Every description follows: 1. One sentence: **what** + **when to call**. 2. Param semantics worth knowing (units, ranges, mode behavior, precedence). 3. **Example invocation** for tools with multiple modes, optional bands, or non-obvious parameter interactions. 4. Cross-references when relevant: `prefer X when both apply`. Banned: all-caps section headers inside descriptions (`"CONTENT search"`, `"TIME-BASED fetch"`); persona-prescriptive language (`"you are a..."`); behavioral references to other tools by description rather than name. Tools getting examples: `search_messages`, `search_rag`, `store_fact`, `get_sms_messages`. Trivial tools (`get_current_datetime`, `reverse_geocode`, `get_file_tags`) skip the example. Sample (`search_messages`): > Search SMS/MMS message bodies. Modes: `fts5` (keyword + phrase + prefix > + AND/OR/NOT + NEAR proximity), `semantic` (embedding similarity, > requires generated embeddings), `hybrid` (RRF merge, recommended; > degrades to `fts5` when embeddings absent). Optional `start_ts` / > `end_ts` (real-UTC unix seconds) and `contact_id` filters. For pure > date / contact browsing without keywords, prefer `get_sms_messages`. > > Examples: > - `{query: "trader joe's"}` — phrase across all time. > - `{query: "dinner", contact_id: 42, start_ts: 1700000000, end_ts: 1700604800}` > — keyword within a contact and a week. > - `{query: "NEAR(meeting work, 5)"}` — proximity search. ### G. SMS tool fixes #### `get_sms_messages` — honor `days_radius` Today: `sms_client::fetch_messages_for_contact(contact, center_ts)` hardcodes `Duration::days(4)` (lines 31–37). The tool accepts `days_radius` and silently ignores it. **Fix:** widen the signature to `fetch_messages_for_contact(contact, center_ts, days_radius)`. Tool plumbs through. Default 4 retained for back-compat. #### `search_messages` — add date and contact_id filters Today: ImageApi's `search_messages` only forwards `query`, `mode`, `limit` to SMS-API. **Fix:** add `start_ts`, `end_ts`, `contact_id` parameters. - `contact_id` forwards directly to SMS-API (`/api/messages/search/?contact_id=`). - `start_ts` / `end_ts` are not natively accepted by SMS-API's search endpoint. Apply client-side post-filter on the response (Apollo's pattern: `chat_tools.py:670–680`). Bump the SMS-API `limit` to a larger fetch pool when a date filter is supplied so in-window matches aren't lost to out-of-window FTS rank. --- ## Implementation sequencing Each step is independently mergeable. ### ImageApi PRs 1. **Split system-prompt assembly + neutralize user message.** Two named blocks; user message context-only. Default identity string added. Tests: golden snapshots of the resulting `system_content` with and without `custom_system_prompt`. 2. **`system_prompt` field on chat request + swap/restore + amend persistence.** Mirrors `annotate_system_with_budget` pattern. Tests: round-trip system content unchanged in append mode; persisted in amend mode. 3. **`fetch_messages_for_contact` honors `days_radius`.** Tool wires the param through. Tests: window math at the client level. 4. **`ToolGateOpts` + per-tool description rewrites.** Description text changes are the bulk of the diff but no behavior change beyond gating. ### FileViewer-React PR 5. **Chat hook sends `system_prompt`; modal gets style-note input; built-in personas updated to specify shape.** The `useInsightChat.sendTurn` call site picks up the persona and includes it on every chat turn body. Style-note input is a one-shot suffix. ## Testing & verification **Automated:** - Unit (Rust): swap-and-restore round-trip preserves stored transcript. - Unit (Rust): amend mode persists override into new insight row. - Unit (Rust): `fetch_messages_for_contact(days_radius=N)` produces a window of `2N` days centered on `center_ts`. - Unit (Rust): `build_tool_definitions(opts)` excludes gated tools when the corresponding flag is false. **Manual:** - Run a chat turn against an existing insight without `system_prompt` → output unchanged from baseline. - Same insight, with override → output reflects new voice. - Re-open chat → original baked persona still authoritative (override was ephemeral). - Regenerate an insight with the journal persona → model's voice matches journal style; no "memory assistant" framing leaks through. - Toggle data presence (delete a row from `calendar_events`) → tool drops from the catalog on the next turn. ## Risks - **Default identity wording matters.** A too-neutral default ("Use the gathered context to write a summary") might produce flatter output than today's "personal photo memory assistant" framing for users who never set a persona. Mitigation: tune the default with a small set of test photos before merging. - **Persona-suffix style notes can contradict persona voice.** A user who picks `journal` (first person, warm) and adds the style note "respond in bullet points" will get a tonal collision. Acceptable — the user expressed a per-turn intent and we honor it. Document the composition rule in the persona-manager UI. - **Lazy data-presence probes add a per-turn `SELECT 1`.** Negligible on SQLite (sub-millisecond) but adds up across many turns. Cache the result for the turn's duration; don't re-probe per-tool. ## Open questions None blocking. Items deferred to a possible follow-up cycle: - Apollo parity for the same per-turn override pattern (already present; just needs RN client wiring on the photo path which is already proxy). - Tool consolidation (`search_messages` + `get_sms_messages` → single `search_messages` with optional date filter, Apollo-style). Considered and deferred — separate spec.