diff --git a/specs/002-insight-chat-improvements.md b/specs/002-insight-chat-improvements.md new file mode 100644 index 0000000..4efa9e3 --- /dev/null +++ b/specs/002-insight-chat-improvements.md @@ -0,0 +1,392 @@ +# Insight Chat improvements — design + +**Date:** 2026-05-07 +**Branch:** `feature/insight-chat-improvements` (in both `ImageApi/` and `FileViewer-React/`) +**Scope:** ImageApi photo-anchored insight + chat surface, plus the +FileViewer-React client. Apollo's free/visit chat is **not** in this cycle. + +## Problem + +Three concrete gaps in today's insight + chat surface: + +1. **Tool drift.** ImageApi exposes 13 tools to the LLM. Some are gated on + `apollo_enabled` / `has_vision`, but several optional ones + (`search_rag`, `get_calendar_events`, `get_location_history`) are + registered unconditionally even when their backing tables are empty. + Descriptions vary in quality and a couple have outright bugs. +2. **Inconsistent / incomplete tool descriptions.** Tools like + `search_messages` describe their selection rules but omit useful + examples; `store_fact` doesn't show the `object_entity_id` vs + `object_value` choice; `get_sms_messages` accepts a `days_radius` + parameter that the backing client silently ignores. The LLM is being + instructed against a slightly wrong reality. +3. **System prompt fights the persona.** Today's generation prompt + prepends the user's `custom_system_prompt` and then immediately asserts + `"You are a personal photo memory assistant..."`. The user message + demands `"a detailed insight with a title and summary"`. Both + contradict whatever voice / shape / POV the persona just established. + On chat continuation the persona is baked into the stored transcript at + generation time and can't be changed without regenerating. + +## Goals + +- Tool catalog is **representative** — every tool registered for a turn is + backed by data the user actually has. +- Tool descriptions are **concise but complete**, with examples for any + tool whose param choice has multiple modes or non-obvious interactions. +- Persona / system prompt is **authoritative** for voice, length, and + shape — both at generation and during chat continuation. +- Per-turn system prompt overrides on chat work without surprising + side-effects on the stored transcript outside `amend` mode. + +## Non-goals + +- Apollo backend / frontend changes. Separate cycle. +- Refactoring the `generate_photo_title` post-hoc title flow. Already + takes `custom_system_prompt`. +- Tool consolidation (e.g. merging `search_messages` + `get_sms_messages`). + Considered and deferred — keeps blast radius small. +- Removing knowledge-memory tools (`recall_*` / `store_*`). Audit + confirmed they have a live read path via `knowledge.rs` HTTP routes. +- Persisting persona changes to the stored transcript outside `amend` + mode. Deliberate — re-opens use the persona currently active in the + client, not a sticky historical setting. + +--- + +## Design + +### A. System prompt — generation + +Today (`insight_generator.rs:3305–3326`): + +``` +[custom_system_prompt if any] + +"You are a personal photo memory assistant helping to reconstruct..." + +{owner_id_note} + +{fewshot_block} + +"IMPORTANT INSTRUCTIONS: +1. You MUST call multiple tools... +2. When calling get_sms_messages and search_rag... +3. Use recall_facts_for_photo... +... +8. You have a hard budget of {max_iterations} iterations..." +``` + +The first concatenation is the bug: `custom` claims one identity, the +next line asserts another. + +**New structure** — two named blocks, in order: + +``` +[Identity / voice / format block] ← persona-controlled (or neutral default) +[Procedural block] ← always identity-free +``` + +**Identity block:** +- When `custom_system_prompt` is supplied: use that string verbatim, no + pre/append. +- When not: a neutral default that doesn't fight a future persona. + Working text: `"You are reconstructing a memory from a photo. Use the + gathered context to write a thoughtful summary; you decide voice, + length, and shape."` + +**Procedural block** — identity-free, always emitted: + +``` +Tool-use guidance: +- You have a budget of {max_iterations} tool-calling iterations. +- Call tools to gather context BEFORE writing your final answer; don't + answer after one or two calls. +- When calling get_sms_messages or search_rag, make at least one call + WITHOUT a contact filter — surrounding events matter even when a + contact is known. +- Use recall_facts_for_photo + recall_entities to load any prior + knowledge about subjects in the photo. +- When you identify people / places / events / things, use store_entity + + store_fact to grow the persistent memory. +- A tool returning no results is informative; continue with the others. + +{owner_id_note if applicable} +{fewshot_block if applicable} +``` + +Differences from today's "IMPORTANT INSTRUCTIONS" block: removed the +"you are a personal photo memory assistant" framing and the explicit +"at least 5 tool calls" floor (replaced with the softer "don't answer +after one or two"). Few-shot stays — it's pattern-of-tool-use, not +identity. + +### B. User message — generation + +Today (line 3357): + +``` +{visual_block}Please analyze this photo and gather any relevant context +from the surrounding weeks. + +Photo file path: {file_path} +Date taken: {date} +{contact_info} +{gps_info} +{tags_info} + +Use the available tools to gather more context about this moment +(messages, calendar events, location history, etc.), then write a +detailed insight with a title and summary. +``` + +Problems: the trailing line bakes in output shape ("title and +summary"), and the title from the resulting response is **discarded +anyway** — `generate_photo_title` (line 3494) regenerates the title +post-hoc from the summary. So the prompt is constraining voice for no +data-model benefit. + +**New payload** — context-only, no output prescription: + +``` +{visual_block}Photo file path: {file_path} +Date taken: {date} +{contact_info} +{gps_info} +{tags_info} + +Gather context with the available tools, then respond. +``` + +The persona owns shape. If a user wants "title-then-paragraph" output, +their persona prompt says so. + +### C. System prompt — chat continuation + +Add `system_prompt: Option` to `ChatTurnRequest` (and to its +HTTP wrapper `ChatTurnHttpRequest`). It carries through both the +non-streaming `chat_turn` and the streaming `chat_turn_stream`. + +**Append mode (default, `amend=false`)** — ephemeral +swap-and-restore, mirroring the existing `annotate_system_with_budget` +pattern: + +1. Load stored transcript. +2. If `system_prompt` is `Some(s)`: + - If first message is a `system` role: stash original content, + replace with `s`. + - Else: prepend a synthetic ephemeral system message with `s` (note + it's synthetic so the restore step pops it rather than rewriting). +3. Run `annotate_system_with_budget` on top (existing per-turn budget + note appends to whatever's there now). +4. Run the agentic loop. +5. **Before persistence**, restore the original system content (or pop + the synthetic one). Run `restore_system_content` for the budget + annotation as today. +6. Save. + +Result: the model sees the override; the stored transcript is +unchanged outside the model's actual reply. + +**Amend mode (`amend=true`)**: + +- If `system_prompt` is supplied: the override stays in place during + the serialization for the new insight row. The new row's + `training_messages` system message is the override. `is_current=false` + flips on prior rows as today. +- If not supplied: behaves as today (stored transcript's system message + carries forward unchanged). + +### D. FileViewer-React — client wiring + +`hooks/useInsightChat.tsx`: +- `SendTurnOptions` gains `systemPromptOverride?: string | null`. +- Inside `sendTurn`, before issuing the streaming POST: + 1. Read the active persona's `systemPrompt` from AsyncStorage + (already loaded for generation flows — reuse the same accessor). + 2. If a one-shot `systemPromptOverride` is set, append as a suffix + (`${persona}\n\n${override}`) so persona voice survives + override + tweaks the turn. + 3. Include the resulting string as `system_prompt` on the request body. +- No history-load change. The history endpoint still returns the stored + transcript. + +`components/InsightChatModal.tsx`: +- Add a small "Style note" composer affordance — a one-shot text input + that, when filled, becomes the `systemPromptOverride` for the next + send. Cleared after send. +- The existing persona chip continues to open `PersonaManagerModal`. + +`hooks/usePersonas.tsx` and the bundled defaults: +- Built-in `assistant` and `journal` prompts get audited and rewritten + to **explicitly state voice / shape / length** — since the framework + no longer guarantees a default shape, the persona must. + +### E. Tool catalog — gating + +Widen `build_tool_definitions` from `(has_vision: bool, apollo_enabled: +bool)` to a single `ToolGateOpts` struct: + +```rust +pub struct ToolGateOpts { + pub has_vision: bool, + pub apollo_enabled: bool, + pub daily_summaries_present: bool, + pub calendar_present: bool, + pub location_history_present: bool, +} +``` + +The chat / generation services compute the three new fields lazily per +turn via `SELECT 1 FROM LIMIT 1` (cheap; cached for the turn's +duration). Lazy because operators import data after launch and we don't +want to require a restart for the LLM to discover its new capabilities. + +Per-tool gating: + +| Tool | Existing gate | New gate | +|---|---|---| +| `describe_photo` | `has_vision` | unchanged | +| `get_personal_place_at` | `apollo_enabled` | unchanged | +| `get_calendar_events` | none | `calendar_present` | +| `get_location_history` | none | `location_history_present` | +| `search_rag` | none | `daily_summaries_present` | + +All other tools always-on. (`get_sms_messages` and `search_messages` +fail informatively if SMS-API is unreachable; not worth a startup probe +since intermittent failures are the same shape.) + +### F. Tool descriptions — convention + +Every description follows: + +1. One sentence: **what** + **when to call**. +2. Param semantics worth knowing (units, ranges, mode behavior, + precedence). +3. **Example invocation** for tools with multiple modes, optional bands, + or non-obvious parameter interactions. +4. Cross-references when relevant: `prefer X when both apply`. + +Banned: all-caps section headers inside descriptions +(`"CONTENT search"`, `"TIME-BASED fetch"`); persona-prescriptive language +(`"you are a..."`); behavioral references to other tools by description +rather than name. + +Tools getting examples: `search_messages`, `search_rag`, `store_fact`, +`get_sms_messages`. Trivial tools (`get_current_datetime`, +`reverse_geocode`, `get_file_tags`) skip the example. + +Sample (`search_messages`): + +> Search SMS/MMS message bodies. Modes: `fts5` (keyword + phrase + prefix +> + AND/OR/NOT + NEAR proximity), `semantic` (embedding similarity, +> requires generated embeddings), `hybrid` (RRF merge, recommended; +> degrades to `fts5` when embeddings absent). Optional `start_ts` / +> `end_ts` (real-UTC unix seconds) and `contact_id` filters. For pure +> date / contact browsing without keywords, prefer `get_sms_messages`. +> +> Examples: +> - `{query: "trader joe's"}` — phrase across all time. +> - `{query: "dinner", contact_id: 42, start_ts: 1700000000, end_ts: 1700604800}` +> — keyword within a contact and a week. +> - `{query: "NEAR(meeting work, 5)"}` — proximity search. + +### G. SMS tool fixes + +#### `get_sms_messages` — honor `days_radius` + +Today: `sms_client::fetch_messages_for_contact(contact, center_ts)` +hardcodes `Duration::days(4)` (lines 31–37). The tool accepts +`days_radius` and silently ignores it. + +**Fix:** widen the signature to +`fetch_messages_for_contact(contact, center_ts, days_radius)`. Tool +plumbs through. Default 4 retained for back-compat. + +#### `search_messages` — add date and contact_id filters + +Today: ImageApi's `search_messages` only forwards `query`, `mode`, +`limit` to SMS-API. + +**Fix:** add `start_ts`, `end_ts`, `contact_id` parameters. +- `contact_id` forwards directly to SMS-API + (`/api/messages/search/?contact_id=`). +- `start_ts` / `end_ts` are not natively accepted by SMS-API's search + endpoint. Apply client-side post-filter on the response (Apollo's + pattern: `chat_tools.py:670–680`). Bump the SMS-API `limit` to a + larger fetch pool when a date filter is supplied so in-window matches + aren't lost to out-of-window FTS rank. + +--- + +## Implementation sequencing + +Each step is independently mergeable. + +### ImageApi PRs + +1. **Split system-prompt assembly + neutralize user message.** Two + named blocks; user message context-only. Default identity string + added. Tests: golden snapshots of the resulting `system_content` + with and without `custom_system_prompt`. +2. **`system_prompt` field on chat request + swap/restore + amend + persistence.** Mirrors `annotate_system_with_budget` pattern. Tests: + round-trip system content unchanged in append mode; persisted in + amend mode. +3. **`fetch_messages_for_contact` honors `days_radius`.** Tool wires + the param through. Tests: window math at the client level. +4. **`ToolGateOpts` + per-tool description rewrites.** Description + text changes are the bulk of the diff but no behavior change beyond + gating. + +### FileViewer-React PR + +5. **Chat hook sends `system_prompt`; modal gets style-note input; + built-in personas updated to specify shape.** The + `useInsightChat.sendTurn` call site picks up the persona and + includes it on every chat turn body. Style-note input is a one-shot + suffix. + +## Testing & verification + +**Automated:** +- Unit (Rust): swap-and-restore round-trip preserves stored transcript. +- Unit (Rust): amend mode persists override into new insight row. +- Unit (Rust): `fetch_messages_for_contact(days_radius=N)` produces a + window of `2N` days centered on `center_ts`. +- Unit (Rust): `build_tool_definitions(opts)` excludes gated tools when + the corresponding flag is false. + +**Manual:** +- Run a chat turn against an existing insight without `system_prompt` → + output unchanged from baseline. +- Same insight, with override → output reflects new voice. +- Re-open chat → original baked persona still authoritative (override + was ephemeral). +- Regenerate an insight with the journal persona → model's voice + matches journal style; no "memory assistant" framing leaks through. +- Toggle data presence (delete a row from `calendar_events`) → tool + drops from the catalog on the next turn. + +## Risks + +- **Default identity wording matters.** A too-neutral default ("Use the + gathered context to write a summary") might produce flatter output + than today's "personal photo memory assistant" framing for users + who never set a persona. Mitigation: tune the default with a small + set of test photos before merging. +- **Persona-suffix style notes can contradict persona voice.** A user + who picks `journal` (first person, warm) and adds the style note + "respond in bullet points" will get a tonal collision. Acceptable — + the user expressed a per-turn intent and we honor it. Document the + composition rule in the persona-manager UI. +- **Lazy data-presence probes add a per-turn `SELECT 1`.** Negligible + on SQLite (sub-millisecond) but adds up across many turns. Cache the + result for the turn's duration; don't re-probe per-tool. + +## Open questions + +None blocking. Items deferred to a possible follow-up cycle: + +- Apollo parity for the same per-turn override pattern (already + present; just needs RN client wiring on the photo path which is + already proxy). +- Tool consolidation (`search_messages` + `get_sms_messages` → + single `search_messages` with optional date filter, Apollo-style). + Considered and deferred — separate spec.