Files
ImageApi/specs/002-insight-chat-improvements.md
Cameron Cordes fbece0ba9a insight-chat: design for tool catalog, system prompt, and SMS fixes
Lays out the cycle: split generation system prompt into identity vs
procedural blocks so personas drive voice/shape, add per-turn
system_prompt override on chat (ephemeral in append mode, persisted
on amend), gate optional tools on data presence, and fix the
days_radius bug in get_sms_messages.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 14:04:07 -04:00

15 KiB
Raw Blame History

Insight Chat improvements — design

Date: 2026-05-07 Branch: feature/insight-chat-improvements (in both ImageApi/ and FileViewer-React/) Scope: ImageApi photo-anchored insight + chat surface, plus the FileViewer-React client. Apollo's free/visit chat is not in this cycle.

Problem

Three concrete gaps in today's insight + chat surface:

  1. Tool drift. ImageApi exposes 13 tools to the LLM. Some are gated on apollo_enabled / has_vision, but several optional ones (search_rag, get_calendar_events, get_location_history) are registered unconditionally even when their backing tables are empty. Descriptions vary in quality and a couple have outright bugs.
  2. Inconsistent / incomplete tool descriptions. Tools like search_messages describe their selection rules but omit useful examples; store_fact doesn't show the object_entity_id vs object_value choice; get_sms_messages accepts a days_radius parameter that the backing client silently ignores. The LLM is being instructed against a slightly wrong reality.
  3. System prompt fights the persona. Today's generation prompt prepends the user's custom_system_prompt and then immediately asserts "You are a personal photo memory assistant...". The user message demands "a detailed insight with a title and summary". Both contradict whatever voice / shape / POV the persona just established. On chat continuation the persona is baked into the stored transcript at generation time and can't be changed without regenerating.

Goals

  • Tool catalog is representative — every tool registered for a turn is backed by data the user actually has.
  • Tool descriptions are concise but complete, with examples for any tool whose param choice has multiple modes or non-obvious interactions.
  • Persona / system prompt is authoritative for voice, length, and shape — both at generation and during chat continuation.
  • Per-turn system prompt overrides on chat work without surprising side-effects on the stored transcript outside amend mode.

Non-goals

  • Apollo backend / frontend changes. Separate cycle.
  • Refactoring the generate_photo_title post-hoc title flow. Already takes custom_system_prompt.
  • Tool consolidation (e.g. merging search_messages + get_sms_messages). Considered and deferred — keeps blast radius small.
  • Removing knowledge-memory tools (recall_* / store_*). Audit confirmed they have a live read path via knowledge.rs HTTP routes.
  • Persisting persona changes to the stored transcript outside amend mode. Deliberate — re-opens use the persona currently active in the client, not a sticky historical setting.

Design

A. System prompt — generation

Today (insight_generator.rs:33053326):

[custom_system_prompt if any] +
"You are a personal photo memory assistant helping to reconstruct..." +
{owner_id_note} +
{fewshot_block} +
"IMPORTANT INSTRUCTIONS:
1. You MUST call multiple tools...
2. When calling get_sms_messages and search_rag...
3. Use recall_facts_for_photo...
...
8. You have a hard budget of {max_iterations} iterations..."

The first concatenation is the bug: custom claims one identity, the next line asserts another.

New structure — two named blocks, in order:

[Identity / voice / format block]    ← persona-controlled (or neutral default)
[Procedural block]                   ← always identity-free

Identity block:

  • When custom_system_prompt is supplied: use that string verbatim, no pre/append.
  • When not: a neutral default that doesn't fight a future persona. Working text: "You are reconstructing a memory from a photo. Use the gathered context to write a thoughtful summary; you decide voice, length, and shape."

Procedural block — identity-free, always emitted:

Tool-use guidance:
- You have a budget of {max_iterations} tool-calling iterations.
- Call tools to gather context BEFORE writing your final answer; don't
  answer after one or two calls.
- When calling get_sms_messages or search_rag, make at least one call
  WITHOUT a contact filter — surrounding events matter even when a
  contact is known.
- Use recall_facts_for_photo + recall_entities to load any prior
  knowledge about subjects in the photo.
- When you identify people / places / events / things, use store_entity
  + store_fact to grow the persistent memory.
- A tool returning no results is informative; continue with the others.

{owner_id_note if applicable}
{fewshot_block if applicable}

Differences from today's "IMPORTANT INSTRUCTIONS" block: removed the "you are a personal photo memory assistant" framing and the explicit "at least 5 tool calls" floor (replaced with the softer "don't answer after one or two"). Few-shot stays — it's pattern-of-tool-use, not identity.

B. User message — generation

Today (line 3357):

{visual_block}Please analyze this photo and gather any relevant context
from the surrounding weeks.

Photo file path: {file_path}
Date taken: {date}
{contact_info}
{gps_info}
{tags_info}

Use the available tools to gather more context about this moment
(messages, calendar events, location history, etc.), then write a
detailed insight with a title and summary.

Problems: the trailing line bakes in output shape ("title and summary"), and the title from the resulting response is discarded anywaygenerate_photo_title (line 3494) regenerates the title post-hoc from the summary. So the prompt is constraining voice for no data-model benefit.

New payload — context-only, no output prescription:

{visual_block}Photo file path: {file_path}
Date taken: {date}
{contact_info}
{gps_info}
{tags_info}

Gather context with the available tools, then respond.

The persona owns shape. If a user wants "title-then-paragraph" output, their persona prompt says so.

C. System prompt — chat continuation

Add system_prompt: Option<String> to ChatTurnRequest (and to its HTTP wrapper ChatTurnHttpRequest). It carries through both the non-streaming chat_turn and the streaming chat_turn_stream.

Append mode (default, amend=false) — ephemeral swap-and-restore, mirroring the existing annotate_system_with_budget pattern:

  1. Load stored transcript.
  2. If system_prompt is Some(s):
    • If first message is a system role: stash original content, replace with s.
    • Else: prepend a synthetic ephemeral system message with s (note it's synthetic so the restore step pops it rather than rewriting).
  3. Run annotate_system_with_budget on top (existing per-turn budget note appends to whatever's there now).
  4. Run the agentic loop.
  5. Before persistence, restore the original system content (or pop the synthetic one). Run restore_system_content for the budget annotation as today.
  6. Save.

Result: the model sees the override; the stored transcript is unchanged outside the model's actual reply.

Amend mode (amend=true):

  • If system_prompt is supplied: the override stays in place during the serialization for the new insight row. The new row's training_messages system message is the override. is_current=false flips on prior rows as today.
  • If not supplied: behaves as today (stored transcript's system message carries forward unchanged).

D. FileViewer-React — client wiring

hooks/useInsightChat.tsx:

  • SendTurnOptions gains systemPromptOverride?: string | null.
  • Inside sendTurn, before issuing the streaming POST:
    1. Read the active persona's systemPrompt from AsyncStorage (already loaded for generation flows — reuse the same accessor).
    2. If a one-shot systemPromptOverride is set, append as a suffix (${persona}\n\n${override}) so persona voice survives + override tweaks the turn.
    3. Include the resulting string as system_prompt on the request body.
  • No history-load change. The history endpoint still returns the stored transcript.

components/InsightChatModal.tsx:

  • Add a small "Style note" composer affordance — a one-shot text input that, when filled, becomes the systemPromptOverride for the next send. Cleared after send.
  • The existing persona chip continues to open PersonaManagerModal.

hooks/usePersonas.tsx and the bundled defaults:

  • Built-in assistant and journal prompts get audited and rewritten to explicitly state voice / shape / length — since the framework no longer guarantees a default shape, the persona must.

E. Tool catalog — gating

Widen build_tool_definitions from (has_vision: bool, apollo_enabled: bool) to a single ToolGateOpts struct:

pub struct ToolGateOpts {
    pub has_vision: bool,
    pub apollo_enabled: bool,
    pub daily_summaries_present: bool,
    pub calendar_present: bool,
    pub location_history_present: bool,
}

The chat / generation services compute the three new fields lazily per turn via SELECT 1 FROM <table> LIMIT 1 (cheap; cached for the turn's duration). Lazy because operators import data after launch and we don't want to require a restart for the LLM to discover its new capabilities.

Per-tool gating:

Tool Existing gate New gate
describe_photo has_vision unchanged
get_personal_place_at apollo_enabled unchanged
get_calendar_events none calendar_present
get_location_history none location_history_present
search_rag none daily_summaries_present

All other tools always-on. (get_sms_messages and search_messages fail informatively if SMS-API is unreachable; not worth a startup probe since intermittent failures are the same shape.)

F. Tool descriptions — convention

Every description follows:

  1. One sentence: what + when to call.
  2. Param semantics worth knowing (units, ranges, mode behavior, precedence).
  3. Example invocation for tools with multiple modes, optional bands, or non-obvious parameter interactions.
  4. Cross-references when relevant: prefer X when both apply.

Banned: all-caps section headers inside descriptions ("CONTENT search", "TIME-BASED fetch"); persona-prescriptive language ("you are a..."); behavioral references to other tools by description rather than name.

Tools getting examples: search_messages, search_rag, store_fact, get_sms_messages. Trivial tools (get_current_datetime, reverse_geocode, get_file_tags) skip the example.

Sample (search_messages):

Search SMS/MMS message bodies. Modes: fts5 (keyword + phrase + prefix

  • AND/OR/NOT + NEAR proximity), semantic (embedding similarity, requires generated embeddings), hybrid (RRF merge, recommended; degrades to fts5 when embeddings absent). Optional start_ts / end_ts (real-UTC unix seconds) and contact_id filters. For pure date / contact browsing without keywords, prefer get_sms_messages.

Examples:

  • {query: "trader joe's"} — phrase across all time.
  • {query: "dinner", contact_id: 42, start_ts: 1700000000, end_ts: 1700604800} — keyword within a contact and a week.
  • {query: "NEAR(meeting work, 5)"} — proximity search.

G. SMS tool fixes

get_sms_messages — honor days_radius

Today: sms_client::fetch_messages_for_contact(contact, center_ts) hardcodes Duration::days(4) (lines 3137). The tool accepts days_radius and silently ignores it.

Fix: widen the signature to fetch_messages_for_contact(contact, center_ts, days_radius). Tool plumbs through. Default 4 retained for back-compat.

search_messages — add date and contact_id filters

Today: ImageApi's search_messages only forwards query, mode, limit to SMS-API.

Fix: add start_ts, end_ts, contact_id parameters.

  • contact_id forwards directly to SMS-API (/api/messages/search/?contact_id=).
  • start_ts / end_ts are not natively accepted by SMS-API's search endpoint. Apply client-side post-filter on the response (Apollo's pattern: chat_tools.py:670680). Bump the SMS-API limit to a larger fetch pool when a date filter is supplied so in-window matches aren't lost to out-of-window FTS rank.

Implementation sequencing

Each step is independently mergeable.

ImageApi PRs

  1. Split system-prompt assembly + neutralize user message. Two named blocks; user message context-only. Default identity string added. Tests: golden snapshots of the resulting system_content with and without custom_system_prompt.
  2. system_prompt field on chat request + swap/restore + amend persistence. Mirrors annotate_system_with_budget pattern. Tests: round-trip system content unchanged in append mode; persisted in amend mode.
  3. fetch_messages_for_contact honors days_radius. Tool wires the param through. Tests: window math at the client level.
  4. ToolGateOpts + per-tool description rewrites. Description text changes are the bulk of the diff but no behavior change beyond gating.

FileViewer-React PR

  1. Chat hook sends system_prompt; modal gets style-note input; built-in personas updated to specify shape. The useInsightChat.sendTurn call site picks up the persona and includes it on every chat turn body. Style-note input is a one-shot suffix.

Testing & verification

Automated:

  • Unit (Rust): swap-and-restore round-trip preserves stored transcript.
  • Unit (Rust): amend mode persists override into new insight row.
  • Unit (Rust): fetch_messages_for_contact(days_radius=N) produces a window of 2N days centered on center_ts.
  • Unit (Rust): build_tool_definitions(opts) excludes gated tools when the corresponding flag is false.

Manual:

  • Run a chat turn against an existing insight without system_prompt → output unchanged from baseline.
  • Same insight, with override → output reflects new voice.
  • Re-open chat → original baked persona still authoritative (override was ephemeral).
  • Regenerate an insight with the journal persona → model's voice matches journal style; no "memory assistant" framing leaks through.
  • Toggle data presence (delete a row from calendar_events) → tool drops from the catalog on the next turn.

Risks

  • Default identity wording matters. A too-neutral default ("Use the gathered context to write a summary") might produce flatter output than today's "personal photo memory assistant" framing for users who never set a persona. Mitigation: tune the default with a small set of test photos before merging.
  • Persona-suffix style notes can contradict persona voice. A user who picks journal (first person, warm) and adds the style note "respond in bullet points" will get a tonal collision. Acceptable — the user expressed a per-turn intent and we honor it. Document the composition rule in the persona-manager UI.
  • Lazy data-presence probes add a per-turn SELECT 1. Negligible on SQLite (sub-millisecond) but adds up across many turns. Cache the result for the turn's duration; don't re-probe per-tool.

Open questions

None blocking. Items deferred to a possible follow-up cycle:

  • Apollo parity for the same per-turn override pattern (already present; just needs RN client wiring on the photo path which is already proxy).
  • Tool consolidation (search_messages + get_sms_messages → single search_messages with optional date filter, Apollo-style). Considered and deferred — separate spec.