Lays out the cycle: split generation system prompt into identity vs procedural blocks so personas drive voice/shape, add per-turn system_prompt override on chat (ephemeral in append mode, persisted on amend), gate optional tools on data presence, and fix the days_radius bug in get_sms_messages. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
15 KiB
Insight Chat improvements — design
Date: 2026-05-07
Branch: feature/insight-chat-improvements (in both ImageApi/ and FileViewer-React/)
Scope: ImageApi photo-anchored insight + chat surface, plus the
FileViewer-React client. Apollo's free/visit chat is not in this cycle.
Problem
Three concrete gaps in today's insight + chat surface:
- Tool drift. ImageApi exposes 13 tools to the LLM. Some are gated on
apollo_enabled/has_vision, but several optional ones (search_rag,get_calendar_events,get_location_history) are registered unconditionally even when their backing tables are empty. Descriptions vary in quality and a couple have outright bugs. - Inconsistent / incomplete tool descriptions. Tools like
search_messagesdescribe their selection rules but omit useful examples;store_factdoesn't show theobject_entity_idvsobject_valuechoice;get_sms_messagesaccepts adays_radiusparameter that the backing client silently ignores. The LLM is being instructed against a slightly wrong reality. - System prompt fights the persona. Today's generation prompt
prepends the user's
custom_system_promptand then immediately asserts"You are a personal photo memory assistant...". The user message demands"a detailed insight with a title and summary". Both contradict whatever voice / shape / POV the persona just established. On chat continuation the persona is baked into the stored transcript at generation time and can't be changed without regenerating.
Goals
- Tool catalog is representative — every tool registered for a turn is backed by data the user actually has.
- Tool descriptions are concise but complete, with examples for any tool whose param choice has multiple modes or non-obvious interactions.
- Persona / system prompt is authoritative for voice, length, and shape — both at generation and during chat continuation.
- Per-turn system prompt overrides on chat work without surprising
side-effects on the stored transcript outside
amendmode.
Non-goals
- Apollo backend / frontend changes. Separate cycle.
- Refactoring the
generate_photo_titlepost-hoc title flow. Already takescustom_system_prompt. - Tool consolidation (e.g. merging
search_messages+get_sms_messages). Considered and deferred — keeps blast radius small. - Removing knowledge-memory tools (
recall_*/store_*). Audit confirmed they have a live read path viaknowledge.rsHTTP routes. - Persisting persona changes to the stored transcript outside
amendmode. Deliberate — re-opens use the persona currently active in the client, not a sticky historical setting.
Design
A. System prompt — generation
Today (insight_generator.rs:3305–3326):
[custom_system_prompt if any] +
"You are a personal photo memory assistant helping to reconstruct..." +
{owner_id_note} +
{fewshot_block} +
"IMPORTANT INSTRUCTIONS:
1. You MUST call multiple tools...
2. When calling get_sms_messages and search_rag...
3. Use recall_facts_for_photo...
...
8. You have a hard budget of {max_iterations} iterations..."
The first concatenation is the bug: custom claims one identity, the
next line asserts another.
New structure — two named blocks, in order:
[Identity / voice / format block] ← persona-controlled (or neutral default)
[Procedural block] ← always identity-free
Identity block:
- When
custom_system_promptis supplied: use that string verbatim, no pre/append. - When not: a neutral default that doesn't fight a future persona.
Working text:
"You are reconstructing a memory from a photo. Use the gathered context to write a thoughtful summary; you decide voice, length, and shape."
Procedural block — identity-free, always emitted:
Tool-use guidance:
- You have a budget of {max_iterations} tool-calling iterations.
- Call tools to gather context BEFORE writing your final answer; don't
answer after one or two calls.
- When calling get_sms_messages or search_rag, make at least one call
WITHOUT a contact filter — surrounding events matter even when a
contact is known.
- Use recall_facts_for_photo + recall_entities to load any prior
knowledge about subjects in the photo.
- When you identify people / places / events / things, use store_entity
+ store_fact to grow the persistent memory.
- A tool returning no results is informative; continue with the others.
{owner_id_note if applicable}
{fewshot_block if applicable}
Differences from today's "IMPORTANT INSTRUCTIONS" block: removed the "you are a personal photo memory assistant" framing and the explicit "at least 5 tool calls" floor (replaced with the softer "don't answer after one or two"). Few-shot stays — it's pattern-of-tool-use, not identity.
B. User message — generation
Today (line 3357):
{visual_block}Please analyze this photo and gather any relevant context
from the surrounding weeks.
Photo file path: {file_path}
Date taken: {date}
{contact_info}
{gps_info}
{tags_info}
Use the available tools to gather more context about this moment
(messages, calendar events, location history, etc.), then write a
detailed insight with a title and summary.
Problems: the trailing line bakes in output shape ("title and
summary"), and the title from the resulting response is discarded
anyway — generate_photo_title (line 3494) regenerates the title
post-hoc from the summary. So the prompt is constraining voice for no
data-model benefit.
New payload — context-only, no output prescription:
{visual_block}Photo file path: {file_path}
Date taken: {date}
{contact_info}
{gps_info}
{tags_info}
Gather context with the available tools, then respond.
The persona owns shape. If a user wants "title-then-paragraph" output, their persona prompt says so.
C. System prompt — chat continuation
Add system_prompt: Option<String> to ChatTurnRequest (and to its
HTTP wrapper ChatTurnHttpRequest). It carries through both the
non-streaming chat_turn and the streaming chat_turn_stream.
Append mode (default, amend=false) — ephemeral
swap-and-restore, mirroring the existing annotate_system_with_budget
pattern:
- Load stored transcript.
- If
system_promptisSome(s):- If first message is a
systemrole: stash original content, replace withs. - Else: prepend a synthetic ephemeral system message with
s(note it's synthetic so the restore step pops it rather than rewriting).
- If first message is a
- Run
annotate_system_with_budgeton top (existing per-turn budget note appends to whatever's there now). - Run the agentic loop.
- Before persistence, restore the original system content (or pop
the synthetic one). Run
restore_system_contentfor the budget annotation as today. - Save.
Result: the model sees the override; the stored transcript is unchanged outside the model's actual reply.
Amend mode (amend=true):
- If
system_promptis supplied: the override stays in place during the serialization for the new insight row. The new row'straining_messagessystem message is the override.is_current=falseflips on prior rows as today. - If not supplied: behaves as today (stored transcript's system message carries forward unchanged).
D. FileViewer-React — client wiring
hooks/useInsightChat.tsx:
SendTurnOptionsgainssystemPromptOverride?: string | null.- Inside
sendTurn, before issuing the streaming POST:- Read the active persona's
systemPromptfrom AsyncStorage (already loaded for generation flows — reuse the same accessor). - If a one-shot
systemPromptOverrideis set, append as a suffix (${persona}\n\n${override}) so persona voice survives + override tweaks the turn. - Include the resulting string as
system_prompton the request body.
- Read the active persona's
- No history-load change. The history endpoint still returns the stored transcript.
components/InsightChatModal.tsx:
- Add a small "Style note" composer affordance — a one-shot text input
that, when filled, becomes the
systemPromptOverridefor the next send. Cleared after send. - The existing persona chip continues to open
PersonaManagerModal.
hooks/usePersonas.tsx and the bundled defaults:
- Built-in
assistantandjournalprompts get audited and rewritten to explicitly state voice / shape / length — since the framework no longer guarantees a default shape, the persona must.
E. Tool catalog — gating
Widen build_tool_definitions from (has_vision: bool, apollo_enabled: bool) to a single ToolGateOpts struct:
pub struct ToolGateOpts {
pub has_vision: bool,
pub apollo_enabled: bool,
pub daily_summaries_present: bool,
pub calendar_present: bool,
pub location_history_present: bool,
}
The chat / generation services compute the three new fields lazily per
turn via SELECT 1 FROM <table> LIMIT 1 (cheap; cached for the turn's
duration). Lazy because operators import data after launch and we don't
want to require a restart for the LLM to discover its new capabilities.
Per-tool gating:
| Tool | Existing gate | New gate |
|---|---|---|
describe_photo |
has_vision |
unchanged |
get_personal_place_at |
apollo_enabled |
unchanged |
get_calendar_events |
none | calendar_present |
get_location_history |
none | location_history_present |
search_rag |
none | daily_summaries_present |
All other tools always-on. (get_sms_messages and search_messages
fail informatively if SMS-API is unreachable; not worth a startup probe
since intermittent failures are the same shape.)
F. Tool descriptions — convention
Every description follows:
- One sentence: what + when to call.
- Param semantics worth knowing (units, ranges, mode behavior, precedence).
- Example invocation for tools with multiple modes, optional bands, or non-obvious parameter interactions.
- Cross-references when relevant:
prefer X when both apply.
Banned: all-caps section headers inside descriptions
("CONTENT search", "TIME-BASED fetch"); persona-prescriptive language
("you are a..."); behavioral references to other tools by description
rather than name.
Tools getting examples: search_messages, search_rag, store_fact,
get_sms_messages. Trivial tools (get_current_datetime,
reverse_geocode, get_file_tags) skip the example.
Sample (search_messages):
Search SMS/MMS message bodies. Modes:
fts5(keyword + phrase + prefix
- AND/OR/NOT + NEAR proximity),
semantic(embedding similarity, requires generated embeddings),hybrid(RRF merge, recommended; degrades tofts5when embeddings absent). Optionalstart_ts/end_ts(real-UTC unix seconds) andcontact_idfilters. For pure date / contact browsing without keywords, preferget_sms_messages.Examples:
{query: "trader joe's"}— phrase across all time.{query: "dinner", contact_id: 42, start_ts: 1700000000, end_ts: 1700604800}— keyword within a contact and a week.{query: "NEAR(meeting work, 5)"}— proximity search.
G. SMS tool fixes
get_sms_messages — honor days_radius
Today: sms_client::fetch_messages_for_contact(contact, center_ts)
hardcodes Duration::days(4) (lines 31–37). The tool accepts
days_radius and silently ignores it.
Fix: widen the signature to
fetch_messages_for_contact(contact, center_ts, days_radius). Tool
plumbs through. Default 4 retained for back-compat.
search_messages — add date and contact_id filters
Today: ImageApi's search_messages only forwards query, mode,
limit to SMS-API.
Fix: add start_ts, end_ts, contact_id parameters.
contact_idforwards directly to SMS-API (/api/messages/search/?contact_id=).start_ts/end_tsare not natively accepted by SMS-API's search endpoint. Apply client-side post-filter on the response (Apollo's pattern:chat_tools.py:670–680). Bump the SMS-APIlimitto a larger fetch pool when a date filter is supplied so in-window matches aren't lost to out-of-window FTS rank.
Implementation sequencing
Each step is independently mergeable.
ImageApi PRs
- Split system-prompt assembly + neutralize user message. Two
named blocks; user message context-only. Default identity string
added. Tests: golden snapshots of the resulting
system_contentwith and withoutcustom_system_prompt. system_promptfield on chat request + swap/restore + amend persistence. Mirrorsannotate_system_with_budgetpattern. Tests: round-trip system content unchanged in append mode; persisted in amend mode.fetch_messages_for_contacthonorsdays_radius. Tool wires the param through. Tests: window math at the client level.ToolGateOpts+ per-tool description rewrites. Description text changes are the bulk of the diff but no behavior change beyond gating.
FileViewer-React PR
- Chat hook sends
system_prompt; modal gets style-note input; built-in personas updated to specify shape. TheuseInsightChat.sendTurncall site picks up the persona and includes it on every chat turn body. Style-note input is a one-shot suffix.
Testing & verification
Automated:
- Unit (Rust): swap-and-restore round-trip preserves stored transcript.
- Unit (Rust): amend mode persists override into new insight row.
- Unit (Rust):
fetch_messages_for_contact(days_radius=N)produces a window of2Ndays centered oncenter_ts. - Unit (Rust):
build_tool_definitions(opts)excludes gated tools when the corresponding flag is false.
Manual:
- Run a chat turn against an existing insight without
system_prompt→ output unchanged from baseline. - Same insight, with override → output reflects new voice.
- Re-open chat → original baked persona still authoritative (override was ephemeral).
- Regenerate an insight with the journal persona → model's voice matches journal style; no "memory assistant" framing leaks through.
- Toggle data presence (delete a row from
calendar_events) → tool drops from the catalog on the next turn.
Risks
- Default identity wording matters. A too-neutral default ("Use the gathered context to write a summary") might produce flatter output than today's "personal photo memory assistant" framing for users who never set a persona. Mitigation: tune the default with a small set of test photos before merging.
- Persona-suffix style notes can contradict persona voice. A user
who picks
journal(first person, warm) and adds the style note "respond in bullet points" will get a tonal collision. Acceptable — the user expressed a per-turn intent and we honor it. Document the composition rule in the persona-manager UI. - Lazy data-presence probes add a per-turn
SELECT 1. Negligible on SQLite (sub-millisecond) but adds up across many turns. Cache the result for the turn's duration; don't re-probe per-tool.
Open questions
None blocking. Items deferred to a possible follow-up cycle:
- Apollo parity for the same per-turn override pattern (already present; just needs RN client wiring on the photo path which is already proxy).
- Tool consolidation (
search_messages+get_sms_messages→ singlesearch_messageswith optional date filter, Apollo-style). Considered and deferred — separate spec.