Feature/insight history #104

Merged
cameron merged 6 commits from feature/insight-history into master 2026-06-10 19:01:15 +00:00
Owner
No description provided.
cameron added 6 commits 2026-06-10 19:01:04 +00:00
Expose GET /insights/history?path=... returning every generated version
of a photo's insight (current plus superseded), newest-first, backing the
mobile per-file insight history view.

- New get_insight_history_handler; reuses the existing get_insight_history
  DAO method (removed its dead_code allow).
- impl From<PhotoInsight> for PhotoInsightResponse, collapsing the mapping
  that was duplicated across the single-get and all-insights handlers.
- rate_insight_by_id DAO method + optional insight_id on RateInsightRequest
  so previously generated versions can be approved/rejected (the path-based
  rate only touches the current row).
- DAO tests for history ordering/scoping and id-targeted rating.
- cargo fmt normalized a multi-line assert in insight_chat.rs tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Ollama >=0.8 can stream tool_calls incrementally across NDJSON chunks;
chat_with_tools_stream did `tool_calls = Some(tcs)` per chunk, so only
the last chunk's calls survived assembly and earlier calls were silently
dropped. Append into the accumulator instead.

- ollama: append_streamed_tool_calls helper + tests covering two calls
  arriving in separate chunks and the single-chunk batch case.
- llamacpp: the SSE delta assembly was already correct (per-index
  BTreeMap, same-index argument fragments concatenate, distinct indexes
  accumulate); extracted it into apply_tool_call_deltas /
  finalize_tool_calls and added tests pinning that behavior.
- llm_client: new shared strip_think_blocks (moved from ollama's private
  extract_final_answer, which now delegates) so the tool-calling final
  content paths can reuse it; unit tests for tagged/plain/unclosed/empty
  cases.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Agentic-loop fixes in the generator:

- New recall_facts_for_entity tool (always-on, like recall_entities):
  fetches facts for one entity by id so the model can follow up on
  entities surfaced by recall_entities that aren't photo-linked
  (recall_facts_for_photo only covers linked entities). Mirrors that
  tool's persona scoping (PersonaFilter::Single) and the persona's
  reviewed_only_facts filter exactly, and renders in the same
  "Entity: ... / - predicate object" style. Wired through execute_tool
  and the trajectory summarizer.
- Generation now resolves gates persona-aware:
  current_gate_opts_for_persona(images_inline, Some((user_id,
  persona_id))) instead of the None-defaulting wrapper, so a persona's
  allow_agent_corrections opens propose_correction during generation the
  same way chat turns already did. The now-unused current_gate_opts
  wrapper is removed.
- Strip leaked <think> blocks from the final assistant content before
  parse_title_body / store_insight (raw training transcript keeps them).
- Honest truncation labels: get_sms_messages and get_location_history
  said "Found N ..." while listing only the first K; found_header now
  emits "Found N ... (showing first K):" when truncated, and the
  summarizer still parses the count.
- Clamp days_radius in get_calendar_events and get_location_history to
  1..=30, matching get_sms_messages.
- persona_system_prompt helper (persona store lookup, blank-prompt ->
  None) for server-side persona resolution; callers land in the next
  commit.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
A request carrying persona_id but no system_prompt used to fall back to
the neutral default voice. Both agentic generation
(generate_agentic_insight_handler) and chat bootstrap now resolve the
persona's stored prompt from the persona store, with precedence:
explicit non-blank client system_prompt > persona store lookup >
existing default ("default" persona id behaves the same — used if the
store has a row, neutral default otherwise). Resolution happens at the
handler / bootstrap entry where the DAO is reachable; internals are
unchanged. resolve_bootstrap_system_prompt takes the resolved persona
prompt as a second argument, with precedence tests.

Also in insight_chat:

- Sync chat_turn no longer persists the synthetic "Please write your
  final answer now without calling any more tools." user message pushed
  on iteration exhaustion — extracted both streaming variants'
  synthetic_idx pattern into push/remove_synthetic_final_prompt (the
  remove is a defensive no-op on index drift) and applied it to all
  three loops; round-trip test included.
- Strip leaked <think> blocks from the final content persisted as the
  reply in chat_turn and both streaming AgenticLoopOutcomes (mid-stream
  TextDeltas are untouched; the raw transcript keeps the block).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Keep `cargo clippy --tests` clean alongside the agentic-loop changes:
alias backfill's five-element setup() tuple as SetupFixture
(type_complexity) and build the single-library health map via
std::slice::from_ref instead of cloning (unnecessary clone-to-slice).
No behavior change.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The history-truncation budget assumed an 8192-token context whenever a
chat request omitted num_ctx, while the llama-swap chat slots serve
20k-131k. Replayed transcripts past ~6k tokens were silently gutted
every turn — losing conversation history and destroying llama.cpp
KV-cache prefix reuse (full SWA re-prefill per turn).

Default is now 32768 (real conversations top out around 16k), with
AGENTIC_CHAT_DEFAULT_NUM_CTX to override per deploy, floored at
headroom + 1024. Explicit per-request num_ctx still wins.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
cameron merged commit c78e751743 into master 2026-06-10 19:01:15 +00:00
cameron deleted branch feature/insight-history 2026-06-10 19:01:15 +00:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: Apps/ImageApi#104