docs: chat continuation endpoints + env vars

Document the four new chat endpoints, SSE event shape, backend routing rules, rewind semantics, amend mode, and the AGENTIC_CHAT_MAX_ITERATIONS cap. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 17:32:43 -04:00
parent 079cd4c5b9
commit e51cd564a3
2 changed files with 85 additions and 0 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -169,6 +169,20 @@ POST   /image/tags/batch (bulk tag updates)

 // Memories (week-based grouping)
 GET /memories?path=...&recursive=true
+
+// AI Insights
+POST /insights/generate              (non-agentic single-shot)
+POST /insights/generate/agentic      (tool-calling loop; body: { file_path, backend?, model?, ... })
+GET  /insights?path=...&library=...
+GET  /insights/models                (local Ollama models + capabilities)
+GET  /insights/openrouter/models     (curated OpenRouter allowlist)
+POST /insights/rate                  (thumbs up/down for training data)
+
+// Insight Chat Continuation
+POST /insights/chat                  (single-turn reply, non-streaming)
+POST /insights/chat/stream           (SSE: text / tool_call / tool_result / truncated / done)
+GET  /insights/chat/history?path=... (rendered transcript with tool invocations)
+POST /insights/chat/rewind           (truncate transcript at a rendered index)
 ```

 **Request Types:**
@@ -269,6 +283,9 @@ OPENROUTER_BASE_URL=https://openrouter.ai/api/v1     # Override base URL (option
 OPENROUTER_EMBEDDING_MODEL=openai/text-embedding-3-small  # Optional, embeddings stay local today
 OPENROUTER_HTTP_REFERER=https://your-site.example    # Optional attribution header
 OPENROUTER_APP_TITLE=ImageApi                  # Optional attribution header
+
+# Insight Chat Continuation
+AGENTIC_CHAT_MAX_ITERATIONS=6                  # Cap on tool-calling iterations per chat turn (default 6)
 ```

 **AI Insights Fallback Behavior:**
@@ -297,6 +314,56 @@ This allows runtime verification of model availability before generating insight
 - `GET /insights/openrouter/models` returns `{ models, default_model, configured }`
  for client picker UIs.

+**Insight Chat Continuation:**
+
+After an agentic insight is generated, the full `Vec<ChatMessage>` transcript is
+stored in `photo_insights.training_messages` and can be continued via the
+chat endpoints. The `PhotoInsightResponse.has_training_messages` flag tells
+clients whether chat is available for a given insight.
+
+- `POST /insights/chat` runs one turn of the agentic loop against the replayed
+  history. Body: `{ file_path, library?, user_message, model?, backend?, num_ctx?,
+  temperature?, top_p?, top_k?, min_p?, max_iterations?, amend? }`.
+- `POST /insights/chat/stream` is the SSE variant — same request body, response
+  is `text/event-stream` with events: `iteration_start`, `text` (delta), `tool_call`,
+  `tool_result`, `truncated`, `done`, plus a server-emitted `error_message` on
+  failure. Preferred by the mobile client for live tool-chip updates.
+- `GET /insights/chat/history?path=...&library=...` returns the rendered
+  transcript. Each assistant message carries a `tools: [{name, arguments, result,
+  result_truncated?}]` array with the tool invocations that led up to it. Tool
+  results over 2000 chars are truncated with `result_truncated: true`.
+- `POST /insights/chat/rewind` truncates the transcript at a given rendered
+  index (drops that message + any tool-call scaffolding that preceded it + all
+  later turns). Index 0 is protected. Used for "try again from here" flows.
+
+Backend routing rules (matches agentic-insight generation):
+- Stored `backend` on the insight row is authoritative by default.
+- `request.backend` may override per-turn. `local -> hybrid` is rejected in
+  v1 (would require on-the-fly visual-description rewrite); `hybrid -> local`
+  replays verbatim since the description is already inlined as text.
+- `request.model` overrides the chat model (an Ollama id in local mode, an
+  OpenRouter id in hybrid mode).
+
+Persistence:
+- Append mode (default): re-serialize the full history and `UPDATE` the same
+  row's `training_messages`.
+- Amend mode (`amend: true`): regenerate the title, insert a new insight row
+  via `store_insight` (auto-flips prior rows' `is_current=false`). Response
+  surfaces the new row's id as `amended_insight_id`.
+
+Per-`(library_id, file_path)` async mutex (`AppState.insight_chat.chat_locks`)
+serialises concurrent turns on the same insight so the JSON blob doesn't race.
+
+Context management is a soft bound: if the serialized history exceeds
+`num_ctx - 2048` tokens (cheap 4-byte/token heuristic), the oldest
+assistant-tool_call + tool_result pairs are dropped until under budget. The
+initial user message (with any images) and system prompt are always preserved.
+The `truncated` event / flag is surfaced to the client when a drop occurred.
+
+Configurable env:
+- `AGENTIC_CHAT_MAX_ITERATIONS` — cap on tool-calling iterations per turn
+  (default 6). Per-request `max_iterations` is clamped to this cap.
+
 ## Dependencies of Note

 - **actix-web**: HTTP framework