diff --git a/CLAUDE.md b/CLAUDE.md
index 0849e41..004569e 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -169,6 +169,20 @@ POST   /image/tags/batch (bulk tag updates)
 
 // Memories (week-based grouping)
 GET /memories?path=...&recursive=true
+
+// AI Insights
+POST /insights/generate              (non-agentic single-shot)
+POST /insights/generate/agentic      (tool-calling loop; body: { file_path, backend?, model?, ... })
+GET  /insights?path=...&library=...
+GET  /insights/models                (local Ollama models + capabilities)
+GET  /insights/openrouter/models     (curated OpenRouter allowlist)
+POST /insights/rate                  (thumbs up/down for training data)
+
+// Insight Chat Continuation
+POST /insights/chat                  (single-turn reply, non-streaming)
+POST /insights/chat/stream           (SSE: text / tool_call / tool_result / truncated / done)
+GET  /insights/chat/history?path=... (rendered transcript with tool invocations)
+POST /insights/chat/rewind           (truncate transcript at a rendered index)
 ```
 
 **Request Types:**
@@ -269,6 +283,9 @@ OPENROUTER_BASE_URL=https://openrouter.ai/api/v1     # Override base URL (option
 OPENROUTER_EMBEDDING_MODEL=openai/text-embedding-3-small  # Optional, embeddings stay local today
 OPENROUTER_HTTP_REFERER=https://your-site.example    # Optional attribution header
 OPENROUTER_APP_TITLE=ImageApi                  # Optional attribution header
+
+# Insight Chat Continuation
+AGENTIC_CHAT_MAX_ITERATIONS=6                  # Cap on tool-calling iterations per chat turn (default 6)
 ```
 
 **AI Insights Fallback Behavior:**
@@ -297,6 +314,56 @@ This allows runtime verification of model availability before generating insight
 - `GET /insights/openrouter/models` returns `{ models, default_model, configured }`
   for client picker UIs.
 
+**Insight Chat Continuation:**
+
+After an agentic insight is generated, the full `Vec<ChatMessage>` transcript is
+stored in `photo_insights.training_messages` and can be continued via the
+chat endpoints. The `PhotoInsightResponse.has_training_messages` flag tells
+clients whether chat is available for a given insight.
+
+- `POST /insights/chat` runs one turn of the agentic loop against the replayed
+  history. Body: `{ file_path, library?, user_message, model?, backend?, num_ctx?,
+  temperature?, top_p?, top_k?, min_p?, max_iterations?, amend? }`.
+- `POST /insights/chat/stream` is the SSE variant — same request body, response
+  is `text/event-stream` with events: `iteration_start`, `text` (delta), `tool_call`,
+  `tool_result`, `truncated`, `done`, plus a server-emitted `error_message` on
+  failure. Preferred by the mobile client for live tool-chip updates.
+- `GET /insights/chat/history?path=...&library=...` returns the rendered
+  transcript. Each assistant message carries a `tools: [{name, arguments, result,
+  result_truncated?}]` array with the tool invocations that led up to it. Tool
+  results over 2000 chars are truncated with `result_truncated: true`.
+- `POST /insights/chat/rewind` truncates the transcript at a given rendered
+  index (drops that message + any tool-call scaffolding that preceded it + all
+  later turns). Index 0 is protected. Used for "try again from here" flows.
+
+Backend routing rules (matches agentic-insight generation):
+- Stored `backend` on the insight row is authoritative by default.
+- `request.backend` may override per-turn. `local -> hybrid` is rejected in
+  v1 (would require on-the-fly visual-description rewrite); `hybrid -> local`
+  replays verbatim since the description is already inlined as text.
+- `request.model` overrides the chat model (an Ollama id in local mode, an
+  OpenRouter id in hybrid mode).
+
+Persistence:
+- Append mode (default): re-serialize the full history and `UPDATE` the same
+  row's `training_messages`.
+- Amend mode (`amend: true`): regenerate the title, insert a new insight row
+  via `store_insight` (auto-flips prior rows' `is_current=false`). Response
+  surfaces the new row's id as `amended_insight_id`.
+
+Per-`(library_id, file_path)` async mutex (`AppState.insight_chat.chat_locks`)
+serialises concurrent turns on the same insight so the JSON blob doesn't race.
+
+Context management is a soft bound: if the serialized history exceeds
+`num_ctx - 2048` tokens (cheap 4-byte/token heuristic), the oldest
+assistant-tool_call + tool_result pairs are dropped until under budget. The
+initial user message (with any images) and system prompt are always preserved.
+The `truncated` event / flag is surfaced to the client when a drop occurred.
+
+Configurable env:
+- `AGENTIC_CHAT_MAX_ITERATIONS` — cap on tool-calling iterations per turn
+  (default 6). Per-request `max_iterations` is clamped to this cap.
+
 ## Dependencies of Note
 
 - **actix-web**: HTTP framework
diff --git a/README.md b/README.md
index 1bd3c9a..b625b04 100644
--- a/README.md
+++ b/README.md
@@ -83,6 +83,24 @@ as a 4xx from the chat call. Pick tool-capable models.
   - Controls how many times the model can invoke tools before being forced to produce a final answer
   - Increase for more thorough context gathering; decrease to limit response time
 
+#### Insight Chat Continuation
+After an agentic insight is generated, the conversation can be continued. Endpoints:
+- `POST /insights/chat` — single-turn reply (non-streaming)
+- `POST /insights/chat/stream` — SSE variant with live `text` deltas and
+  `tool_call` / `tool_result` events. Mobile client uses this.
+- `GET /insights/chat/history?path=...&library=...` — rendered transcript;
+  each assistant message carries a `tools: [{name, arguments, result}]` array
+- `POST /insights/chat/rewind` — truncate transcript at a rendered index
+  (drops that message + any preceding tool scaffolding + later turns). Used
+  for "try again from here" flows. The initial user message is protected.
+
+Amend mode (`amend: true` in the chat request body) regenerates the insight's
+title and inserts a new row instead of appending to the existing transcript,
+so you can rewrite the saved summary from within chat.
+
+- `AGENTIC_CHAT_MAX_ITERATIONS` - Cap on tool-calling iterations per chat turn [default: `6`]
+  - Per-request `max_iterations` (when sent by the client) is clamped to this cap
+
 #### Fallback Behavior
 - Primary server is tried first with 5-second connection timeout
 - On failure, automatically falls back to secondary server (if configured)