docs: chat continuation endpoints + env vars

Document the four new chat endpoints, SSE event shape, backend routing rules, rewind semantics, amend mode, and the AGENTIC_CHAT_MAX_ITERATIONS cap. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 17:32:43 -04:00
parent 079cd4c5b9
commit e51cd564a3
2 changed files with 85 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -83,6 +83,24 @@ as a 4xx from the chat call. Pick tool-capable models.
  - Controls how many times the model can invoke tools before being forced to produce a final answer
  - Increase for more thorough context gathering; decrease to limit response time

+#### Insight Chat Continuation
+After an agentic insight is generated, the conversation can be continued. Endpoints:
+- `POST /insights/chat` — single-turn reply (non-streaming)
+- `POST /insights/chat/stream` — SSE variant with live `text` deltas and
+  `tool_call` / `tool_result` events. Mobile client uses this.
+- `GET /insights/chat/history?path=...&library=...` — rendered transcript;
+  each assistant message carries a `tools: [{name, arguments, result}]` array
+- `POST /insights/chat/rewind` — truncate transcript at a rendered index
+  (drops that message + any preceding tool scaffolding + later turns). Used
+  for "try again from here" flows. The initial user message is protected.
+
+Amend mode (`amend: true` in the chat request body) regenerates the insight's
+title and inserts a new row instead of appending to the existing transcript,
+so you can rewrite the saved summary from within chat.
+
+- `AGENTIC_CHAT_MAX_ITERATIONS` - Cap on tool-calling iterations per chat turn [default: `6`]
+  - Per-request `max_iterations` (when sent by the client) is clamped to this cap
+
 #### Fallback Behavior
 - Primary server is tried first with 5-second connection timeout
 - On failure, automatically falls back to secondary server (if configured)