docs: chat continuation endpoints + env vars
Document the four new chat endpoints, SSE event shape, backend routing rules, rewind semantics, amend mode, and the AGENTIC_CHAT_MAX_ITERATIONS cap. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
67
CLAUDE.md
67
CLAUDE.md
@@ -169,6 +169,20 @@ POST /image/tags/batch (bulk tag updates)
|
||||
|
||||
// Memories (week-based grouping)
|
||||
GET /memories?path=...&recursive=true
|
||||
|
||||
// AI Insights
|
||||
POST /insights/generate (non-agentic single-shot)
|
||||
POST /insights/generate/agentic (tool-calling loop; body: { file_path, backend?, model?, ... })
|
||||
GET /insights?path=...&library=...
|
||||
GET /insights/models (local Ollama models + capabilities)
|
||||
GET /insights/openrouter/models (curated OpenRouter allowlist)
|
||||
POST /insights/rate (thumbs up/down for training data)
|
||||
|
||||
// Insight Chat Continuation
|
||||
POST /insights/chat (single-turn reply, non-streaming)
|
||||
POST /insights/chat/stream (SSE: text / tool_call / tool_result / truncated / done)
|
||||
GET /insights/chat/history?path=... (rendered transcript with tool invocations)
|
||||
POST /insights/chat/rewind (truncate transcript at a rendered index)
|
||||
```
|
||||
|
||||
**Request Types:**
|
||||
@@ -269,6 +283,9 @@ OPENROUTER_BASE_URL=https://openrouter.ai/api/v1 # Override base URL (option
|
||||
OPENROUTER_EMBEDDING_MODEL=openai/text-embedding-3-small # Optional, embeddings stay local today
|
||||
OPENROUTER_HTTP_REFERER=https://your-site.example # Optional attribution header
|
||||
OPENROUTER_APP_TITLE=ImageApi # Optional attribution header
|
||||
|
||||
# Insight Chat Continuation
|
||||
AGENTIC_CHAT_MAX_ITERATIONS=6 # Cap on tool-calling iterations per chat turn (default 6)
|
||||
```
|
||||
|
||||
**AI Insights Fallback Behavior:**
|
||||
@@ -297,6 +314,56 @@ This allows runtime verification of model availability before generating insight
|
||||
- `GET /insights/openrouter/models` returns `{ models, default_model, configured }`
|
||||
for client picker UIs.
|
||||
|
||||
**Insight Chat Continuation:**
|
||||
|
||||
After an agentic insight is generated, the full `Vec<ChatMessage>` transcript is
|
||||
stored in `photo_insights.training_messages` and can be continued via the
|
||||
chat endpoints. The `PhotoInsightResponse.has_training_messages` flag tells
|
||||
clients whether chat is available for a given insight.
|
||||
|
||||
- `POST /insights/chat` runs one turn of the agentic loop against the replayed
|
||||
history. Body: `{ file_path, library?, user_message, model?, backend?, num_ctx?,
|
||||
temperature?, top_p?, top_k?, min_p?, max_iterations?, amend? }`.
|
||||
- `POST /insights/chat/stream` is the SSE variant — same request body, response
|
||||
is `text/event-stream` with events: `iteration_start`, `text` (delta), `tool_call`,
|
||||
`tool_result`, `truncated`, `done`, plus a server-emitted `error_message` on
|
||||
failure. Preferred by the mobile client for live tool-chip updates.
|
||||
- `GET /insights/chat/history?path=...&library=...` returns the rendered
|
||||
transcript. Each assistant message carries a `tools: [{name, arguments, result,
|
||||
result_truncated?}]` array with the tool invocations that led up to it. Tool
|
||||
results over 2000 chars are truncated with `result_truncated: true`.
|
||||
- `POST /insights/chat/rewind` truncates the transcript at a given rendered
|
||||
index (drops that message + any tool-call scaffolding that preceded it + all
|
||||
later turns). Index 0 is protected. Used for "try again from here" flows.
|
||||
|
||||
Backend routing rules (matches agentic-insight generation):
|
||||
- Stored `backend` on the insight row is authoritative by default.
|
||||
- `request.backend` may override per-turn. `local -> hybrid` is rejected in
|
||||
v1 (would require on-the-fly visual-description rewrite); `hybrid -> local`
|
||||
replays verbatim since the description is already inlined as text.
|
||||
- `request.model` overrides the chat model (an Ollama id in local mode, an
|
||||
OpenRouter id in hybrid mode).
|
||||
|
||||
Persistence:
|
||||
- Append mode (default): re-serialize the full history and `UPDATE` the same
|
||||
row's `training_messages`.
|
||||
- Amend mode (`amend: true`): regenerate the title, insert a new insight row
|
||||
via `store_insight` (auto-flips prior rows' `is_current=false`). Response
|
||||
surfaces the new row's id as `amended_insight_id`.
|
||||
|
||||
Per-`(library_id, file_path)` async mutex (`AppState.insight_chat.chat_locks`)
|
||||
serialises concurrent turns on the same insight so the JSON blob doesn't race.
|
||||
|
||||
Context management is a soft bound: if the serialized history exceeds
|
||||
`num_ctx - 2048` tokens (cheap 4-byte/token heuristic), the oldest
|
||||
assistant-tool_call + tool_result pairs are dropped until under budget. The
|
||||
initial user message (with any images) and system prompt are always preserved.
|
||||
The `truncated` event / flag is surfaced to the client when a drop occurred.
|
||||
|
||||
Configurable env:
|
||||
- `AGENTIC_CHAT_MAX_ITERATIONS` — cap on tool-calling iterations per turn
|
||||
(default 6). Per-request `max_iterations` is clamped to this cap.
|
||||
|
||||
## Dependencies of Note
|
||||
|
||||
- **actix-web**: HTTP framework
|
||||
|
||||
Reference in New Issue
Block a user