diff --git a/CLAUDE.md b/CLAUDE.md index 0849e41..004569e 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -169,6 +169,20 @@ POST /image/tags/batch (bulk tag updates) // Memories (week-based grouping) GET /memories?path=...&recursive=true + +// AI Insights +POST /insights/generate (non-agentic single-shot) +POST /insights/generate/agentic (tool-calling loop; body: { file_path, backend?, model?, ... }) +GET /insights?path=...&library=... +GET /insights/models (local Ollama models + capabilities) +GET /insights/openrouter/models (curated OpenRouter allowlist) +POST /insights/rate (thumbs up/down for training data) + +// Insight Chat Continuation +POST /insights/chat (single-turn reply, non-streaming) +POST /insights/chat/stream (SSE: text / tool_call / tool_result / truncated / done) +GET /insights/chat/history?path=... (rendered transcript with tool invocations) +POST /insights/chat/rewind (truncate transcript at a rendered index) ``` **Request Types:** @@ -269,6 +283,9 @@ OPENROUTER_BASE_URL=https://openrouter.ai/api/v1 # Override base URL (option OPENROUTER_EMBEDDING_MODEL=openai/text-embedding-3-small # Optional, embeddings stay local today OPENROUTER_HTTP_REFERER=https://your-site.example # Optional attribution header OPENROUTER_APP_TITLE=ImageApi # Optional attribution header + +# Insight Chat Continuation +AGENTIC_CHAT_MAX_ITERATIONS=6 # Cap on tool-calling iterations per chat turn (default 6) ``` **AI Insights Fallback Behavior:** @@ -297,6 +314,56 @@ This allows runtime verification of model availability before generating insight - `GET /insights/openrouter/models` returns `{ models, default_model, configured }` for client picker UIs. +**Insight Chat Continuation:** + +After an agentic insight is generated, the full `Vec` transcript is +stored in `photo_insights.training_messages` and can be continued via the +chat endpoints. The `PhotoInsightResponse.has_training_messages` flag tells +clients whether chat is available for a given insight. + +- `POST /insights/chat` runs one turn of the agentic loop against the replayed + history. Body: `{ file_path, library?, user_message, model?, backend?, num_ctx?, + temperature?, top_p?, top_k?, min_p?, max_iterations?, amend? }`. +- `POST /insights/chat/stream` is the SSE variant — same request body, response + is `text/event-stream` with events: `iteration_start`, `text` (delta), `tool_call`, + `tool_result`, `truncated`, `done`, plus a server-emitted `error_message` on + failure. Preferred by the mobile client for live tool-chip updates. +- `GET /insights/chat/history?path=...&library=...` returns the rendered + transcript. Each assistant message carries a `tools: [{name, arguments, result, + result_truncated?}]` array with the tool invocations that led up to it. Tool + results over 2000 chars are truncated with `result_truncated: true`. +- `POST /insights/chat/rewind` truncates the transcript at a given rendered + index (drops that message + any tool-call scaffolding that preceded it + all + later turns). Index 0 is protected. Used for "try again from here" flows. + +Backend routing rules (matches agentic-insight generation): +- Stored `backend` on the insight row is authoritative by default. +- `request.backend` may override per-turn. `local -> hybrid` is rejected in + v1 (would require on-the-fly visual-description rewrite); `hybrid -> local` + replays verbatim since the description is already inlined as text. +- `request.model` overrides the chat model (an Ollama id in local mode, an + OpenRouter id in hybrid mode). + +Persistence: +- Append mode (default): re-serialize the full history and `UPDATE` the same + row's `training_messages`. +- Amend mode (`amend: true`): regenerate the title, insert a new insight row + via `store_insight` (auto-flips prior rows' `is_current=false`). Response + surfaces the new row's id as `amended_insight_id`. + +Per-`(library_id, file_path)` async mutex (`AppState.insight_chat.chat_locks`) +serialises concurrent turns on the same insight so the JSON blob doesn't race. + +Context management is a soft bound: if the serialized history exceeds +`num_ctx - 2048` tokens (cheap 4-byte/token heuristic), the oldest +assistant-tool_call + tool_result pairs are dropped until under budget. The +initial user message (with any images) and system prompt are always preserved. +The `truncated` event / flag is surfaced to the client when a drop occurred. + +Configurable env: +- `AGENTIC_CHAT_MAX_ITERATIONS` — cap on tool-calling iterations per turn + (default 6). Per-request `max_iterations` is clamped to this cap. + ## Dependencies of Note - **actix-web**: HTTP framework diff --git a/README.md b/README.md index 1bd3c9a..b625b04 100644 --- a/README.md +++ b/README.md @@ -83,6 +83,24 @@ as a 4xx from the chat call. Pick tool-capable models. - Controls how many times the model can invoke tools before being forced to produce a final answer - Increase for more thorough context gathering; decrease to limit response time +#### Insight Chat Continuation +After an agentic insight is generated, the conversation can be continued. Endpoints: +- `POST /insights/chat` — single-turn reply (non-streaming) +- `POST /insights/chat/stream` — SSE variant with live `text` deltas and + `tool_call` / `tool_result` events. Mobile client uses this. +- `GET /insights/chat/history?path=...&library=...` — rendered transcript; + each assistant message carries a `tools: [{name, arguments, result}]` array +- `POST /insights/chat/rewind` — truncate transcript at a rendered index + (drops that message + any preceding tool scaffolding + later turns). Used + for "try again from here" flows. The initial user message is protected. + +Amend mode (`amend: true` in the chat request body) regenerates the insight's +title and inserts a new row instead of appending to the existing transcript, +so you can rewrite the saved summary from within chat. + +- `AGENTIC_CHAT_MAX_ITERATIONS` - Cap on tool-calling iterations per chat turn [default: `6`] + - Per-request `max_iterations` (when sent by the client) is clamped to this cap + #### Fallback Behavior - Primary server is tried first with 5-second connection timeout - On failure, automatically falls back to secondary server (if configured)