OpenRouter Support, Insight Chat and User injection #56

Merged

cameron merged 24 commits from 005-llm-client-trait into master

2026-04-26 23:01:35 +00:00

Author	SHA1	Message	Date
Cameron	21e624da6b	fix(video): sentinel for failed HLS encodes to stop retry loop Previously a corrupt source (e.g. truncated mp4 with no moov atom) would be re-queued on every directory scan: cleanup_partial_hls wipes the temp playlist on ffmpeg failure, leaving no .m3u8 to short-circuit the next pass. Mirrors the thumbnail .unsupported sentinel pattern: on ffmpeg failure, write <playlist>.m3u8.unsupported, and treat its presence as "done" in both the ScanDirectoryMessage filter and the QueueVideosMessage check. Delete the sentinel to force a retry. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 01:06:13 -04:00
Cameron	021d1bffc0	chore: ignore db backups and local .idea config files Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 19:13:28 -04:00
Cameron	fa21b0d73d	chore(ai): disable default few-shot insight ids Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 19:12:25 -04:00
Cameron	0e55a6b125	fix(ai): treat rewind at end of history as no-op success The mobile client's regenerate-after-failure flow sends a discard index equal to the server's rendered count (its optimistic user bubble for the failed turn was never persisted). find_raw_cut treated this as out of range, surfacing as "Chat rewind failed: discard_from_rendered_index out of range" and blocking the retry. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 19:12:17 -04:00
Cameron	0ebc2e9003	feat(ai): rerank timing + think:false + OpenRouter error detail - search_rag reranker now logs wall-clock time around the ollama.generate call, the candidate count / top-N going in, and the final reordering. The "final indices" + swap-count line is info level so it's always visible; detailed before/after previews stay at debug for when you want to inspect reranker quality. - New OllamaClient::generate_no_think convenience that sets Ollama's top-level think:false on the request, plumbed through try_generate via a new internal generate_with_options. Used only by the reranker today; avoids the chain-of-thought tax on reasoning models (Qwen3/VL, DeepSeek-R1 distills, GPT-OSS) when the task has nothing to reason about. Server-side no-op on non-reasoning models. - OpenRouter chat_with_tools "missing choices[0]" error now includes the actual response body — extracts structured {error: {code, message}} when OpenRouter surfaces it (common for upstream-provider issues like rate limits and content moderation), otherwise falls back to a truncated raw-JSON view. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 16:19:45 -04:00
Cameron	e5781325c6	fix(ai): render tool-call arguments as compact JSON in logs Switch the "Agentic tool call" log from {:?} (Debug) to {} (Display) on serde_json::Value. Display produces compact JSON — `{"date":"2023-08-15"}` instead of `Object {"date": String("2023-08-15")}` — which is what the model actually sent and what a human reading the log wants to see. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 14:25:53 -04:00
Cameron	d43f5fc991	docs: document OLLAMA_REQUEST_TIMEOUT_SECONDS env var Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 13:54:23 -04:00
Cameron	f0ae9f95dc	feat(ai): few-shot exemplars + sticky Ollama preference - Few-shot injection on /insights/generate/agentic: compresses prior training_messages into trajectory blocks (tool calls + result summaries) and injects into the system prompt. Hardcoded default ids with optional request override. - New fewshot_source_ids column on photo_insights (+ migration) to track which exemplars influenced a given row, for downstream training-set filtering. Chat amend rows stamp None with a lineage note. - Ollama client now remembers which server (primary/fallback) most recently succeeded and tries it first on the next call, via a shared Arc<AtomicBool>. Avoids re-404ing the primary on every agent iteration when the chosen model only lives on the fallback. - Demote noisy logs: daily_summary "Summary match" lines to debug; inner chat_with_tools non-2xx body log from error to warn (outer layer owns the terminal-error signal). - Drift-guard tests for summarize_tool_result covering the success / empty / error / unknown shape for every tool. - Tidy: three pre-existing clippy warnings cleaned up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 13:54:06 -04:00
Cameron	29f32b9d22	FFMPEG playlist improvements Better playlist management, .tmp renaming, HLS playlist parameter and concurrency tweaking.	2026-04-24 10:08:03 -04:00
Cameron	13b9d54861	fix(scan): quiet startup scans & thumbnail RAW/HEIC Three recurring issues on every full scan: 1. Video playlist scans re-enqueued every file only to reject it as AlreadyExists. Pre-filter in ScanDirectoryMessage and QueueVideosMessage so we skip videos whose .m3u8 already exists, and demote the leaked AlreadyExists log to debug. 2. image crate was built with only jpeg/png features, so webp/tiff/avif files logged "format not supported" every scan. Enable those features. 3. RAW (ARW/NEF/CR2/...) and HEIC thumbnails weren't generated, so the scan kept retrying them. Try the file's embedded JPEG preview via kamadak-exif first (fast, pure-Rust, works on Sony ARW where ffmpeg's TIFF decoder fails). Fall back to ffmpeg for HEIC/HEIF and RAWs with no preview. Anything still undecodable gets a <thumb>.unsupported sentinel so future scans skip it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 20:47:13 -04:00
Cameron	dc2a96162e	fix(dates): prefer earliest of fs created/modified as fallback On copied or restored files (e.g. a backup library), the OS stamps created at copy time while modified is preserved from the source, so the earlier of the two is a better proxy for when the content originated. Adds utils::earliest_fs_time and threads it through the three spots that fall back to filesystem dates: photos-list sort, memories grouping, and insight-generation timestamp. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 17:20:12 -04:00
Cameron	d54419e779	style: cargo fmt drift Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 17:19:59 -04:00
Cameron	aa651d1c7b	feat(ai): iteration budget in prompt + preserve photo-knowledge links - Inject the max-iterations budget into the agentic system prompt for both insight generation and chat turns. Chat does this per-turn by appending a note to the replayed system message and restoring it before persistence so the note doesn't accumulate across turns. - Stop deleting entity_photo_links at the start of agentic insight generation. The clear made recall_facts_for_photo always return empty, wasting a tool call and discarding knowledge from prior runs. Re-linking the same entity is already an INSERT OR IGNORE no-op. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 16:28:48 -04:00
Cameron	6831f50993	feat(ai): USER_NAME env + shared summary prompt + test-bin knobs Introduces USER_NAME (default "Me") as the single source for the message sender label and the first-person persona across daily summaries, SMS context, insight generation, and chat. Eliminates the "Me:" transcript / "what I did" ambiguity that confused smaller models, and unhardcodes "Cameron" from prompt text + the knowledge-graph owner entity. Set USER_NAME=Cameron in .env to preserve the existing owner entity row (keyed on UNIQUE(name, entity_type)) — otherwise the next run creates a fresh owner entity and orphans the existing facts/photo-links. Also: - search_messages redirect: when the model calls it with date/contact but no query, return a hint pointing at get_sms_messages instead of a bare missing-parameter error (prevents same-turn retry loops) - sharpen search_messages vs get_sms_messages tool descriptions so content-vs-time-based intent is unambiguous - extract build_daily_summary_prompt (+ DAILY_SUMMARY_MESSAGE_LIMIT, DAILY_SUMMARY_SYSTEM_PROMPT) shared by daily_summary_job and test_daily_summary binary — prompt tweaks now land in both - EMBEDDING_MODEL const; fixes both insert sites that stored "mxbai-embed-large:335m" while generate_embeddings actually runs "nomic-embed-text:v1.5" - test_daily_summary: add --num-ctx / --temperature / --top-p / --top-k / --min-p flags wired into OllamaClient setters, and print the configured knobs at the top of each run - OllamaClient::generate now logs prompt/gen token counts and tok/s via log_chat_metrics (symmetric with chat_with_tools) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 23:39:37 -04:00
Cameron	e4a3536f87	feat(ai): search_messages tool + RAG reranker Adds a search_messages tool that hits the Django FTS5/semantic/hybrid endpoint for keyword-quality text search over message bodies, and an LLM-based reranker inside tool_search_rag (gated by SEARCH_RAG_RERANK, default on). Reranker pulls ~3x candidates from the vector index, asks the chat model to rank by relevance, and falls back to vector order on parse failure. The reranker shares the active chat turn's OllamaClient so num_ctx and sampling match — otherwise Ollama unloads/reloads the model on every rerank call. (Unverified end-to-end; caught by inspection, awaiting e2e confirmation.) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 10:56:03 -04:00
Cameron	e51cd564a3	docs: chat continuation endpoints + env vars Document the four new chat endpoints, SSE event shape, backend routing rules, rewind semantics, amend mode, and the AGENTIC_CHAT_MAX_ITERATIONS cap. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 17:32:43 -04:00
Cameron	079cd4c5b9	feat(ai): streaming chat endpoint with live tool events Add LlmClient::chat_with_tools_stream and SSE endpoint POST /insights/chat/stream that emits text deltas, tool_call / tool_result pairs, truncated notice, and a terminal done frame as the agentic loop runs. - Ollama: parses NDJSON from /api/chat stream, accumulates content deltas, emits Done with tool_calls from the final chunk. - OpenRouter: parses OpenAI-compatible SSE, reassembles tool_call argument deltas by index, asks for stream_options.include_usage. - InsightChatService spawns the loop on a tokio task, feeds events through an mpsc channel, persists training_messages at the end. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 16:57:41 -04:00
Cameron	c2bd3c08e1	feat(ai): surface tool invocations in chat history load_history now groups preceding tool_call + tool_result scaffolding under each assistant reply as `tools: [{name, arguments, result}]`. Result bodies over 2000 chars are truncated for payload size with a `result_truncated` flag; the full value remains in training_messages. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 16:03:53 -04:00
Cameron	65ab10e9a8	feat(ai): chat rewind + ollama metrics logging Rewind: POST /insights/chat/rewind truncates training_messages at a given rendered index, dropping the target message plus any preceding tool-call scaffolding. The initial user prompt is protected. Metrics: log prompt_eval_count/duration and eval_count/duration from every Ollama chat response, rendered as tokens + ms + tok/s. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 15:16:32 -04:00
Cameron	0b9528f61e	feat(ai): chat continuation for photo insights (server v1) Adds POST /insights/chat and GET /insights/chat/history. Replays the stored agentic conversation through the same backend the insight was generated with (or a per-turn override), runs a short tool-calling loop, and persists the extended history in append or amend mode. Backend switching: same-backend or hybrid->local replay verbatim; local->hybrid is rejected in v1 (would require on-the-fly vision description rewrite). Per-(library, file) async mutex serialises concurrent turns. Soft context budget drops oldest tool_call+result pairs when the serialized history exceeds num_ctx - 2048 tokens. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 13:00:27 -04:00
Cameron	e2eefbd156	feat(ai): curated OpenRouter model picker for hybrid backend Add OPENROUTER_ALLOWED_MODELS env var and GET /insights/openrouter/models endpoint returning the curated list verbatim. Drop the live capability precheck in hybrid mode — trust the operator's allowlist; bad ids surface as a chat-call error. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 10:36:19 -04:00
Cameron	3ac0cd62eb	feat(ai): hybrid backend mode for agentic insights Adds a `backend` column to photo_insights (default 'local', migration 2026-04-20-000000) and a corresponding optional `backend` field on the agentic request. When a request sets backend=hybrid: - The local Ollama vision model is called once via describe_image to produce a text description. - The description is inlined into the first user message as text — no base64 image is ever sent to the chat model. - The agentic tool-calling loop and title generation route through an OpenRouterClient (dispatched via &dyn LlmClient), letting the user pick any tool-capable model from OpenRouter per request. - describe_photo is removed from the offered tools since the description is already present. Embeddings and vision stay on local Ollama regardless of backend. Hybrid mode requires OPENROUTER_API_KEY; handlers return a clear error when hybrid is requested without it, and also when the selected OpenRouter model lacks tool-calling support. AppState gains an optional openrouter client built from OPENROUTER_API_KEY / OPENROUTER_BASE_URL / OPENROUTER_DEFAULT_MODEL / OPENROUTER_EMBEDDING_MODEL / attribution headers. Default model is anthropic/claude-sonnet-4. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 22:30:40 -04:00
Cameron	e799ba716c	feat(ai): add OpenRouterClient implementing LlmClient OpenAI-compatible client for OpenRouter. Translates canonical wire shapes at the boundary: tool-call arguments stringify on send / parse on receive (accepting both string and native-object forms); images rewritten from the base64 images field into content-parts with image_url entries; role=tool messages inherit tool_call_id from the preceding assistant's tool calls. /models parsed into ModelCapabilities via supported_parameters (tool use) and architecture.input_modalities (vision). 15-minute capabilities cache. Bearer auth; HTTP-Referer / X-Title attribution headers optional. Not wired into request routing yet — first consumer arrives with hybrid backend mode. 11 unit tests cover the translation helpers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 22:18:29 -04:00
Cameron	0073409b3d	refactor: introduce LlmClient trait (no-op) Preparation for a second LLM backend (OpenRouter) and hybrid vision-local / chat-remote mode. Shared wire types (ChatMessage, Tool, ToolCall, etc.) move into a new src/ai/llm_client.rs and are re-exported from ollama.rs so existing imports keep working. OllamaClient now implements LlmClient. No behavior change; callers still hold the concrete OllamaClient. Caller migration to Arc<dyn LlmClient> is deferred to the PR that wires hybrid backend routing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 22:11:05 -04:00

OpenRouter Support, Insight Chat and User injection #56

24 Commits