OpenRouter Support, Insight Chat and User injection #56

Merged
cameron merged 24 commits from 005-llm-client-trait into master 2026-04-26 23:01:35 +00:00

24 Commits

Author SHA1 Message Date
Cameron
21e624da6b fix(video): sentinel for failed HLS encodes to stop retry loop
Previously a corrupt source (e.g. truncated mp4 with no moov atom) would be
re-queued on every directory scan: cleanup_partial_hls wipes the temp
playlist on ffmpeg failure, leaving no .m3u8 to short-circuit the next pass.

Mirrors the thumbnail .unsupported sentinel pattern: on ffmpeg failure,
write <playlist>.m3u8.unsupported, and treat its presence as "done" in both
the ScanDirectoryMessage filter and the QueueVideosMessage check. Delete
the sentinel to force a retry.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 01:06:13 -04:00
Cameron
021d1bffc0 chore: ignore db backups and local .idea config files
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 19:13:28 -04:00
Cameron
fa21b0d73d chore(ai): disable default few-shot insight ids
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 19:12:25 -04:00
Cameron
0e55a6b125 fix(ai): treat rewind at end of history as no-op success
The mobile client's regenerate-after-failure flow sends a discard index
equal to the server's rendered count (its optimistic user bubble for the
failed turn was never persisted). find_raw_cut treated this as out of
range, surfacing as "Chat rewind failed: discard_from_rendered_index out
of range" and blocking the retry.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 19:12:17 -04:00
Cameron
0ebc2e9003 feat(ai): rerank timing + think:false + OpenRouter error detail
- search_rag reranker now logs wall-clock time around the ollama.generate
  call, the candidate count / top-N going in, and the final reordering.
  The "final indices" + swap-count line is info level so it's always
  visible; detailed before/after previews stay at debug for when you want
  to inspect reranker quality.
- New OllamaClient::generate_no_think convenience that sets Ollama's
  top-level think:false on the request, plumbed through try_generate via
  a new internal generate_with_options. Used only by the reranker today;
  avoids the chain-of-thought tax on reasoning models (Qwen3/VL,
  DeepSeek-R1 distills, GPT-OSS) when the task has nothing to reason
  about. Server-side no-op on non-reasoning models.
- OpenRouter chat_with_tools "missing choices[0]" error now includes the
  actual response body — extracts structured {error: {code, message}}
  when OpenRouter surfaces it (common for upstream-provider issues like
  rate limits and content moderation), otherwise falls back to a
  truncated raw-JSON view.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 16:19:45 -04:00
Cameron
e5781325c6 fix(ai): render tool-call arguments as compact JSON in logs
Switch the "Agentic tool call" log from {:?} (Debug) to {} (Display) on
serde_json::Value. Display produces compact JSON — `{"date":"2023-08-15"}`
instead of `Object {"date": String("2023-08-15")}` — which is what the
model actually sent and what a human reading the log wants to see.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 14:25:53 -04:00
Cameron
d43f5fc991 docs: document OLLAMA_REQUEST_TIMEOUT_SECONDS env var
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 13:54:23 -04:00
Cameron
f0ae9f95dc feat(ai): few-shot exemplars + sticky Ollama preference
- Few-shot injection on /insights/generate/agentic: compresses prior
  training_messages into trajectory blocks (tool calls + result summaries)
  and injects into the system prompt. Hardcoded default ids with optional
  request override.
- New fewshot_source_ids column on photo_insights (+ migration) to track
  which exemplars influenced a given row, for downstream training-set
  filtering. Chat amend rows stamp None with a lineage note.
- Ollama client now remembers which server (primary/fallback) most
  recently succeeded and tries it first on the next call, via a shared
  Arc<AtomicBool>. Avoids re-404ing the primary on every agent iteration
  when the chosen model only lives on the fallback.
- Demote noisy logs: daily_summary "Summary match" lines to debug;
  inner chat_with_tools non-2xx body log from error to warn (outer
  layer owns the terminal-error signal).
- Drift-guard tests for summarize_tool_result covering the success /
  empty / error / unknown shape for every tool.
- Tidy: three pre-existing clippy warnings cleaned up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 13:54:06 -04:00
Cameron
29f32b9d22 FFMPEG playlist improvements
Better playlist management, .tmp renaming, HLS playlist parameter and concurrency tweaking.
2026-04-24 10:08:03 -04:00
Cameron
13b9d54861 fix(scan): quiet startup scans & thumbnail RAW/HEIC
Three recurring issues on every full scan:

1. Video playlist scans re-enqueued every file only to reject it as
   AlreadyExists. Pre-filter in ScanDirectoryMessage and QueueVideosMessage
   so we skip videos whose .m3u8 already exists, and demote the leaked
   AlreadyExists log to debug.

2. image crate was built with only jpeg/png features, so webp/tiff/avif
   files logged "format not supported" every scan. Enable those features.

3. RAW (ARW/NEF/CR2/...) and HEIC thumbnails weren't generated, so the
   scan kept retrying them. Try the file's embedded JPEG preview via
   kamadak-exif first (fast, pure-Rust, works on Sony ARW where ffmpeg's
   TIFF decoder fails). Fall back to ffmpeg for HEIC/HEIF and RAWs with
   no preview. Anything still undecodable gets a <thumb>.unsupported
   sentinel so future scans skip it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 20:47:13 -04:00
Cameron
dc2a96162e fix(dates): prefer earliest of fs created/modified as fallback
On copied or restored files (e.g. a backup library), the OS stamps
created at copy time while modified is preserved from the source, so
the earlier of the two is a better proxy for when the content
originated. Adds utils::earliest_fs_time and threads it through the
three spots that fall back to filesystem dates: photos-list sort,
memories grouping, and insight-generation timestamp.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 17:20:12 -04:00
Cameron
d54419e779 style: cargo fmt drift
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 17:19:59 -04:00
Cameron
aa651d1c7b feat(ai): iteration budget in prompt + preserve photo-knowledge links
- Inject the max-iterations budget into the agentic system prompt for
  both insight generation and chat turns. Chat does this per-turn by
  appending a note to the replayed system message and restoring it
  before persistence so the note doesn't accumulate across turns.
- Stop deleting entity_photo_links at the start of agentic insight
  generation. The clear made recall_facts_for_photo always return
  empty, wasting a tool call and discarding knowledge from prior runs.
  Re-linking the same entity is already an INSERT OR IGNORE no-op.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 16:28:48 -04:00
Cameron
6831f50993 feat(ai): USER_NAME env + shared summary prompt + test-bin knobs
Introduces USER_NAME (default "Me") as the single source for the message
sender label and the first-person persona across daily summaries, SMS
context, insight generation, and chat. Eliminates the "Me:" transcript /
"what I did" ambiguity that confused smaller models, and unhardcodes
"Cameron" from prompt text + the knowledge-graph owner entity. Set
USER_NAME=Cameron in .env to preserve the existing owner entity row
(keyed on UNIQUE(name, entity_type)) — otherwise the next run creates
a fresh owner entity and orphans the existing facts/photo-links.

Also:
- search_messages redirect: when the model calls it with date/contact
  but no query, return a hint pointing at get_sms_messages instead of
  a bare missing-parameter error (prevents same-turn retry loops)
- sharpen search_messages vs get_sms_messages tool descriptions so
  content-vs-time-based intent is unambiguous
- extract build_daily_summary_prompt (+ DAILY_SUMMARY_MESSAGE_LIMIT,
  DAILY_SUMMARY_SYSTEM_PROMPT) shared by daily_summary_job and
  test_daily_summary binary — prompt tweaks now land in both
- EMBEDDING_MODEL const; fixes both insert sites that stored
  "mxbai-embed-large:335m" while generate_embeddings actually runs
  "nomic-embed-text:v1.5"
- test_daily_summary: add --num-ctx / --temperature / --top-p /
  --top-k / --min-p flags wired into OllamaClient setters, and print
  the configured knobs at the top of each run
- OllamaClient::generate now logs prompt/gen token counts and tok/s
  via log_chat_metrics (symmetric with chat_with_tools)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 23:39:37 -04:00
Cameron
e4a3536f87 feat(ai): search_messages tool + RAG reranker
Adds a search_messages tool that hits the Django FTS5/semantic/hybrid
endpoint for keyword-quality text search over message bodies, and an
LLM-based reranker inside tool_search_rag (gated by SEARCH_RAG_RERANK,
default on). Reranker pulls ~3x candidates from the vector index, asks
the chat model to rank by relevance, and falls back to vector order on
parse failure.

The reranker shares the active chat turn's OllamaClient so num_ctx and
sampling match — otherwise Ollama unloads/reloads the model on every
rerank call. (Unverified end-to-end; caught by inspection, awaiting
e2e confirmation.)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 10:56:03 -04:00
Cameron
e51cd564a3 docs: chat continuation endpoints + env vars
Document the four new chat endpoints, SSE event shape, backend
routing rules, rewind semantics, amend mode, and the
AGENTIC_CHAT_MAX_ITERATIONS cap.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 17:32:43 -04:00
Cameron
079cd4c5b9 feat(ai): streaming chat endpoint with live tool events
Add LlmClient::chat_with_tools_stream and SSE endpoint
POST /insights/chat/stream that emits text deltas, tool_call /
tool_result pairs, truncated notice, and a terminal done frame as the
agentic loop runs.

- Ollama: parses NDJSON from /api/chat stream, accumulates content
  deltas, emits Done with tool_calls from the final chunk.
- OpenRouter: parses OpenAI-compatible SSE, reassembles tool_call
  argument deltas by index, asks for stream_options.include_usage.
- InsightChatService spawns the loop on a tokio task, feeds events
  through an mpsc channel, persists training_messages at the end.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 16:57:41 -04:00
Cameron
c2bd3c08e1 feat(ai): surface tool invocations in chat history
load_history now groups preceding tool_call + tool_result scaffolding
under each assistant reply as `tools: [{name, arguments, result}]`.
Result bodies over 2000 chars are truncated for payload size with a
`result_truncated` flag; the full value remains in training_messages.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 16:03:53 -04:00
Cameron
65ab10e9a8 feat(ai): chat rewind + ollama metrics logging
Rewind: POST /insights/chat/rewind truncates training_messages at a
given rendered index, dropping the target message plus any preceding
tool-call scaffolding. The initial user prompt is protected.

Metrics: log prompt_eval_count/duration and eval_count/duration from
every Ollama chat response, rendered as tokens + ms + tok/s.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 15:16:32 -04:00
Cameron
0b9528f61e feat(ai): chat continuation for photo insights (server v1)
Adds POST /insights/chat and GET /insights/chat/history. Replays the
stored agentic conversation through the same backend the insight was
generated with (or a per-turn override), runs a short tool-calling
loop, and persists the extended history in append or amend mode.

Backend switching: same-backend or hybrid->local replay verbatim;
local->hybrid is rejected in v1 (would require on-the-fly vision
description rewrite).

Per-(library, file) async mutex serialises concurrent turns. Soft
context budget drops oldest tool_call+result pairs when the
serialized history exceeds num_ctx - 2048 tokens.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 13:00:27 -04:00
Cameron
e2eefbd156 feat(ai): curated OpenRouter model picker for hybrid backend
Add OPENROUTER_ALLOWED_MODELS env var and GET /insights/openrouter/models
endpoint returning the curated list verbatim. Drop the live capability
precheck in hybrid mode — trust the operator's allowlist; bad ids surface
as a chat-call error.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 10:36:19 -04:00
Cameron
3ac0cd62eb feat(ai): hybrid backend mode for agentic insights
Adds a `backend` column to photo_insights (default 'local', migration
2026-04-20-000000) and a corresponding optional `backend` field on the
agentic request. When a request sets backend=hybrid:

- The local Ollama vision model is called once via describe_image to
  produce a text description.
- The description is inlined into the first user message as text —
  no base64 image is ever sent to the chat model.
- The agentic tool-calling loop and title generation route through an
  OpenRouterClient (dispatched via &dyn LlmClient), letting the user
  pick any tool-capable model from OpenRouter per request.
- describe_photo is removed from the offered tools since the description
  is already present.

Embeddings and vision stay on local Ollama regardless of backend.
Hybrid mode requires OPENROUTER_API_KEY; handlers return a clear error
when hybrid is requested without it, and also when the selected
OpenRouter model lacks tool-calling support.

AppState gains an optional openrouter client built from
OPENROUTER_API_KEY / OPENROUTER_BASE_URL / OPENROUTER_DEFAULT_MODEL /
OPENROUTER_EMBEDDING_MODEL / attribution headers. Default model is
anthropic/claude-sonnet-4.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 22:30:40 -04:00
Cameron
e799ba716c feat(ai): add OpenRouterClient implementing LlmClient
OpenAI-compatible client for OpenRouter. Translates canonical wire shapes at
the boundary: tool-call arguments stringify on send / parse on receive
(accepting both string and native-object forms); images rewritten from the
base64 images field into content-parts with image_url entries; role=tool
messages inherit tool_call_id from the preceding assistant's tool calls.

/models parsed into ModelCapabilities via supported_parameters (tool use)
and architecture.input_modalities (vision). 15-minute capabilities cache.
Bearer auth; HTTP-Referer / X-Title attribution headers optional.

Not wired into request routing yet — first consumer arrives with hybrid
backend mode. 11 unit tests cover the translation helpers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 22:18:29 -04:00
Cameron
0073409b3d refactor: introduce LlmClient trait (no-op)
Preparation for a second LLM backend (OpenRouter) and hybrid vision-local /
chat-remote mode. Shared wire types (ChatMessage, Tool, ToolCall, etc.) move
into a new src/ai/llm_client.rs and are re-exported from ollama.rs so
existing imports keep working. OllamaClient now implements LlmClient.

No behavior change; callers still hold the concrete OllamaClient. Caller
migration to Arc<dyn LlmClient> is deferred to the PR that wires hybrid
backend routing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 22:11:05 -04:00