- search_rag reranker now logs wall-clock time around the ollama.generate
call, the candidate count / top-N going in, and the final reordering.
The "final indices" + swap-count line is info level so it's always
visible; detailed before/after previews stay at debug for when you want
to inspect reranker quality.
- New OllamaClient::generate_no_think convenience that sets Ollama's
top-level think:false on the request, plumbed through try_generate via
a new internal generate_with_options. Used only by the reranker today;
avoids the chain-of-thought tax on reasoning models (Qwen3/VL,
DeepSeek-R1 distills, GPT-OSS) when the task has nothing to reason
about. Server-side no-op on non-reasoning models.
- OpenRouter chat_with_tools "missing choices[0]" error now includes the
actual response body — extracts structured {error: {code, message}}
when OpenRouter surfaces it (common for upstream-provider issues like
rate limits and content moderation), otherwise falls back to a
truncated raw-JSON view.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Few-shot injection on /insights/generate/agentic: compresses prior
training_messages into trajectory blocks (tool calls + result summaries)
and injects into the system prompt. Hardcoded default ids with optional
request override.
- New fewshot_source_ids column on photo_insights (+ migration) to track
which exemplars influenced a given row, for downstream training-set
filtering. Chat amend rows stamp None with a lineage note.
- Ollama client now remembers which server (primary/fallback) most
recently succeeded and tries it first on the next call, via a shared
Arc<AtomicBool>. Avoids re-404ing the primary on every agent iteration
when the chosen model only lives on the fallback.
- Demote noisy logs: daily_summary "Summary match" lines to debug;
inner chat_with_tools non-2xx body log from error to warn (outer
layer owns the terminal-error signal).
- Drift-guard tests for summarize_tool_result covering the success /
empty / error / unknown shape for every tool.
- Tidy: three pre-existing clippy warnings cleaned up.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Introduces USER_NAME (default "Me") as the single source for the message
sender label and the first-person persona across daily summaries, SMS
context, insight generation, and chat. Eliminates the "Me:" transcript /
"what I did" ambiguity that confused smaller models, and unhardcodes
"Cameron" from prompt text + the knowledge-graph owner entity. Set
USER_NAME=Cameron in .env to preserve the existing owner entity row
(keyed on UNIQUE(name, entity_type)) — otherwise the next run creates
a fresh owner entity and orphans the existing facts/photo-links.
Also:
- search_messages redirect: when the model calls it with date/contact
but no query, return a hint pointing at get_sms_messages instead of
a bare missing-parameter error (prevents same-turn retry loops)
- sharpen search_messages vs get_sms_messages tool descriptions so
content-vs-time-based intent is unambiguous
- extract build_daily_summary_prompt (+ DAILY_SUMMARY_MESSAGE_LIMIT,
DAILY_SUMMARY_SYSTEM_PROMPT) shared by daily_summary_job and
test_daily_summary binary — prompt tweaks now land in both
- EMBEDDING_MODEL const; fixes both insert sites that stored
"mxbai-embed-large:335m" while generate_embeddings actually runs
"nomic-embed-text:v1.5"
- test_daily_summary: add --num-ctx / --temperature / --top-p /
--top-k / --min-p flags wired into OllamaClient setters, and print
the configured knobs at the top of each run
- OllamaClient::generate now logs prompt/gen token counts and tok/s
via log_chat_metrics (symmetric with chat_with_tools)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add LlmClient::chat_with_tools_stream and SSE endpoint
POST /insights/chat/stream that emits text deltas, tool_call /
tool_result pairs, truncated notice, and a terminal done frame as the
agentic loop runs.
- Ollama: parses NDJSON from /api/chat stream, accumulates content
deltas, emits Done with tool_calls from the final chunk.
- OpenRouter: parses OpenAI-compatible SSE, reassembles tool_call
argument deltas by index, asks for stream_options.include_usage.
- InsightChatService spawns the loop on a tokio task, feeds events
through an mpsc channel, persists training_messages at the end.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Rewind: POST /insights/chat/rewind truncates training_messages at a
given rendered index, dropping the target message plus any preceding
tool-call scaffolding. The initial user prompt is protected.
Metrics: log prompt_eval_count/duration and eval_count/duration from
every Ollama chat response, rendered as tokens + ms + tok/s.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Preparation for a second LLM backend (OpenRouter) and hybrid vision-local /
chat-remote mode. Shared wire types (ChatMessage, Tool, ToolCall, etc.) move
into a new src/ai/llm_client.rs and are re-exported from ollama.rs so
existing imports keep working. OllamaClient now implements LlmClient.
No behavior change; callers still hold the concrete OllamaClient. Caller
migration to Arc<dyn LlmClient> is deferred to the PR that wires hybrid
backend routing.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Silence forward-looking dead_code on unused DAO modules, annotate
individual placeholder items, rewrite tautological assert!(true/false)
in token tests as panic! arms, and pick up fmt drift.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Expose Ollama sampling params through the insight generation endpoints
so users can tune creativity/determinism per request. All four are
optional — omitted values fall through to the model's server-side
defaults.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds a standalone binary that walks a directory and runs the agentic
insight loop over every image/video, skipping files already processed.
Supports --path, --model, --max-iterations, --timeout-secs, --num-ctx,
and --reprocess flags for flexible overnight/VPS batch runs.
Also adds OllamaClient::with_request_timeout() builder method so slow
large models are not cut off by the default 120s limit.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Captures prompt_eval_count and eval_count from Ollama /api/chat responses
during the agentic loop and returns them in POST /insights/generate/agentic
so the frontend can display context window usage to the user.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Sanitise tool call arguments before re-sending in conversation history: non-object values (bool, string, null) that some models produce are normalised to {} to prevent Ollama 500s
- Map 'error parsing tool call' Ollama 500 to HTTP 400 with a descriptive message listing compatible models (llama3.1, llama3.2, qwen2.5, mistral-nemo)
- Add reverse_geocode tool backed by existing Nominatim helper; description hints model can chain it after get_location_history results
- Make get_sms_messages contact parameter optional (was required, forcing the model to guess); executor now passes None to fall back to all-contacts search
- Log tool result outcomes at warn level for errors/empty results, info for successes; log SMS API errors with full detail; log full request body on Ollama 500
- Strengthen system prompt to require 3-4 tool calls before final answer
- Try fallback server when checking model capabilities (primary-only check caused 500 for models only on fallback)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- cargo fmt applied across all modified source files
- Collapse nested if let Some / if !is_empty into a single let-chain (clippy::collapsible_match)
- All other warnings are pre-existing dead-code lint on unused trait methods
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>