Expose GET /insights/history?path=... returning every generated version
of a photo's insight (current plus superseded), newest-first, backing the
mobile per-file insight history view.
- New get_insight_history_handler; reuses the existing get_insight_history
DAO method (removed its dead_code allow).
- impl From<PhotoInsight> for PhotoInsightResponse, collapsing the mapping
that was duplicated across the single-get and all-insights handlers.
- rate_insight_by_id DAO method + optional insight_id on RateInsightRequest
so previously generated versions can be approved/rejected (the path-based
rate only touches the current row).
- DAO tests for history ordering/scoping and id-targeted rating.
- cargo fmt normalized a multi-line assert in insight_chat.rs tests.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
LlamaCppClient gains text_to_speech (OpenAI /audio/speech), list_voices and
create_voice (voice library at the swap-root /upstream/<model>/voices
passthrough), plus a tts_model slot configured via LLAMA_SWAP_TTS_MODEL
(default "chatterbox").
New Claims-gated routes:
- POST /tts/speech -> { audio_base64, format } for data: URI playback
- GET /tts/voices -> voice library passthrough
- POST /tts/voices/upload -> clone a voice from an uploaded clip (multipart)
- POST /tts/voices/from-library -> clone from a library file (ffmpeg-extracts
audio from video; audio forwarded as-is)
Security: voice_name sanitized to [A-Za-z0-9_-] (it becomes an upstream
filename), 25 MB upload cap, library refs restricted to real audio/video,
path confined via is_valid_full_path. Adds is_audio_file + unit tests for the
sanitizer, mime guesser, and swap-root derivation.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replace the one-shot SSE chat stream with an async dispatch + reconnectable
replay flow so the mobile client survives backgrounding, network blips, and
OS-killed sockets without losing an in-flight agentic turn.
- TurnRegistry/TurnEntry: in-memory per-turn event buffer (cap 500, front
eviction) shared by the agentic loop (writer) and SSE replay readers.
ReplayOutcome + replay_from/next_batch distinguish Events/CaughtUp/Gone;
next_batch registers the Notify before reading state (no lost wakeup) and
drains every buffered event before signaling terminal, so the final
Done/Error is never dropped and the stream closes cleanly.
- Endpoints: POST /insights/chat/turn (202 + turn_id), GET
/insights/chat/turn/{id} (SSE replay, ?skip_before= resume, per-event seq,
410 on eviction), DELETE /insights/chat/turn/{id} (real task abort +
cooperative is_running() check at each loop boundary).
- Cancellation actually stops the task (AbortHandle stored on the entry) and
emits a Done{cancelled:true}; callers skip persistence on cancel.
- Background sweeper drops stale turns; interval clamped to <=300s.
- OpenTelemetry spans: ai.chat.turn.execute/replay/cancel.
- Legacy POST /insights/chat/stream path preserved unchanged.
Tests: registry coverage for terminal delivery (race guard), waiting, Gone,
abort, eviction; handler integration tests for 404/410, skip_before, seq
stamping, completed replay, and cancel.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Previously embed_one() silently fell back to Ollama embeddings,
which would load nomic-embed-text into VRAM alongside llama-swap —
wasting memory on an unintended model. Now returns an error with
an actionable message instead.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Fix query param mismatch: rename GenerationStatusQuery.file_path to
path so the client's app-resume buildQuery({ path: ... }) resolves
correctly instead of always getting 400
- Remove dead _lib_id bindings from both generate handlers
- Return 202 Accepted instead of 200 from generate endpoints
- Restore OpenTelemetry span instrumentation on generate handlers
- Remove stale UNIQUE constraint from initial migration (incompatible
with plain-INSERT DAO)
- Add tests for status guard: complete_job/fail_job are no-ops when
job is already cancelled, and cancel_job by id
- Persist generation params (num_ctx, temperature, top_p, top_k, min_p,
system_prompt, persona_id) on the photo_insights table for auditing
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Add insight_generation_jobs table migration and DAO
- Implement job lifecycle: create_or_get_active, complete, fail, cancel
- Refactor POST /insights/generate and /agentic to async spawn with timeout
- Add GET /insights/generation/status endpoint with job_id and file_path lookup
- Use String for enum fields in Diesel models to avoid private Bound type
- Add from_str() helpers on InsightJobStatus and InsightGenerationType
- Fix update_training_messages to return Result<usize, DbError>
- 7/7 DAO unit tests passing
When backend=hybrid with LLM_BACKEND=llamacpp, the user-selected model
(an OpenRouter id like "google/gemini-3-flash-preview") was being applied
to the local LlamaCppClient's primary_model and vision_model. This caused
describe_image to send the OpenRouter model name to llama-swap, which
returned 400 because it has no such slot.
Guard the local-client model override with !is_hybrid so it only applies
in local-only mode (where the user is selecting a different local model).
Bump to v1.2.0.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
llamacpp models now receive images via OpenAI content-parts instead of
the describe-then-inline strategy (hybrid mode unchanged). Fixes
assistant messages with tool_calls emitting content: null instead of ""
to satisfy strict Jinja template role-alternation checks. Adds debug
logging of message role sequences on llamacpp requests.
Introduces BackendKind enum, SamplingOverrides, and ResolvedBackend in
a new backend.rs module. InsightGenerator::resolve_backend centralises
client construction + vision capability detection — next step wires the
existing inline dispatch through it.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reverts the per-request backend="llamacpp" value. Chat/vision/embedding
backend is now a deploy-time decision (LLM_BACKEND=ollama|llamacpp),
applied globally across chat, vision describe, and embeddings — so
embedding vectors stay in one space across the index.
- Per-request backend whitelist back to "local"|"hybrid". A request
arriving with backend="llamacpp" is rejected.
- LLM_BACKEND=llamacpp swaps the entire local stack to llama-swap:
chat hits the chat slot, describe hits the vision slot, embeddings
hit the embed slot. Hybrid mode still routes chat to OpenRouter
but uses LLM_BACKEND for the describe pass.
- Drops env vars HYBRID_VISION_BACKEND, LLAMA_SWAP_VISION_MODELS,
EMBEDDING_BACKEND (the last never shipped). Drops the
LlamaCppClient.vision_models allowlist — capability inference now
reports has_vision only for the configured vision_model slot.
- Drops the /insights/llamacpp/models handler. /insights/models is
the single endpoint; returns Ollama servers under LLM_BACKEND=ollama
and llama-swap slots (from LLAMA_SWAP_ALLOWED_MODELS) under
LLM_BACKEND=llamacpp. Same envelope shape either way.
- New ai::embed_one helper routes embeddings through llama-swap when
LLM_BACKEND=llamacpp (else Ollama). Wires it into the four
insight_generator embedding sites.
- Cross-replay matrix simplifies to pre-llamacpp shape (local↔local,
hybrid↔hybrid, hybrid→local allowed; local→hybrid rejected).
Wires a new LlamaCppClient (OpenAI-compatible /v1 wire format) alongside
OllamaClient and OpenRouterClient. Per-slot routing for chat/vision/embed
via env (LLAMA_SWAP_URL + *_MODEL vars); capability inference uses an
env allowlist since /v1/models doesn't report modality.
InsightGenerator + InsightChatService gain three-way dispatch on
chat_backend = "local" | "hybrid" | "llamacpp". Hybrid and llamacpp
share the describe-then-inline path (text-only chat after a separate
vision describe). HYBRID_VISION_BACKEND=llamacpp lets hybrid route its
describe pass through llama-swap's vision slot while chat still goes
to OpenRouter.
Cross-replay matrix added (validate_cross_replay): local<->llamacpp
and hybrid<->llamacpp allowed; local->hybrid and llamacpp->hybrid
rejected. New /insights/llamacpp/models handler mirrors the OpenRouter
shape.
Probe-phase scaffolding for CLIP semantic search. Adds the column
that will hold per-photo embeddings, the HTTP client to Apollo's
inference service, and a throwaway probe binary so we can eyeball
search-result quality on the live library before building the
persistence layer (backlog drain, /photos/search endpoint, UI).
- migrations/2026-05-14-000000_add_clip_embedding/ — adds
image_exif.clip_embedding (BLOB) and clip_model_version (TEXT),
plus a partial index on (clip_embedding IS NULL AND content_hash
IS NOT NULL) for the future backfill drain.
- src/database/models.rs — extends ImageExif struct to match.
- src/ai/clip_client.rs — encode_image / encode_text / health,
same Permanent/Transient/Disabled taxonomy as face_client.
- src/bin/probe_clip_search.rs — --query <q> --library N --limit M
--top K. Encodes a sample and prints top-K cosine similarities.
No DB writes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Land the persistence model and HTTP surface for local face recognition.
Inference still lives in Apollo (Phase 1); this side adds the data home
plus every endpoint Apollo's UI and FileViewer-React will consume.
Schema (new migration 2026-04-29-000000_add_faces):
- persons: visual identities. Optional entity_id bridges to the
existing knowledge-graph entities table; auto-bridging is left to
the management UI (we don't muddy LLM provenance from face rows).
UNIQUE(name COLLATE NOCASE) so 'alice' / 'Alice' fold to one row.
- face_detections: keyed on content_hash (cross-library dedup), with
status='detected' carrying bbox + 512-d embedding BLOB, and
'no_faces' / 'failed' marker rows that tell Phase 3's file watcher
not to re-scan. Marker invariant enforced via CHECK; partial UNIQUE
on content_hash WHERE status='no_faces' guards against double-marks.
Schema regenerated with `diesel print-schema` against a clean migration
run; joinables added for face_detections → libraries / persons and
persons → entities.
face_client.rs (sibling of apollo_client.rs):
- reqwest multipart, 60 s timeout (CPU inference on a backlog can be
slow; bounded threadpool on Apollo serializes calls anyway).
- FaceDetectError::{Permanent, Transient, Disabled} — Phase 3 keys
its marker-row decision on this. 422 → mark failed, 5xx → defer.
- APOLLO_FACE_API_BASE_URL falls back to APOLLO_API_BASE_URL when
unset; both unset = is_enabled() false, callers no-op.
faces.rs (DAO + handlers):
- SqliteFaceDao implements the full FaceDao trait; person face counts
go through sql_query because diesel's BoxedSelectStatement +
group_by trips trait-resolver recursion.
- merge_persons re-points face rows in a transaction, copies notes
when target's are empty, deletes src.
- manual POST /image/faces resolves content_hash through image_exif,
crops the user-drawn bbox with 10% padding (detector wants context
around ears/jaw), POSTs the crop to face_client.embed for a real
ArcFace vector, then inserts source='manual'.
- Cluster-suggest (Phase 6) gets its data from
GET /faces/embeddings — base64-encoded paged BLOBs so Apollo's
DBSCAN can stream them without ImageApi pre-aggregating.
Endpoints registered alongside add_*_services in main.rs:
GET /faces/stats?library=
GET /faces/embeddings?library=&unassigned=&limit=&offset=
GET /image/faces?path=&library=
POST /image/faces (manual create via embed)
PATCH /image/faces/{id}
DELETE /image/faces/{id}
GET /persons?library=
POST /persons
GET /persons/{id}
PATCH /persons/{id}
DELETE /persons/{id}?cascade=set_null|delete (set_null default)
POST /persons/{id}/merge
GET /persons/{id}/faces?library=
The file-watch hook (Phase 3) and the rerun-on-one-photo handler
(Phase 6) live behind the FaceDao methods marked dead_code today —
they're called only when those phases land. Same shape for the trait
methods that aren't reached by Phase 2 routes.
Tests: 3 DAO unit tests cover person CRUD + case-insensitive uniqueness,
marker-row idempotency (mark_status is a no-op when any row exists),
and merge re-pointing faces.
Cargo.toml: reqwest gains the `multipart` feature.
cargo build / cargo test --lib / cargo fmt / cargo clippy --all-targets
all clean for the new code; the two pre-existing test_path_excluder
failures and the pre-existing sort_by clippy warnings are unrelated and
present on master.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Optional integration with the sibling Apollo project's user-defined
Places (name + lat/lon + radius_m + description + category). When
APOLLO_API_BASE_URL is set, the per-photo location resolver folds the
most-specific containing Place into the LLM prompt's location string —
"Home (My house in Cambridge) — near Cambridge, MA" rather than the
city name alone. Smallest-radius wins; Apollo sorts server-side via
/api/places/contains, so the carousel badge in Apollo and the prompt
string here always agree.
Adds an agentic tool `get_personal_place_at(latitude, longitude)` that
the LLM can call during chat continuation. Tool description tells the
model the call returns the user's free-text notes, not just a name.
Deliberately narrow — no enumerate-all variant, lat/lon required.
Unset APOLLO_API_BASE_URL = legacy Nominatim-only path, tool is not
registered. 5 s timeout; all errors degrade silently.
Tests: 5 unit tests for compose_location_string (Apollo only, Nominatim
only, both, both-with-description, neither).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Introduces USER_NAME (default "Me") as the single source for the message
sender label and the first-person persona across daily summaries, SMS
context, insight generation, and chat. Eliminates the "Me:" transcript /
"what I did" ambiguity that confused smaller models, and unhardcodes
"Cameron" from prompt text + the knowledge-graph owner entity. Set
USER_NAME=Cameron in .env to preserve the existing owner entity row
(keyed on UNIQUE(name, entity_type)) — otherwise the next run creates
a fresh owner entity and orphans the existing facts/photo-links.
Also:
- search_messages redirect: when the model calls it with date/contact
but no query, return a hint pointing at get_sms_messages instead of
a bare missing-parameter error (prevents same-turn retry loops)
- sharpen search_messages vs get_sms_messages tool descriptions so
content-vs-time-based intent is unambiguous
- extract build_daily_summary_prompt (+ DAILY_SUMMARY_MESSAGE_LIMIT,
DAILY_SUMMARY_SYSTEM_PROMPT) shared by daily_summary_job and
test_daily_summary binary — prompt tweaks now land in both
- EMBEDDING_MODEL const; fixes both insert sites that stored
"mxbai-embed-large:335m" while generate_embeddings actually runs
"nomic-embed-text:v1.5"
- test_daily_summary: add --num-ctx / --temperature / --top-p /
--top-k / --min-p flags wired into OllamaClient setters, and print
the configured knobs at the top of each run
- OllamaClient::generate now logs prompt/gen token counts and tok/s
via log_chat_metrics (symmetric with chat_with_tools)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add LlmClient::chat_with_tools_stream and SSE endpoint
POST /insights/chat/stream that emits text deltas, tool_call /
tool_result pairs, truncated notice, and a terminal done frame as the
agentic loop runs.
- Ollama: parses NDJSON from /api/chat stream, accumulates content
deltas, emits Done with tool_calls from the final chunk.
- OpenRouter: parses OpenAI-compatible SSE, reassembles tool_call
argument deltas by index, asks for stream_options.include_usage.
- InsightChatService spawns the loop on a tokio task, feeds events
through an mpsc channel, persists training_messages at the end.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Rewind: POST /insights/chat/rewind truncates training_messages at a
given rendered index, dropping the target message plus any preceding
tool-call scaffolding. The initial user prompt is protected.
Metrics: log prompt_eval_count/duration and eval_count/duration from
every Ollama chat response, rendered as tokens + ms + tok/s.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds POST /insights/chat and GET /insights/chat/history. Replays the
stored agentic conversation through the same backend the insight was
generated with (or a per-turn override), runs a short tool-calling
loop, and persists the extended history in append or amend mode.
Backend switching: same-backend or hybrid->local replay verbatim;
local->hybrid is rejected in v1 (would require on-the-fly vision
description rewrite).
Per-(library, file) async mutex serialises concurrent turns. Soft
context budget drops oldest tool_call+result pairs when the
serialized history exceeds num_ctx - 2048 tokens.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add OPENROUTER_ALLOWED_MODELS env var and GET /insights/openrouter/models
endpoint returning the curated list verbatim. Drop the live capability
precheck in hybrid mode — trust the operator's allowlist; bad ids surface
as a chat-call error.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
OpenAI-compatible client for OpenRouter. Translates canonical wire shapes at
the boundary: tool-call arguments stringify on send / parse on receive
(accepting both string and native-object forms); images rewritten from the
base64 images field into content-parts with image_url entries; role=tool
messages inherit tool_call_id from the preceding assistant's tool calls.
/models parsed into ModelCapabilities via supported_parameters (tool use)
and architecture.input_modalities (vision). 15-minute capabilities cache.
Bearer auth; HTTP-Referer / X-Title attribution headers optional.
Not wired into request routing yet — first consumer arrives with hybrid
backend mode. 11 unit tests cover the translation helpers.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Preparation for a second LLM backend (OpenRouter) and hybrid vision-local /
chat-remote mode. Shared wire types (ChatMessage, Tool, ToolCall, etc.) move
into a new src/ai/llm_client.rs and are re-exported from ollama.rs so
existing imports keep working. OllamaClient now implements LlmClient.
No behavior change; callers still hold the concrete OllamaClient. Caller
migration to Arc<dyn LlmClient> is deferred to the PR that wires hybrid
backend routing.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Register the agentic insight endpoint that validates tool-calling capability,
runs the agentic loop, and returns the stored PhotoInsightResponse. Returns 400
for unsupported models, 500 for other errors. Max iterations configurable via
AGENTIC_MAX_ITERATIONS env var (default 10).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>