Compare commits

300 Commits

Author SHA1 Message Date
cameron 98274c3301 Merge pull request 'Feature/tts voice management' (#105) from feature/tts-voice-management into master
Reviewed-on: #105
2026-06-13 02:01:37 +00:00
Cameron Cordes 1017fe73af Include start offset in voice-name window tag
Clones that don't start at 0:00 are tagged with where the reference
window begins (grandma-at1m32s-30s), so voices cloned from different
sections of the same source are distinguishable in the voice list.
Zero-start names keep the existing -30s form.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-12 16:21:41 -04:00
Cameron Cordes 1dec34540d Add start/duration window selection for voice-clone reference clips
Both voice creation endpoints (upload + from-library) now accept optional
start_seconds/duration_seconds, threaded to ffmpeg as -ss/-t, so the
reference window can target clean speech anywhere in a long recording
instead of always the first N seconds. Duration is clamped to the
LLAMA_SWAP_TTS_REF_SECONDS cap and the voice-name tag reflects the
actual window length.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-12 16:09:03 -04:00
Cameron Cordes 2e0f78aa1b Add user-configurable TTS pronunciation overrides
A JSON map (TTS_PRONUNCIATIONS_PATH, default tts_pronunciations.json)
rewrites mispronounced words — place names, initialisms, dotted
abbreviations — to phonetic spellings before synthesis, applied after
markdown cleanup in both /tts/speech paths. Whole-word smartcase
matching (lowercase keys match any casing, uppercase keys exact),
longest key wins, hot-reloaded on mtime change with last-good fallback
on parse errors. See tts_pronunciations.example.json.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 23:06:18 -04:00
Cameron Cordes 3fa4fa8501 Strip markdown decoration from model-emitted insight titles
Models wrap the title line despite the prompt — "**Title: A Day in the
Woods**", "## Title: ...", bold around just the label — which made
parse_title_body's bare "Title:" prefix match fall through to the
fallbacks and leak asterisks into the stored title.

strip_title_markdown trims bold/italic markers, heading hashes,
backticks, and quotes from both ends; applied to the label line, the
extracted title, both fallback paths, and generate_photo_title (which
previously stripped only quotes).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 22:18:43 -04:00
Cameron Cordes efd05db523 Make the embedding model swappable via env for A/B testing
Trialing Qwen3-Embedding-0.6B (1024-dim, instruct-prefixed queries)
against nomic required code changes at every hardcoded seam; now it's a
config flip plus a reembed_embeddings run.

- EMBEDDING_DIM env (default 768) replaces every hardcoded dim check:
  daily summary / calendar / search / location DAOs, Ollama batch
  validation, reembed_embeddings
- entities gains the dim guard it never had — a wrong-dim vector
  silently kills dedup/recall (cosine over mismatched lengths is 0),
  so store None and warn instead
- embed_query / embed_document split with EMBED_QUERY_PREFIX /
  EMBED_DOCUMENT_PREFIX (literal \n expanded): retrieval models treat
  the two sides differently — nomic wants search_query:/search_document:,
  Qwen3 wants Instruct:...\nQuery: on queries only. All query-side
  call sites and all corpus writers now declare their side.
- document the contract in CLAUDE.md: change the model or any of these
  vars → re-run reembed_embeddings or search is garbage

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 21:40:40 -04:00
Cameron Cordes b1493f5aca Wait out TTS GPU hold before the insight job timeout starts
The GPU lease keeps per-request reqwest budgets from burning behind a
cross-model swap, but the job-level INSIGHT_GENERATION_TIMEOUT_SECS
wall-clock started at spawn — an insight queued behind a running TTS
synthesis parked its first chat call on the lease and timed out
("timeout after 180s") before chatterbox even finished loading.

Acquire-and-drop an LLM read lease before starting the job clock in
both insight handlers: the wait for the GPU happens before the
timeout begins, mirroring the per-request lease semantics. Dropped
immediately — holding it across the generation would deadlock the
chat calls' own lease acquisitions.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 19:15:38 -04:00
Cameron Cordes a022a3d15d Fix RAG vector-space mismatch and search_rag retrieval quality
Queries embedded via llama-swap were searching corpora embedded via
Ollama (measured: spaces diverged). Introduce LocalLlm — the local
Ollama + llama-swap pair with LLM_BACKEND dispatch baked in — and route
all embedding writers through it; anything embedding via a concrete
client reintroduces the bug.

- search_rag: embed the model's query verbatim (no metadata boilerplate),
  make date optional — no time-decay when omitted, so "when did X
  happen?" queries rank purely by similarity across all time
- reembed_embeddings bin: re-embed summaries / calendar / search /
  knowledge entities via the active backend, with old-new cosine report
  per table and truncate-and-retry for inputs over the embed server's
  physical batch size
- import_calendar, import_search_history: embed through LocalLlm
- search_messages / get_sms_messages: render sender → recipient so sent
  messages are attributable to a conversation
- insight job failures: store the one-line anyhow context chain ({:#})
  instead of the Debug dump the client was shown verbatim
- serialize env_dispatch tests behind a lock (parallel-runner flake)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 19:06:52 -04:00
Cameron Cordes 0accc4ef2f Add GPU lease coordinating LLM and TTS requests through llama-swap
llama-swap runs chat/vision/Chatterbox as a mutually-exclusive set on
one GPU and HOLDS a request for a non-resident model until the resident
model drains, then swaps. That hold burned the holder's reqwest timeout
(measured: a queued TTS lost 77s behind one LLM turn; an LLM request
behind a synthesis waited the entire remaining synth), so concurrent
insight + read-aloud timed out instead of queueing.

ai::gpu adds a fair RwLock lease acquired before each request is sent,
so cross-model waits happen before the HTTP timeout starts: chat/vision
share the read lease, TTS synthesis and voice-library ops (which spin
Chatterbox up) take the write lease, and embeddings take none (the
embed slot is in llama-swap's always-resident group). Speech jobs now
flip queued->running only after acquiring the GPU, letting the client
anchor its poll deadline to that transition.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 18:20:06 -04:00
Cameron Cordes 03699f7413 Add TTS voice deletion, async speech jobs, voice-list cache, ref-seconds name tags
- DELETE /tts/voices/{name}: remove a cloned voice via the llama-swap
  passthrough (upstream chatterbox-tts-api exposes DELETE /voices/{name}).
- POST/GET/DELETE /tts/speech/jobs: durable job flow for long syntheses —
  dispatch returns 202 + job id, the synth queues on the GPU permit instead
  of fast-failing 429, and clients poll for the result (kept ~10 min).
- GET /tts/voices now serves an in-memory cache so listing voices doesn't
  make llama-swap spin up the TTS model (evicting the resident LLM);
  invalidated on create/delete, ?refresh=1 forces an upstream re-query.
- Created voice names are tagged with LLAMA_SWAP_TTS_REF_SECONDS (e.g.
  grandma-30s) so the library shows which ref length produced each clone.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 17:36:15 -04:00
cameron c78e751743 Merge pull request 'Feature/insight history' (#104) from feature/insight-history into master
Reviewed-on: #104
2026-06-10 19:01:14 +00:00
Cameron Cordes 31904fef80 Raise chat truncation default num_ctx to 32k, env-overridable
The history-truncation budget assumed an 8192-token context whenever a
chat request omitted num_ctx, while the llama-swap chat slots serve
20k-131k. Replayed transcripts past ~6k tokens were silently gutted
every turn — losing conversation history and destroying llama.cpp
KV-cache prefix reuse (full SWA re-prefill per turn).

Default is now 32768 (real conversations top out around 16k), with
AGENTIC_CHAT_DEFAULT_NUM_CTX to override per deploy, floored at
headroom + 1024. Explicit per-request num_ctx still wins.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-09 19:14:02 -04:00
Cameron Cordes 13f3635db2 Fix clippy lints in backfill and libraries tests
Keep `cargo clippy --tests` clean alongside the agentic-loop changes:
alias backfill's five-element setup() tuple as SetupFixture
(type_complexity) and build the single-library health map via
std::slice::from_ref instead of cloning (unnecessary clone-to-slice).
No behavior change.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-09 18:29:44 -04:00
Cameron Cordes b711252c23 Resolve persona prompts server-side; drop synthetic prompt in chat_turn
A request carrying persona_id but no system_prompt used to fall back to
the neutral default voice. Both agentic generation
(generate_agentic_insight_handler) and chat bootstrap now resolve the
persona's stored prompt from the persona store, with precedence:
explicit non-blank client system_prompt > persona store lookup >
existing default ("default" persona id behaves the same — used if the
store has a row, neutral default otherwise). Resolution happens at the
handler / bootstrap entry where the DAO is reachable; internals are
unchanged. resolve_bootstrap_system_prompt takes the resolved persona
prompt as a second argument, with precedence tests.

Also in insight_chat:

- Sync chat_turn no longer persists the synthetic "Please write your
  final answer now without calling any more tools." user message pushed
  on iteration exhaustion — extracted both streaming variants'
  synthetic_idx pattern into push/remove_synthetic_final_prompt (the
  remove is a defensive no-op on index drift) and applied it to all
  three loops; round-trip test included.
- Strip leaked <think> blocks from the final content persisted as the
  reply in chat_turn and both streaming AgenticLoopOutcomes (mid-stream
  TextDeltas are untouched; the raw transcript keeps the block).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-09 18:29:35 -04:00
Cameron Cordes 091982bdfc Add recall_facts_for_entity tool; fix generation gates and tool output
Agentic-loop fixes in the generator:

- New recall_facts_for_entity tool (always-on, like recall_entities):
  fetches facts for one entity by id so the model can follow up on
  entities surfaced by recall_entities that aren't photo-linked
  (recall_facts_for_photo only covers linked entities). Mirrors that
  tool's persona scoping (PersonaFilter::Single) and the persona's
  reviewed_only_facts filter exactly, and renders in the same
  "Entity: ... / - predicate object" style. Wired through execute_tool
  and the trajectory summarizer.
- Generation now resolves gates persona-aware:
  current_gate_opts_for_persona(images_inline, Some((user_id,
  persona_id))) instead of the None-defaulting wrapper, so a persona's
  allow_agent_corrections opens propose_correction during generation the
  same way chat turns already did. The now-unused current_gate_opts
  wrapper is removed.
- Strip leaked <think> blocks from the final assistant content before
  parse_title_body / store_insight (raw training transcript keeps them).
- Honest truncation labels: get_sms_messages and get_location_history
  said "Found N ..." while listing only the first K; found_header now
  emits "Found N ... (showing first K):" when truncated, and the
  summarizer still parses the count.
- Clamp days_radius in get_calendar_events and get_location_history to
  1..=30, matching get_sms_messages.
- persona_system_prompt helper (persona store lookup, blank-prompt ->
  None) for server-side persona resolution; callers land in the next
  commit.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-09 18:29:20 -04:00
Cameron Cordes 592dfcb42c Accumulate streamed tool calls across chunks in Ollama streaming
Ollama >=0.8 can stream tool_calls incrementally across NDJSON chunks;
chat_with_tools_stream did `tool_calls = Some(tcs)` per chunk, so only
the last chunk's calls survived assembly and earlier calls were silently
dropped. Append into the accumulator instead.

- ollama: append_streamed_tool_calls helper + tests covering two calls
  arriving in separate chunks and the single-chunk batch case.
- llamacpp: the SSE delta assembly was already correct (per-index
  BTreeMap, same-index argument fragments concatenate, distinct indexes
  accumulate); extracted it into apply_tool_call_deltas /
  finalize_tool_calls and added tests pinning that behavior.
- llm_client: new shared strip_think_blocks (moved from ollama's private
  extract_final_answer, which now delegates) so the tool-calling final
  content paths can reuse it; unit tests for tagged/plain/unclosed/empty
  cases.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-09 18:29:06 -04:00
Cameron Cordes 8e4f91561b Add per-file insight history endpoint and rate-by-id
Expose GET /insights/history?path=... returning every generated version
of a photo's insight (current plus superseded), newest-first, backing the
mobile per-file insight history view.

- New get_insight_history_handler; reuses the existing get_insight_history
  DAO method (removed its dead_code allow).
- impl From<PhotoInsight> for PhotoInsightResponse, collapsing the mapping
  that was duplicated across the single-get and all-insights handlers.
- rate_insight_by_id DAO method + optional insight_id on RateInsightRequest
  so previously generated versions can be approved/rejected (the path-based
  rate only touches the current row).
- DAO tests for history ordering/scoping and id-targeted rating.
- cargo fmt normalized a multi-line assert in insight_chat.rs tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 18:28:22 -04:00
cameron 750a8de6b1 Merge pull request 'Feature/tts integration' (#103) from feature/tts-integration into master
Reviewed-on: #103
2026-06-07 21:35:49 +00:00
Cameron Cordes 412da2ce8e Collapse blank lines to a single break in TTS text cleaning
Chatterbox inserts a long pause — sometimes ~20s of silence — for each
blank line it sees, and insight text is markdown full of paragraph
breaks. clean_for_tts previously preserved paragraph structure
(\n{3,} -> \n\n), so every paragraph boundary still reached the model
as a double newline. Now any run of 2+ newlines, including
whitespace-only blank lines, collapses to a single newline so the
worst pause a break can cause is a normal line-break pause.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-04 09:12:43 -04:00
Cameron Cordes dec6f21af9 Bump version to 1.3.0
TTS feature release: /tts/speech + voice library endpoints (Chatterbox via
llama-swap), input cleaning, tuning knobs, WAV-normalized voice cloning,
OTel spans, dedicated synth timeout, and single-flight serialization.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-03 14:07:10 -04:00
Cameron Cordes cab867da60 Serialize /tts/speech with a single permit; 429 when busy
The Chatterbox wrapper has no internal lock or cancellation, so concurrent
synth requests contend on the single GPU and abandoned (timed-out) jobs
cascade into stacked slowness. Gate synthesis behind a one-permit semaphore
and fast-fail concurrent requests with 429 instead of queueing.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-03 14:02:56 -04:00
Cameron Cordes d8dd260c6b Give TTS synthesis its own (longer) request timeout
Long insights are chunked + synthesized server-side and can run past the shared
180s chat/embedding client timeout, causing spurious timeouts. /tts/speech now
uses a per-request timeout from LLAMA_SWAP_TTS_REQUEST_TIMEOUT_SECONDS
(default 600), overriding the client default without affecting chat/embeddings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-03 10:25:06 -04:00
Cameron Cordes 9978b28b52 Document TTS endpoints + env in CLAUDE.md
Sync CLAUDE.md with the Chatterbox TTS feature: the /tts/* endpoints and the
LLAMA_SWAP_TTS_MODEL / _VOICE / _REF_SECONDS env vars (only need LLAMA_SWAP_URL).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-02 23:15:39 -04:00
Cameron Cordes ccacfe1113 Instrument TTS handlers with OTel spans (codebase standard)
Each /tts handler now opens an http.tts.* span via extract_context_from_request
+ global_tracer().start_with_context, sets Status::Ok / Status::error on every
outcome, and records useful attributes (model, format, voice_name, byte counts)
— matching the insight handlers. Prometheus request metrics were already
covered by the app-wide actix-web-prom middleware.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-02 23:10:43 -04:00
Cameron Cordes 62d517dcda Normalize voice-clone reference audio to WAV via ffmpeg
Chatterbox validates the reference clip by file extension and rejects formats
like .aac/.opus. Always transcode the reference (upload bytes and library
files alike) to mono 24 kHz WAV with ffmpeg before forwarding, so any source
format is accepted and the from-library audio/video paths are unified.

The reference length cap is now configurable via LLAMA_SWAP_TTS_REF_SECONDS
(default 30) — Chatterbox is zero-shot, so a clean ~10-20s clip is the sweet
spot. Drops the now-unused mime guesser.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-02 22:50:08 -04:00
Cameron Cordes 35c5ecb427 Document TTS endpoints and env in README + .env.example
Adds the /tts/speech and /tts/voices* endpoints plus LLAMA_SWAP_TTS_MODEL /
LLAMA_SWAP_TTS_VOICE (TTS only needs LLAMA_SWAP_URL, not LLM_BACKEND=llamacpp).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-02 22:34:34 -04:00
Cameron Cordes 51be5df214 Clean insight text for TTS and pass through Chatterbox tuning knobs
/tts/speech now normalizes input before synthesis: unwraps markdown
links/images to visible text, drops heading/list/blockquote/emphasis
markers and URLs, strips emoji (which non-turbo Chatterbox mispronounces
or skips), and collapses whitespace. Centralized in clean_for_tts so the
app, WebUI, and curl all get clean audio. Bracketed tags are deliberately
preserved for a future Turbo (paralinguistic) switch.

Adds optional exaggeration / cfg_weight / temperature to the request,
clamped to Chatterbox's documented ranges and forwarded on the speech
body. Unit tests cover markdown/emoji/URL stripping and tag preservation.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-02 22:15:05 -04:00
Cameron Cordes 69268d03fe Add TTS endpoints backed by Chatterbox via llama-swap
LlamaCppClient gains text_to_speech (OpenAI /audio/speech), list_voices and
create_voice (voice library at the swap-root /upstream/<model>/voices
passthrough), plus a tts_model slot configured via LLAMA_SWAP_TTS_MODEL
(default "chatterbox").

New Claims-gated routes:
- POST /tts/speech        -> { audio_base64, format } for data: URI playback
- GET  /tts/voices        -> voice library passthrough
- POST /tts/voices/upload -> clone a voice from an uploaded clip (multipart)
- POST /tts/voices/from-library -> clone from a library file (ffmpeg-extracts
  audio from video; audio forwarded as-is)

Security: voice_name sanitized to [A-Za-z0-9_-] (it becomes an upstream
filename), 25 MB upload cap, library refs restricted to real audio/video,
path confined via is_valid_full_path. Adds is_audio_file + unit tests for the
sanitizer, mime guesser, and swap-root derivation.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-02 22:04:42 -04:00
cameron 015dc976e3 Merge pull request 'feature/insight-jobs' (#102) from feature/insight-jobs into master
Reviewed-on: #102
2026-06-02 23:41:36 +00:00
Cameron Cordes b9b6e51af1 Stop ffprobe walking every frame in video stream probe
probe_video_stream_meta requested a bare `side_data_list` section in
-show_entries. On modern ffprobe that's the *frame* side-data section,
so ffprobe enumerated every frame to collect it — reading the entire
mdat. For non-faststart phone clips on the SMB mount this turned a
metadata probe into a full-file read: /video/generate took 10-32s per
open (0% CPU, time proportional to file size).

Switch to `stream_side_data_list`, which reads the Display Matrix
rotation from the stream header (moov) without touching frames. Codec,
frame rate, and rotation are unchanged; the existing rotation parser
already reads streams[0].side_data_list[].rotation. Fixes both the
open-path probe and the transcode actor's probe. Cold opens now return
near-instantly.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-30 13:19:47 -04:00
Cameron Cordes 16ae82ba70 Normalize video rel_path lookup to forward slashes on Windows
generate_video built the rel_path for its image_exif lookup by stripping
the library root from the absolute path, leaving backslashes on Windows
(Melissa\clip.mp4). file_scan stores rel_paths forward-slash and
get_exif_batch matches exactly with no normalization, so the lookup
missed and the handler re-hashed the entire video file on every request.

Extract rel_path_for_lookup and normalize separators with replace('\\',
'/'). Adds tests for Windows/Unix separators, file-at-root, leading
separator stripping, and the no-match fallback.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-30 12:51:44 -04:00
Cameron Cordes a542ea411b Exclude inlined image bytes from chat context budget
The truncation budget estimated message size by serializing the full
ChatMessage array, including the base64 image persisted in the first
user message. A 1024px JPEG is hundreds of KB of base64 characters —
8-19x the entire ~24KB text budget at the default num_ctx — and the
image lives in the protected prefix that's never dropped. The budget
check was therefore essentially always over, dropping all tool history
and firing the "trimmed context" banner on every turn for vision
backends that inline images.

estimate_bytes now strips image payloads before counting and charges a
flat IMAGE_TOKENS_EACH per image instead, so the budget reflects real
text token pressure. Adds a regression test covering a short
conversation with one large image.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-30 11:51:57 -04:00
Cameron Cordes 962f7bf05c Add reconnectable async chat-turn flow with in-memory TurnRegistry
Replace the one-shot SSE chat stream with an async dispatch + reconnectable
replay flow so the mobile client survives backgrounding, network blips, and
OS-killed sockets without losing an in-flight agentic turn.

- TurnRegistry/TurnEntry: in-memory per-turn event buffer (cap 500, front
  eviction) shared by the agentic loop (writer) and SSE replay readers.
  ReplayOutcome + replay_from/next_batch distinguish Events/CaughtUp/Gone;
  next_batch registers the Notify before reading state (no lost wakeup) and
  drains every buffered event before signaling terminal, so the final
  Done/Error is never dropped and the stream closes cleanly.
- Endpoints: POST /insights/chat/turn (202 + turn_id), GET
  /insights/chat/turn/{id} (SSE replay, ?skip_before= resume, per-event seq,
  410 on eviction), DELETE /insights/chat/turn/{id} (real task abort +
  cooperative is_running() check at each loop boundary).
- Cancellation actually stops the task (AbortHandle stored on the entry) and
  emits a Done{cancelled:true}; callers skip persistence on cancel.
- Background sweeper drops stale turns; interval clamped to <=300s.
- OpenTelemetry spans: ai.chat.turn.execute/replay/cancel.
- Legacy POST /insights/chat/stream path preserved unchanged.

Tests: registry coverage for terminal delivery (race guard), waiting, Gone,
abort, eviction; handler integration tests for 404/410, skip_before, seq
stamping, completed replay, and cancel.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 19:50:25 -04:00
Cameron Cordes 0c1c1c6792 fix: split token count columns into separate migration
A previous commit added prompt_eval_count and eval_count to the
existing 2026-05-27-000002_add_insight_generation_params migration,
but Diesel won't re-run an already-applied migration. Environments
that applied the original version of 000002 never got these two
columns, causing "no such column: photo_insights.prompt_eval_count"
on every insight read.

- Revert 000002 up.sql to its original 7-column form
- Add 000003_add_insight_token_counts for the two missing columns

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 22:34:44 -04:00
Cameron Cordes cdd981fe64 fix: inline DB error source into DbError struct
The previous fix logged the underlying error in a separate log line,
but the error that propagated up still showed just "DbError { kind:
InsertError }" at the call site. Now the source message is captured
on the struct itself, so Debug/Display output at any call site shows
the actual Diesel error inline.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 22:30:19 -04:00
Cameron Cordes dad0220587 fix: stop swallowing DB errors across the entire DAO layer
Every map_err(|_| DbError::new(...)) and map_err(|_| anyhow!("..."))
in the database layer was discarding the actual Diesel/SQLite error,
making failures impossible to diagnose from logs.

- Add DbError::log() that logs the source error before converting
- Replace all ~130 swallowed outer map_err closures with DbError::log
- Replace all ~47 swallowed inner anyhow closures to include the
  source error in the message

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 13:56:48 -04:00
Cameron Cordes 39ad83f55b fix: surface actual Diesel error in store_insight instead of generic InsertError
The previous map_err closures discarded the Diesel error, making
failures like missing columns impossible to diagnose from logs.
Now the underlying error is logged before converting to DbError.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 13:53:54 -04:00
Cameron Cordes 9654d256f4 fix: persist token counts and fix agentic insight_id mapping
- Add prompt_eval_count and eval_count columns to photo_insights so
  token usage from llama-swap/Ollama is stored and returned by the API
- Fix agentic generator return: was (prompt_eval_count, eval_count),
  handler destructured first element as insight_id — now returns
  (insight_id, prompt_eval_count, eval_count)
- Wire prompt_eval_count/eval_count from DB into PhotoInsightResponse
  instead of hardcoded None

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 13:47:57 -04:00
Cameron Cordes 449ce1fda1 chore: resolve all clippy warnings and formatting
- Replace impl ToString with impl Display for InsightJobStatus and
  InsightGenerationType
- Rename from_str → parse to avoid confusion with std::str::FromStr
- Collapse nested if statements (handlers, insight_chat, insight_generator,
  image handlers)
- Use is_multiple_of() instead of manual modulo checks
- Suppress deprecated diesel::dsl::count_distinct (no drop-in replacement
  available in current Diesel version)
- Scope MutexGuard in synthesize_merge to drop before await
- Allow dead_code on generate_no_think, enumerate_indexable_files,
  total_deleted (intended for future use)
- Allow type_complexity on Diesel query result tuples

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 13:13:48 -04:00
Cameron Cordes a410683edf fix: fail fast when LLM_BACKEND=llamacpp but LlamaCppClient is unconfigured
Previously embed_one() silently fell back to Ollama embeddings,
which would load nomic-embed-text into VRAM alongside llama-swap —
wasting memory on an unintended model. Now returns an error with
an actionable message instead.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 13:02:42 -04:00
Cameron Cordes 2818936739 fix: audit fixes for async insight jobs + persist generation params
- Fix query param mismatch: rename GenerationStatusQuery.file_path to
  path so the client's app-resume buildQuery({ path: ... }) resolves
  correctly instead of always getting 400
- Remove dead _lib_id bindings from both generate handlers
- Return 202 Accepted instead of 200 from generate endpoints
- Restore OpenTelemetry span instrumentation on generate handlers
- Remove stale UNIQUE constraint from initial migration (incompatible
  with plain-INSERT DAO)
- Add tests for status guard: complete_job/fail_job are no-ops when
  job is already cancelled, and cancel_job by id
- Persist generation params (num_ctx, temperature, top_p, top_k, min_p,
  system_prompt, persona_id) on the photo_insights table for auditing

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 13:02:15 -04:00
Cameron Cordes b87eb4e690 feat: async insight generation with SQLite job tracking
- Add insight_generation_jobs table migration and DAO
- Implement job lifecycle: create_or_get_active, complete, fail, cancel
- Refactor POST /insights/generate and /agentic to async spawn with timeout
- Add GET /insights/generation/status endpoint with job_id and file_path lookup
- Use String for enum fields in Diesel models to avoid private Bound type
- Add from_str() helpers on InsightJobStatus and InsightGenerationType
- Fix update_training_messages to return Result<usize, DbError>
- 7/7 DAO unit tests passing
2026-05-27 10:02:18 -04:00
cameron 5a75d1a28c Merge pull request 'feature/llamacpp-backend' (#101) from feature/llamacpp-backend into master
Reviewed-on: #101
2026-05-26 18:58:47 +00:00
Cameron Cordes b03ee60342 fix: prevent hybrid mode from leaking OpenRouter model to local llamacpp client
When backend=hybrid with LLM_BACKEND=llamacpp, the user-selected model
(an OpenRouter id like "google/gemini-3-flash-preview") was being applied
to the local LlamaCppClient's primary_model and vision_model. This caused
describe_image to send the OpenRouter model name to llama-swap, which
returned 400 because it has no such slot.

Guard the local-client model override with !is_hybrid so it only applies
in local-only mode (where the user is selecting a different local model).
Bump to v1.2.0.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-26 09:55:16 -04:00
Cameron Cordes 0a627f4880 Add contact name filter to SMS search tool + misc improvements
- sms search tool: accept contact name, trim/validate, skip when
  contact_id is set, pass to API client
- sms_client: new contact field in SmsSearchParams, URL-encode on wire
- Tool description clarifies contact_id takes precedence when both given
- Add parse_title_body helper for LLM response parsing
- llamacpp backend improvements
2026-05-25 21:46:18 -04:00
cameron b9175e2718 image: add xlarge (4096px) on-demand preview tier
New `PhotoSize::XLarge` variant sits between `Large` (2048px) and
`Full` (original). On-demand generated and disk-cached at
`_xlarge/<hash>.jpg`, same waterfall as `Large` (embedded RAW preview
→ ffmpeg → image crate). Sources below 4096px serve at native size.

Reduces decoded bitmap memory from ~192MB (48MP full) to ~64MB for
the mobile viewer's zoom tier.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 15:33:03 -04:00
Cameron Cordes 9dba659d1e test: add llamacpp model-slot consistency and content-null tests
Cover the properties that prevent mid-turn model swaps in llama-swap
exclusive mode: vision_model defaults to primary, cloned local client
mirrors the user-selected model, embeddings stay on their own slot.
Also test the content:null serialization for tool-calling messages.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 19:29:51 -04:00
Cameron Cordes 208344ad98 ai: mirror chat model on local client to prevent mid-turn model swap
When the user selects a model from the picker, the local client's
primary_model and vision_model now match the chat model. Prevents
llama-swap exclusive mode from swapping models when describe_photo
or rerank fires during an agentic turn.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 19:27:29 -04:00
Cameron Cordes fb388c29d7 docs: update env + CLAUDE.md for direct-vision llamacpp + ResolvedBackend
llamacpp models now receive images directly instead of
describe-then-inline. LLAMA_SWAP_VISION_MODEL defaults to the
primary model. Document the ResolvedBackend dispatch pattern.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 15:03:12 -04:00
Cameron Cordes a8a661f70a ai: extract ResolvedBackend, remove ~480 lines of duplicated dispatch
Replace 5 copies of the ~80-line backend resolution pattern with a
single InsightGenerator::resolve_backend() builder that returns a
ResolvedBackend (chat + local clients, BackendKind enum, images_inline
flag). Tool dispatch now takes &ResolvedBackend instead of
&OllamaClient + model + backend strings.

Remove duplicated ollama/openrouter/llamacpp fields from
InsightChatService — InsightGenerator owns them and resolve_backend
uses them. Delete build_chat_clients (replaced by resolve_backend).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 15:00:50 -04:00
Cameron Cordes 0631820fbf ai: send images directly to llamacpp chat models + add ResolvedBackend
llamacpp models now receive images via OpenAI content-parts instead of
the describe-then-inline strategy (hybrid mode unchanged). Fixes
assistant messages with tool_calls emitting content: null instead of ""
to satisfy strict Jinja template role-alternation checks. Adds debug
logging of message role sequences on llamacpp requests.

Introduces BackendKind enum, SamplingOverrides, and ResolvedBackend in
a new backend.rs module. InsightGenerator::resolve_backend centralises
client construction + vision capability detection — next step wires the
existing inline dispatch through it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 14:00:37 -04:00
Cameron Cordes be51421b38 ai: collapse llamacpp into LLM_BACKEND env switch
Reverts the per-request backend="llamacpp" value. Chat/vision/embedding
backend is now a deploy-time decision (LLM_BACKEND=ollama|llamacpp),
applied globally across chat, vision describe, and embeddings — so
embedding vectors stay in one space across the index.

- Per-request backend whitelist back to "local"|"hybrid". A request
  arriving with backend="llamacpp" is rejected.
- LLM_BACKEND=llamacpp swaps the entire local stack to llama-swap:
  chat hits the chat slot, describe hits the vision slot, embeddings
  hit the embed slot. Hybrid mode still routes chat to OpenRouter
  but uses LLM_BACKEND for the describe pass.
- Drops env vars HYBRID_VISION_BACKEND, LLAMA_SWAP_VISION_MODELS,
  EMBEDDING_BACKEND (the last never shipped). Drops the
  LlamaCppClient.vision_models allowlist — capability inference now
  reports has_vision only for the configured vision_model slot.
- Drops the /insights/llamacpp/models handler. /insights/models is
  the single endpoint; returns Ollama servers under LLM_BACKEND=ollama
  and llama-swap slots (from LLAMA_SWAP_ALLOWED_MODELS) under
  LLM_BACKEND=llamacpp. Same envelope shape either way.
- New ai::embed_one helper routes embeddings through llama-swap when
  LLM_BACKEND=llamacpp (else Ollama). Wires it into the four
  insight_generator embedding sites.
- Cross-replay matrix simplifies to pre-llamacpp shape (local↔local,
  hybrid↔hybrid, hybrid→local allowed; local→hybrid rejected).
2026-05-21 11:36:58 -04:00
Cameron Cordes d14df63f19 env.example: document LLAMA_SWAP_* + HYBRID_VISION_BACKEND vars
Mirrors the section added to CLAUDE.md so deploys can opt into the
llamacpp backend from the template alone.
2026-05-20 17:54:08 -04:00
Cameron Cordes f0927f5355 ai: add llamacpp backend (llama-swap) as third LLM client
Wires a new LlamaCppClient (OpenAI-compatible /v1 wire format) alongside
OllamaClient and OpenRouterClient. Per-slot routing for chat/vision/embed
via env (LLAMA_SWAP_URL + *_MODEL vars); capability inference uses an
env allowlist since /v1/models doesn't report modality.

InsightGenerator + InsightChatService gain three-way dispatch on
chat_backend = "local" | "hybrid" | "llamacpp". Hybrid and llamacpp
share the describe-then-inline path (text-only chat after a separate
vision describe). HYBRID_VISION_BACKEND=llamacpp lets hybrid route its
describe pass through llama-swap's vision slot while chat still goes
to OpenRouter.

Cross-replay matrix added (validate_cross_replay): local<->llamacpp
and hybrid<->llamacpp allowed; local->hybrid and llamacpp->hybrid
rejected. New /insights/llamacpp/models handler mirrors the OpenRouter
shape.
2026-05-20 17:52:33 -04:00
cameron d04b86e32c Merge pull request 'image: add on-demand size=large preview tier (~2048px JPEG q85)' (#100) from feature/image-large-preview into master
Reviewed-on: #100
2026-05-19 21:51:08 +00:00
Cameron Cordes 19798184f0 image: add on-demand size=large preview tier (~2048px JPEG q85)
Adds a third PhotoSize between Thumb (200px) and Full (original). The
viewer placeholder and map callout previously upscaled a 200px thumb
into a full-screen / full-width view, which looked visibly blocky on
3× devices. The new tier is generated on-demand, disk-cached, and
served via the existing /image endpoint.

Storage layout mirrors the Thumb branch's lookup chain:
  1. hash-keyed: <thumbs>/_large/<hash[..2]>/<hash>.jpg (shared across
     libraries when content_hash is known)
  2. library-scoped legacy: <thumbs>/_large/<lib_id>/<rel_path>

Generation pipeline mirrors generate_image_thumbnail:
  - RAW: decode the embedded JPEG preview, apply EXIF orientation,
         resize to 2048-long-edge, encode JPEG q85
  - HEIC/HEIF: ffmpeg with scale + q:v 5 (≈ q85)
  - everything else: image crate decode + thumbnail() + JpegEncoder
Never upscales — sources below the 2048 cap re-encode at native size.

Handler offloads decode/resize to web::block to keep the actix worker
free (a 24MP source takes 100–500ms). Writes via tempfile+rename so
concurrent readers can't observe a half-written JPEG. On any
generation failure, falls through to the Full branch (which itself
serves the RAW embedded preview for unrenderable RAW containers).

Video requests for size=large fall back to the existing thumb pipeline
since there's no useful 2048px video tier.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 13:14:49 -04:00
cameron c3c6cd03db Merge pull request 'file_types: filter macOS AppleDouble + .DS_Store from media predicates' (#99) from feature/filter-fs-metadata into master
Reviewed-on: #99
2026-05-18 17:12:42 +00:00
Cameron Cordes b843a4a366 file_types: filter macOS AppleDouble + .DS_Store from media predicates
Symptom: Apollo's logs showed bursts of 422 decode_failed from
ImageApi's CLIP backfill — e.g. `._DSC_2182-S.jpg`. macOS writes
`._<name>` AppleDouble sidecars when copying to non-HFS volumes
(SMB, FAT, exFAT), and they carry the original file's extension
even though their bytes are extended-attribute metadata, not the
image. ImageApi's walker matched them via the extension predicate,
sent them through the ingest pipeline, and accumulated failed rows
in face_detections + clip_embedding while pinning Apollo's eviction
timer with the 422 burst.

Fix: predicate-level guard in is_image_file / is_video_file (and
by inheritance is_media_file). Every walker that already gates on
these (face_watch, backfill, clip_watch, watcher, files,
probe_clip_search) inherits the skip without per-callsite edits.
Narrow scope on purpose — `._*` prefix + the exact `.DS_Store`
basename — rather than blanket dotfile filtering, because a user
could plausibly name a cover image `.cover.jpg`.

Existing rows are not cleaned by this change. To purge what
already accumulated (one-shot, run from your DB shell after
deploying):

  DELETE FROM image_exif
   WHERE file_path LIKE '%/._%' OR file_path LIKE '%/.DS_Store';
  DELETE FROM face_detections
   WHERE rel_path LIKE '%/._%' OR rel_path LIKE '%/.DS_Store';
  DELETE FROM tagged_photo
   WHERE file_path LIKE '%/._%' OR file_path LIKE '%/.DS_Store';
  DELETE FROM favorites
   WHERE path LIKE '%/._%' OR path LIKE '%/.DS_Store';

The maintenance pipeline's missing-file scan would NOT catch these
on its own — the files exist on disk (they're real macOS metadata,
just not images), so stat() returns Ok and the row sticks.
2026-05-17 20:10:16 -04:00
cameron d275150db6 Merge pull request 'feature/video-frame-rate' (#98) from feature/video-frame-rate into master
Reviewed-on: #98
2026-05-18 00:09:35 +00:00
Cameron acdffc1558 cargo fmt: drop trailing blank line in actors.rs
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 21:14:30 -04:00
Cameron bd61e10158 chore: add .gitattributes + unit tests for ffprobe rational parser
LF normalization across OSes; *.sql pinned to LF for stable diffs.

Tests cover the rational frame-rate parser (NTSC 29.97, integer fps,
slow-mo 240, ffprobe's 0/0 unknown sentinel, malformed and out-of-range
inputs). Extracted the closure into a free fn for the test seam.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 21:13:06 -04:00
Cameron 1b70a6f0b4 video: probe frame rate via ffprobe and return on /video/generate
Adds frame_rate to GenerateVideoResponse so the mobile scrubber can step
at the source's real fps instead of a hardcoded 30. probe_video_stream_meta
gains a frame_rate field (avg_frame_rate preferred, r_frame_rate fallback,
nonsense values rejected) and is now pub so the handler can reuse it.
Cost is one ffprobe per /video/generate call; degrades silently to None
on probe failure.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 21:03:21 -04:00
cameron 3162a4f477 Merge pull request 'clip-search: accept library_ids (multi-select whitelist) on /photos/search' (#97) from feature/clip-search-library-ids into master
Reviewed-on: #97
2026-05-16 13:38:00 +00:00
Cameron Cordes 87093a63d7 clip-search: accept library_ids (multi-select whitelist) on /photos/search
Previously the endpoint only accepted `library=<id>` (single id) — multi-
select scopes had to be filtered upstream by Apollo, which kept the
filter logic out of FileViewer-React's reach (it calls ImageApi
directly and got no scoping for 2+ active libraries).

Adds `library_ids` (comma-separated id list, e.g. `?library_ids=1,3`).
Parsed inside the existing scope decision: `library_ids` wins when
both are supplied; either / both empty falls back to "every enabled
library" (historical default). Malformed entries return 400.

Dedupes ids while preserving order so a stray `library_ids=1,1,3`
doesn't double-pass to the DAO. The single-id path still works
unchanged for older clients.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 09:30:46 -04:00
cameron dd7b4befb6 Merge pull request 'feature/clip-semantic-search' (#96) from feature/clip-semantic-search into master
Reviewed-on: #96
2026-05-16 00:32:32 +00:00
Cameron Cordes 922f7df8d3 clip-search: offset-based pagination on /photos/search
Adds `offset` query param (default 0) and `total_matching` + `offset`
response fields. Backend already computes the full sorted list of
above-threshold matches per query; pagination just slices it at
[offset, offset+limit) instead of always returning the top window.
Offsets past the end return an empty page cleanly so the client can
stop fetching naturally.

Re-scores on every page rather than caching the sorted list — at
personal-library scale (~14k embeddings, 768d) the dot-product loop
is sub-100ms and the lack of state means no eviction / staleness
concerns.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 16:56:10 -04:00
Cameron Cordes ee2ed3005b clip-search: document env knobs in .env.example
APOLLO_CLIP_API_BASE_URL (falls back to APOLLO_API_BASE_URL),
CLIP_BACKLOG_MAX_PER_TICK, CLIP_ENCODE_CONCURRENCY, and
CLIP_REQUEST_TIMEOUT_SEC — all of which the code already reads.
Apollo's side was documented earlier; this closes the parity gap.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 16:10:52 -04:00
Cameron Cordes 66267cc345 clip-search: fmt + clippy clamp + test AppState arg
Pulls cargo fmt + clippy pass over the new files only — pre-existing
files left untouched even though fmt has drift on them. clamp(1,200)
swaps a manual min/max chain that clippy flagged. test AppState
constructor needed ClipClient::new(None) so the lib-test target
compiles.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 16:10:52 -04:00
Cameron Cordes 32195ed89e clip-search: backlog drain + /photos/search endpoint
Wires the persistence layer for CLIP semantic search. The watcher's
per-tick drain encodes any image_exif row with a known content_hash
but no clip_embedding via Apollo (cap CLIP_BACKLOG_MAX_PER_TICK,
default 32). On a query, /photos/search encodes the text via Apollo
and reranks every stored embedding in-memory.

ExifDao additions:
- list_clip_unencoded_candidates — partial-index scan for drain
- backfill_clip_embedding — touches only the two new columns
- list_clip_index — dedup'd (hash, embedding) pull for search

clip_watch::run_clip_encoding_pass is the parallel fan-out — tokio
runtime per pass with CLIP_ENCODE_CONCURRENCY (default 4). No marker
rows for permanent failures yet; per-tick cap bounds the retry cost.

/photos/search params: q, limit, threshold (default 0.20), library,
model_version. Response is intentionally minimal (path + score) so
the frontend joins against existing photo-metadata routes lazily.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 16:10:52 -04:00
Cameron Cordes 8d9e76cf15 clip-search: migration + client + probe binary
Probe-phase scaffolding for CLIP semantic search. Adds the column
that will hold per-photo embeddings, the HTTP client to Apollo's
inference service, and a throwaway probe binary so we can eyeball
search-result quality on the live library before building the
persistence layer (backlog drain, /photos/search endpoint, UI).

- migrations/2026-05-14-000000_add_clip_embedding/ — adds
  image_exif.clip_embedding (BLOB) and clip_model_version (TEXT),
  plus a partial index on (clip_embedding IS NULL AND content_hash
  IS NOT NULL) for the future backfill drain.
- src/database/models.rs — extends ImageExif struct to match.
- src/ai/clip_client.rs — encode_image / encode_text / health,
  same Permanent/Transient/Disabled taxonomy as face_client.
- src/bin/probe_clip_search.rs — --query <q> --library N --limit M
  --top K. Encodes a sample and prints top-K cosine similarities.
  No DB writes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 16:10:52 -04:00
cameron 26ffc15c8b Merge pull request 'feature/hls-content-hash' (#95) from feature/hls-content-hash into master
Reviewed-on: #95
2026-05-15 20:09:48 +00:00
Cameron Cordes 0168a4b574 hls: remove legacy /video/stream + /video/{path} routes
The hash-keyed `/video/hls/{hash}/{file}` route fully covers HLS
playback now and both clients (Apollo, FileViewer-React) have
shipped updates that use it directly. Keeping the basename-keyed
fallback only encouraged stale URLs to keep flowing — every legacy
file was deleted by the startup migration, so the routes were
guaranteed 404 machines.

Dropped:
- `stream_video` handler (`GET /video/stream?path=…`) — the original
  basename-keyed playlist serve.
- `get_video_part` handler (`GET /video/{path}`) — bare-filename
  segment serve. The new layout's segments live in
  `<shard>/<hash>/segment_NNN.ts` and reach the client via
  `stream_hls_file`.
- `legacy_path` field on `GenerateVideoResponse` (serialised as
  `playlist`). The field always pointed at a file the migration had
  deleted; current clients ignore it entirely.
- Their service registrations in `main.rs`.
- The body-side `filename` extraction in `generate_video` (existed
  only to construct `legacy_path`) and the now-unused `global`
  opentelemetry import in `handlers/video.rs`.

All 707 tests still pass. Same hand-rolled validators (`is_valid_hash`
/ `is_allowed_hls_filename`) keep the new route's defense-in-depth
intact.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 16:00:19 -04:00
Cameron c30cadde02 ai: fix UTF-8 byte-slice panics in insight_generator log/truncation paths
Switch four `&s[..N]` / `&s[..s.len().min(N)]` sites to
`chars().take(N).collect::<String>()` so truncation lands on character
boundaries instead of mid-codepoint. The agentic summary preview log
was panicking when generated content hit an em-dash at byte 200; the
few-shot passage cap, brief_json_args debug formatter, and a test
assertion message had the same latent bug.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 15:10:02 -04:00
Cameron Cordes 8503ef7884 chore: cargo fmt + clippy --fix sweep across the crate
Pure mechanical cleanup of accumulated drift in files outside the
HLS-content-hash branch's main change set. No behavior change.

- `cargo fmt` on every previously-misformatted file
  (`ai/insight_generator.rs`, `database/knowledge_dao.rs`,
  `faces.rs`, `knowledge.rs`, `libraries.rs`).
- `cargo clippy --fix`:
  - `needless_borrow`: `&library` → `library` in `handlers/image.rs`
    (two sites in the photo-listing path).
- Manual clippy pass for warnings clippy emits but can't auto-apply:
  - `field_reassign_with_default` in `database/reconcile.rs::run` —
    consolidated into a struct-literal initializer.
  - `needless_range_loop` in `database/knowledge_dao.rs::union_perceptual_tags`
    — inner `for b in (a+1)..indices.len() { let ib = indices[b]; ... }`
    becomes `for &ib in &indices[a + 1..] { ... }`.
  - Doc-list indentation: continuation lines under nested bullets in
    `database/mod.rs::get_memories_in_window` and
    `database/knowledge_dao.rs::build_entity_graph` realigned to the
    list-item content column.

Deliberately not touched (each deserves its own focused commit, with
testing, rather than getting bundled into a sweep):
- 4× `deprecated count_distinct` in `faces.rs` — diesel API migration
  to `AggregateExpressionMethods::aggregate_distinct` may shift result
  types; needs verification against the existing stats queries.
- `await_holding_lock` in `knowledge.rs:807` — `std::sync::Mutex` held
  across `ollama.generate(...).await`. Genuine concurrency bug; fix
  requires understanding the surrounding flow before just dropping
  the guard.
- 2× `type_complexity` in `database/mod.rs` — cosmetic, would need a
  `type` alias and corresponding callers updated.
- Dead `total_deleted` on `library_maintenance::GcStats` and
  `file_scan::enumerate_indexable_files` — both are public surface
  retained for future use; deletion is a separate decision.

All 707 tests still pass. Release build clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 16:25:05 -04:00
Cameron Cordes 8c91bf554b hls: cargo fmt + clippy::cloned_ref_to_slice_refs
Pure mechanical pass on the files this branch added/modified:
rustfmt reflow of a few long lines / chains, and the one
non-pre-existing clippy warning — replacing
`&[rel_path.clone()]` with `std::slice::from_ref(&rel_path)` in
`handlers::video::generate_video` to avoid the alloc + clone for a
single-element slice.

All 707 tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 16:01:16 -04:00
Cameron Cordes 7cd1ea3cf8 hls: per-library readiness gauges + GET /hls/stats endpoint
The hash-keyed pipeline transcodes lazily, so a freshly mounted (or
freshly upgraded) library is "mostly pending" for the first hour
while the watcher works through the backlog. The operator wants a
live read on remaining work so they can tune `HLS_CONCURRENCY` and
know when to stop waiting.

Adds:

- `src/hls_stats.rs` — pure compute path (`stats_from_rows`) and an
  Arc<Mutex<dyn ExifDao>> wrapper (`compute_and_publish`). Per
  library: `total`, `with_playlist`, `pending`, `unsupported`,
  `hashless_videos`. Dedup is by content_hash so duplicate-bytes-at-
  N-paths counts once (same domain rule as `faces::stats`).
  `hashless_videos` is a separate counter so the operator can see
  the "hash backfill, then transcode" pipeline depth instead of
  having NULL-hash rows just hide.

- Prometheus gauges labeled by library name:
  `imageserver_hls_videos_total`, `..._with_playlist`, `..._pending`,
  `..._unsupported`. Updated by the watcher at the end of every full-
  scan tick *and* on every `/hls/stats` hit, so whichever surface the
  operator is watching stays fresh. Registered in `main` alongside
  the existing image/video gauges.

- `GET /hls/stats` — Claims-protected JSON snapshot of the same data
  plus a top-level cross-library aggregate. Runs on a blocking pool
  so it doesn't pin the actix worker; per-call cost is one
  `list_paths_and_hashes_for_library` SQL query per library plus a
  `stat()` per distinct video hash. Bounded — never invoked from
  middleware, only from the explicit endpoint and the full-scan
  tick. The watcher's end-of-tick `info!` summary line mirrors the
  endpoint output for operators tailing the log.

- New `ExifDao::list_paths_and_hashes_for_library` method:
  `SELECT rel_path, content_hash FROM image_exif WHERE library_id =
  ?`. Single round-trip; callers filter to video extensions
  client-side because the schema doesn't carry media-type. Mock
  impl in `files.rs` returns an empty vec.

Tests in `hls_stats::tests` exercise stats_from_rows directly (videos-
only filter, hash dedup, playlist vs sentinel decision, NULL-hash
hashless counting) plus a publish_gauges round-trip that reads the
gauge value back. Full suite (347 lib + 360 bin = 707) passes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 15:58:46 -04:00
Cameron Cordes 7c153596fe hls: hash-keyed HTTP routes for /video/generate and serving
`POST /video/generate` is reshaped to return a JSON object instead of
a bare string. New fields:

- `playlist_url`: stable hash-keyed URL of the form
  `/video/hls/<hash>/playlist.m3u8`. Use this with hls.js / native
  players — relative segment refs inside the playlist resolve to
  `/video/hls/<hash>/segment_NNN.ts` because the URL is path-based.
- `content_hash`: the blake3 hex digest that identifies the bytes.
  Stable across libraries, archive ingests, renames; clients can
  cache the URL by hash.
- `ready`: true iff the playlist file is already on disk. False means
  a transcode was just queued; the client should retry the URL after
  a short delay (or rely on hls.js's built-in retry).
- `playlist` (legacy): basename-keyed path string, echoed under the
  old field name so clients that destructure `response.playlist` keep
  working during the rollout. The startup migration deletes the
  underlying file, so this URL will 404; clients should migrate to
  `playlist_url`. Field is slated for removal once Apollo / File
  Viewer ship the update.

The handler:
- resolves the source path across libraries (same logic as before),
- looks up `image_exif.content_hash` for that (library_id, rel_path),
- falls back to inline `content_hash::compute` when the row is mid-
  backfill — pure read, no library mutation,
- sends a single-element `QueueVideosMessage` to `VideoPlaylistManager`
  if the playlist isn't already on disk and there's no
  `playlist.unsupported` sentinel,
- returns the URL immediately. The actor pipeline owns transcoding.

New route `GET /video/hls/{hash}/{file}`:
- strict validation: hash must be 64 ascii-hex chars; file must be
  `playlist.m3u8` or `segment_NNN.ts` (digits only). Anything else
  returns 400 so we never have to rely on path canonicalisation
  alone to defend against traversal,
- belt-and-suspenders canonicalize() guard verifies the resolved
  file lives under `$VIDEO_PATH`,
- serves with the standard `NamedFile::into_response` machinery.

Cleanup in `actors.rs`:
- `ProcessMessage` + its `StreamActor` handler had no senders after
  the rewire — removed. `StreamActor` itself stays (still handles
  `RefreshThumbnailsMessage` from `files.rs`).
- `create_playlist`, `playlist_file_for`,
  `playlist_unsupported_sentinel` are gone — the legacy on-demand
  transcode helper and the migration-only path helpers had no
  remaining users (the migration uses its own classify() function).
- Imports tightened: dropped `Child`, `ExitStatus`, `trace`.

Tests cover both new validators (`is_valid_hash`,
`is_allowed_hls_filename`) including the strings that motivated the
defence-in-depth (traversal attempts, internal `.tmp`/`.unsupported`
artifacts, malformed segment names).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 15:51:01 -04:00
Cameron Cordes 78fabc2b32 hls: retire legacy basename-keyed HLS files on startup
Adds `video::legacy_migration::retire_legacy_hls_output`, called once
from `main` right after the diesel migrations run and before the
actor pipeline starts. Walks `$VIDEO_PATH` at depth 1, deletes every
`.m3u8` / `.m3u8.tmp` / `.m3u8.unsupported` / `.ts` file at root, and
logs a single info line with per-class counts. Skips directories
(the new layout's `<shard>/<hash>/` lives there) and unknown
extensions, so an operator's stashed README or `.tmp` from a
different tool is safe.

Why this needs its own one-shot pass rather than letting the rewritten
`cleanup_orphaned_playlists` handle it: the cleanup walk deliberately
only looks at `<shard>/<hash>/` dirs (so it can't accidentally `rm`
operator-stashed content), so without this migration the legacy files
would sit at root forever, never served, never refreshed. Operator
complaint count from the previous IMG_NNNN.MOV collision: ~10
duplicate-basename hits on one library alone; total .m3u8 count was
699 vs a much larger video count — i.e. the loser of every collision
was a permanent orphan. This pass collects all of them, then the
running watcher writes hash-keyed playlists going forward.

Idempotent — a second boot finds nothing and reports zero deletions,
so the call site can stay in `main` across releases until the module
is removed in a later cleanup commit. Tests cover the happy path
(legacy artifacts gone, hash dir untouched, unrelated files left
alone), idempotency, and the missing-directory case.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 15:43:13 -04:00
Cameron Cordes b8e17e05b7 hls: rewrite orphan cleanup for hash-keyed layout
The cleanup walk previously looked for `$VIDEO_PATH/<basename>.m3u8`
and matched each file's stem against a recursive walk of every
library. With the hash-keyed layout now in place, every playlist's
file_stem is the literal string "playlist" — the old logic would
treat every hash-keyed playlist as orphaned on its next run and wipe
them all in one tick (default cleanup interval is 24h, so this is a
24-hour bomb on top of the prior commit).

New approach: orphan-ness is decided in the database, not on the
filesystem. The cleanup loop:

- Snapshots every distinct non-NULL `image_exif.content_hash` into a
  HashSet (new `ExifDao::list_distinct_content_hashes` method —
  `SELECT DISTINCT content_hash WHERE content_hash IS NOT NULL`).
- Walks `$VIDEO_PATH` two levels deep: top-level entries are filtered
  to 2-char lowercase hex shard dirs, each shard's children to 64-char
  hex hash dirs. Anything else (legacy `.m3u8` at root from the
  pre-content-hash era, operator-stashed dirs, partial writes) is left
  alone.
- Hash dirs whose hash isn't in the alive set are `remove_dir_all`'d.
  Shard dirs that emptied as a result are reaped on the same pass via
  `remove_dir` (no-op if non-empty).
- The library-stale safety gate is preserved: a stale library skips
  the cycle even though the orphan decision is DB-only, because the
  upstream missing-file scan that retires `image_exif` rows itself
  pauses for stale libraries. Belt-and-suspenders — keeping a hash
  dir for one extra 24h cycle is cheaper than wiping one whose source
  was briefly unreachable. The gate now also filters disabled
  libraries out of the stale set (they're intentionally absent from
  the health map).
- The legacy `excluded_dirs` parameter is preserved on the function
  signature but unused (the walk no longer crosses library trees);
  flagged with a leading underscore. Callers in `main.rs` stay
  unchanged.

`MockExifDao` in `files.rs` grows the new method (returns empty);
unit tests for the new `is_hash_shard` / `is_full_hash` validators
guard against an operator's stashed directory under VIDEO_PATH ever
matching the orphan-rm path. Both pass.

A follow-up commit handles the one-shot startup migration that
retires the legacy basename-keyed `.m3u8` / `.ts` files at
`$VIDEO_PATH` root.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 15:41:04 -04:00
Cameron Cordes d1667099c3 hls: rewire queue + generator to write hash-keyed playlists
Switches the watcher → VideoPlaylistManager → PlaylistGenerator path
from the basename-keyed layout
(`$VIDEO_PATH/{basename}.m3u8`) to the hash-keyed layout
(`$VIDEO_PATH/{hash[..2]}/{hash}/playlist.m3u8`) introduced in the
prior commit. Source videos that share a basename across libraries
(or across subdirs of one library) no longer overwrite each other's
playlists. The legacy HTTP endpoints in `/video/generate` /
`/video/stream` still use the basename layout — those move in a
follow-up commit alongside the stable streaming URL.

actors.rs:
- `QueueVideosMessage.video_paths: Vec<PathBuf>` →
  `videos: Vec<VideoToQueue>`. The queue handler dedups against the
  hash-keyed playlist + sentinel and forwards `GeneratePlaylistMessage`
  carrying the hash.
- `GeneratePlaylistMessage` now carries `content_hash: String`; the
  legacy `playlist_path: String` field is gone.
- `PlaylistGenerator` takes a `video_dir: PathBuf` at construction,
  computes the hash dir + playlist + sentinel + segment template via
  `hls_paths`, `mkdir -p`s the shard/hash dir before ffmpeg runs, and
  cleans up partial output on failure by walking the hash dir.
- `ScanDirectoryMessage` and its handler are retired entirely; their
  startup-walk role is taken over by the watcher's first tick (see
  `watcher.rs` below). Dropping it avoids threading an `ExifDao` into
  `VideoPlaylistManager` just so the actor can resolve hashes.
- Legacy `playlist_file_for` / `playlist_unsupported_sentinel` are
  retained behind `#[allow(dead_code)]` for the upcoming migration
  pass that retires pre-content-hash output.

watcher.rs:
- `process_new_files` keeps `content_hash` in the EXIF-batch result
  (formerly threw it away). Videos with `image_exif.content_hash =
  NULL` — mid-backfill rows — are skipped this tick rather than
  falling back to a basename-colliding playlist; they get picked up
  after `backfill_unhashed_backlog` populates the hash on a
  subsequent tick. Skipped count is logged at debug.
- The video staleness check now uses `hls_paths::playlist_for_hash`
  instead of `$VIDEO_PATH/{basename}.m3u8`.
- `last_full_scan` initialises to `UNIX_EPOCH` so the watcher's first
  tick is treated as a full scan. That covers the catch-up gap left
  by removing `ScanDirectoryMessage` — every library's existing media
  is checked once at watcher boot (≈60s after startup) instead of
  waiting up to `WATCH_FULL_INTERVAL_SECONDS` (1h default).

main.rs: removes the `ScanDirectoryMessage` import and the per-library
`do_send` loop, with a comment pointing at the watcher's first-tick
behavior.

state.rs: `PlaylistGenerator::new` now takes the video dir.

Tests: existing `video::hls_paths` (4) and `watcher::tests` (4) pass.
The basename-keyed `/video/generate` endpoint still compiles and
serves; behavior change there is deferred to the follow-up commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 15:36:01 -04:00
Cameron Cordes c71e1cdce0 hls: add hash-keyed path helpers + VideoToQueue type
Foundation for migrating HLS playlist output from basename-keyed
(`$VIDEO_PATH/{basename}.m3u8`) to content-hash-keyed
(`$VIDEO_PATH/{hash[..2]}/{hash}/playlist.m3u8`). The basename layout
collides whenever two source videos share a filename — common with
iPhone-style sequential naming (`IMG_NNNN.MOV`) across libraries — so
the loser's playlist gets overwritten and ffmpeg keeps re-queueing the
file every scan.

This commit adds the path layout and type plumbing without touching the
actor pipeline, watcher, or HTTP handlers yet:

- `src/video/hls_paths.rs`: `playlist_for_hash`, `sentinel_for_hash`,
  `segment_template_for_hash` built on top of `content_hash::hls_dir`,
  with constants for the filenames inside the hash dir. Unit tests
  cover the sharded layout and the playlist/sentinel/segment paths
  all landing in the same directory (so HLS relative refs resolve).
- `src/content_hash::hls_dir` un-deaded — was waiting for this branch.
- `VideoToQueue` struct in `actors.rs`: pairs a source path with its
  content hash so callers that lack a hash (rows mid-backfill) skip
  the video rather than fabricate one.
- `playlist_file_for` / `playlist_unsupported_sentinel` retained as
  migration-only helpers — they're only needed by the one-shot startup
  pass that retires pre-content-hash output.

Follow-ups (separate commits on this branch): wire `hls_paths` through
the queue handler + `PlaylistGenerator`, update the watcher's
`process_new_files` to build `VideoToQueue`, switch `/video/generate`
and `/video/stream` to resolve path→hash and return stable URLs, add
the legacy-layout migration, rewrite `cleanup_orphaned_playlists` for
the new dir shape, and surface progress via Prometheus + `/hls/stats`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 15:23:31 -04:00
cameron 22ce1a20e7 Merge pull request 'feature/library-patch-endpoint' (#94) from feature/library-patch-endpoint into master
Reviewed-on: #94
2026-05-13 13:44:36 +00:00
Cameron Cordes 7ec156fc05 libraries: accept newline as an excluded_dirs separator
Splits parse_excluded_dirs_column on `,`, `\n`, AND `\r` so a textarea
submit with one entry per line works the same as comma-separated.
Mixed input (`a, b\nc`) parses cleanly too — the frontend can paste
from any source without preprocessing.

Motivated by the "forgot the comma" footgun: typing
`.thumbnails .thumbnails2` in a single-line input today stores a
never-matching component pattern. With newlines as a first-class
separator and the frontend switching to a textarea, the natural
one-per-line UX makes that mistake impossible.

The DB store form stays comma-joined (normalize_excluded_dirs_input
hasn't changed) so existing rows are unaffected and no migration is
needed. Newline support matters mostly for the inbound write path;
mirroring it on the read side keeps the parser round-trip safe in
case anything writes a newline form directly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 09:23:51 -04:00
Cameron Cordes 439532377d libraries: validate excluded_dirs entries on write
Reject the silent-footgun shapes that PathExcluder would store but
never match. The watcher would still walk past every photo as if the
exclude wasn't there, and the operator would have no signal that
their entry is dead. Caught at PATCH time with a descriptive 422.

Rules:
- Backslash anywhere → "use forward slashes" (catches \photos,
  photos\2024, \\server\share — Windows-typed entries land in the
  component-pattern bucket and never fire).
- Drive-letter prefix (Z:, Z:/...) → "relative to library root" —
  excludes are root-relative, not absolute system paths.
- Multi-segment name with no leading slash (photos/2024) →
  "did you mean /photos/2024?" — the common "I forgot the slash"
  typo, today silently stored as a component pattern that never hits.
- `..` segments in a path entry → "doesn't normalise". base.join()
  doesn't canonicalise, so the resulting prefix never matches.
- Bare "/" → "almost certainly a typo" for the library root.

Trailing slashes on path entries are stripped silently. Eight new
tests cover each rejection plus the trailing-slash normalisation
and the all-or-nothing failure mode of normalize_excluded_dirs_input
(one bad entry aborts the whole patch rather than silently applying
N-1 of N changes).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 09:02:29 -04:00
Cameron Cordes ce9fa94cb4 libraries: surface globals, normalise excluded_dirs on write
Two follow-ups to the PATCH endpoint:

1. GET /libraries now returns ``global_excluded_dirs`` alongside the
   library list — the union-with-globals semantics is invisible from
   the per-library row alone, and the admin UI needs to show what's
   already being skipped before the operator adds entries that would
   duplicate.

2. PATCH /libraries/{id} canonicalises the excluded_dirs string on
   write via the new ``normalize_excluded_dirs_input``: trims per
   entry, drops empties, dedupes preserving first-occurrence order,
   comma-joins without inner whitespace. Empty / whitespace-only →
   NULL. Round-trip stable so re-saving an entry produces an
   identical row.

Five new tests cover the empty / whitespace, trim, dedup, round-trip,
and overlap-with-globals cases. effective_excluded_dirs continues to
keep overlapping entries between globals and per-library on purpose —
PathExcluder accepts repeats and there's no behavioural reason to
dedupe at merge time.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 08:58:04 -04:00
Cameron Cordes b3124437ec libraries: PATCH /libraries/{id} with live-apply
Adds an HTTP mutation surface for `libraries.enabled` and
`libraries.excluded_dirs`, replacing the SQL-only workflow noted in
CLAUDE.md. Apollo's Settings panel calls this from the LIBRARIES
section so the operator no longer has to ssh + sqlite3 to flip a
library off or edit its excludes.

Live-apply (no restart) via a new `live_libraries: Arc<RwLock<Vec<
Library>>>` field on AppState. The existing immutable `libraries`
Vec stays for hot-path handlers that only need stable id → root_path
lookups, avoiding a 19-call-site refactor. The watcher and
cleanup_orphaned_playlists now take the lock instead of a Vec
snapshot and re-read at the top of each tick, so `enabled` /
`excluded_dirs` changes are picked up within one
WATCH_QUICK_INTERVAL_SECONDS. The GET /libraries handler also reads
through the live view.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 08:47:35 -04:00
cameron 74bf693878 Merge pull request 'feature/date-backfill-null-only' (#93) from feature/date-backfill-null-only into master
Reviewed-on: #93
2026-05-12 18:42:21 +00:00
Cameron Cordes 2d56047497 Drop fs_time from date-backfill eligibility
The drain queried `date_taken IS NULL OR date_taken_source = 'fs_time'`
ORDER BY id ASC LIMIT 500 every watcher tick. The resolver is
deterministic on file bytes + filename + fs metadata, so any row that
landed on fs_time once landed there again on every retry — the drain
spun on the same lowest-id rows in perpetuity, never advancing to
rows 501+ while still logging more_remain=true.

Side effect: 500 auto-commit UPDATEs per tick sustained the SQLite
write lock long enough that other writers on separate DAO connections
hit the 5s busy_timeout. Manifested as intermittent 500s on
PATCH /image/faces/{id} that succeeded on retry.

Narrow the partial index and query predicate to `date_taken IS NULL`.
If exiftool installs or a new filename regex lands, an operator can
re-resolve fs_time rows out-of-band rather than re-introducing the
steady-state churn.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 14:37:36 -04:00
Cameron Cordes 3427c2916c Log 500-return paths in PATCH /image/faces/{id}
The four 500-return paths in update_face_handler returned e.to_string()
in the body but never logged. When a face PATCH failed with a 16-byte
body and no log entry, the cause (SQLITE_BUSY from cross-DAO writer
contention exhausting the 5s busy_timeout) was invisible. Surface the
full anyhow chain via {:#} on each path so the diesel cause is in the
log even when the response body only shows the top-level context.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 14:37:26 -04:00
cameron 6a3e37b7dc Merge pull request 'feature/split-main-rs' (#92) from feature/split-main-rs into master
Reviewed-on: #92
2026-05-12 17:02:06 +00:00
Cameron Cordes 9f8a69fc6d Split main.rs: extract watcher loop into src/watcher.rs
main.rs drops from 1200 → 346 lines (90% smaller than the pre-branch
3542). What's left is the startup wiring it was always meant to be:
.env, migrations, AppState construction, route registration, server
bind. The four background-loop functions move into src/watcher.rs:

- watch_files (310 lines) — quick/full scan tick, per-library probe,
  backfill drain dispatch, missing-file scan, back-ref refresh,
  orphan GC.
- process_new_files (351 lines) — file walk → EXIF write →
  face-candidate build → HLS / preview-clip queueing →
  reconciliation. The "biggest untested chunk" from the earlier
  audit.
- cleanup_orphaned_playlists (167 lines) — separate slower-tick
  thread.
- playlist_needs_generation — small mtime-comparison helper.

Plus 4 unit tests for playlist_needs_generation (covers missing
playlist, newer playlist, newer video, video-missing-metadata
fallback).

main.rs's imports correspondingly shrink — Addr, HashSet, WalkDir,
Utc, InsertImageExif, and the bulk of video::actors all leave with
the watcher. CLAUDE.md updated to reflect the new module layout
(layered architecture box + module map for the face-detection
section).

cargo test --bin image-api: 329 passing (no regression).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 12:54:37 -04:00
Cameron Cordes bdb69c7d37 Split main.rs: extract HTTP handlers into src/handlers/
main.rs drops from 2935 → 1200 lines, freed for startup wiring +
the watcher. The 16 route handlers move into three domain-grouped
files under src/handlers/:

- handlers/favorites.rs (128 lines): favorites, put_add_favorite,
  delete_favorite.

- handlers/video.rs (665 lines): generate_video, stream_video,
  get_video_part, get_video_preview, get_preview_status. The 5
  pre-existing get_preview_status integration tests move with the
  handler (still pass against TestPreviewDao + AppState::test_state).

- handlers/image.rs (1003 lines): get_image (with the
  hash/library-scoped/bare-legacy thumb lookup), upload_image,
  get_file_metadata, set_image_gps, get_full_exif, set_image_date,
  clear_image_date. Helpers (create_circular_thumbnail,
  build_metadata_response_for_date_mutation) and request structs
  (SetGpsRequest, SetDateRequest, ClearDateRequest, UploadQuery)
  travel with them.

main.rs's import block shrinks from ~50 lines to ~22 as everything
HTTP-specific (NamedFile, mp::Multipart, BytesMut, Span, KeyValue,
StreamExt, …) moves with the handlers. The is_video_file wrapper
also goes — remaining callers in watch_files / cleanup use
file_types::is_video_file directly.

cargo test --bin image-api: 325 passing (no regression).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 12:38:17 -04:00
Cameron Cordes bec9857426 Split main.rs: extract backfill drains and thumbnails into modules
main.rs drops from 3542 → ~2930 lines by moving:

- src/backfill.rs (new): backfill_unhashed_backlog,
  backfill_missing_date_taken, backfill_missing_content_hashes,
  build_face_candidates, process_face_backlog. Now unit-tested for
  the first time — 5 tests covering cap behavior, library-id
  filtering, missing-on-disk skip, and the video/unhashed/scanned
  filters on face-candidate selection.

- src/thumbnails.rs (new): unsupported_thumbnail_sentinel,
  generate_image_thumbnail, create_thumbnails, update_media_counts,
  is_image, is_video, plus the IMAGE_GAUGE / VIDEO_GAUGE Prometheus
  metrics. Replaces the no-op stubs that used to live in lib.rs.
  4 new unit tests for the sentinel path math and the
  walker-counts-images-vs-videos smoke path.

Supporting:
- SqliteExifDao::from_shared (test-only) so an SqliteExifDao and
  SqliteFaceDao can share one in-memory connection — required to
  test build_face_candidates against the real join.
- files.rs / video/{mod,actors}.rs import from crate::thumbnails::*
  instead of the now-removed stubs in lib.rs.

cargo test --bin image-api: 325 passing (was 314).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 12:22:02 -04:00
cameron 05ec5d0c70 Merge pull request 'feature/knowledge-curation' (#91) from feature/knowledge-curation into master
Reviewed-on: #91
2026-05-12 15:40:55 +00:00
Cameron Cordes e67e00ef8a knowledge: predicate-quality nudge + bulk-reject endpoint
Two coupled changes to fight the speech-act-predicate problem
(facts like (Cameron, expressed, "I'm tempted to...")):

1. System prompt grows an explicit predicate-quality rule. The
   agent is told to use relationship-shaped verbs (lives_in,
   works_at, attended, is_friend_of, interested_in), and is
   given an explicit DON'T list (expressed, said, mentioned,
   stated, quoted, noted, discussed, thought, wondered). Plus a
   concrete Bad / Good example contrasting the noise pattern
   with the structured paraphrase the agent should be writing.
   Stops the bleed for new insights.

2. Cleanup tools for the legacy noise that's already in the
   table:
   - get_predicate_stats(persona, limit) returns
     [(predicate, count)] sorted desc — feeds the curation UI's
     PREDICATES tab.
   - bulk_reject_facts_by_predicate(persona, predicate, audit)
     flips every ACTIVE fact under that predicate to 'rejected'
     in one transaction, stamping last_modified_* so the action
     is attributable + reversible per-fact through the entity
     detail panel. REVIEWED facts under the same predicate are
     left alone — the curator may have hand-approved an
     exception ("interested_in" might be largely noise but a
     reviewed entry is intentional).

New HTTP endpoints:
   GET  /knowledge/predicate-stats?limit=
   POST /knowledge/predicates/{predicate}/bulk-reject

Persona-scoped via the existing X-Persona-Id header.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 21:50:26 -04:00
Cameron Cordes fb078b4906 knowledge: normalize legacy entity_type values
One-shot migration that re-applies the synonym map from
`normalize_entity_type` over every existing row, so legacy
entries written before that helper landed in upsert_entity stop
needing client-side workarounds.

  person ← person | people | human | individual | contact
  place  ← place | location | venue | site | area | landmark
  event  ← event | occasion | activity | celebration
  thing  ← thing | object | item | product

Unknown types ("friend", "family", etc.) get a lowercase+trim
sweep so at minimum case variants collapse — the curator can
merge or rename them via the curation UI from there.

`UPDATE OR IGNORE` skips rows that would violate UNIQUE(name,
entity_type) after the rewrite (e.g. an existing ("Sarah",
"person") + ("Sarah", "Person") pair). The duplicate survives
unchanged so it can be merged through the normal curation flow
rather than silently disappearing.

Idempotent: every UPDATE is conditional on `entity_type !=
canonical`, so re-running the migration is a no-op. The down
migration is intentionally inert — we don't have per-row
history of the original strings and the rewritten values stay
semantically correct.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 21:42:51 -04:00
Cameron Cordes d123cde333 knowledge: entity-graph endpoint for force-directed view
New GET /knowledge/graph?type=&limit= returns the data the
curation UI's graph tab needs:
  - nodes = entities with at least one in-scope fact (rejected /
    superseded excluded). Carries fact_count for visual sizing.
    Top-N by count desc; default cap 200 (clamped 1..1000).
  - edges = relational facts (object_entity_id set) grouped by
    (subject, object, predicate) so 3 "is_friend_of" facts
    between the same pair collapse into one edge with count=3.

Two raw SQL queries: an INNER JOIN onto a persona-scoped fact-
count subquery for nodes (skips 0-fact entities entirely so the
sim doesn't waste time on disconnected islands), then a follow-
up GROUP BY over the persona-scoped fact set restricted to the
node id set via IN clauses (ids are i32 so inlining is safe).

Pairs with the Apollo-side GraphPanel that runs d3-force over
the returned payload and renders SVG with click-to-open.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 21:26:02 -04:00
Cameron Cordes 6dca0c027d fmt: cargo fmt sweep
No logic changes - line reflow, brace placement, and method-chain splits
across handlers / personas / state / faces / knowledge / insights_dao /
knowledge_dao / populate_knowledge. Picked up incidentally while running
fmt for the sms-search work.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 19:21:00 -04:00
Cameron Cordes 7329cc5ce7 insights: push sms search filters server-side, render snippets, expand fts5 docs
- Refactor search_messages_with_contact -> search_messages(query, &SmsSearchParams)
  exposing date_from / date_to / offset / is_mms / has_media; drop the over-fetch
  + client-side date post-filter that could silently drop in-window hits past
  position 100.
- Surface SMS-API's <mark>-wrapped snippet for MMS messages that only matched
  via message_parts_fts (attachment text / filename) - pre-snippet, those
  rendered as a blank body preview to the LLM.
- Expose is_mms / has_media on the search_messages tool schema; expand the
  FTS5 syntax docs with worked examples for phrase / prefix / boolean / NEAR
  / grouping so the model picks the right operator.
- Unit tests for format_search_hits (body fallback, snippet preferred, MMS
  attachment-only regression, empty-snippet fallback) and strip_mark_tags.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 19:20:19 -04:00
Cameron Cordes 6620fa48d7 knowledge: consolidation proposals endpoint
Finds near-duplicate entities the upsert-time cosine guard didn't
catch — typically legacy data from before that guard landed, or
pairs whose embeddings sit between 0.85 (default proposal floor)
and 0.92 (auto-collapse threshold). Pure read-side feature; the
actual merging still goes through the existing
/knowledge/entities/merge action.

New DAO method `find_consolidation_proposals(threshold,
max_groups)`:
  - Loads every non-rejected entity with an embedding.
  - Partitions by entity_type so a person can't cluster with a
    place.
  - Pairwise cosine, edges above threshold feed a union-find for
    transitive grouping (Sara → Sarah → Sarah J. all land in one
    cluster).
  - Tracks min/max cosine per component so the UI can show "how
    tight" each cluster is before clicking in.
  - Returns groups of >= 2 sorted by size desc then max cosine
    desc; trimmed to `max_groups`.

New endpoint `GET /knowledge/consolidation-proposals?threshold=
&limit=` accepts the threshold (clamped 0.5–0.99 to prevent the
"every entity in one mega-cluster" case) and returns groups with
per-entity persona fact-count breakdowns baked in — saves the UI
a separate query per cluster member.

ConsolidationGroup is exported through database/mod.rs so the
handler can use it without depending on knowledge_dao internals.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 18:43:11 -04:00
Cameron Cordes 89d0a6527c knowledge: per-entity persona breakdown for list + detail
Entities are global; facts are persona-scoped. Under the active
persona an entity can read as "0 facts" while having plenty under
other personas the user owns — the curation UI had no way to
surface that gap. Adds a batched DAO method
`get_persona_breakdowns_for_entities` that returns
{entity_id → [(persona_id, count)]} in one query (group by
subject + persona, user-scoped, status != rejected), and wires it
into both /knowledge/entities list rows and
GET /knowledge/entities/{id}.

EntitySummary grows an optional `persona_breakdown` field
(skipped on serialization when None — keeps PATCH responses
unchanged). EntityDetailResponse carries the breakdown as a
non-optional Vec since the detail endpoint always populates it.

One extra query per list page (50 entities → 50 subject ids
batched in one IN clause); single-entity GET adds one round trip.
Indexed by (subject_entity_id, persona_id) implicitly via the
existing user-persona indexes on entity_facts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 18:29:20 -04:00
Cameron Cordes f200466508 knowledge: forbid markdown in synthesized merge descriptions
System prompt now explicitly enumerates the markdown forms the
model shouldn't emit (bold, italics, headings, bullets, lists,
code fences) on top of the existing "no preamble, no quotes"
constraints. Some local models default to markdown-shaped
output for descriptions and the curation UI is plain-text,
which would render the asterisks and hashes literally.

The output cleaning step picks up a parallel sweep: strip code
fences, leading bullets / headings, wrapping quotes, and naive
inline emphasis markers (** and __). Rare enough that the
plain-replace is fine; not trying to parse markdown.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 16:49:02 -04:00
Cameron Cordes afac02cade knowledge: synthesize-merge endpoint for LLM-curated descriptions
New POST /knowledge/entities/synthesize-merge { source_id,
target_id } that calls the local Ollama with both entities' names
+ descriptions and returns a synthesized merged-description draft.
Read-only on the database — the curation UI uses the response as
the editable seed in the merge picker; the actual merge still
requires a follow-up PATCH-target-description + POST /merge.

The handler drops the KnowledgeDao lock before the LLM call so
other knowledge reads aren't blocked while generation runs
(typically seconds). Failure mode is 503 with an explicit hint
that the UI should fall back to skip-synthesis — keeps the merge
action working when the model is offline.

Output is lightly cleaned (leading "Merged description:" /
surrounding quotes stripped) since small models reach for those
patterns even with explicit "no preamble" guidance. Heavier
parsing isn't worth it — the curator edits anyway.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 16:37:26 -04:00
Cameron Cordes fd4dd89bbb knowledge: agent self-correction with audit + per-persona gate + revert
Bundles three coupled changes so agent-side mutations stay
auditable and reversible:

1. Audit columns on entity_facts —
   `last_modified_by_model` / `last_modified_by_backend` /
   `last_modified_at`. Stamped on every mutation path
   (update_fact, supersede_fact, manual PATCH, manual supersede,
   the new revert). NULL on rows never touched since creation.
   Partial index on `last_modified_at WHERE NOT NULL` keeps the
   "show me recent edits" feed fast without bloating from legacy
   rows.

2. Per-persona gate `personas.allow_agent_corrections` (BOOLEAN,
   default 0). Defense in depth at two layers:
   - build_tool_definitions: when off, `update_fact` and
     `supersede_fact` aren't in the catalog at all, so even a
     hallucinated tool call by the model fails fast.
   - tool_update_fact / tool_supersede_fact: re-checks the persona
     flag at call time and returns an explicit "corrections
     disabled" error if it's somehow off (e.g. flag flipped mid-
     loop).
   ToolGateOpts grows the flag; current_gate_opts splits into
   `current_gate_opts` (no persona context, defaults closed) +
   `current_gate_opts_for_persona` for chat callers that have a
   persona id. Both call sites in insight_chat are updated.

3. Revert action — new DAO method `revert_supersession` +
   `POST /knowledge/facts/{id}/restore`. Flips status back to
   'active', clears `superseded_by`, clears `valid_until` (we
   don't track whether it was hand-set vs auto-stamped, so the
   safe reset is to drop it — user can re-bound after). Stamps
   `last_modified_*` so the revert itself is attributable.

Manual paths (PATCH / supersede via HTTP, plus restore) stamp the
audit columns with `("manual", "manual")`. Agent paths stamp the
loop-time chat model and backend (mirroring the existing
created_by_* convention).

FactDetail in the HTTP response now carries the audit triple
alongside the existing provenance. Apollo wires the new field set
in the matching commit.

PersonaView / UpdatePersonaRequest grow `allowAgentCorrections`;
the PersonaPatch + InsertPersona + bulk_import paths thread it.

317 lib tests pass, including unchanged update_fact / supersede
DAO tests (now passing audit=None — None means "no provenance
context to attribute", legacy semantics).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 20:56:56 -04:00
Cameron Cordes 86c331571d knowledge: per-persona reviewed-only mode + agent reads include reviewed
Two coupled changes to the agent's recall surface:

1. Default scope expanded. recall_facts_for_photo and recall_entities
   used to filter to status='active' only — which silently dropped
   'reviewed' (human-verified) facts. Now they surface active +
   reviewed by default. Reviewed is strictly more trusted than
   active and shouldn't have been hidden. Rejected and superseded
   stay filtered.

2. New persona toggle `reviewed_only_facts` (BOOLEAN, default false,
   migration 2026-05-10-000400). When set, the agent's recall on
   that persona returns ONLY facts with status='reviewed' — strict
   mode for tasks where hallucinated agent claims are particularly
   costly. Wired:
   - schema.rs / Persona / InsertPersona / PersonaPatch grow the
     field.
   - PersonaView returns it as `reviewedOnlyFacts` (camelCase wire).
   - PUT /personas/{id} accepts it (mobile editor surfaces it).
   - InsightGenerator now carries a PersonaDao reference so
     recall_facts_for_photo can read the active persona's flag at
     start; one extra read per recall, cheap.

Composes with include_all_memories: that operates on the persona
*scope* axis (single vs hive), reviewed_only_facts on the *status*
axis. They're orthogonal.

Legacy persona rows pick up the default false on migration; no
behavior change unless explicitly toggled. The 4 existing persona
construction sites (one production, two tests, one InsertPersona in
knowledge_dao tests) all default the field. populate_knowledge bin
+ state.rs constructors also wire the new persona_dao arg.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 20:21:39 -04:00
Cameron Cordes f53338923d knowledge: stamp model + backend on facts for audit
Adds two nullable TEXT columns to entity_facts —
`created_by_model` (LLM identifier) and `created_by_backend`
("local" / "hybrid" / "manual" / NULL) — so the curator can audit
which configurations produce good fact-keeping and which produce
noise.

photo_insights already carries model_version + backend, and
entity_facts.source_insight_id links to it, but:
  - source_insight_id is set post-loop, so chat-continuation and
    regenerated-insight facts lose the link.
  - JOINing per read is more friction than embedding provenance on
    the row itself.
  - Manual facts (POST /knowledge/facts) have no insight at all and
    need their own "manual" provenance marker.

Threading: execute_tool grows `model` + `backend` params, passed
from the three call sites (agentic insight loop, chat single-turn,
chat stream) using the loop-time `chat_backend.primary_model()` +
`effective_backend` already in scope. tool_store_fact stamps the
new fact accordingly; manual create_fact stamps backend="manual".
Legacy rows leave both NULL — pre-tracking data can't be back-
filled reliably from training_messages without burning compute.

Indexes are partial (WHERE NOT NULL) so legacy rows don't bloat
them, and "show me all facts from model X" stays fast.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 20:05:14 -04:00
Cameron Cordes 85f3716379 knowledge: fact supersession + photo-date valid_from
Two Phase-2 followups in one commit since they're coupled at the
write path:

* Agent populates valid_from from the source photo's date_taken
  when calling store_fact. Loose semantics — date_taken is *evidence
  at that date*, not strictly when the fact started being true — but
  gives the curator a calendar anchor and pairs with supersession to
  close intervals cleanly. valid_until stays NULL (a single photo
  can't tell us when something stopped). Honours the existing
  upsert_fact dedup (corroborated facts keep their first-recorded
  valid_from).

* Supersession: new column entity_facts.superseded_by INTEGER
  (migration 2026-05-10-000200), new status value 'superseded',
  new DAO method supersede_fact, new HTTP endpoint
  POST /knowledge/facts/{id}/supersede.

  Marking an old fact as replaced by a new one atomically: flips
  status to 'superseded', sets superseded_by, and stamps
  valid_until from the new fact's valid_from (when not already
  set). delete_fact clears dangling supersession pointers in the
  same transaction so the column never points at a missing row —
  no FK because SQLite can't ALTER ADD with REFERENCES, but the
  DAO maintains the invariant.

Pairs with conflict detection from the previous slice: once the
old fact's valid_until is closed, its interval no longer overlaps
the new fact's, so they stop flagging — the supersede action
resolves the conflict.

Two tests pin the contract: supersede stamps valid_until from
new.valid_from while respecting an existing valid_until, and
deleting the supersedeR clears the dangling pointer while leaving
the old fact's 'superseded' status in place for history.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 19:47:06 -04:00
Cameron Cordes 01f5ad7527 knowledge: valid-time on facts + interval-aware conflict detection
Adds bitemporal support to entity_facts. Existing `created_at` is
transaction time (when we recorded the fact); the new
`valid_from` / `valid_until` BIGINT columns are valid time (when the
fact is/was true in the real world). NULL on either side = unbounded
on that side, both NULL = "always-true / unknown" — matches the
default state of every legacy row, no backfill needed.

The split matters for time-bounded predicates like
is_in_relationship_with / lives_in / works_at: recording the fact
once doesn't mean the relationship is still ongoing. Same predicate
across different windows ("lives_in NYC 2018-2020", "lives_in SF
2020-present") is no longer a conflict — the interval-aware check
in get_entity only flags pairs whose windows overlap. Facts with no
valid-time data still flag against everything (worst case for legacy
rows — user adds dates to suppress).

API surface:
- POST /knowledge/facts accepts optional valid_from / valid_until.
- PATCH /knowledge/facts/{id} accepts both with tri-state semantics:
  field omitted = leave alone, JSON null = clear to NULL, number =
  set. Implemented via a small serde helper around Option<Option>.
- GET /knowledge/entities/{id} surfaces both fields per fact and
  uses them in conflict detection.

Agent path (insight_generator) writes NULL/NULL for now — deriving
valid_from from the source photo's date_taken is slated for a
follow-up agent tool alongside Phase 2's supersession.

Test pins set + clear semantics via update_fact: setting both
bounds, leaving them alone on a subsequent patch, then clearing
valid_until back to NULL.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 19:25:55 -04:00
Cameron Cordes bcd5312953 knowledge: detect same-predicate object conflicts at read time
GET /knowledge/entities/{id} now flags facts as `in_conflict` when
another active fact shares the same predicate but disagrees on the
object (entity id or text value). Pure read-time computation in the
handler — group facts by predicate, distinct-object count > 1 flags
all members. No schema change; same shape as `is_current` on photo
insights.

The flag is intentionally a *signal*, not a hard constraint. Some
predicates are legitimately multi-valued (friend_of, tagged_in,
appears_in) — the curator UI surfaces the amber accent and lets the
user reject the stale fact, accept both, or supersede one later
once the supersession column lands.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 19:14:58 -04:00
Cameron Cordes 0b8478a5e4 knowledge: list sort + persona-scoped fact_count per entity
Two related additions to /knowledge/entities:

- New EntitySort enum (UpdatedDesc default, NameAsc, FactCountDesc)
  surfaced via `?sort=updated|name|count`. NameAsc clusters near-
  duplicate names so dupes stand out at a glance; FactCountDesc
  surfaces heavily-used entities and demotes 0-fact noise to the
  bottom.

- New `list_entities_with_fact_counts` DAO method that returns each
  entity alongside a persona-scoped count of its non-rejected facts
  (subject side). Persona scope follows X-Persona-Id via the
  existing resolve_persona_filter chain — Single filters on
  (user_id, persona_id), All unions across the user's personas.
  Implemented as one raw SQL query with a LEFT JOIN to a fact-count
  subquery and ORDER BY tied to the chosen sort, so count-sort needs
  no second round trip.

The agent's existing list_entities call site is unchanged — it
doesn't need persona-scoped counts and the trait method stays cheap.
EntitySummary grows an Option<i64> fact_count (skip_serializing_if
none) so PATCH responses stay shaped as before.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 16:04:13 -04:00
Cameron Cordes 0e2b18224f knowledge: pre-delete relational facts so entity delete succeeds
DELETE /knowledge/entities/{id} was 500ing on any entity that was the
object of a relational fact. entity_facts.object_entity_id has
ON DELETE SET NULL, but the table also has
CHECK (object_entity_id IS NOT NULL OR object_value IS NOT NULL) —
purely relational facts (subject + predicate + object_entity_id, no
object_value, like "Alice is_friend_of Bob") would have both NULL
after SET NULL fired, the CHECK would abort, and the whole DELETE
would fail with a CHECK violation. The user just saw QueryError
because the DAO swallowed the diesel error string.

Wrap delete_entity in a transaction that first deletes facts where
the entity is the object AND object_value is null, then deletes the
entity. Surviving siblings (typed facts about the entity as subject)
are CASCADE'd by the FK as before. Also start surfacing the actual
diesel error in a warn log before collapsing to DbErrorKind so future
similar issues don't masquerade as the opaque QueryError.

A schema-level fix (changing object FK to ON DELETE CASCADE via a
table-rebuild migration) is the cleaner long-term resolution and is
slated for Phase 2; the DAO-side pre-delete is sufficient and less
invasive in the meantime.

Test pins the contract: a relational fact pointing at the deleted
entity is removed, an unrelated typed fact about an unrelated entity
survives, and the entity itself is deleted.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 15:44:38 -04:00
Cameron Cordes f7ce3d2b22 knowledge: include library_id in photo_links response
The PhotoLinkDetail in /knowledge/entities/{id} was dropping the
library_id field, leaving consumers no way to construct a
content-routed thumbnail URL. Apollo's curation screen was falling
through to library=0 (the FastAPI default) and getting 400s.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 15:19:37 -04:00
Cameron Cordes d7aee4f228 knowledge: cosine dedup, fact create endpoint, recall nudge
Phase 1 of the knowledge curation work. Three small server-side changes
to support an Apollo-side curation surface and reduce the agent's near-
duplicate output rate going forward:

- upsert_entity grows an embedding-cosine fallback after the exact name
  match misses. New entities whose embedding sits above
  ENTITY_DEDUP_COSINE_THRESHOLD (default 0.92) against any same-type
  active entity collapse onto the existing row. Eliminates the Sarah /
  Sara / Sarah J. trio the FTS5 prefix check was missing.
- POST /knowledge/facts symmetric with the existing PATCH/DELETE so the
  curation UI can create facts directly. Persona-scoped via X-Persona-Id;
  validates subject (and optional object) entity existence; reuses
  KnowledgeDao::upsert_fact so corroboration semantics match the agent
  path.
- One sentence in build_system_content telling the agent to call
  recall_entities before store_entity when a name resembles something
  already known. Cheap; complements the DAO-layer guard.

Includes upsert_entity_collapses_near_duplicate_by_embedding test
covering both the collapse-on-near-match path and the don't-collapse-on-
unrelated-embedding path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 15:16:05 -04:00
cameron 827a78dd79 Merge pull request 'feature/persona-fk-and-guard' (#90) from feature/persona-fk-and-guard into master
Reviewed-on: #90
2026-05-10 18:42:27 +00:00
Cameron Cordes 08a5f46be1 chat: scope insight lookup by library_id to fix regen-shadow bug
When a photo exists in more than one library and the user
regenerates its insight from library A's chat, the regenerate
streams cleanly, store_insight flips library A's old row to
is_current=false, and inserts a new is_current=true row tagged
(library A, rel_path). On the next history fetch the user sees
their old transcript — the regenerate appears to vanish.

The cause: get_insight(file_path) filters on rel_path + is_current
only, so library B's untouched is_current=true row for the same
rel_path satisfies the query and gets returned by SQLite's .first()
ahead of A's new row. Because get_insight is also what
chat_turn_stream uses to decide bootstrap vs. continuation, the
next chat turn after the shadow hit also routes against the
wrong insight, so update_training_messages corrupts library B's
transcript with library A's chat.

Fix: add get_current_insight_for_library(library_id, file_path)
filtered on (library_id, rel_path, is_current=true) and route the
chat surface (load_history, chat_turn{,_stream}, rewind_history)
through it. load_history falls back to the cross-library
get_insight when the scoped lookup misses — preserves the
"scalar data merges across libraries" intent for the case where
the active library has no insight but another does. The path-only
get_insight stays for callers that don't have library context
(populate_knowledge, the photo-grid metadata fetch).

chat_history_handler stops dropping the parsed library on the
floor and threads it through. Single-library deploys see no
behaviour change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 14:03:41 -04:00
Cameron Cordes b9d9ba0320 chat: route search_messages({date}) to get_sms_messages
When the LLM calls search_messages with { date, limit } and no
query, it's making the predictable mistake of conflating the two
"messages"-shaped tools. The previous behaviour returned an error
that pointed it at get_sms_messages — correct, but burning a turn
on the misroute. Long photo-chat threads where the user asks
"what was happening that weekend?" hit this on small models
roughly half the time.

Now the date-string-without-query case transparently dispatches
to get_sms_messages with the same args (date / limit / days_radius
/ contact name all pass through unchanged) and prepends a short
"(Note: routed to get_sms_messages — prefer it directly next time)"
to the result. The model sees real data on its first try while
still learning the right tool for next time. Cases that don't have
a get_sms_messages equivalent (numeric contact_id, or start_ts /
end_ts windows) keep the original error so the model knows to
either supply a query or restructure its call.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 13:48:13 -04:00
Cameron Cordes fbd769e475 personas: composite FK + built-in update guard
Two persona-infrastructure correctness fixes that go together because
the second one (FK with CASCADE) requires the first (preventing the
persona row from being mutated out from under its facts).

1. update_persona handler refuses name/systemPrompt edits to built-ins
   (409). includeAllMemories stays editable — that's a per-user
   preference, not the persona's identity. Mirrors the existing
   delete_persona guard. The DAO is intentionally permissive so the
   guard sits at the HTTP layer; persona_dao test pins that contract.

2. Migration 2026-05-10 adds user_id to entity_facts and a composite
   FK (user_id, persona_id) -> personas(user_id, persona_id) ON DELETE
   CASCADE. This closes two issues at once:

   - Persona orphans: deleting a custom persona used to leave its
     facts dangling forever, readable only via PersonaFilter::All.
     CASCADE now wipes them with the persona row.

   - Multi-user fact leakage: PersonaFilter::Single("default") used
     to surface every user's default-scoped facts. PersonaFilter is
     now { user_id, persona_id } and all read paths
     (get_facts_for_entity, list_facts, get_recent_activity) filter
     on user_id first. upsert_fact's dedup key extends to user_id so
     identical claims under shared persona names from different
     users no longer corroborate-bump each other's confidence.

   - user_id threads from Claims.sub.parse::<i32>().unwrap_or(1) at
     the chat / insight handlers through ChatTurnRequest, the
     streaming agentic loop, execute_tool, and into the leaf tools
     (tool_store_fact, tool_recall_facts_for_photo). The ".unwrap_or(1)"
     accommodates Apollo's service token whose sub is non-numeric on
     legacy mints.

   - Backfill picks the smallest user_id matching each legacy fact's
     persona_id so the FK holds for already-stored rows.

Five new knowledge_dao tests with FK-on connection: persona scoping
isolation, All-variant union per-user, dedup not crossing users,
CASCADE delete, FK rejection of unknown personas. Plus
dao_update_does_not_block_built_ins documenting where the
HTTP-layer guard lives.

Apollo coordinates separately — the matching changes there add the
/api/personas proxy and start sending persona_id on photo-chat turns.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 13:30:35 -04:00
cameron 79a1168724 Merge pull request 'faces: add person_id filter to /faces/embeddings; remove tag-bootstrap' (#89) from feature/faces-tab into master
Reviewed-on: #89
2026-05-10 15:49:18 +00:00
Cameron Cordes a079065ae9 faces: add person_id filter to /faces/embeddings; remove tag-bootstrap
Pairs with the Apollo FACES-tab change. The new
POST /api/persons/{id}/similar-unassigned route on Apollo needs to
fetch one person's embeddings cheaply to compute the centroid;
adding a person_id query param to /faces/embeddings keeps that to a
single round-trip instead of paging the whole detected set
client-side. When both person_id and unassigned=true are supplied,
person_id wins (the explicit filter is the more specific intent).

Tag-bootstrap removal: bootstrap_candidates_handler,
bootstrap_persons_handler, /persons/bootstrap and
/tags/people-bootstrap-candidates route registrations, and the
heuristic helpers (is_plausible_name_token, looks_like_person) plus
their tests. Only Apollo called these; the migration is complete.
The persons.created_from_tag column stays - it's informational on
existing rows and removing it would be a destructive migration for
no benefit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 11:30:37 -04:00
cameron 25233904aa Merge pull request 'personas: elevate to server with per-persona fact scoping' (#88) from feature/persona-knowledge-segmentation into master
Reviewed-on: #88
2026-05-10 03:44:26 +00:00
cameron 8c377324a1 Merge pull request 'video: handle unknown/short durations in thumb + preview gen' (#87) from fix/video-thumb-preview-edge-cases into master
Reviewed-on: #87
2026-05-10 03:12:58 +00:00
Cameron Cordes 5476ed8ac4 video: handle unknown/short durations in thumb + preview gen
`get_duration_seconds` now returns `Option<f64>` and falls back from
`format=duration` to `stream=duration`. Empty stdout no longer
parse-panics with "cannot parse float from empty string", which was
poisoning the preview-clip row with status=failed and re-queueing every
full scan (notably for GoPro LRV files). `generate_preview_clip` handles
the unknown-duration case by transcoding the whole file (capped at 10s).

`generate_video_thumbnail` seeks to ~50% of the probed duration instead
of a hardcoded `-ss 3`, with a first-frame fallback when the probe
returns nothing. Fixes the loop where short Snapchat clips (<3s) got
"missing thumbnail" logged on every scan because ffmpeg exited 0
without writing a frame, and never wrote the .unsupported sentinel
either.

Adds unit tests for `parse_ffprobe_duration` covering the empty-output,
N/A, multi-line, non-positive, and non-finite cases.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 23:08:16 -04:00
cameron 7350f1916a Merge pull request 'fix/manual-date-update' (#86) from fix/manual-date-update into master
Reviewed-on: #86
2026-05-10 02:53:20 +00:00
Cameron Cordes 9871c685b4 date-override: cargo fmt
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 21:23:11 -04:00
Cameron Cordes 108bbeb029 date-override: union semantics across libraries + slash forms
The date-override path used to look up `image_exif` strictly by
`(library_id, rel_path)` with only the forward-slash form, while
`/image/metadata`'s `get_exif` falls back across libraries and tries
both slash forms. A photo whose row sat under a different library_id
than its filesystem-resolved one — or whose rel_path was stored with
backslashes — rendered fine in the modal but 404'd on save.

`set_manual_date_taken` / `clear_manual_date_taken` now share a
`locate_image_exif_row` helper that mirrors `get_exif`'s union
semantics (scoped lookup first, library-agnostic fallback by rel_path
in both slash forms), then update by primary key so the write hits
exactly the row read. Inner anyhow errors are logged with
`(library_id, rel_path)` so the next failure mode is debuggable.

Handler-side: `resolve_library_param` errors no longer silently fall
back to the primary library (which would have masked the original bug
with a different "row not found"); a malformed library param now
returns 400. New `DbErrorKind::NotFound` lets the handler distinguish
genuine misses (404) from real DB failures (500).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 21:21:25 -04:00
Cameron Cordes 3e2f36a748 personas: elevate to server with per-persona fact scoping
Move personas off the mobile client into ImageApi as first-class
records, and scope entity_facts by persona so each one builds its own
voice over a shared entity graph. The new include_all_memories flag
lets a persona opt back into the full hive-mind pool for human
browsing of /knowledge/*; agentic generation always stays in-voice.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 17:59:20 -04:00
cameron 55a986c249 Merge pull request 'feature/streaming-insights' (#85) from feature/streaming-insights into master
Reviewed-on: #85
2026-05-09 20:57:16 +00:00
cameron c52a646be2 Merge pull request 'memories: restore early-era Snapchat unix-epoch filenames' (#84) from feature/snapchat-early-era-dates into master
Reviewed-on: #84
2026-05-08 20:23:35 +00:00
Cameron Cordes d32a7d7c3a memories: restore early-era Snapchat unix-epoch filenames
The recent blanket "snapchat-" prefix denylist (43f8f83) rejected ALL
Snapchat-prefixed filenames from timestamp parsing, which fixed the
sequential-ID false positives but also broke real unix-second
filenames from Snapchat's early era. `Snapchat-1383929602.jpg`
(2013-11-08 16:53:22 UTC) now falls through to fs_time — and on files
with broken filesystem metadata, fs_time pins to 1970.

Replace the blanket prefix denial with a tighter discriminator:
  - exactly 10 captured digits AND timestamp >= 2011-09-23 (Snapchat
    launch) → real unix epoch, accept
  - any other length under this prefix → sequential ID, reject

This keeps the existing rejections intact:
  Snapchat-1021849065.mp4          → 10 digits, 2002 < launch → reject
  Snapchat-1751031586660373917.jpg → 19 digits truncates to 16 → reject
And restores the regression case:
  Snapchat-1383929602.jpg          → 10 digits, 2013 ≥ launch → accept

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 16:22:57 -04:00
Cameron Cordes 3699e059a2 insight-chat: include Date taken + GPS in bootstrap photo context
The bootstrap system message gave the model a file path and (in
hybrid mode) a visual description, but no temporal anchor. Models
defaulted to today's date when calling get_sms_messages — Nov 2014
photos were getting "2024-03-11" passed as `date`, missing every
historical message and leading the model to confidently misreport
context.

This commit folds two more EXIF-sourced facts into the
--- PHOTO CONTEXT --- block:

  Date taken: <YYYY-MM-DD or "unknown">
  GPS: <lat, lon to 4dp>           (omitted when no GPS)

Resolution waterfall for date_taken matches the documented canonical
date pipeline at the EXIF / filename steps, but intentionally stops
short of the fs-time fallback `generate_agentic_insight_for_photo`
uses — for chat we'd rather show "unknown" than mislead the model
with an inode mtime. GPS is taken straight from EXIF when both
lat/lon are populated; absent GPS suppresses the line entirely so
the model doesn't hallucinate coordinates.

InsightGenerator gains a `fetch_exif(file_path)` accessor (crate-
visible) so the chat service doesn't need its own ExifDao plumbing.

build_bootstrap_system_message picks up two new params (date,
gps); existing tests updated and 5 new tests cover:
- date present / absent / waterfall (EXIF wins, filename fallback,
  None when neither source has it)
- GPS present / absent
- ordering (path → date → visual)

Total insight_chat unit tests: 33 (up from 27).
2026-05-08 11:14:39 -04:00
Cameron Cordes a0ec1a5080 insight-chat: photo context belongs in system msg, not user turn
After refresh, the rendered transcript was showing two unwanted
artifacts in the initial user bubble:

  Photo file path: pics/DSC_5171.jpg
  please tell me about this photo and what was going on around it

  Please write your final answer now without calling any more tools.

Two distinct bugs:

1. Bootstrap was prepending `Photo file path: <path>` (and, in
   hybrid mode, the visual description block) into the user-turn
   content. The model needed it to call file_path-keyed tools, but
   the user could see it in their own bubble on replay.

2. The no-tools fallback ("Please write your final answer now…")
   was a synthetic user message we never stripped from history,
   so it persisted into training_messages, rendered as a second
   user bubble, AND wiped the prior tool-call accumulator inside
   load_history (user-turn handler clears pending_tools), which
   is why the tool invocations disappeared from the assistant
   bubble after refresh.

Fixes:

- New `build_bootstrap_system_message` helper composes the persona
  with a `--- PHOTO CONTEXT ---` block (path + optional visual
  description). Lives in the system message, not the user turn.
  The user's bubble shows only what they typed.
- Streaming agentic loop's no-tools fallback now records its
  insertion index and removes the synthetic user prompt from
  `messages` after the model responds. Final assistant content
  stays — it reads coherently on replay without the synthetic
  prompt above it. Applies to both bootstrap and continuation.

3 new tests cover the system-message composer (path-only, with
visual block, persona-trim). Total insight_chat unit tests: 27.
2026-05-08 11:07:03 -04:00
Cameron Cordes 24ecf2abd4 insight-chat: prepend Photo file path: <path> to bootstrap user turn
Bug: bootstrap user_content was just the user's typed message (plus
the hybrid visual description). Tools that take a file_path arg —
recall_facts_for_photo, get_file_tags, get_faces_in_photo — had no
way to learn the canonical path. Small models would invent
placeholders like "input_file_0.png" or call the tool with a name
guessed from a hidden multimodal input handle, neither of which
matched any real photo.

Fix: prepend a single-line "Photo file path: <normalized>\n\n" block
to user_content. Same shape generate_agentic_insight_for_photo
already uses for non-chat callers — kept the bootstrap minimal
(no date / GPS / tags pre-stuffing; the agentic loop can fetch
those via tools when needed).

Hybrid still injects the visual description block between the path
block and the user message; local mode just gets path + user text.
2026-05-08 10:59:35 -04:00
Cameron Cordes a29ff406a1 insight-chat: extract bootstrap resolution helpers + unit-test them
resolve_bootstrap_system_prompt and resolve_bootstrap_backend run on
every bootstrap turn — they pick the persisted system prompt and the
chosen backend label. They were inline conditionals before; pulling
them out makes the rules testable without spinning up the full
streaming stack.

9 new tests cover:
- system prompt fallback to BOOTSTRAP_DEFAULT_SYSTEM_PROMPT for None,
  empty string, whitespace-only
- supplied non-empty prompts pass through verbatim, with interior
  newlines / spacing preserved (Apollo personas use multi-line tool
  listings)
- backend defaults to "local" for None / empty
- "local" / "hybrid" accepted case-insensitively with edge-trim
- unknown labels return a descriptive error

Total insight_chat tests: 24 (up from 15). No behaviour change.
2026-05-08 10:56:22 -04:00
Cameron Cordes 928efe49f9 insight-chat: bootstrap insight on first Discuss message + regenerate flag
Tap-Discuss-on-no-insight previously failed silently: ImageApi's
/insights/chat/stream required an existing agentic insight, errored
when missing, and emitted the failure as `event: error` — which the
frontend SSE consumer ignored (it listens for `error_message`).

This commit closes both gaps with a server-side state machine:

- /insights/chat/stream now branches on insight presence. Missing
  insight (or `regenerate: true` in the body) → bootstrap path:
  builds [System(req.system_prompt), User(req.user_message + image)],
  runs the agentic loop, generates a title, persists a new row via
  store_insight (which auto-flips priors). Existing insight →
  continuation path (unchanged behaviour).
- New `regenerate: bool` request field forces bootstrap even when an
  insight exists. Takes precedence over `amend`.
- `done` SSE payload field-name alignment with Apollo's frontend
  convention: prompt_eval_count → prompt_tokens, eval_count →
  eval_tokens, num_ctx echo added.
- `amended_insight_id` semantics broaden — now populated whenever the
  turn produced a new row (bootstrap, regenerate, or amend). Existing
  amend clients keep working unchanged; new clients get the new row's
  id for free.
- `event: error` → `event: error_message` so frontend errors stop
  silently dropping.

Refactor: extracted run_streaming_agentic_loop, build_chat_clients,
and generate_title as shared helpers between bootstrap and
continuation. Continuation path's outer logic moves to
run_continuation_streaming with no behaviour change.

Mobile-ready: any client (Apollo backend, mobile, future) sends one
request to /insights/chat/stream and gets the right path. Apollo's
proxy stays a dumb pipe.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 10:41:50 -04:00
cameron bdafd39546 Merge pull request 'feature/insight-chat-improvements' (#83) from feature/insight-chat-improvements into master
Reviewed-on: #83
2026-05-07 22:19:12 +00:00
Cameron Cordes 8bd1a85070 insight-chat: cargo fmt sweep on the get_faces_in_photo additions
Single-line dao lock + reordered faces import. No logic changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 17:53:31 -04:00
Cameron Cordes 6f0c15d0c5 insight-chat: code-review polish on get_faces_in_photo
- Drop redundant `use anyhow::Context` inside has_any_faces (already
  imported at the module level).
- Drop dead `.unwrap_or("?")` on bound faces — the vec is filtered to
  is_some() so the fallback can never fire.
- Reorder the face_dao constructor param + initializer to match the
  struct declaration (between tag_dao and knowledge_dao). Update both
  state.rs call sites and populate_knowledge.rs to match.
- Hold face_dao lock once across the library-resolver loop instead of
  reacquiring per iteration.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 17:48:22 -04:00
Cameron Cordes b64a5bec28 insight-chat: add get_faces_in_photo agentic tool
The LLM had no path to see face_detections data — get_file_tags
returns user-applied tags, but a face that's been detected and bound
to a person via the embedding-cluster auto-bind path doesn't always
have a matching tag. The new tool joins face_detections with persons
by content_hash and returns bound names + bboxes, plus unidentified
faces (so smaller models can count people in the photo without
inferring from a visual description).

Gated on face_detections being non-empty via the same has_any_*
pattern as daily_summaries.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 17:43:16 -04:00
Cameron Cordes 388eb22cd2 Remove full plan file, just keep spec 2026-05-07 17:29:04 -04:00
Cameron Cordes eef41d4172 thumbnails: align video ffmpeg args with the image path so non-yuvj420p sources work
The bare 'ffmpeg -ss 3 -i in -vframes 1 -f image2 out' command failed on
sources whose decoded pix_fmt isn't yuvj420p (e.g. older Samsung phone
videos in yuv420p). With no -vf filter chain, the decoded frame goes
straight to the mjpeg encoder, which rejects it with 'Non full-range
YUV is non-standard' and exits non-zero.

generate_image_thumbnail_ffmpeg already handles the same class of
source for HEIC/RAW by adding -vf scale=200:-1 -c:v mjpeg — the filter
chain lets ffmpeg auto-insert the pix_fmt converter the encoder needs.
Adopt the same args here. Side benefit: video thumbnails are now 200px
wide on disk, matching image thumbnails (previously full-resolution).

Pre-existing .unsupported sentinels for videos that hit this failure
will need to be deleted manually to retry — they're under
$THUMBNAILS/<lib_id>/.../*.unsupported.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 17:20:05 -04:00
Cameron Cordes b42acbb3f3 fmt: cargo fmt sweep across drifted files
No behavior change — purely whitespace/line-break cleanup that had
accumulated since the last format run.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 16:42:41 -04:00
Cameron Cordes 2a273a3ed9 thumbnails: stop video failures from re-logging every watcher tick
generate_video_thumbnail used .output().expect(...), which only catches
spawn failure — non-zero ffmpeg exits were silently discarded. With no
thumbnail and no .unsupported sentinel left behind, the watcher
re-detected the file as missing every quick-scan tick and re-logged
"New file detected (missing thumbnail)" forever.

Mirror the image branch: return io::Result, check status.success(),
and write the sentinel from create_thumbnails on failure.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 16:41:24 -04:00
Cameron Cordes a8433c2e01 insight-chat: document the new system_prompt field in CLAUDE.md
Add system_prompt to the /insights/chat body schema with a one-paragraph
note on the append-vs-amend semantics so future readers find the
contract alongside the rest of the chat-continuation docs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 15:26:32 -04:00
Cameron Cordes 1cdc0f6eb9 insight-chat: drop the dead SmsApiClient::search_messages wrapper
The post-PR-4 delegation kept it as a convenience for callers that
don't filter by contact, but nothing actually uses it. Delete to clear
the dead_code warning. search_messages_with_contact remains as the
single entry point.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 15:10:31 -04:00
Cameron Cordes e539c083c9 insight-chat: code-review polish on the tool-gating PR
- search_messages now delegates to search_messages_with_contact(.., None)
  so the two methods share a single HTTP path. Drops the dead-code
  warning and the ~30-line duplication.
- DailySummaryDao gains has_any_summaries (LIMIT 1 existence probe)
  used by current_gate_opts; the SELECT COUNT(*) get_total_summary_count
  added in the prior commit is removed (it had no other caller).
- current_gate_opts doc comment corrected to describe what the probes
  actually do.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 15:07:57 -04:00
Cameron Cordes f50d32667b insight-chat: ToolGateOpts + per-tool description rewrites
Tools whose backing tables are empty (calendar, location_history,
daily_summaries) drop out of the catalog so the LLM doesn't waste
iteration budget calling them only to receive "no results found".
Vision and apollo gates already existed; this generalizes the pattern.

search_messages gains start_ts/end_ts/contact_id filters (date filter
is a client-side post-filter; SMS-API only accepts contact_id natively
on the search endpoint).

Descriptions follow a consistent convention: one sentence (what +
when), param semantics, examples for tools with non-obvious param
choices. No more all-caps headers, no more identity-prescriptive
language inside descriptions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 14:56:58 -04:00
Cameron Cordes b02da0d0cc insight-chat: code-review polish on the days_radius fix
- Bind effective_radius once in fetch_messages_for_contact so the log
  output and window math share a single source of truth for the clamp.
- Clamp tool-supplied days_radius to [1, 30] at the tool boundary so a
  runaway LLM value can't produce a thousand-day window.
- Split the negative-input test into a real negative-input case
  alongside the zero-input case.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 14:47:46 -04:00
Cameron Cordes 659e7bd973 insight-chat: get_sms_messages tool now honors days_radius
The agentic tool definition advertised a days_radius parameter but
sms_client::fetch_messages_for_contact was hardcoded to ±4 days,
silently ignoring whatever value the LLM chose. Plumb the parameter
through; default 4 retained at the tool level for back-compat.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 14:42:42 -04:00
Cameron Cordes 428f24b0f8 insight-chat: code-review polish on the chat system_prompt override
- Trim the override input once via Option::map(str::trim).filter(...).
- Use matches!() in restore_system_prompt_override's Prepended arm so
  it reads consistently with the Replaced arm.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 14:40:04 -04:00
Cameron Cordes faa289882f insight-chat: per-turn system_prompt override on chat continuation
Append mode: applied ephemerally — original system message restored
before persistence so re-opens see the baked persona. Amend mode:
override stays in place and becomes the new insight row's system
message. Pattern mirrors annotate_system_with_budget.

Adds system_prompt field on both ChatTurnHttpRequest and ChatTurnRequest;
plumbs through chat_turn and chat_turn_stream identically.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 14:34:08 -04:00
Cameron Cordes 177187f6a2 insight-chat: code-review polish on the system-prompt split
- Use Option::map instead of manual match-on-Option (drops clippy::manual_map).
- Drop redundant `max_iterations = max_iterations` from the format! call.
- Use captured identifiers consistently in the user_content format!.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 14:27:59 -04:00
Cameron Cordes 8ae4099d46 insight-chat: split generation system prompt into identity + procedural blocks
The framework no longer asserts "you are a personal photo memory
assistant" alongside a user-supplied custom_system_prompt — the
persona is the authoritative identity. The procedural block (tool-use
guidance, iteration budget) stays identity-free.

The user message also stops asking for "a detailed insight with a
title and summary" since the title is regenerated post-hoc anyway and
the wording was constraining voice for no data-model benefit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 14:20:45 -04:00
Cameron Cordes 204428b0c0 insight-chat: implementation plan for the spec
Five sequenced PRs:
  1. Split generation system prompt + neutralize user message
  2. system_prompt field on chat request (ephemeral / amend-persisted)
  3. fetch_messages_for_contact honors days_radius
  4. ToolGateOpts + per-tool description rewrites + search_messages
     gains start_ts/end_ts/contact_id
  5. FileViewer-React: persona system_prompt on every turn + style note

Each PR independently mergeable. Tests inline TDD per task.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 14:15:09 -04:00
Cameron Cordes fbece0ba9a insight-chat: design for tool catalog, system prompt, and SMS fixes
Lays out the cycle: split generation system prompt into identity vs
procedural blocks so personas drive voice/shape, add per-turn
system_prompt override on chat (ephemeral in append mode, persisted
on amend), gate optional tools on data presence, and fix the
days_radius bug in get_sms_messages.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 14:04:07 -04:00
cameron 22e157411c Merge pull request 'date_resolver: drop -fast2 so MP4 moov-at-end files resolve' (#82) from fix/exiftool-mp4-moov-trailer into master
Reviewed-on: #82
2026-05-07 16:42:08 +00:00
Cameron Cordes c128596470 date_resolver: drop -fast2 so MP4 moov-at-end files resolve
For QuickTime/MP4 files whose `moov` atom sits at the end of the
file (non-faststart — common for Snapchat exports and any MP4
muxed without `-movflags +faststart`), `-fast2` causes exiftool
to skip the trailer and return no `CreateDate` /
`MediaCreateDate`, dropping the resolver to the `fs_time`
fallback for files that actually have a real capture date.

Reported cases:
  Snapchat-477624257.mp4
    fs_time: 2026-05-04 (today, file was just modified)
    real:    QuickTime CreateDate 2018-09-02
  action_compound_cc92e65b709d1deb895b4c2a9484fc6a.mp4
    fs_time: 2026-05-04
    real:    MediaCreateDate 2018-03-01

The waterfall pre-filters to files kamadak-exif couldn't read, so
the JPEG fast-path is already covered without `-fast2`. Paying
full-scan cost on the residual is the right trade. The per-tick
drain re-resolves `source = 'fs_time'` rows, so existing rows
recover automatically on the next watcher tick after deploy — no
SQL migration needed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 12:40:50 -04:00
cameron ac8d17fb22 Merge pull request 'memories: deny Snapchat-prefixed filenames from timestamp parsing' (#81) from feature/filename-date-snapchat-denylist into master
Reviewed-on: #81
2026-05-07 16:20:06 +00:00
Cameron Cordes 43f8f83d80 memories: deny Snapchat-prefixed filenames from timestamp parsing
Snapchat assigns sequential IDs that happen to overlap real epoch
values, so the 10-16 digit timestamp regex matched and produced
2002-era dates for files actually saved in 2016/2021. The digits
themselves are indistinguishable from a unix timestamp, so we
dispatch on the source-app prefix instead. Case-insensitive,
extensible for future apps that exhibit the same pattern.

Reported cases:
  Snapchat-1021849065.mp4          → 2002-05-19 (actual 2021)
  Snapchat-1751031586660373917.jpg → 2002-09-09 (actual 2016)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 12:17:40 -04:00
cameron e55f6a5961 Merge pull request 'memories: reject implausible filename-derived timestamps' (#80) from feature/filename-date-plausibility into master
Reviewed-on: #80
2026-05-07 16:02:50 +00:00
Cameron Cordes feaae9b6d3 memories: reject implausible filename-derived timestamps
Filenames like `000227580005.jpg` (film-scan ID) and
`IMG_21323906751390.jpeg` were matched by the 10-16 digit timestamp
regex and resolved to 1970 / 2037, then written into
`image_exif.date_taken` with `source = 'filename'`. EXIF-less
photos showed up under those bogus dates everywhere date_taken is
read.

Two new guards in `extract_date_from_filename`:
- leading zero → reject (real epoch values don't have one at any
  sane resolution).
- resolved year outside [1995, now+1y] → reject.

Both let the date_resolver waterfall fall through to fs_time,
which is a much better proxy for content age than a fake epoch
date. Regression tests cover the two reported filenames.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 12:02:07 -04:00
cameron 95e21c8128 Merge pull request 'feature/manual-date-override' (#79) from feature/manual-date-override into master
Reviewed-on: #79
2026-05-07 15:10:37 +00:00
Cameron Cordes 7e1c4ab318 backfill_date_taken: surface the actual diesel error in warnings
The DAO swallowed every diesel::update failure as a flat
`anyhow!("Update error")`, then trace_db_call further reduced it to
`DbError { kind: UpdateError }`. Operators saw "update failed for lib
2 Snapchat/foo.mp4: DbError { kind: UpdateError }" with no clue why
(constraint violation? type mismatch? row vanished mid-flight? DB
locked?).

Two changes:
- Preserve the diesel error in the anyhow chain along with the input
  params (lib, rel_path, date_taken, source) so the cause is visible.
- Log the chain at warn-level inside the DAO before the trace wrapper
  collapses it to DbErrorKind::UpdateError, so the warning at the
  call site finally has something diagnosable next to it.
- Treat zero-row updates as a debug-level "row likely retired by the
  missing-file scan" rather than a hard failure — that case is benign
  and shouldn't poison the drain's error tally.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 11:07:17 -04:00
Cameron Cordes 65af7d999e memories: parse filename dates as UTC, not server local
`extract_date_from_filename` was calling `Local::from_local_datetime`
on the parsed YYYY-MM-DD-HH-MM-SS components, then `.timestamp()` was
shifting the result by the SERVER's TZ offset to produce real UTC
seconds. That made filename-sourced timestamps disagree with EXIF-
sourced timestamps by hours: kamadak-exif's `DateTimeOriginal` is a
naive string parsed AS-IF-UTC (the project's load-bearing
"naive local reinterpreted as UTC" convention), and Apollo's photo
matcher re-anchors that naive value through the BROWSER's TZ when
matching to the track. Anything stamped in server-local instead got
double-shifted on its way through the matcher and through any
`formatNaive*` display path on the client.

Visible symptom in the Apollo DETAILS modal: a photo's CURRENT date
read correctly (1:25 AM via exif) while FROM FILENAME read 4 hours
ahead (5:25 AM in EDT) for the same `IMG_20160710_012515.jpg`.

Switch to `Utc::from_utc_datetime` so `.timestamp()` returns the
wall-clock-as-UTC unix seconds — same convention as the EXIF path.
The /memories endpoint, the canonical-date waterfall (which feeds
`image_exif.date_taken` for filename-only files), and Apollo's
DETAILS modal `filename_date` field all now line up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 20:43:18 -04:00
Cameron Cordes 16d6586b7d exif: GET /image/exif/full — exiftool dump for the DETAILS modal
The curated `image_exif` columns are a small slice of what exiftool
can read (camera/lens/GPS/capture/dates). Apollo's DETAILS modal wants
to surface everything — white balance, metering, MakerNotes, IPTC,
ICC profile, Composite tags, the lot — for an operator inspecting a
photo's provenance.

`read_full_exif_via_exiftool(path)` shells out to `exiftool -j -G -n`:
JSON output, group-prefixed keys (`EXIF:Make`, `MakerNotes:LensInfo`),
numeric values (callers can reformat). Spawned via web::block to keep
it off the actix worker — RAW with rich MakerNotes can take a few
seconds.

The endpoint is on-demand only; the indexer / file watcher does NOT
call it. Falls back to 503 with a clear message when exiftool isn't
on PATH so Apollo can render an "install exiftool" hint. Multi-library
union resolution mirrors set_image_gps / get_file_metadata.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 19:42:41 -04:00
Cameron Cordes 832b50d587 image_exif: manual date_taken override (set/clear endpoints)
Add `POST /image/exif/date` and `POST /image/exif/date/clear` so an
operator can correct a row whose canonical-date waterfall landed on the
wrong value (camera clock reset, fs_time fallback for a copied-from-
backup file, etc). New `original_date_taken` / `original_date_taken_source`
columns snapshot the prior value on first override so revert is lossless.

The waterfall source set is now `'exif' | 'exiftool' | 'filename' | 'fs_time' | 'manual'`.
The existing `idx_image_exif_date_backfill` partial index already filters
to `date_taken IS NULL OR date_taken_source = 'fs_time'`, so manual rows
are naturally excluded from the per-tick drain — no index change needed.

`ExifMetadata` now exposes `date_taken_source` + originals so a UI can
render "manually set; was X via filename".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 19:26:43 -04:00
cameron 2acc525e73 Merge pull request 'otel: revert HTTP transport, keep gRPC' (#78) from fix/otlp-revert-to-grpc into master
Reviewed-on: #78
2026-05-06 22:36:09 +00:00
Cameron Cordes ecd49fd053 otel: revert HTTP transport, keep gRPC
The HTTP/protobuf exporter never sent any traffic in prod (tcpdump
on port 4318 showed nothing) despite the receiver path being correct
and the bridge wiring being intact (logs reached journalctl via the
stdout exporter). Likely the BatchLogProcessor + reqwest-client combo
isn't getting the right runtime context, but debugging that on a live
deployment isn't worth holding up the rest of the speedups.

Restoring grpc-tonic transport so prod observability comes back. The
remaining build-time wins on this branch (mold linker, system sqlite3,
profile.dev tweaks, lockfile-only dep refresh) deliver most of the
original savings without touching telemetry. Operator: revert
OTLP_OTLS_ENDPOINT in prod from port 4318 back to 4317.

HTTP transport remains a viable follow-up — needs to be debugged
against a local SigNoz instance with internal SDK error visibility
enabled, on its own branch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 18:33:37 -04:00
cameron c7bd2226cc Merge pull request 'build: speed up debug compile loop' (#77) from feature/build-time-speedups into master
Reviewed-on: #77
2026-05-06 21:41:19 +00:00
Cameron Cordes f73db58771 build: speed up debug compile loop
- Drop libsqlite3-sys 'bundled' on Linux/macOS so the SQLite C source
  isn't recompiled every clean build; Windows keeps 'bundled' via a
  cfg(windows) target override.
- Switch opentelemetry-otlp from grpc-tonic to http-proto + reqwest-client.
  Removes the tonic + h2 + hyper-h2 stack from the build graph; reqwest
  was already a dependency. Updates otel.rs to call .with_http().
- Add [profile.dev] debug = "line-tables-only" to shrink linker work
  while keeping panics/backtraces useful.
- Add .cargo/config.toml selecting mold via gcc on x86_64-linux-gnu.
  Requires `apt install mold`. Other platforms use the default linker.
- cargo update: lockfile-only refresh of all minor/patch bumps within
  existing version constraints.

Cold debug build: ~1m 37s; touch-one-file rebuild: ~5s on Linux.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 17:36:42 -04:00
cameron 06fdcadf67 Merge pull request 'feature/canonical-date-taken' (#76) from feature/canonical-date-taken into master
Reviewed-on: #76
2026-05-06 21:15:57 +00:00
Cameron Cordes 9f1b3f6d9a date_taken_source: backfill 'exif' on legacy rows
Pre-resolver rows already had a populated `date_taken` from the old
kamadak-exif-only ingest path. The column-add migration left their
`date_taken_source` as NULL, and the drain's eligibility predicate
(`date_taken IS NULL OR date_taken_source = 'fs_time'`) skips them —
so they remain unlabelled forever and never benefit from the
resolver's exiftool fallback even if they're videos that should
upgrade.

Label them all `'exif'` in a one-shot UPDATE. Safe because every
write path that populated `date_taken` before the resolver landed was
a kamadak-exif read. Idempotent (the WHERE matches nothing on a
second run). Down.sql is a no-op — the labels stay correct under any
schema state, and the column-add migration is the right place to
revert if needed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 17:05:00 -04:00
Cameron Cordes 7f12890f4b memories: single-SQL rewrite + 20-year lookback
Replaces the EXIF-loop + WalkDir-fallback pipeline that powered
`/memories` with a single per-library SQL query
(`get_memories_in_window`) that uses `strftime('%m-%d' | '%W' | '%m',
date_taken, 'unixepoch', tz_offset)` for calendar matching in the
client's timezone, plus a `years_back` lower bound and a
no-future-dates upper bound. Returns only the matching rows; the
handler applies per-library `PathExcluder` post-query and sorts.

Drops:
- `collect_exif_memories` — replaced by the single SQL query.
- `collect_filesystem_memories` — the canonical-date pipeline now
  populates `date_taken` for every row at ingest, so the WalkDir
  fallback that scanned 14k+ files each request is no longer needed.
- `get_memory_date_with_priority` and friends — request-time waterfall
  superseded by `date_resolver` running at ingest. The associated
  three priority-tests are dropped; their replacement lives in
  `date_resolver::tests`.

On a ~14k-file library this drops `/memories` from 10–15 s
(dominated by `fs::metadata` per row) to single-digit ms.

Bumps `DEFAULT_YEARS_BACK` from 15 → 20 to surface deeper archives
on matching anniversaries.

Note vs. ISO weeks: the original Rust used `chrono::iso_week().week()`
for week-span matching. SQLite's `%W` is Monday-anchored but uses week
0 for days before the first Monday, so it can disagree with ISO at
year boundaries by ±1. Acceptable for nostalgia browsing.

Adds 3 new DAO tests covering month-span filter, library scoping, and
the unknown-span-token guard. Also adds a CLAUDE.md section describing
the canonical-date pipeline end-to-end and the new
`DATE_BACKFILL_MAX_PER_TICK` env var.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 16:04:09 -04:00
Cameron Cordes 54e0635a98 date_backfill: per-tick drain for unresolved date_taken rows
Adds two ExifDao methods (`get_rows_needing_date_backfill` /
`backfill_date_taken`) and a `backfill_missing_date_taken` watcher pass
that runs on every tick alongside `backfill_unhashed_backlog`.

The drain queries the partial index for rows where `date_taken IS NULL`
or `date_taken_source = 'fs_time'`, batches up to
`DATE_BACKFILL_MAX_PER_TICK` paths (default 500), and feeds them through
`date_resolver::resolve_dates_batch` — a single exiftool subprocess
covers the whole tick. Rows that newly resolve to `exiftool` /
`filename` / `fs_time` get persisted via `backfill_date_taken` (touches
only `date_taken` + `date_taken_source` so EXIF / hash / perceptual
columns survive).

`filename`-sourced rows are intentionally not re-resolved — the regex
is authoritative when it matches and re-running exiftool wouldn't
change the answer. Files that have disappeared from disk are skipped
so a ghost row doesn't loop through the drain forever; the
missing-file scan in `library_maintenance` retires those separately.

Comes with two DAO unit tests (eligibility filter + column-isolation).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 16:03:03 -04:00
Cameron Cordes 2d14291733 ingest: stamp canonical date_taken on every InsertImageExif
Wires `date_resolver::resolve_date_taken` into the three call sites
that build `InsertImageExif`:

- `process_new_files` (file watcher) — every newly-registered file gets
  the resolver's verdict so videos and EXIF-stripped images land with a
  real date instead of NULL.
- Upload handler — same waterfall on the post-multipart-write path.
- GPS-write handler — re-runs the waterfall after exiftool writes GPS
  and re-reads the EXIF, in case a previously fs_time-sourced row now
  has a real EXIF date to upgrade to.

This is a behavior change vs. the pre-rewrite `/memories` request-time
priority: EXIF now beats filename when both are present. A photo
named `Screenshot_2014-06-01.png` whose EXIF `DateTime` is 2021 now
appears under 2021. The reverse case (no EXIF, parseable filename) is
unchanged and continues to surface the filename date with
`date_taken_source = 'filename'`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 16:00:14 -04:00
Cameron Cordes 79e258eccd date_resolver: canonical date_taken waterfall with exiftool fallback
New module that consolidates the four-step ingest waterfall:
kamadak-exif (already in process via the caller's prior result) →
exiftool fallback → filename regex → earliest_fs_time. Each step is
tagged with a `DateSource` so the caller can persist provenance.

The exiftool fallback is what makes videos and MakerNote-hosted dates
land at all — kamadak-exif can't read QuickTime/MP4 or Nikon-style
sub-IFDs. Single-file mode shells out per call; batch mode pipes paths
on stdin via `-@ -` and fans the result through one subprocess so the
upcoming per-tick drain doesn't pay startup cost per row. The
`exiftool` PATH check is cached in a `OnceLock` to keep the drain
short-circuited on deploys without exiftool installed.

`SubSecDateTimeOriginal` and `ContentCreateDate` are pulled alongside
the standard tags to capture iPhone's sub-second precision and Apple's
preferred capture-time tag respectively. `FileModifyDate` is
deliberately *not* in the tag list — it's a filesystem-derived value
the resolver already covers via the `fs_time` step, and pulling it
through exiftool would mask "no real EXIF date" with a misleading
`source = exiftool` row.

Module is registered in both `lib.rs` and `main.rs` (sibling-module
pattern the rest of the bin uses); no callers wired in yet — that
lands in the next commit. Comes with 9 unit tests covering JSON
parsing edge cases, source-priority short-circuiting, and the
fs_time-when-no-exif path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 15:59:02 -04:00
Cameron Cordes 84326501a9 image_exif: add date_taken_source column
New nullable TEXT column tracks which step of the canonical-date
waterfall (kamadak-exif → exiftool → filename → fs_time) populated
`date_taken`. Lets a later per-tick drain re-resolve weak sources
(`fs_time`) once stronger ones become available, and gives the UI/debug
surface a way to answer "why does this photo show up under this date?".

Adds the column at all `InsertImageExif` construction sites with `None`
placeholders (the resolver wiring lands in a follow-up commit), and
extends the `update_exif` SET tuple so the column survives the GPS-write
re-read path. Partial index `idx_image_exif_date_backfill` is created
for the upcoming drain query.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 15:57:49 -04:00
cameron 5de9a322ac Merge pull request 'duplicates: folder-pair view of exact dups' (#75) from feature/folder-pair-duplicates into master
Reviewed-on: #75
2026-05-06 18:27:12 +00:00
Cameron Cordes 67cf0c7f73 duplicates: folder-pair view of exact dups
Bucket exact-dup rows by (library_id, dirname) pair on each side, then
filter by coverage = shared / min(folder_a_total, folder_b_total) and
an absolute floor on shared count. Surfaces "this folder is mostly
contained in that folder" matches that the per-file EXACT view buries
under one row each — e.g. an old phone-backup tree shadowing the
organized library, or a topic-grouped folder duplicating a date-grouped
one within the same library.

New endpoint: GET /duplicates/folder-pairs?library=&include_resolved=
&min_coverage=&min_shared=. Cached 5 min keyed on (library, include_resolved);
the user-tunable thresholds filter the cached unfiltered pair list so
slider drags don't re-bucket. Shares the resolve / unresolve flow with
the existing tabs — the frontend fans out N parallel /resolve calls,
one per shared content_hash.

Folder names carry no signal (BMW lives under Night Photos, not BMW_backup),
so bucketing is purely on (library_id, dirname) co-occurrence in
exact-dup groups. Within-folder dups (same hash twice in the same
folder) are skipped — those belong to the EXACT tab.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 12:43:29 -04:00
cameron 9ccb48233f Merge pull request 'exif: preserve filesystem mtime on GPS write' (#74) from fix/exif-preserve-mtime into master
Reviewed-on: #74
2026-05-04 20:12:08 +00:00
Cameron Cordes 1ddbca3413 exif: preserve filesystem mtime on GPS write
Pass -P to exiftool so write_gps doesn't bump the file's modification
time. For phone photos with no embedded EXIF datetime, the filesystem
mtime is often the only timestamp we have — losing it on every GPS
backfill would be data loss.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 16:09:21 -04:00
cameron 82dd21b205 Merge pull request 'feature/duplicate-detection' (#73) from feature/duplicate-detection into master
Reviewed-on: #73
2026-05-03 22:34:49 +00:00
Cameron Cordes 57b7bad086 duplicates: library-aware visibility — only hide a demoted row when its survivor is reachable
Soft-marked rows used to disappear from /photos globally, including
from a library-scoped view that didn't contain the survivor at all.
A user browsing lib A who'd promoted a file from lib B as the
survivor would silently lose visibility on their own copy in lib A,
even though lib B's file isn't reachable from lib A's view.

Library-scoped queries now keep a demoted row visible when its
survivor lives in a library outside the current scope. Implemented
as a NOT EXISTS subquery against the same image_exif table aliased
as `survivor`. The unscoped (all-libraries) view is unchanged — every
survivor is reachable, so demoted rows stay hidden as before.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 18:24:07 -04:00
Cameron Cordes 98057c98a1 duplicates: tighten perceptual cluster — entropy band, asymmetric dHash, medoid prune
Three changes against "still too loose at lowest sensitivity":

- Popcount entropy band tightened from [8, 56] to [16, 48]. The wider
  band let too much low-frequency content through (skies, scans,
  faded film) where pHash collapses to near-uniform values that
  Hamming-trivially across hundreds of unrelated images.
- dHash check now uses an asymmetric stricter threshold
  (dhash_threshold = max(2, threshold/2)). pHash is the candidate-
  discovery signal; dHash is validation. Splitting the budget means
  a real near-dup survives both while incidental pHash collisions
  on uniform content get vetoed. Missing dHash on either side now
  rejects the edge (was: trust pHash alone).
- Single-link union-find can chain weakly-similar images via
  transitive edges. Added a medoid-validation pass: per cluster,
  pick the member with smallest summed distance to others, then
  drop any whose distance to it exceeds threshold. Two new tests
  pin both invariants.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 18:19:48 -04:00
Cameron Cordes 7ca888e95d duplicates: filter low-entropy hashes + dHash double-check, fix backfill loop
The perceptual cluster was producing one giant first group that
contained hundreds of unrelated images. Two causes:
- Solid-colour images (skies, black frames, monochrome scans) all
  hash to near-zero pHashes that Hamming-distance-zero to each other.
- Single-link clustering on pHash alone is too permissive — a chain
  of weakly-similar images all collapses into one cluster.

Fixed by skipping hashes outside the popcount [8, 56] band (uniform
content) and requiring dHash agreement within threshold before
unioning a candidate edge from the BK-tree. Two new tests pin both
invariants.

Backfill bin separately fix: decode-failed rows kept phash_64=NULL
and got re-pulled by every batch, infinite-looping on a queue of
unbreakable formats. Persist a 0/0 sentinel on decode failure so
the row leaves the candidate set; the all-zero hash is excluded
from clustering by the same entropy filter so it doesn't pollute
results.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 18:08:05 -04:00
Cameron Cordes 7584cd8792 duplicates: perceptual hash + soft-mark resolution + upload 409
Adds pHash + dHash columns alongside the existing blake3 content_hash so
near-duplicates (re-encoded, resized, format-converted copies) become
queryable. /duplicates/{exact,perceptual} return groups; /duplicates/
{resolve,unresolve} flip a duplicate_of_hash soft-mark on losing rows
and union perceptual-only tag sets onto the survivor. The default
/photos listing filters duplicate_of_hash IS NULL so demoted siblings
stop cluttering the grid; include_duplicates=true opts back in for
Apollo's review modal. Upload now hashes bytes pre-write and returns
409 with the canonical sibling when a file's bytes already exist.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 17:36:01 -04:00
cameron 4340b164eb Merge pull request 'perf/faces-embeddings-no-clone' (#72) from perf/faces-embeddings-no-clone into master
Reviewed-on: #72
2026-05-01 23:09:22 +00:00
Cameron Cordes fb4df4b195 style: cargo fmt sweep
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 19:01:00 -04:00
Cameron Cordes 1d9b9a0bc4 faces: avoid 40 MB row clone in /faces/embeddings
list_embeddings cloned the full FaceDetectionRow inside the filter_map
just to pair it with the base64-encoded embedding. The 2 KB BLOB was
already on the row — at 20k unassigned faces that's 40 MB of pointless
heap traffic per Apollo cluster-suggest run. Move the bytes out via
Option::take() so the row drops the BLOB instead of duplicating it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 19:00:55 -04:00
cameron 7998a0c9b0 Merge pull request 'feature/per-library-excluded-dirs' (#71) from feature/per-library-excluded-dirs into master
Reviewed-on: #71
2026-05-01 20:11:10 +00:00
Cameron Cordes 58f010f302 docs(claude): pin excluded_dirs entry-form syntax
The two entry shapes for libraries.excluded_dirs / EXCLUDED_DIRS
are not symmetric:
  - /sub/path → multi-segment, library-root-anchored, recursive
  - name     → single component anywhere in the tree

Without this pinned, a reasonable read of the column doc would be
"any path-like string works" — but a multi-segment string without a
leading slash silently never matches (the no-slash form scans path
components for exact string equality, and components are
slash-free).

No code change; just documentation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 20:05:58 +00:00
Cameron Cordes 814066551e multi-library: per-library excluded_dirs
Adds a nullable comma-separated TEXT column to the libraries table.
Effective excludes for a walk = (env-var globals) ∪
(library.excluded_dirs). Empty / NULL = no library-specific
extras; the global env var still applies.

Migration (2026-05-01-110000_libraries_excluded_dirs)

  ALTER TABLE libraries ADD COLUMN excluded_dirs TEXT. NULL on every
  existing row — no behavior change on upgrade.

Library struct + helpers (libraries.rs)

  - Library gains excluded_dirs: Vec<String>, parsed from the column
    by parse_excluded_dirs_column (drops empties / whitespace,
    matches the env-var parser).
  - Library::effective_excluded_dirs(globals) returns the union.
  - From<LibraryRow> hydrates the field on AppState construction so
    /libraries surfaces it.

Watcher / walkers / memories

  Every per-library walker now consults the effective set:
    - process_new_files (file-watch ingest, RAW/EXIF/face)
    - process_face_backlog (filter_excluded inherits)
    - create_thumbnails (startup + new-file branch)
    - update_media_counts (Prometheus gauge)
    - cleanup_orphaned_playlists (per-library source-existence check)
    - memories endpoint (PathExcluder)

  Effective set is computed once per per-library iteration in the
  watcher tick and threaded through; called functions retain their
  flat &[String] signature (no per-library awareness needed inside
  the walker primitives).

Use case: mount a parent directory while a sibling library covers
a child subtree, and exclude the child subtree from the parent so
the libraries don't double-walk / double-write image_exif. With
hash-keyed derived data (Branches B/C), the duplication-avoidance
is the only cost prevented — face / tag / insight sharing was
already correct via content_hash.

Tests: 228 pass (226 from previous + 2 new in libraries::tests:
parse_excluded_dirs_column edge cases,
effective_excluded_dirs_unions_global_and_per_library).

CLAUDE.md gains a "Per-library excludes" subsection of the
multi-library data model.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 19:54:17 +00:00
cameron 4f17af688e Merge pull request 'multi-library: operator kill switch via libraries.enabled' (#70) from feature/library-enabled-flag into master
Reviewed-on: #70
2026-05-01 19:15:20 +00:00
Cameron Cordes 3598bb2cfe multi-library: operator kill switch via libraries.enabled
A small follow-up to Branches A/B/C. Adds a nullable-default-1
boolean column to the `libraries` table that controls whether the
watcher considers the library at all. Useful for staging a new
mount before committing to ingest, and as a maintenance kill
switch when a library needs to be quiet without being unmounted.

Migration (2026-05-01-100000_libraries_enabled_flag)

  ALTER TABLE libraries ADD COLUMN enabled BOOLEAN NOT NULL DEFAULT 1.
  Existing rows stay enabled — no behavior change on upgrade.

Watcher gate (main.rs)

  At the top of the per-library loop, if !lib.enabled { continue; }
  — runs BEFORE the availability probe. Disabled libraries don't
  enter the health map, don't get probed, don't get ingest, don't
  get any maintenance pass. The initial sweep before the loop's
  first sleep also skips disabled libraries.

Orphan-GC consensus (library_maintenance.rs)

  all_libraries_online filters disabled libraries out of the
  consensus check — they're treated as out-of-scope, not as
  blockers. Otherwise flipping enabled=false would permanently
  halt orphan GC for the rest of the system, which is the opposite
  of the intended kill-switch semantics.

Cross-library duplicates: safe by construction. Hash-keyed derived
data (face_detections, tagged_photo with hash, photo_insights with
hash) is anchored by ANY image_exif row carrying the hash. Disabling
a library does NOT delete its image_exif rows, so a hash referenced
by a disabled library's row stays anchored — derived data survives.
collect_orphan_hashes deliberately doesn't filter image_exif by
library.enabled for exactly this reason.

No HTTP endpoint. Library mutation is rare-enough infra work that a
SQL toggle is fine, and a public mutation endpoint without a role /
permission story would be poorly-prioritized exposure for a
single-user tool. Documented in CLAUDE.md.

Tests: 226 pass (225 from Branch C + 1 new
all_libraries_online_treats_disabled_as_out_of_scope, which proves
that even an explicit Stale entry on a disabled library doesn't
block the consensus).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 19:10:24 +00:00
cameron 23448cf5e6 Merge pull request 'feature/library-handoff-and-gc' (#69) from feature/library-handoff-and-gc into master
Reviewed-on: #69
2026-05-01 18:27:40 +00:00
Cameron Cordes d809ddee44 library_maintenance: clarify orphan-gc log wording
"marked 2 new" parses as "2 new files" on first read — but the
unit is content_hashes, and the action is observing them as
orphaned (becoming-deleted, not appearing). Reword:

  "{} new orphan hash(es) marked, {} revived"

instead of "marked {} new, revived {}". Also pluralize the deleted
counts ("row(s)") and append the pending-set size to the success
log so a tick that both deletes and re-marks doesn't lose the
trailing-state context.

No behavior change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 18:01:01 +00:00
Cameron Cordes fa98d147be library_maintenance: log orphan-gc decisions in stale-library path too
run_orphan_gc returned early on the !all_online branch before the
final debug/info log line, so the GC was effectively invisible
whenever any library was Stale — exactly the dry-run scenario where
operators most want to confirm the safety gate is firing. Add the
same conditional log inside the early-return branch (plus a
"deferred — at least one library Stale" hint in the info-level
variant when there's something newly marked).

No behavior change beyond observability.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 17:14:09 +00:00
Cameron Cordes 5f247be1f1 docs(claude): note in-place edit gap as future Branch D
The maintenance pipeline added in Branch C assumes (library_id,
rel_path) bytes are stable for as long as the file lives at that
path. In-place edits (crop, re-export to same name) bypass
process_new_files's already-indexed check, so the row's
content_hash stays pinned to the original bytes — tags / faces /
insights remain attached to that hash silently.

Document the gap and the proposed shape of the fix:
  - Stale-content detection pass: compare last_modified / size_bytes
    to fs::metadata, re-hash on mismatch, update image_exif.
  - "Content branched" semantics on hash change: faces re-run, tags
    migrate forward (user intent survives a crop), insights migrate
    + flag for re-generation, favorites follow path.
  - Apollo derived.db cache invalidation belongs in the same design
    cycle, not after.

Captured here so the design intent is clear before someone hits the
case in real life. No code change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 16:53:08 +00:00
Cameron Cordes 263e27e108 multi-library: handoff + orphan GC with two-tick consensus
Branch C of the multi-library data-model rollout. Implements the
operational maintenance pipeline pinned in CLAUDE.md → "Multi-library
data model" / "Library availability and safety". Branches A and B
land first; this branch builds on top.

New module: src/library_maintenance.rs

Three idempotent passes the watcher runs every tick after the
per-library ingest loop:

1. Missing-file scan (per online library)

   For each Online library, load a paginated page of image_exif rows
   (IMAGE_EXIF_MISSING_SCAN_PAGE_SIZE, default 500), stat() each one,
   and delete rows whose source file is NotFound. Permission/IO
   errors are skipped, never deleted. Capped at
   IMAGE_EXIF_MISSING_DELETE_CAP_PER_TICK (default 200) per library
   per tick — so a pathological mount that returns NotFound for
   everything can't wipe the table in one cycle. Cursor advances
   across ticks, wraps on partial-page returns, and naturally cycles
   through the entire library over many minutes. Skipped wholesale
   for Stale libraries via the existing probe gate.

2. Back-ref refresh (DB-only)

   For face_detections / tagged_photo / photo_insights: any
   hash-keyed row whose (library_id, rel_path) no longer matches an
   image_exif row, but whose content_hash does, is repointed at a
   surviving image_exif location. Pure SQL with EXISTS guards so
   rows whose hash is fully orphaned are left alone (the orphan GC
   handles those). Idempotent; no availability gate needed.

   This is what makes a recent → archive move invisible to readers:
   when pass 1 retires the lib-A row, pass 2 pivots tags / faces /
   insights to lib-B's surviving path before any client notices.

3. Orphan GC (destructive)

   Hash-keyed derived rows whose content_hash has no image_exif
   referent are GC-eligible. Two-tick consensus: a hash must be
   observed orphaned on two consecutive ticks AND every library must
   be Online for both. A single Stale tick within the window cancels
   all pending deletes (they remain marked but won't be promoted) —
   they're re-evaluated next tick. The pending set lives in
   OrphanGcState (in-memory); a watcher restart resets it, which can
   only delay a delete, never cause one. Hashes that re-appear in
   image_exif between ticks are "revived" from the pending set
   (handles transient share unmount / remount).

Two new ExifDao methods:
  - list_rel_paths_for_library_page(library_id, limit, offset) for
    the paginated missing-file scan.
  - (count_for_library landed in Branch A.)

Watcher wiring (main.rs)

Per-library: missing-file scan inside the existing per-library
loop, after process_new_files, gated by the same probe check that
already protects ingest. After the loop: reconcile (Branch B),
back-ref refresh, then run_orphan_gc. The maintenance connection is
opened once per tick (image_api::database::connect), used by all
three DB-only passes, and dropped at end of tick.

CLAUDE.md gains a "Maintenance pipeline" subsection that describes
the three passes and their interaction with the existing
availability-and-safety policy.

Tests: 225 pass (217 from Branch B + 8 new in library_maintenance
covering back-ref refresh including the fully-orphaned no-op case,
two-tick GC consensus, Stale-tick consensus reset, image_exif
re-appearance revival, multi-table delete, and the
all_libraries_online helper).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 16:27:53 +00:00
cameron a0283a6362 Merge pull request 'multi-library: hash-keyed tagged_photo + photo_insights with reconciliation' (#68) from feature/hash-keyed-derived-data into master
Reviewed-on: #68
2026-05-01 16:16:38 +00:00
Cameron Cordes 48cac8c285 multi-library: hash-keyed tagged_photo + photo_insights with reconciliation
Branch B of the multi-library data-model rollout. tagged_photo and
photo_insights now follow the bytes (content_hash), not the path,
matching the policy pinned in CLAUDE.md "Multi-library data model".
Branch A's availability probe and EXIF scoping land first; this
branch builds on top.

Migration (2026-05-01-000000_hash_keyed_derived_data)

  Adds nullable content_hash columns to tagged_photo and photo_insights,
  with partial indexes on the non-null subset to keep the index small
  during the transitional window. The migration backfills from
  image_exif:
    * tagged_photo joins on rel_path alone (no library_id available);
    * photo_insights joins on (library_id, rel_path), unambiguous.
  Rows whose image_exif hash isn't known yet stay null and the runtime
  reconciliation pass populates them as the hash backlog drains.

Insert-time population

  TagDao::tag_file looks up image_exif.content_hash by rel_path before
  inserting; the hash is written into the new column.
  InsightDao::store_insight does the same scoped to (library_id,
  rel_path). Caller-supplied hash on InsertPhotoInsight wins; otherwise
  the DAO does the lookup. Both paths fall back to None if the hash
  isn't known yet — reconciliation backfills.

Reconciliation (database/reconcile.rs)

  Three idempotent passes the watcher runs once per tick after the
  per-library backfill loop:
    1. tagged_photo NULL hashes → populate from image_exif by rel_path.
    2. photo_insights NULL hashes → populate by (library_id, rel_path).
    3. photo_insights scalar merge — when multiple is_current rows
       share a content_hash, keep the earliest generated_at as
       current; demote the rest. Demoted rows keep their data so
       /insights/history is unaffected; only the "current" pointer
       narrows to one per hash.

  No filesystem dependency, so reconcile doesn't need the availability
  gate; runs every tick. Logs once when something changed, debug
  otherwise.

  Tags are set-valued under the policy (union on read, already
  DISTINCT in queries), so there is no analogous tag-collapse pass —
  duplicate (tag_id, content_hash) rows across libraries are
  harmless.

Read paths are unchanged in this branch — lookup_tags_batch's
existing rel_path-via-hash-sibling expansion still produces the
correct merge. A follow-up can simplify reads to use the new column
directly for performance.

Tests: 217 pass (212 pre-existing + 5 new in reconcile covering
NULL-fill, hash-not-yet-known no-op, library scoping on insights,
earliest-wins collapse, idempotency).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 14:52:16 +00:00
cameron cce8f0c1b7 Merge pull request 'feature/multi-library-data-model' (#67) from feature/multi-library-data-model into master
Reviewed-on: #67
2026-05-01 14:40:16 +00:00
Cameron Cordes 48ed7be5d9 libraries: initial availability sweep before watcher's first sleep
new_health_map seeds every library as Online, and the watcher's tick
loop sleeps WATCH_QUICK_INTERVAL_SECONDS (default 60s) before its
first probe — meaning /libraries reported the optimistic default for
up to a minute after boot, even when a share was clearly unmounted.

Run the same refresh_health pass once at the top of the watcher
thread before entering the sleep loop. /libraries is then truthful
within milliseconds of the watcher thread starting (effectively from
the first HTTP request, since the watcher spawns well before the
server binds).

The per-tick gate inside the loop is unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 14:33:45 +00:00
Cameron Cordes eea1bf3181 multi-library: availability probe + scoped EXIF queries + collision fixes
Branch A of the multi-library data-model rollout. Three threads of
correctness/safety work that ship together because the new mount
needs all three before it can land:

1. Library availability probe (libraries.rs, state.rs, main.rs)

   New LibraryHealth (Online | Stale { reason, since }) and a shared
   LibraryHealthMap on AppState. Probe checks root_path exists +
   is_dir + readable + non-empty (relative to a "had_data" signal so
   fresh mounts aren't downgraded). The watcher tick begins with a
   refresh_health() per library; stale libraries skip ingest, the
   hash backfill, and face-detection backlog drains for that tick.
   The orphaned-playlist cleanup also gates on every library being
   online — a missing source on a stale library is indistinguishable
   from a transient unmount, and the cleanup is destructive.

   /libraries now returns each library with its current health
   state. Logs only on Online↔Stale transitions so a long outage
   doesn't spam.

   New ExifDao::count_for_library is the "had_data" signal.

2. EXIF queries scoped by library_id (database/mod.rs, files.rs,
   main.rs, tags.rs)

   query_by_exif gains an Option<i32> library filter; /photos and
   /photos/exif now pass it. Without this, an EXIF-filtered request
   scoped to ?library=N returned cross-library results because the
   handler resolved the library but didn't push it through to SQL.

   get_exif_batch gains the same option. The watcher's per-library
   ingest, face-candidate build, and content-hash backfill all
   scope to their library; the union-mode /photos date-sort path
   and the library-agnostic tag fan-out (lookup_tags_batch, by
   design) keep using None.

3. Derivative-path collision fixes (content_hash.rs, main.rs)

   New content_hash::library_scoped_legacy_path helper:
   <derivative_dir>/<library_id>/<rel_path>. Thumbnail generation
   (startup walk + watcher needs-thumb check) and serving now use
   it; serving falls back to the bare-legacy mirrored path so
   pre-multi-library deployments keep working without
   regeneration. Without this, lib2 with the same rel_path as lib1
   would have its thumbnail request short-circuit to lib1's image.

   Orphaned-playlist cleanup walks every library when checking for
   the source video (was: BASE_PATH only). Without this, mounting
   a 2nd library and waiting 24h would delete every playlist whose
   source lived only in the 2nd library.

   The HLS playlist write path collision (filename-only basename,
   not rel_path) is left as a known issue with a TODO at the call
   site — the actor-pipeline rewrite belongs in Branch B/C.

Tests: 212 pass (cargo test --lib). New tests cover the probe
states (online / missing root / non-dir / empty-with-prior-data),
refresh_health transitions, query_by_exif scoping, get_exif_batch
keying on (library_id, rel_path), library_scoped_legacy_path, and
count_for_library.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 14:12:49 +00:00
Cameron Cordes 2f91891459 docs(claude): pin multi-library data model + availability/safety policy
Adds a "Multi-library data model" section that classifies each table as
intrinsic-to-bytes (hash-keyed), user-intent-about-a-photo (hash-keyed),
or library-administrative ((library_id, rel_path)). Spells out merge
semantics on read (union for set-valued, earliest-wins for scalar),
write attribution (binds to bytes, not to current library), the
transitional-state rules for hash-less rows, library handoff behavior
on archive moves, and orphan GC.

Adds a "Library availability and safety" subsection: every watcher
tick begins with a presence probe; destructive paths (move-handoff
re-keying, orphan GC) require both/all libraries online and
confirmed-clean for two consecutive ticks. A NAS reboot, USB pull, or
VPN drop must never trigger destruction — the worst case is that
derived-data work pauses until the share returns.

The face_detections table is referenced as the existing reference
implementation of the policy.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 14:11:42 +00:00
cameron 3d162105f7 Merge pull request 'feature/edit-tag' (#66) from feature/edit-tag into master
Reviewed-on: #66
2026-05-01 01:03:40 +00:00
Cameron 98601973f7 faces: log at the three 503 paths in update_face_handler
PATCH /image/faces/{id} can return 503 from three places (face client
disabled, transient embed error, mid-flight disable) and none of them
were logging — operator sees the status code but nothing in the Rust
log explaining why. Add warn! lines at each so future bbox-edit
failures aren't silent. Response body is unchanged so existing clients
keep working.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 20:57:51 -04:00
Cameron 862917b0d1 gitignore: SQLite WAL runtime + local docs/specs dirs
*.db-shm / *.db-wal show up in the working tree whenever the server
runs (the WAL/journal pragmas in connect()), and /docs and /specs
hold per-feature design notes that stay local per the project's
"spec docs not in git" convention.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 20:31:19 -04:00
Cameron 44d677528e tags: add edit + delete endpoints, enable FK enforcement
PUT /image/tags/{id} renames a tag globally; DELETE /image/tags/{id}
removes a tag and every photo's reference. Rename returns 200/404/409
(case-insensitive name conflict) / 400 (empty name); delete returns
204/404. New migration adds a UNIQUE COLLATE NOCASE index on
tags.name with a pre-flight pass that collapses existing case-
insensitive duplicates onto the lowest id.

The connection setup now sets PRAGMA foreign_keys = ON. The schema
already declares ON DELETE CASCADE / SET NULL on several tables —
those clauses were documentation-only because SQLite has FK
enforcement off per-connection by default. Audited every
diesel::delete site; each touches either no inbound FKs or has a
matching policy. delete_tag relies on the tagged_photo cascade
instead of doing manual cleanup.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 20:26:35 -04:00
cameron 89b743ba54 Merge pull request 'faces: count distinct content_hash in stats total_photos' (#65) from face-stats-dedup-hash into master
Reviewed-on: #65
2026-04-30 22:43:58 +00:00
Cameron Cordes 323097c650 faces: count distinct content_hash in stats total_photos
face_detections is keyed on content_hash (one row per unique bytes,
shared across libraries / duplicate paths) but total_photos was
COUNT(*) over image_exif rows. A file present at multiple rel_paths or
across libraries inflated the denominator without inflating the
numerator, leaving a permanent gap (e.g. 1101/1103 with nothing
actually pending detection).

Switch total_photos to COUNT(DISTINCT content_hash) so numerator and
denominator live in the same domain. Exclude rows with NULL
content_hash from the count — they're held in the hash-backfill
backlog, not the detection backlog, and counting them pins the bar
below 100% for the duration of that pass.

CLAUDE.md: document the stats domain rule next to the rest of the
face-detection notes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 22:41:20 +00:00
cameron d0833177c7 Merge pull request 'feature/face-stats-exclude-videos' (#64) from feature/face-stats-exclude-videos into master
Reviewed-on: #64
2026-04-30 21:17:19 +00:00
Cameron Cordes 67abd8d8ff style: cargo fmt
Pre-existing whitespace drift in test bodies, normalized by rustfmt.
No behavior change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 21:16:34 +00:00
Cameron Cordes 0840d55c70 faces: exclude videos from backlog drain and SCANNED denominator
list_unscanned_candidates pulled every hashed image_exif row, including
videos. filter_excluded then dropped them client-side without writing a
marker, so the same set re-appeared every watcher tick — emitting the
"backlog drain — running detection on N candidate(s)" log forever and
producing no progress.

face_stats.total_photos counted the same video rows in the denominator,
so the SCANNED percentage was structurally capped below 100%.

Add an image-extension SQL predicate (case-insensitive, sourced from
file_types::IMAGE_EXTENSIONS) and apply it to both queries. Videos
never enter the candidate set, total_photos counts only what can
actually be scanned, and 100% becomes reachable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 21:16:30 +00:00
cameron dbb046dfa8 Merge pull request 'indexer: prune EXCLUDED_DIRS at WalkDir time, extract enumerate_indexable_files' (#63) from feature/exclude-dirs-at-index-time into master
Reviewed-on: #63
2026-04-30 20:24:18 +00:00
Cameron Cordes f50655fb21 indexer: apply EXCLUDED_DIRS to remaining WalkDir callers
Audit follow-up to 5bf4956. The same `@eaDir` pruning that protects
the indexer also needs to protect the other walks under library roots:

- `create_thumbnails` walks every file in every library to generate
  thumbnails. Without EXCLUDED_DIRS, it would generate thumbnails of
  Synology's `SYNOFILE_THUMB_*.jpg` thumbnails (thumbnails of thumbnails).
- `update_media_counts` walks for the prometheus IMAGE / VIDEO gauges.
  Without EXCLUDED_DIRS, the gauges over-count by however many phantom
  `@eaDir` images live alongside the real photos.
- `cleanup_orphaned_playlists` walks BASE_PATH searching for source
  videos by filename. EXCLUDED_DIRS isn't a behavior change for typical
  Synology mounts (no .mp4 in @eaDir), but it's a correctness win for
  any operator-defined exclude that happens to contain video.

Refactor: add `walk_library_files(base, excluded_dirs) -> Vec<DirEntry>`
to file_scan.rs as the shared primitive. `enumerate_indexable_files`
now layers media-type + mtime filters on top of it. One new test
covers the lower-level helper (returns all extensions, prunes excluded
subtrees).

`generate_video_gifs` (currently `#[allow(dead_code)]`, not reachable
from main) gets the `update_media_counts` signature update and reads
EXCLUDED_DIRS from env so a future revival isn't broken — but its
WalkDir walk stays raw because the dual lib/bin compile makes the
file_scan module path non-trivial there. Tagged with a comment.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 20:21:17 +00:00
Cameron Cordes 5bf49568f1 indexer: prune EXCLUDED_DIRS at WalkDir time, extract enumerate_indexable_files
Synology drops `@eaDir/.../SYNOFILE_THUMB_*.jpg` files alongside every
photo. The face-detect pipeline already filters those out via
`face_watch::filter_excluded`, but the filter runs *after* the indexer
has already inserted rows into `image_exif`. Result: phantom rows whose
content_hash never matches a `face_detections` row, so the anti-join in
`list_unscanned_candidates` returns them every tick. They're filtered
out at runtime, no marker is written, and the cycle repeats forever —
log spam, wrong stats denominator, and on a real Synology library the
phantom rows balloon into the hundreds of thousands.

Move the exclusion to the WalkDir pass, where filter_entry can prune
whole subtrees instead of walking and discarding leaves. Extract the
pre-existing 30-line walker chain in main.rs::process_new_files into
`file_scan::enumerate_indexable_files` so it's testable in isolation.

Six tests cover the bug (eadir prune), nested patterns, absolute-under-base
syntax, non-media filtering, modified_since semantics, and forward-slash
rel_path normalization.

Out of scope (other WalkDir callers in main.rs that don't yet apply
EXCLUDED_DIRS — thumbnail gen at 1309, media scan at 1377, video
playlist scan at 1685, and two nested walks at 1709 / 1743): separate
audit PR.

Operator note: existing phantom rows still need a one-shot cleanup —
  DELETE FROM face_detections WHERE content_hash IN (
    SELECT content_hash FROM image_exif WHERE rel_path LIKE '%/@eaDir/%'
  );
  DELETE FROM image_exif WHERE rel_path LIKE '%/@eaDir/%' OR rel_path LIKE '@eaDir/%';
Run before attaching a fresh Synology-sourced library.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 19:29:37 +00:00
cameron f358e83050 Merge pull request 'sqlite: enable WAL + busy_timeout in connect(); 408/413/429 transient' (#62) from feature/sqlite-wal-and-413-transient into master
Reviewed-on: #62
2026-04-30 18:16:38 +00:00
Cameron Cordes db9dc63e5e sqlite: enable WAL + busy_timeout in connect(); 408/413/429 transient
The DB connection helper now sets `journal_mode=WAL`, `busy_timeout=5000`,
and `synchronous=NORMAL` on every connection. 13+ DAOs each open their
own connection through this helper and share one SQLite file — without
WAL, a writer's exclusive lock blocks readers and `load_persons` racing
the face-watch write storm errored instantly with "database is locked".
GPU face inference made this visible by speeding detect ~10× and
flooding the writer side. WAL persists in the file once set so the
debug binaries that bypass connect() inherit it automatically.

Also widen face_client.rs's classifier: 408 / 413 / 429 are now Transient
instead of Permanent. These are operator-fixable proxy/infra errors;
marking them Permanent poisons every affected photo with status='failed'
and requires manual SQL to recover. Specifically, Apollo's nginx
defaulted to a 1 MB body cap and silently rejected normal-size photos
before they reached the backend — the deferred-and-retry contract is
the right behavior for that class of fault.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 18:13:15 +00:00
cameron 9443c91f88 Merge pull request 'Face Recognition / People Integration' (#61) from feature/face-recog-phase3-file-watch into master
Reviewed-on: #61
2026-04-30 17:22:08 +00:00
Cameron Cordes 96c539764c docs: face detection system section + per-tick backlog drain env vars
CLAUDE.md gets an "Important Patterns → Face detection system" entry
covering the schema (why content_hash and not (library_id, rel_path)),
the file-watch hook + per-tick backlog drains, auto-bind on tag-name
match, manual-face create with EXIF orientation handling, and the
rerun-preserves-manual-rows contract. README's face section adds
the two new env vars (FACE_BACKLOG_MAX_PER_TICK and
FACE_HASH_BACKFILL_MAX_PER_TICK) shipped this cycle so operators
know they're tunable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 14:06:42 +00:00
Cameron Cordes 675b4a4849 faces: add .env.example template covering all documented env vars
The face-recognition plan and CLAUDE.md document the full env-var
surface (face detection knobs, Apollo / Ollama / OpenRouter / SMS
integrations, watch intervals, RAG flags), but no example file
existed — operators copying the project to a new deploy had nothing
to start from. Group by section, comment out optional integrations
so a minimal copy boots without external services.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 13:51:45 +00:00
Cameron Cordes 5e1bad3179 faces: filter videos out of detection candidate set
The backlog drain pulls every hashed image_exif row, which includes videos.
Sending them to Apollo just produces 422 decode_failed → status='failed'
markers, burning a round-trip per video and inflating the FAILED stat.

Widen filter_excluded to also drop anything is_image_file rejects. Covers
both call sites (file-watch hook and per-tick backlog drain) without
plumbing a second filter through.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 12:45:55 +00:00
Cameron Cordes 1971eeccd6 faces: drain backfill + detection backlog every tick, not just full scans
Symptom: ImageApi restart, then ~60 minutes of silence — no
face_watch lines at all. Cause: backfill + face-detection candidate
build were both gated inside process_new_files, which during quick
scans (every 60s) only walks files modified in the last interval.
The pre-existing unhashed / unscanned backlog never entered the
candidate set, so it only drained on the full-scan path (default
once per hour). Surfaced as "scan stuck at 1101/13118" — most of
those rows were waiting on the next full scan.

Two new per-tick passes that work directly off the DB:

(1) backfill_unhashed_backlog uses ExifDao::get_rows_missing_hash to
    pull unhashed rows in id order, capped (FACE_HASH_BACKFILL_MAX_PER_TICK
    default 2000), and writes content_hash for each. No filesystem
    walk — the walk was the gating filter that hid the backlog.

(2) process_face_backlog uses a new FaceDao::list_unscanned_candidates
    (LEFT-anti-join on content_hash via raw SQL, GROUP BY hash so
    duplicates fire one detect call) to pull a capped batch of
    hashed-but-unscanned rows (FACE_BACKLOG_MAX_PER_TICK default 64)
    and runs the existing face_watch detection pipeline on them.

Both run only when face_client.is_enabled(). The cap on (2) is small
because each candidate is a real Apollo round-trip — 64/tick at 60s
quick interval ≈ 64 detections/min, which paces an 8-core CPU
inference comfortably while keeping a steady flow visible in logs.
process_new_files's own backfill stays in place for the same-tick
flow (a brand-new upload gets hashed AND face-scanned in the tick
where it's discovered) but is now belt-and-suspenders.

Test backstop pinning the new DAO method's filter contract: only
hashed, unscanned, in-library rows are returned; scanned rows,
unhashed rows, and other-library rows are filtered out.
2026-04-30 01:46:49 +00:00
Cameron Cordes c2c1fe5b8b faces: bbox crop respects EXIF orientation + pads enough for RetinaFace
Two reasons manually-drawn bboxes were never resolving a face on
re-detection:

(1) The bbox arrives in display space (browser already applied EXIF
    orientation when rendering the carousel), but the `image` crate
    in crop_image_to_bbox opens raw pre-rotation pixels. For any
    phone photo with Orientation 6/8/etc., applying the bbox without
    rotating first crops a completely different region of the image
    — landing on background, hair, or empty pixels. Now reads the
    EXIF Orientation tag and applies it before indexing into the
    canonical-oriented dims.

(2) Padding was 10 % on each side. A typical 200×250 face bbox +
    10 % becomes ~240×300; insightface resizes that to det_size=640,
    so the face fills ~95 % of the input. RetinaFace's anchors
    expect faces at 20–60 % of input dimensions; at 95 % it
    routinely returns zero detections. Bumped to 50 % padding so the
    crop is 2× the bbox dims and the face occupies ~50 % of the
    input — anchor-friendly. Bbox is still clamped to image bounds,
    so edge-of-image cases just get less padding on the clipped
    side.

Together these explain why bbox-edit re-embed practically always
fell into the "no face detected" branch (and bbox-edit reverts
without the recent soft-fallback commit). Per-photo embedding
quality also improves slightly — same face, more context, better
landmarks for ArcFace.
2026-04-30 01:06:08 +00:00
Cameron Cordes 5a2f406429 faces: bbox edits survive when re-detection finds no face
Moving a tagged bbox off-center (to fine-tune position, or onto a
back-of-head the operator already manually tagged) made
update_face_handler 422 because the re-embed step ran detection on
the new crop and found nothing. Frontend's catch then reverted the
optimistic update — visible as the bbox snapping back the moment the
user released their drag.

The re-embed is a soft contract: a fresh ArcFace vector is preferable,
but the operator's bbox edit is sacred. Now:

  - empty faces[] → keep old embedding, apply the bbox, log info
  - permanent embed error → keep old embedding, apply the bbox, log info
  - bad-bytes embedding → keep old embedding, apply the bbox, log warn
  - transient failure (cuda_oom, engine unavailable) still 503s so
    the operator can retry — those are recoverable and we don't want
    to silently drift cluster math on retries that succeed later

Cost: a slightly stale embedding for the row, which marginally
affects clustering / auto-bind cosine for files re-detected against
this person. Accepted because dropping the user's manual drag every
time the new crop happens to lose detection is a much worse UX —
especially for the force-create rows (back of head, profile) where
re-detection will *always* fail.
2026-04-30 01:01:07 +00:00
Cameron Cordes 6a6a4a6a46 tags: batch lookup expands content-hash siblings cross-library
The first cut matched by rel_path only — fine for single-library
deploys but wrong for multi-library setups where the same content
lives under different rel_paths (e.g. a backup mount holding copies
of the primary library). A tag applied under library A would silently
not appear in the library-B grid badge even though the carousel's
per-path /image/tags would resolve it correctly via siblings.

The batch handler now does the expansion server-side in three queries
regardless of input size:

  1. image_exif batch lookup → query path → content_hash
  2. image_exif JOIN by content_hash → all sibling rel_paths sharing
     each hash (paths are deduped across libraries)
  3. tagged_photo + tags JOIN over the union of (query + sibling)
     rel_paths

Tags are then aggregated back to query paths via a sibling→originals
reverse map, deduped by tag id. Files without a content_hash (just
indexed, hash compute pending, etc.) skip step 2 and only get tags
from their own rel_path — same fallback the per-path handler uses.

Adds ExifDao::get_rel_paths_for_hashes (batch counterpart of
get_rel_paths_by_hash) chunked at 500 to stay under SQLite's
SQLITE_LIMIT_VARIABLE_NUMBER. Five queries for a 4k-photo grid is
still ~800x cheaper than per-path HTTP fan-out.
2026-04-30 00:36:44 +00:00
Cameron Cordes 3112260dc8 tags: batch lookup endpoint to collapse photo-match fan-out
Apollo's photo-match enrichment fanned out one ``GET /image/tags?path=``
per record (bounded concurrency 20) — for a 4k-photo time window that
meant ~4000 round-trips, each briefly contending the tag-dao mutex.
The cost dwarfed the actual SQL.

Add a single ``POST /image/tags/lookup`` body ``{paths: [...]}``
returning ``{path: [tag, ...]}`` with only paths that have at least
one tag. SqliteTagDao gains ``get_tags_grouped_by_paths`` which JOINs
tagged_photo + tags and chunks the IN clause at 500 (safely under
SQLite's variable limit). Five queries for a 4k-photo grid is ~800x
cheaper than 4k HTTP calls.

Trade-off: the batch matches by rel_path directly and does not do the
cross-library content-hash sibling expansion that the per-path
``GET /image/tags`` does. For Apollo's grid that's accepted as
deliberate — single-library deploys see no difference, multi-library
deploys with rel_path-divergent siblings might miss a tag in the grid
badge but the carousel still resolves full sibling tags via the
per-path endpoint when opened. If sibling sharing in the grid becomes
load-bearing, extend the handler to JOIN image_exif on content_hash.
2026-04-30 00:28:33 +00:00
Cameron Cordes 16abacf4c5 faces: backfill no longer stalls on chronic-error files at the front
The content-hash backfill capped at 500/tick AND counted errors
against that cap. So a pocket of files that errored every time
(vanished mid-scan, permission denied, unreadable) at the head of the
exif_records iteration order burned the entire budget every tick and
the rest of the backlog never advanced — surfacing as a face-scan
stuck at e.g. 44% with no progress. Without a content_hash, those
photos never become face-detection candidates, so it looks like
detection is broken when really it's the prerequisite hash that
isn't filling.

Two fixes:

  - Cap on successes only. Errors still get counted and logged but
    don't burn the per-tick budget; the loop keeps moving past them
    to the working files behind. Errors are bounded by the unhashed
    backlog size (each record walked at most once per tick), so this
    can't run away.

  - Always log the unhashed backlog count when non-zero. Previously
    "stuck at 44%" looked silent from the outside; now every tick
    surfaces "backfilled N/M; K still need backfill" so an operator
    can tell backfill is making progress (or isn't).

Also bumps the default cap from 500 to 2000. Hashing is cheap (blake3
+ one DB UPDATE), and 500 was conservative for a personal-scale
library where 10k+ unhashed files is a normal first-run state.
2026-04-30 00:03:26 +00:00
Cameron Cordes 891a9982ef faces: force-create path for regions the detector can't see
Adds an opt-in 'force' flag to POST /image/faces. When set, the handler
skips the Apollo embed call entirely and stores the row with a
2048-byte zero-vector embedding under the sentinel model_version
'manual_no_embed'. The row participates as a browse-by-person tag but
is excluded from clustering and auto-bind:

- face_clustering._decode_b64_embedding filters norm<=0 (already)
- cluster suggester groups by model_version, so the sentinel never
  mixes with real buffalo_l rows
- cosine_similarity with a zero vector resolves to 0/NaN, never
  crossing the 0.4 auto-bind threshold

Use case: tag someone looking away from the camera, profile shot,
heavily-occluded face — anywhere the detector returns no_face_in_crop
on the user's drawn region. The frontend only sets force=true after a
422 from a strict create plus an explicit operator confirmation, so
the normal "draw a centered face" UX still gets a real ArcFace
embedding.
2026-04-29 23:49:34 +00:00
Cameron Cordes 0eaf27d2d3 faces: cover hydrate_face_with_person — assigned + unassigned branches
Two unit tests pinning the response shape that PATCH/POST /image/faces
relies on. They use the existing in-memory SQLite harness and exercise
the helper directly:

- assigned: person_name resolves through the persons join and bbox /
  source / person_id round-trip cleanly.
- unassigned: person_name is None (not stale, not omitted), person_id
  is None.

These would have caught the prior regression — when the handlers
returned a bare FaceDetectionRow, person_name was structurally absent
from the response shape. A test that asserts person_name is populated
when person_id is set forces the join (or any equivalent) to exist.

A dangling-person_id case isn't covered: the FK on face_detections
makes that state structurally impossible at rest (ON DELETE SET NULL
zeroes the column when a person is removed), so there's nothing to
defend against.
2026-04-29 23:41:52 +00:00
Cameron Cordes 0c2f421a1f faces: PATCH/POST /image/faces returns person_name with the row
Both create_face_handler and update_face_handler returned the bare
FaceDetectionRow, so PATCH /image/faces/{id} (used by both bbox edits
and person assignment) replied without person_name. The carousel
overlay does an optimistic replace on this row — replacing the joined
FaceWithPerson with a row that has person_name = undefined visibly
dropped the VFD label off the bbox after every save.

Add a small hydrate_face_with_person helper that does the persons
lookup and assembles a FaceWithPerson, used by both handlers. The
list endpoint already does the join, so the PATCH/POST shape now
matches it.
2026-04-29 23:38:24 +00:00
Cameron Cordes 43cb60d3ad faces: re-embed on bbox edit instead of leaving the embedding stale
Phase 2 stored the new bbox on PATCH /image/faces/{id} but logged
"embedding now stale (Phase 3 will re-embed)" and moved on. That left
the embedding column pointing at the *old* face area while the bbox
described a new one — auto-bind cosine similarity and the cluster
suggester would silently rank the row as "the same face it was before
the edit" forever after, even though the geometry no longer matched.

Now: when the PATCH includes a bbox, the handler:
  1. Looks up the row to find its photo (library_id + rel_path).
  2. Crops the new bbox region with the same crop_image_to_bbox helper
     manual-create uses (10% pad on each side so the detector has
     ear/jaw context).
  3. POSTs the crop to face_client.embed for a fresh ArcFace vector.
  4. Stores both the new bbox AND the new embedding in one
     update_face transaction.

Errors map cleanly:
  - face_client disabled → 503 (bbox edit needs Apollo).
  - decode failure / no face in crop → 422.
  - Apollo CUDA OOM / unavailable → 503 transient.
  - Underlying row missing → 404.

About 100-500ms per edit on CPU, dominated by Apollo's inference call.
Acceptable for a manual operator action; the alternative (stale
embedding) silently broke the rest of the face stack.

Prerequisite for the upcoming carousel-side draw/resize bbox UI —
without re-embed, every operator-driven bbox tweak would corrode the
clustering/auto-bind quality. ApiPatchFaceBody on Apollo's side
already passes bbox through verbatim, so no Apollo change needed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 23:10:25 +00:00
Cameron Cordes 7303fb8aa3 faces: ignore/junk bucket — DB schema + lazy-create endpoint
A single global "Ignored" person row, marked is_ignored=true, that the
frontend lazily creates on first use to hold strangers, false
detections, and faces the user doesn't want bound to a real person.

Schema (new migration 2026-04-29-000200_add_is_ignored):
  - persons.is_ignored BOOLEAN NOT NULL DEFAULT 0
  - Partial index on (is_ignored) WHERE is_ignored = 1; small WHERE
    set means a tiny index that only ever services the bucket lookup.

Why a real persons row instead of a separate table or status enum:
  - face_detections.person_id stays a clean foreign key — no special
    code paths for "ignored faces" anywhere else in the schema.
  - The cluster-suggester already filters by `person_id IS NULL`, so
    bound-to-ignored faces are naturally excluded from re-clustering
    without any change.
  - merge / rename / delete all work on it with the existing routes
    (the management UI just hides it from default views).

DAO additions / changes:
  - get_or_create_ignored_person (idempotent; race-safe via the
    UNIQUE COLLATE NOCASE on persons.name + retry-on-409 fallback).
  - list_persons gains an include_ignored parameter; default false
    so the management screen hides the bucket unless asked.
  - find_persons_by_names_ci filters is_ignored=0 in SQL so the
    auto-bind path can NEVER target the bucket — even if the user
    happens to tag photos as "Ignored", the heuristic look-up skips
    it. Bucket assignment is always an explicit operator action.
  - update_person accepts is_ignored: Option<bool> so a person can
    be moved into / out of the bucket without a delete + recreate.

Routes:
  - POST /persons/ignore-bucket — returns the bucket, creating it on
    first call. Frontend uses this lazily right before binding.
  - GET /persons gains ?include_ignored=true; default behavior
    unchanged.
  - PATCH /persons/{id} now accepts is_ignored.

Tests: ignore_bucket_idempotent_and_filters_auto_bind covers the
contract: bucket is idempotent across calls, find_persons_by_names_ci
skips it (even on exact name match), default list_persons hides it,
include_ignored=true surfaces it. All other tests updated to pass
the new is_ignored: false / Option<bool> fields explicitly.

cargo test --lib: 181/0; fmt + clippy clean for new code.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 22:48:16 +00:00
Cameron Cordes 0e160f5d22 faces: include bbox on /faces/embeddings response
Apollo's cluster suggester wants to render a *face*-cropped thumbnail
for each cluster's representative — a multi-person photo with the
cluster about 'one' of them was unreadable when the thumb showed the
whole image. Plumbing bbox through means the UI can crop to the rep
face without an extra round-trip per cluster.

FaceEmbeddingRow gains bbox_x/y/w/h (Optional<f32>, mirrors the column
nullability — for status='detected' rows the CHECK constraint
guarantees they're populated, but the type stays nullable as
documentation). list_embeddings already loaded these from the
underlying FaceDetectionRow; this commit just stops dropping them
when constructing the response.

No DB changes; no behavior change for existing callers (the new
fields are additive).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 21:01:58 +00:00
Cameron Cordes a24fac5511 faces: backfill missing content_hash from the file watcher
Photos indexed before content-hashing landed (or where the hash compute
failed silently on insert) end up in image_exif with NULL content_hash.
build_face_candidates keys on content_hash, so those rows would never
become face candidates without backfill — symptom: face detection logs
nothing despite photos being in the library and the watcher running.

The dedicated `backfill_hashes` binary already handles this; this
commit lets the watcher self-heal during full scans so the deploy
'just works' for face recognition without operator action.

Idempotent — subsequent scans see populated hashes and no-op. Bounded
per tick by FACE_HASH_BACKFILL_MAX_PER_TICK (default 500) so a watcher
tick on a 50k-photo legacy library doesn't blake3 every file in one
shot. For very large backlogs the dedicated binary is still faster
(no DAO mutex contention with the watcher loop).

Only runs when face_client.is_enabled(), so legacy deploys without
APOLLO_FACE_API_BASE_URL keep the same behavior.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 20:41:08 +00:00
Cameron Cordes 23f4941471 faces: surface enabled/disabled state + per-tick candidate count
Manual deploy debugging: 'Saved thumbnail' logs were visible (boot-time
thumbnail backfill) but no face_watch logs were appearing, with no
obvious way to tell whether the integration was disabled, hadn't reached
a full scan yet, or had simply seen no new files.

Two log lines:
  - watch_files startup: 'Face detection: ENABLED' / 'DISABLED (set
    APOLLO_FACE_API_BASE_URL or APOLLO_API_BASE_URL to enable)' so
    you can tell at a glance whether the env wired through.
  - process_new_files (debug-level): 'face_watch: scan tick — N image
    file(s) walked, M candidate(s) (library 'main', modified_since=...)'
    so an empty-candidate scan is distinguishable from a misconfigured
    or skipped one without bumping log level for the rest of the
    watcher.

No behavior change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 20:19:17 +00:00
Cameron Cordes 41f93d70d1 faces: tighten bootstrap candidate filter, bump to 1.1.0
Filter <3-char tags and emoji/symbol-bearing tags out of the bootstrap
candidate list before grouping. Manual testing surfaced these as noise
the operator never tickets — they pushed real candidates lower in the
list and made the UI harder to scan. This is a hard filter (drop from
candidates entirely), not a heuristic flag — looks_like_person still
governs the default-checked decision for the rows that *do* survive.

is_plausible_name_token rules:
  - >= 3 chars after trimming (rejects "AB", "OK", whitespace-only)
  - Each char is alphabetic (any script — covers Renée, José, 田中太郎),
    whitespace, name-punctuation (' - . _ U+2019), or ASCII digit
  - Anything else (emoji, symbols, math, arrows, control codes) drops
    the whole tag

Digits stay allowed at this layer; looks_like_person handles "Trip 2018"
on the heuristic side. Lets a "Sarah2" alias still appear so the
operator can spot and confirm it manually, just unticked by default.

Cargo version bump 1.0.0 → 1.1.0 marks the face-recog feature surface
landing — Phase 2's schema + endpoints, Phase 3's file-watch hook, and
Phase 4's bootstrap + auto-bind are all behind APOLLO_FACE_API_BASE_URL,
so legacy 1.0 deploys without that env see no behavior change.

Tests: 1 new (faces::tests::is_plausible_name_token_filters_short_and_emoji)
covers the accept-list (Latin/accented/Asian scripts, hyphenated and
apostrophe names) and the reject-list (length floor, emoji classes,
symbols, leading/trailing whitespace handling).

cargo test --lib: 180 / 0; fmt + clippy clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 19:05:04 +00:00
Cameron Cordes 1859399759 faces: phase 4 — people-tag bootstrap + auto-bind on detection
Wires the existing string people-tags into the new persons table and
auto-binds new detections to a same-named person when the photo carries
exactly one matching tag. ImageApi has no notion of which tags are
people-tags today (purely a user mental model), so this is operator-
confirmed: the suggester surfaces candidates with a heuristic flag, the
operator confirms, then bootstrap creates persons rows. Auto-bind
follows on every detection thereafter.

New endpoints:
  GET  /tags/people-bootstrap-candidates
       Per case-insensitive name group: display name (most-frequent
       capitalization), normalized lowercase, summed usage_count,
       looks_like_person heuristic flag, already_exists check against
       the persons table. Sorted persons-likely-first then by count.
  POST /persons/bootstrap
       Body: {names: [string]}. Idempotent — pre-fetches the existing-
       name set so a duplicate request reports per-row "already exists"
       instead of 409-ing each insert. Created rows get
       created_from_tag=true; failed rows surface in `skipped` with a
       reason.

looks_like_person heuristic — conservative on purpose because the
operator confirms in the UI:
  - 1–2 whitespace-separated words
  - Each word starts uppercase, no digits anywhere
  - Single-word names not on a small denylist (cat, christmas, beach,
    sunset, untagged, ...). Two-word names skip the denylist so
    "Sarah Smith" is never false-rejected.

FaceDao additions:
  - find_persons_by_names_ci — bulk lowercase-name → person_id lookup
    via sql_query (Diesel's BoxedSelectStatement + LOWER() doesn't
    play well with the type system).
  - person_reference_embedding — L2-normalized mean of a person's
    detected embeddings, *filtered by model_version* so a future
    buffalo_xl row can never contaminate an in-flight buffalo_l auto-
    bind decision. Returns None when the person has no faces yet.
  - assign_face_to_person — sets face_detections.person_id and, only
    when persons.cover_face_id is NULL, claims this face as cover. The
    UI's hand-picked cover survives later auto-binds.
  - decode_embedding_bytes / cosine_similarity helpers — pub(crate)
    so face_watch can decode the wire bytes once and feed them through
    the cosine threshold.

Auto-bind in face_watch::process_one:
  After every successful detect, for each newly-stored auto face we
  pull the photo's tags, look up which (if any) map to existing
  persons, and:
    - skip when zero or multiple distinct persons are matched
      (multi-match is genuinely ambiguous; cluster suggester handles it)
    - on first face for a person: bind unconditionally so bootstrap can
      ever produce a usable reference
    - thereafter: bind iff cosine(new_emb, person_ref) >=
      FACE_AUTOBIND_MIN_COS (default 0.4, env-tunable to 0..=1)
  The reference embedding comes from person_reference_embedding under
  the same model_version as the candidate, so a model upgrade never
  silently re-anchors a person's centroid.

Plumbing: watch_files now constructs its own SqliteTagDao alongside the
other watcher DAOs and threads it through process_new_files →
run_face_detection_pass → process_one. The handler-side TagDao
registration in main.rs already covers bootstrap_candidates_handler;
no extra app_data wiring needed.

Tests: 8 new (faces.rs):
  - looks_like_person accepts/rejects/two-word-skips-denylist (3)
  - cosine_similarity on identical / orthogonal / opposite / mismatch /
    zero / empty inputs
  - decode_embedding_bytes round-trip + size validation
  - find_persons_by_names_ci groups case + handles empty input
  - person_reference_embedding filters by model_version (buffalo_l ref
    must not include buffalo_xl rows)
  - assign_face_to_person sets cover when unset, doesn't overwrite

cargo test --lib: 179 / 0; fmt + clippy clean for new code.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 18:55:01 +00:00
Cameron Cordes f985a0d658 faces: surface UNIQUE constraint as 409, not 500
Manual smoke test caught a bug: POST /persons with a duplicate name
returned 500 with the body 'insert person Cameron' instead of the
intended 409 Conflict.

Root cause: the handler keyed on `format!("{}", e).contains("unique")`,
but anyhow's plain Display only renders the *outermost* context
("insert person Cameron") and hides the diesel error nested below
('UNIQUE constraint failed: persons.name'). The string check was a
false negative on every duplicate.

Fix: walk the source chain and downcast for
diesel::result::Error::DatabaseError(UniqueViolation, _) — exposed
via a shared `is_unique_violation` helper used by both
create_person_handler and update_person_handler. Error bodies for
non-unique failures now use `{:#}` so the body actually carries the
underlying cause when the user surfaces it.

merge_persons_handler also moves to `{:#}` for richer error bodies;
the "itself" check was already structural and unaffected.

Regression test (faces::tests::is_unique_violation_walks_chain) pins
both the bug shape ({} doesn't surface UNIQUE) and the fix
(is_unique_violation correctly downcasts the chain), so a future
refactor of error handling can't silently re-bury this.

cargo test --lib: 171 / 0; fmt + clippy clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 18:44:10 +00:00
Cameron Cordes 4dee7b6f73 faces: phase 3 — file-watch hook drives auto detection
Wire face detection into ImageApi's existing scan loop so new uploads
pick up faces automatically and the initial backlog grinds through on
full-scan ticks. No new job system; Phase 2's already_scanned check
makes the work implicitly idempotent (one face_detections row per
content_hash, including no_faces / failed marker rows).

face_watch.rs (new):
  - run_face_detection_pass(library, excluded_dirs, face_client,
    face_dao, candidates) — sync entry point. Builds a per-pass tokio
    runtime and fans out detect calls bounded by FACE_DETECT_CONCURRENCY
    (default 8). The watcher thread itself stays sync.
  - filter_excluded — applies the same PathExcluder /memories uses, so
    @eaDir / .thumbnails / EXCLUDED_DIRS-listed paths skip detection
    before we burn a detect call (and Apollo's GPU memory) on junk.
  - read_image_bytes_for_detect — RAW/HEIC route through
    extract_embedded_jpeg_preview because opencv-python-headless can't
    decode either; everything else gets a plain std::fs::read so EXIF
    orientation reaches Apollo's exif_transpose intact.
  - process_one — translates Apollo's response into the Phase 2 marker
    contract: faces[] empty → no_faces; FaceDetectError::Permanent →
    failed (don't retry); Transient → no marker (next scan retries);
    success with N faces → N detected rows with the embeddings unpacked.

main.rs (process_new_files + watch_files):
  - watch_files now also takes face_client + excluded_dirs; the watcher
    thread builds a SqliteFaceDao the same way it builds ExifDao /
    PreviewDao.
  - After the EXIF write loop, build_face_candidates queries image_exif
    for the just-walked image paths' content_hashes (covers new uploads
    and pre-existing backlog), filters out anything already_scanned, and
    hands the rest to face_watch::run_face_detection_pass.
  - Bypassed wholesale when face_client.is_enabled() is false — keeps
    the watcher usable on legacy deploys where Apollo isn't configured.

Tests: 5 face_watch unit tests cover the parts that don't need a real
Apollo:
  - filter_excluded drops dir-component patterns (@eaDir) without
    matching substring file names (eaDir-not-a-thing.jpg keeps).
  - filter_excluded drops absolute-under-base subtrees (/private).
  - empty EXCLUDED_DIRS short-circuits cleanly.
  - read_image_bytes_for_detect passes JPEG bytes through verbatim
    (orientation must reach Apollo unmodified).
  - read_image_bytes_for_detect falls through to plain read when a
    RAW-extension file has no embedded preview, so Apollo gets a chance
    to 422 and we mark failed rather than infinitely-retrying.

cargo test --lib: 170 / 0; fmt and clippy clean for new code.
End-to-end (drop a photo → face_detections row appears) needs Apollo
running and is deferred to deploy-time verification.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 18:21:19 +00:00
Cameron Cordes f77e44b34d faces: fix PathExcluder false-positive + cover face_client/crop in tests
PathExcluder was iterating every component of the absolute path,
including the system prefix. Two of the existing memories tests had
been failing on master because tempdir() lives under /tmp on Linux
and a pattern like "tmp" then matched the system /tmp component
rather than anything the user actually asked to exclude. Phase 3's
file-watch hook will use the same code to skip @eaDir / .thumbnails
under each library's BASE_PATH, so the bug would hide every photo
on a host whose BASE_PATH passes through a directory named the same
as a user pattern.

Fix: store base in PathExcluder and strip it before scanning
components. A path that lives outside base falls through to the
no-match branch (defensive — nothing legit hits that today).

Also extracted the face_client error classification into a pure
classify_error_response(status, body) so the marker-row contract
with Apollo (422 → Permanent / 'failed', 5xx → Transient / defer)
is unit-testable without spinning up an HTTP server.

New tests:
  memories::tests::test_path_excluder_*           — 2 previously
    failing tests now pass.
  ai::face_client::tests::classify_*              — 4 cases:
    422 decode_failed → Permanent, 503 cuda_oom → Transient
    (handles both string and {code:..} detail shapes), 5xx →
    Transient + other 4xx → Permanent, unparseable HTML body still
    classifies on status.
  faces::tests::crop_*                            — 3 cases:
    invalid bbox rejected, valid bbox round-trips through JPEG
    decode, corner crop with 10% padding clamps inside source.

cargo test --lib: 165 passed / 0 failed (was 156 / 2 failed).
cargo fmt and clippy on new code clean. The remaining
sort_by clippy warnings in pre-existing files (memories.rs,
files.rs, exif.rs) are unrelated and present on master.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 18:09:44 +00:00
Cameron Cordes 860169032b faces: phase 2 — schema + manual face/person CRUD
Land the persistence model and HTTP surface for local face recognition.
Inference still lives in Apollo (Phase 1); this side adds the data home
plus every endpoint Apollo's UI and FileViewer-React will consume.

Schema (new migration 2026-04-29-000000_add_faces):
  - persons: visual identities. Optional entity_id bridges to the
    existing knowledge-graph entities table; auto-bridging is left to
    the management UI (we don't muddy LLM provenance from face rows).
    UNIQUE(name COLLATE NOCASE) so 'alice' / 'Alice' fold to one row.
  - face_detections: keyed on content_hash (cross-library dedup), with
    status='detected' carrying bbox + 512-d embedding BLOB, and
    'no_faces' / 'failed' marker rows that tell Phase 3's file watcher
    not to re-scan. Marker invariant enforced via CHECK; partial UNIQUE
    on content_hash WHERE status='no_faces' guards against double-marks.

Schema regenerated with `diesel print-schema` against a clean migration
run; joinables added for face_detections → libraries / persons and
persons → entities.

face_client.rs (sibling of apollo_client.rs):
  - reqwest multipart, 60 s timeout (CPU inference on a backlog can be
    slow; bounded threadpool on Apollo serializes calls anyway).
  - FaceDetectError::{Permanent, Transient, Disabled} — Phase 3 keys
    its marker-row decision on this. 422 → mark failed, 5xx → defer.
  - APOLLO_FACE_API_BASE_URL falls back to APOLLO_API_BASE_URL when
    unset; both unset = is_enabled() false, callers no-op.

faces.rs (DAO + handlers):
  - SqliteFaceDao implements the full FaceDao trait; person face counts
    go through sql_query because diesel's BoxedSelectStatement +
    group_by trips trait-resolver recursion.
  - merge_persons re-points face rows in a transaction, copies notes
    when target's are empty, deletes src.
  - manual POST /image/faces resolves content_hash through image_exif,
    crops the user-drawn bbox with 10% padding (detector wants context
    around ears/jaw), POSTs the crop to face_client.embed for a real
    ArcFace vector, then inserts source='manual'.
  - Cluster-suggest (Phase 6) gets its data from
    GET /faces/embeddings — base64-encoded paged BLOBs so Apollo's
    DBSCAN can stream them without ImageApi pre-aggregating.

Endpoints registered alongside add_*_services in main.rs:
  GET    /faces/stats?library=
  GET    /faces/embeddings?library=&unassigned=&limit=&offset=
  GET    /image/faces?path=&library=
  POST   /image/faces                        (manual create via embed)
  PATCH  /image/faces/{id}
  DELETE /image/faces/{id}
  GET    /persons?library=
  POST   /persons
  GET    /persons/{id}
  PATCH  /persons/{id}
  DELETE /persons/{id}?cascade=set_null|delete   (set_null default)
  POST   /persons/{id}/merge
  GET    /persons/{id}/faces?library=

The file-watch hook (Phase 3) and the rerun-on-one-photo handler
(Phase 6) live behind the FaceDao methods marked dead_code today —
they're called only when those phases land. Same shape for the trait
methods that aren't reached by Phase 2 routes.

Tests: 3 DAO unit tests cover person CRUD + case-insensitive uniqueness,
marker-row idempotency (mark_status is a no-op when any row exists),
and merge re-pointing faces.

Cargo.toml: reqwest gains the `multipart` feature.

cargo build / cargo test --lib / cargo fmt / cargo clippy --all-targets
all clean for the new code; the two pre-existing test_path_excluder
failures and the pre-existing sort_by clippy warnings are unrelated and
present on master.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 18:03:42 +00:00
cameron 6642db3c8b Merge pull request 'feat/apollo-places-tool and Geo Tagging Exif' (#60) from feat/apollo-places-tool into master
Reviewed-on: #60
2026-04-28 23:09:33 +00:00
Cameron Cordes 57fb0bcd3c EXIF GPS write: POST /image/exif/gps via exiftool
New endpoint accepts {path, library, latitude, longitude} and shells
out to exiftool to write GPSLatitude/GPSLongitude (with N/S, E/W refs)
into the file's EXIF in place. After the write, the handler
re-extracts EXIF and updates the image_exif row so the DB stays in
sync — the response carries the updated metadata block in one
round-trip. Falls through to store_exif if the row is missing.

`exif::write_gps` is the small helper. `-overwrite_original` so no
.orig sidecar is left behind. Validates lat/lon range + supports_exif
before spawning exiftool. Format support matches the existing read
path (JPEG / TIFF / RAW / HEIF / PNG / WebP) — videos still need a
different writer and aren't covered.

Apollo's "+ PIN" carousel button (separate commit on the Apollo side)
calls this through /api/photos/exif/gps. Drive-by: cargo fmt one-line
collapse on apollo_client.rs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 22:25:40 +00:00
Cameron Cordes 4ae7be35e9 Apollo Places: enrich insights with personal place name + notes
Optional integration with the sibling Apollo project's user-defined
Places (name + lat/lon + radius_m + description + category). When
APOLLO_API_BASE_URL is set, the per-photo location resolver folds the
most-specific containing Place into the LLM prompt's location string —
"Home (My house in Cambridge) — near Cambridge, MA" rather than the
city name alone. Smallest-radius wins; Apollo sorts server-side via
/api/places/contains, so the carousel badge in Apollo and the prompt
string here always agree.

Adds an agentic tool `get_personal_place_at(latitude, longitude)` that
the LLM can call during chat continuation. Tool description tells the
model the call returns the user's free-text notes, not just a name.
Deliberately narrow — no enumerate-all variant, lat/lon required.

Unset APOLLO_API_BASE_URL = legacy Nominatim-only path, tool is not
registered. 5 s timeout; all errors degrade silently.

Tests: 5 unit tests for compose_location_string (Apollo only, Nominatim
only, both, both-with-description, neither).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 19:11:12 +00:00
cameron 9d58547ce3 Merge pull request 'feat/raw-thumb-embedded-preview' (#59) from feat/raw-thumb-embedded-preview into master
Reviewed-on: #59
2026-04-28 17:21:27 +00:00
Cameron Cordes 6521a328bf RAW preview: exiftool fallback for MakerNote / SubIFD previews
kamadak-exif's In::PRIMARY / In::THUMBNAIL only address IFD0 and IFD1.
On modern Nikon NEFs the full-res review JPEG lives in the MakerNote's
PreviewIFD (and many Canon CR2s / DNGs put theirs in a SubIFD chain) —
both unreachable through the existing reader, so the previous patch
still produced no preview for those files and the pipeline fell through
to ffmpeg, which writes black frames when it can't decode the RAW.

Add a slow-path layer in extract_embedded_jpeg_preview that shells out
to exiftool for PreviewImage / JpgFromRaw / OtherImage (one process per
tag). All candidates from both layers are pooled and the largest valid
JPEG wins. exiftool not on PATH degrades to fast-path-only behavior
rather than breaking — the fallback is a strict superset.

Documented the new optional dependency in README.md and CLAUDE.md with
install commands for apt / brew / winget / choco.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 17:13:36 +00:00
Cameron Cordes 00b3c80141 RAW: try IFD0 + IFD1 for embedded preview, serve at full size
The thumbnail pipeline's embedded-JPEG extractor only checked IFD1
(THUMBNAIL), which on many Nikon NEFs is missing or zero-length even
when IFD0 (PRIMARY) carries a perfectly good 1-2 MP reduced-resolution
preview the camera writes for in-body review. The previous behavior
produced black thumbs on disk: the buggy IFD1 pointer resolved to a
short byte sequence that happened to satisfy the SOI sanity check,
image::load_from_memory accepted it, and the resize path quietly wrote
a black JPEG.

Now both IFDs are checked and the larger valid JPEG wins. Format-
agnostic: applies to every TIFF-based RAW (NEF / ARW / CR2 / DNG / RAF /
ORF / RW2 / PEF / SRW / TIFF). is_tiff_raw is now pub so main.rs can
gate its full-size handler on it.

Also extends the /image handler so size=full requests for RAW formats
serve the embedded preview as image/jpeg instead of NamedFile-streaming
the original RAW bytes - browsers can't decode a .nef container, so
<img src=...> would otherwise land as a broken image. Falls through to
NamedFile if no preview is present, preserving the historical behavior
for callers that genuinely want the original bytes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 16:52:10 +00:00
cameron a53c3ae514 Merge pull request 'feature/exif-batch-endpoint for Apollo' (#58) from feature/exif-batch-endpoint into master
Reviewed-on: #58
2026-04-28 12:58:30 +00:00
Cameron Cordes 7621282419 Thumb orientation + library filter on /photos/exif
Two follow-ups on the same feature branch:

1. Bake EXIF orientation into generated thumbnails. The `image` crate
   doesn't apply Orientation on load, and `save_with_format(..Jpeg)`
   drops EXIF — so portrait phone shots ended up sideways in any client
   that displays the cached thumb directly (no EXIF tag for the browser
   to compensate from). New `exif::read_orientation` reads the tag
   cheaply (no full EXIF parse) and `exif::apply_orientation` does the
   rotate/flip via image's existing `rotate90/180/270` + `fliph/flipv`.
   Applied in both branches of `generate_image_thumbnail` (RAW embedded-
   JPEG path and the regular `image::open` path). Existing thumbnails
   in the cache are still wrong-orientation; wipe the thumb dir or run
   a one-off backfill once this lands.

2. Optional `library` query param on `/photos/exif`. Accepts numeric id
   or name (same shape as `/image?library=...`), resolved via the
   existing `resolve_library_param` helper so a bad value 400s before
   we touch the DAO. Filter is applied post-query in the handler
   rather than pushed into `query_by_exif` to keep the DAO trait
   (and its test mocks) unchanged. Cheap enough at typical library
   counts; can be moved into SQL later if it ever isn't.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 17:29:36 -04:00
Cameron Cordes c6f82ebaba Batch EXIF endpoint: GET /photos/exif
Adds a single round-trip projection of `image_exif` for every photo whose
`date_taken` falls in `[date_from, date_to]`. Wraps the existing
`ExifDao::query_by_exif` DAO method which already handles the SQL filter
in one query against the covering index — the only missing piece was
HTTP plumbing.

Designed for window-scoped consumers like Apollo's photo-to-track
matcher, which currently does N+1 (one `/photos` listing + one
`/image/metadata` per photo). Because `/image/metadata` serializes on
`Data<Mutex<dyn ExifDao>>`, that pattern can take 10s+ for windows with
hundreds of photos. The new endpoint takes one mutex acquisition for
the whole batch.

Response shape:
  { photos: [
      { file_path, library_id, library_name,
        camera_model, width, height,
        gps_latitude, gps_longitude, date_taken } ],
    total: N }

Two notes on scope:
- Photos with NULL `date_taken` are excluded by `query_by_exif`'s
  semantics. Filename-extracted dates are not synthesized here; rare
  callers that need that fallback can still hit `/image/metadata`.
- GPS columns are stored as f32 in image_exif to keep row size small;
  the JSON shape widens to f64 so clients don't have to know about the
  on-disk precision.

Library names are pre-mapped from `app_state.libraries` once and
stamped on each row, avoiding an O(rows × libraries) linear scan.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 16:38:53 -04:00
cameron 9cf3af383d Merge pull request '006-bin-cleanup-and-progress' (#57) from 006-bin-cleanup-and-progress into master
Reviewed-on: #57
2026-04-27 20:28:32 +00:00
Cameron b9d5578653 feat(bins): multi-library populate_knowledge + progress UX
populate_knowledge now loads real libraries from the DB instead of
fabricating a single library_id=1 row from BASE_PATH. Adds --library
<id|name> to restrict the walk and validates --path against the selected
library roots. The full library set is still passed to InsightGenerator so
resolve_full_path can probe every root when an insight resolves to a
different library than the one being walked.

Adds indicatif progress bars across the long-running utility binaries via
a shared src/bin_progress.rs helper (determinate bar + open-ended spinner
with consistent styling). Per-batch info! noise is replaced by the bar's
throughput/ETA; warnings and errors route through pb.println so they
scroll above the bar instead of fighting with it.

  populate_knowledge   spinner during scan, determinate bar over all libs
  backfill_hashes      spinner with running hashed/missing/errors counts
  import_calendar      determinate bar; embedding/store failures inline
  import_location_*    determinate bar advancing by chunk size
  import_search_*      determinate bar; pb cloned into the spawn task
  cleanup_files P1     determinate bar over DB paths
  cleanup_files P2     determinate bar; pb.suspend() around y/n/a/s prompt

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 23:55:33 -04:00
Cameron d5f944c7b6 chore(bins): retire unused migrate_exif
Single-library hardcoded (library_id=1) and missing content_hash/size_bytes
backfill, so the watcher's full-scan path subsumes everything it does.
Removed the binary and its CLAUDE.md reference.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 23:55:06 -04:00
cameron 2db611e1c1 Merge pull request 'OpenRouter Support, Insight Chat and User injection' (#56) from 005-llm-client-trait into master
Reviewed-on: #56
2026-04-26 23:01:33 +00:00
Cameron 21e624da6b fix(video): sentinel for failed HLS encodes to stop retry loop
Previously a corrupt source (e.g. truncated mp4 with no moov atom) would be
re-queued on every directory scan: cleanup_partial_hls wipes the temp
playlist on ffmpeg failure, leaving no .m3u8 to short-circuit the next pass.

Mirrors the thumbnail .unsupported sentinel pattern: on ffmpeg failure,
write <playlist>.m3u8.unsupported, and treat its presence as "done" in both
the ScanDirectoryMessage filter and the QueueVideosMessage check. Delete
the sentinel to force a retry.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 01:06:13 -04:00
Cameron 021d1bffc0 chore: ignore db backups and local .idea config files
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 19:13:28 -04:00
Cameron fa21b0d73d chore(ai): disable default few-shot insight ids
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 19:12:25 -04:00
Cameron 0e55a6b125 fix(ai): treat rewind at end of history as no-op success
The mobile client's regenerate-after-failure flow sends a discard index
equal to the server's rendered count (its optimistic user bubble for the
failed turn was never persisted). find_raw_cut treated this as out of
range, surfacing as "Chat rewind failed: discard_from_rendered_index out
of range" and blocking the retry.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 19:12:17 -04:00
Cameron 0ebc2e9003 feat(ai): rerank timing + think:false + OpenRouter error detail
- search_rag reranker now logs wall-clock time around the ollama.generate
  call, the candidate count / top-N going in, and the final reordering.
  The "final indices" + swap-count line is info level so it's always
  visible; detailed before/after previews stay at debug for when you want
  to inspect reranker quality.
- New OllamaClient::generate_no_think convenience that sets Ollama's
  top-level think:false on the request, plumbed through try_generate via
  a new internal generate_with_options. Used only by the reranker today;
  avoids the chain-of-thought tax on reasoning models (Qwen3/VL,
  DeepSeek-R1 distills, GPT-OSS) when the task has nothing to reason
  about. Server-side no-op on non-reasoning models.
- OpenRouter chat_with_tools "missing choices[0]" error now includes the
  actual response body — extracts structured {error: {code, message}}
  when OpenRouter surfaces it (common for upstream-provider issues like
  rate limits and content moderation), otherwise falls back to a
  truncated raw-JSON view.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 16:19:45 -04:00
Cameron e5781325c6 fix(ai): render tool-call arguments as compact JSON in logs
Switch the "Agentic tool call" log from {:?} (Debug) to {} (Display) on
serde_json::Value. Display produces compact JSON — `{"date":"2023-08-15"}`
instead of `Object {"date": String("2023-08-15")}` — which is what the
model actually sent and what a human reading the log wants to see.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 14:25:53 -04:00
Cameron d43f5fc991 docs: document OLLAMA_REQUEST_TIMEOUT_SECONDS env var
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 13:54:23 -04:00
Cameron f0ae9f95dc feat(ai): few-shot exemplars + sticky Ollama preference
- Few-shot injection on /insights/generate/agentic: compresses prior
  training_messages into trajectory blocks (tool calls + result summaries)
  and injects into the system prompt. Hardcoded default ids with optional
  request override.
- New fewshot_source_ids column on photo_insights (+ migration) to track
  which exemplars influenced a given row, for downstream training-set
  filtering. Chat amend rows stamp None with a lineage note.
- Ollama client now remembers which server (primary/fallback) most
  recently succeeded and tries it first on the next call, via a shared
  Arc<AtomicBool>. Avoids re-404ing the primary on every agent iteration
  when the chosen model only lives on the fallback.
- Demote noisy logs: daily_summary "Summary match" lines to debug;
  inner chat_with_tools non-2xx body log from error to warn (outer
  layer owns the terminal-error signal).
- Drift-guard tests for summarize_tool_result covering the success /
  empty / error / unknown shape for every tool.
- Tidy: three pre-existing clippy warnings cleaned up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 13:54:06 -04:00
Cameron 29f32b9d22 FFMPEG playlist improvements
Better playlist management, .tmp renaming, HLS playlist parameter and concurrency tweaking.
2026-04-24 10:08:03 -04:00
Cameron 13b9d54861 fix(scan): quiet startup scans & thumbnail RAW/HEIC
Three recurring issues on every full scan:

1. Video playlist scans re-enqueued every file only to reject it as
   AlreadyExists. Pre-filter in ScanDirectoryMessage and QueueVideosMessage
   so we skip videos whose .m3u8 already exists, and demote the leaked
   AlreadyExists log to debug.

2. image crate was built with only jpeg/png features, so webp/tiff/avif
   files logged "format not supported" every scan. Enable those features.

3. RAW (ARW/NEF/CR2/...) and HEIC thumbnails weren't generated, so the
   scan kept retrying them. Try the file's embedded JPEG preview via
   kamadak-exif first (fast, pure-Rust, works on Sony ARW where ffmpeg's
   TIFF decoder fails). Fall back to ffmpeg for HEIC/HEIF and RAWs with
   no preview. Anything still undecodable gets a <thumb>.unsupported
   sentinel so future scans skip it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 20:47:13 -04:00
Cameron dc2a96162e fix(dates): prefer earliest of fs created/modified as fallback
On copied or restored files (e.g. a backup library), the OS stamps
created at copy time while modified is preserved from the source, so
the earlier of the two is a better proxy for when the content
originated. Adds utils::earliest_fs_time and threads it through the
three spots that fall back to filesystem dates: photos-list sort,
memories grouping, and insight-generation timestamp.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 17:20:12 -04:00
Cameron d54419e779 style: cargo fmt drift
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 17:19:59 -04:00
Cameron aa651d1c7b feat(ai): iteration budget in prompt + preserve photo-knowledge links
- Inject the max-iterations budget into the agentic system prompt for
  both insight generation and chat turns. Chat does this per-turn by
  appending a note to the replayed system message and restoring it
  before persistence so the note doesn't accumulate across turns.
- Stop deleting entity_photo_links at the start of agentic insight
  generation. The clear made recall_facts_for_photo always return
  empty, wasting a tool call and discarding knowledge from prior runs.
  Re-linking the same entity is already an INSERT OR IGNORE no-op.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 16:28:48 -04:00
Cameron 6831f50993 feat(ai): USER_NAME env + shared summary prompt + test-bin knobs
Introduces USER_NAME (default "Me") as the single source for the message
sender label and the first-person persona across daily summaries, SMS
context, insight generation, and chat. Eliminates the "Me:" transcript /
"what I did" ambiguity that confused smaller models, and unhardcodes
"Cameron" from prompt text + the knowledge-graph owner entity. Set
USER_NAME=Cameron in .env to preserve the existing owner entity row
(keyed on UNIQUE(name, entity_type)) — otherwise the next run creates
a fresh owner entity and orphans the existing facts/photo-links.

Also:
- search_messages redirect: when the model calls it with date/contact
  but no query, return a hint pointing at get_sms_messages instead of
  a bare missing-parameter error (prevents same-turn retry loops)
- sharpen search_messages vs get_sms_messages tool descriptions so
  content-vs-time-based intent is unambiguous
- extract build_daily_summary_prompt (+ DAILY_SUMMARY_MESSAGE_LIMIT,
  DAILY_SUMMARY_SYSTEM_PROMPT) shared by daily_summary_job and
  test_daily_summary binary — prompt tweaks now land in both
- EMBEDDING_MODEL const; fixes both insert sites that stored
  "mxbai-embed-large:335m" while generate_embeddings actually runs
  "nomic-embed-text:v1.5"
- test_daily_summary: add --num-ctx / --temperature / --top-p /
  --top-k / --min-p flags wired into OllamaClient setters, and print
  the configured knobs at the top of each run
- OllamaClient::generate now logs prompt/gen token counts and tok/s
  via log_chat_metrics (symmetric with chat_with_tools)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 23:39:37 -04:00
Cameron e4a3536f87 feat(ai): search_messages tool + RAG reranker
Adds a search_messages tool that hits the Django FTS5/semantic/hybrid
endpoint for keyword-quality text search over message bodies, and an
LLM-based reranker inside tool_search_rag (gated by SEARCH_RAG_RERANK,
default on). Reranker pulls ~3x candidates from the vector index, asks
the chat model to rank by relevance, and falls back to vector order on
parse failure.

The reranker shares the active chat turn's OllamaClient so num_ctx and
sampling match — otherwise Ollama unloads/reloads the model on every
rerank call. (Unverified end-to-end; caught by inspection, awaiting
e2e confirmation.)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 10:56:03 -04:00
Cameron e51cd564a3 docs: chat continuation endpoints + env vars
Document the four new chat endpoints, SSE event shape, backend
routing rules, rewind semantics, amend mode, and the
AGENTIC_CHAT_MAX_ITERATIONS cap.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 17:32:43 -04:00
Cameron 079cd4c5b9 feat(ai): streaming chat endpoint with live tool events
Add LlmClient::chat_with_tools_stream and SSE endpoint
POST /insights/chat/stream that emits text deltas, tool_call /
tool_result pairs, truncated notice, and a terminal done frame as the
agentic loop runs.

- Ollama: parses NDJSON from /api/chat stream, accumulates content
  deltas, emits Done with tool_calls from the final chunk.
- OpenRouter: parses OpenAI-compatible SSE, reassembles tool_call
  argument deltas by index, asks for stream_options.include_usage.
- InsightChatService spawns the loop on a tokio task, feeds events
  through an mpsc channel, persists training_messages at the end.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 16:57:41 -04:00
Cameron c2bd3c08e1 feat(ai): surface tool invocations in chat history
load_history now groups preceding tool_call + tool_result scaffolding
under each assistant reply as `tools: [{name, arguments, result}]`.
Result bodies over 2000 chars are truncated for payload size with a
`result_truncated` flag; the full value remains in training_messages.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 16:03:53 -04:00
Cameron 65ab10e9a8 feat(ai): chat rewind + ollama metrics logging
Rewind: POST /insights/chat/rewind truncates training_messages at a
given rendered index, dropping the target message plus any preceding
tool-call scaffolding. The initial user prompt is protected.

Metrics: log prompt_eval_count/duration and eval_count/duration from
every Ollama chat response, rendered as tokens + ms + tok/s.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 15:16:32 -04:00
Cameron 0b9528f61e feat(ai): chat continuation for photo insights (server v1)
Adds POST /insights/chat and GET /insights/chat/history. Replays the
stored agentic conversation through the same backend the insight was
generated with (or a per-turn override), runs a short tool-calling
loop, and persists the extended history in append or amend mode.

Backend switching: same-backend or hybrid->local replay verbatim;
local->hybrid is rejected in v1 (would require on-the-fly vision
description rewrite).

Per-(library, file) async mutex serialises concurrent turns. Soft
context budget drops oldest tool_call+result pairs when the
serialized history exceeds num_ctx - 2048 tokens.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 13:00:27 -04:00
Cameron e2eefbd156 feat(ai): curated OpenRouter model picker for hybrid backend
Add OPENROUTER_ALLOWED_MODELS env var and GET /insights/openrouter/models
endpoint returning the curated list verbatim. Drop the live capability
precheck in hybrid mode — trust the operator's allowlist; bad ids surface
as a chat-call error.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 10:36:19 -04:00
Cameron 3ac0cd62eb feat(ai): hybrid backend mode for agentic insights
Adds a `backend` column to photo_insights (default 'local', migration
2026-04-20-000000) and a corresponding optional `backend` field on the
agentic request. When a request sets backend=hybrid:

- The local Ollama vision model is called once via describe_image to
  produce a text description.
- The description is inlined into the first user message as text —
  no base64 image is ever sent to the chat model.
- The agentic tool-calling loop and title generation route through an
  OpenRouterClient (dispatched via &dyn LlmClient), letting the user
  pick any tool-capable model from OpenRouter per request.
- describe_photo is removed from the offered tools since the description
  is already present.

Embeddings and vision stay on local Ollama regardless of backend.
Hybrid mode requires OPENROUTER_API_KEY; handlers return a clear error
when hybrid is requested without it, and also when the selected
OpenRouter model lacks tool-calling support.

AppState gains an optional openrouter client built from
OPENROUTER_API_KEY / OPENROUTER_BASE_URL / OPENROUTER_DEFAULT_MODEL /
OPENROUTER_EMBEDDING_MODEL / attribution headers. Default model is
anthropic/claude-sonnet-4.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 22:30:40 -04:00
Cameron e799ba716c feat(ai): add OpenRouterClient implementing LlmClient
OpenAI-compatible client for OpenRouter. Translates canonical wire shapes at
the boundary: tool-call arguments stringify on send / parse on receive
(accepting both string and native-object forms); images rewritten from the
base64 images field into content-parts with image_url entries; role=tool
messages inherit tool_call_id from the preceding assistant's tool calls.

/models parsed into ModelCapabilities via supported_parameters (tool use)
and architecture.input_modalities (vision). 15-minute capabilities cache.
Bearer auth; HTTP-Referer / X-Title attribution headers optional.

Not wired into request routing yet — first consumer arrives with hybrid
backend mode. 11 unit tests cover the translation helpers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 22:18:29 -04:00
Cameron 0073409b3d refactor: introduce LlmClient trait (no-op)
Preparation for a second LLM backend (OpenRouter) and hybrid vision-local /
chat-remote mode. Shared wire types (ChatMessage, Tool, ToolCall, etc.) move
into a new src/ai/llm_client.rs and are re-exported from ollama.rs so
existing imports keep working. OllamaClient now implements LlmClient.

No behavior change; callers still hold the concrete OllamaClient. Caller
migration to Arc<dyn LlmClient> is deferred to the PR that wires hybrid
backend routing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 22:11:05 -04:00
cameron 702aa8078c Merge pull request '004 Multi-library Support' (#54) from 004-multi-library into master
Reviewed-on: #54
2026-04-21 01:55:22 +00:00
Cameron bffe604527 Remove potentially confusing TZ from insight generator 2026-04-21 01:55:07 +00:00
Cameron 39c212b0e6 Bump to 1.0.0 for multi-library support 2026-04-21 01:55:07 +00:00
Cameron a35b45fd36 feat: expand insight tool result caps and render timestamps in local time
Doubled default row caps for search_rag/get_sms_messages/get_calendar_events/recall_entities and exposed an optional `limit` parameter on each so the agent can tune per call. Render all LLM-facing timestamps as server-local time with explicit offset so smaller models stop misreading UTC as wall-clock time.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 01:55:07 +00:00
Cameron 3027a3ffda perf: DB-backed recursive /photos + watcher reconciliation
Recursive listings now query image_exif instead of walking disk, taking
union-mode /photos from ~17s to sub-second on a 10k-file library. The
watcher's full scan prunes stale image_exif rows so the DB stays in
parity with the filesystem when files are deleted externally.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 01:55:07 +00:00
Cameron 4a775b5e9b test: cover resolve_library_param and per-library ExifDao filter
Adds 9 unit tests around the library plumbing:
- resolve_library_param branches (absent, empty/whitespace, numeric id,
  name, unknown id, unknown name)
- Library::resolve symmetry with strip_root
- ExifDao::get_all_with_date_taken in union and scoped modes

Introduces SqliteExifDao::from_connection test constructor mirroring the
existing preview_dao pattern so DAO tests can drive an in-memory SQLite.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 01:55:07 +00:00
Cameron b04dd8b601 fix: demote path-not-exists validation errors to debug
The /image cross-library fallback tries the resolved library first and falls
back to any library holding the rel_path. The first attempt emitted error-level
noise on every grid tile in union mode. Split the validation error so only
traversal attempts log at error; missing-file cases log at debug.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 01:55:07 +00:00
Cameron 2c8de8dcc6 feat: union /photos and /memories across libraries
When `library` is omitted, both endpoints now walk every configured
library root, interleave the results, and tag each row with its source
library via the parallel `photo_libraries` / per-row `library_id`
arrays. Previously the handlers fell back to the primary library,
silently hiding the rest.

Threads a parallel `file_libraries: Vec<i32>` through the sort/paginate
helpers so library attribution survives sorting and pagination.
Directory names are de-duplicated across libraries.

`get_all_with_date_taken` grows an optional library filter so memories
can scope its EXIF query per-library during the union walk.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 01:55:07 +00:00
Cameron 586b735af5 feat: include per-photo library id in /photos response
Adds a parallel `photo_libraries: Vec<i32>` array alongside `photos`
in `PhotosResponse` so clients can render per-thumbnail badges.
Populated with the scoped library id at the two main return sites;
left empty for `/favorites` since favorites are library-agnostic.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 01:55:07 +00:00
Cameron c2ee3996be chore: apply cargo fmt + clippy cleanup across crate
Silence forward-looking dead_code on unused DAO modules, annotate
individual placeholder items, rewrite tautological assert!(true/false)
in token tests as panic! arms, and pick up fmt drift.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 01:55:07 +00:00
Cameron a0f3bfab5f fix: validate gps-summary path against every library
The /photos/gps-summary handler validated the incoming path against
the primary library's root with new_file=false, which requires the
path to exist on disk. For a viewer opened on a file from a
non-primary library, tapping the GPS link produced activePath =
<folder from lib 2>, the primary-only check failed, and the server
400'd — so the map came up empty.

Validation here is purely a traversal guard (the DAO does a prefix
LIKE against rel_path), so we now accept the path as long as any
configured library can resolve it without escaping its root.

Also applies cargo fmt drift on files touched this session.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 01:55:07 +00:00
Cameron 54a1df60b8 fix: resolve preview clip rel_path against all libraries
PreviewClipGenerator stripped a single base_path, so videos in a
non-primary library ended up with the absolute path as 'relative'.
On Windows, PathBuf::from(preview_clips_dir).join(absolute) replaces
with the absolute path, and .with_extension("mp4") on a .mp4 input
yields the input path — ffmpeg then errors out with 'cannot edit
existing files in place'.

The generator now holds Vec<Library> and strips whichever root
actually contains the video, with separator normalization to match
the rest of the code.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 01:55:07 +00:00
Cameron 7becbc0737 fix: normalize rel_path separators in non-recursive /photos listing
On Windows, strip_prefix preserves backslashes, so the non-recursive
branch was looking up tags for 'Melissa\img1.jpg' while tagged_photo
stores 'Melissa/img1.jpg' — every file was filtered out. Normalize to
'/' to match the watcher and populate_knowledge.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 01:55:07 +00:00
Cameron e6ee38edec fix: resolve media across libraries for video, metadata, and insights
The /video/generate and /image/metadata handlers assumed files live under
the resolved library only, which broke when a mobile client passed no
library (union mode) but the file lived in a non-primary library. Both
now fall back to scanning every configured library for an existing file.

InsightGenerator held a single base_path, so vision-model loads and
filename-date fallbacks failed for non-primary libraries. It now takes
Vec<Library> and probes each root in resolve_full_path.

/image/metadata responses now carry library_id/library_name so the
mobile viewer can surface which library a file belongs to.

Thumbnail generation at startup is now spawned on a background thread
so the HTTP server can accept traffic while large libraries backfill.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 01:55:07 +00:00
Cameron 2d942a9926 feat: content-hash-aware tag/insight sharing + library scoping
Tags and insights now follow content across libraries via content_hash
lookups on the read path, so the same file indexed at different rel_paths
in multiple libraries shares its annotations. Recursive tag search scopes
hits to the selected library by checking each tagged rel_path against
the library's disk (with a content-hash sibling fallback so tags attached
under one library's rel_path still match a content-equivalent file in
another). The /image and /image/metadata handlers fall back across
libraries when the file isn't under the resolved one, so union-mode
search results (which carry no library attribution in the response)
still serve correctly.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-21 01:55:07 +00:00
Cameron c01a0479b7 fix: honor library param in /image, /photos, /memories
The Phase 3 plumbing accepted `library=` but didn't actually route
requests through the scoped library once it was resolved. Three
concrete bugs surfaced when testing against a second mounted library:

- `/image` always resolved paths against AppState.base_path (primary),
  so thumbnails for non-primary libraries 400'd when their rel_paths
  didn't exist under primary. Now resolves against the scoped library
  and defaults to primary when the param is omitted.

- `/memories` walked the scoped library correctly but its helper
  functions hardcoded `library_id: PRIMARY_LIBRARY_ID` on every
  MemoryItem, causing clients to route thumbnails back to primary
  regardless of which library the memory actually came from.

- `/photos` non-recursive listing delegated to a `RealFileSystem`
  constructed from AppState.base_path at startup, so walks always
  hit primary even when `library=2` was passed. The non-primary
  path now uses list_files against the scoped library's root;
  primary still goes through FileSystemAccess to preserve the
  existing test mock plumbing.

Also adds `library` to ThumbnailRequest so the /image query param
is actually parsed.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-21 01:55:07 +00:00
Cameron 0aaea91cc2 feat: add content_hash backfill + register every media file
Adds blake3 content hashing as the basis for derivative dedup
(thumbnails, HLS) across libraries. Computed inline by the watcher on
ingest and by a new `backfill_hashes` binary for historical rows.

Key changes:
- `content_hash` and `size_bytes` are now populated on new image_exif
  rows; a new ExifDao surface (`get_rows_missing_hash`,
  `backfill_content_hash`, `find_by_content_hash`) supports backfill and
  future hash-keyed lookups.
- The watcher now registers every image/video in image_exif, not just
  files with parseable EXIF. EXIF becomes optional enrichment; videos
  and other non-EXIF files still get a hashed row. This also makes
  DB-indexed sort/filter cover the full library.
- `/image` thumbnail serve dual-looks up hash-keyed path first, then
  falls back to the legacy mirrored layout.
- Upload flow accepts `?library=` query param + hashes uploaded files.
- Store_exif logs the underlying Diesel error on insert failure so
  constraint violations surface instead of hiding behind a generic
  InsertError.
- New migration normalizes rel_path separators to forward slash across
  all tables, deduplicating any rows that collide after normalization.
  Fixes spurious UNIQUE violations from mixed backslash/forward-slash
  paths on Windows ingest.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-21 01:55:07 +00:00
Cameron ce5b337582 feat: make file watcher, thumbnails, and upload library-aware
`watch_files` and `create_thumbnails` now iterate every configured
library, tagging rows with the correct `library_id`. `process_new_files`
takes a `&Library` so InsertImageExif no longer hardcodes the primary
library. Upload accepts an optional `library` query param to pick a
target library; omitted still defaults to primary for backwards
compatibility.

Hash-keyed thumbnail/HLS storage with dual-lookup fallback is deferred
to Phase 5, where it's bundled with the content hash backfill that
actually makes the hash-keyed paths meaningful. Until hashes are
populated, the legacy mirrored layout is a no-op to change.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-21 01:55:07 +00:00
Cameron 48e5de6eab feat: add GET /libraries and library query param plumbing
New `/libraries` endpoint returns configured libraries so clients can
discover them. `FilesRequest` and `MemoriesRequest` gain an optional
`library` param (accepts name or numeric id). Unknown values are
rejected with 400; absent values span all libraries. `/memories`
now scopes its filesystem walk + EXIF query to the resolved library.
`MemoryItem` carries `library_id` so union-mode clients can render a
per-item source badge.

Behavior is unchanged in single-library mode: omitting `library` still
returns results from the primary library, which is the only one
configured until a second row is added to the libraries table.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-21 01:55:07 +00:00
Cameron ffcddbb843 feat: multi-library foundation (schema + libraries module)
Adds a `libraries` registry table and threads library_id through
per-instance metadata tables (image_exif, photo_insights,
entity_photo_links, video_preview_clips). File-path columns renamed to
rel_path to make the relative-to-root semantics explicit. Adds
content_hash + size_bytes on image_exif to support future hash-keyed
thumbnail/HLS dedup. Tags and favorites stay library-agnostic so they
share across libraries by rel_path.

Behavior is unchanged: a single primary library (id=1) is seeded from
BASE_PATH on first boot; all handlers and DAOs route through it as a
transitional shim until the API gains a library query param.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-21 01:55:07 +00:00
cameron 2f4edba08c Merge pull request '003-knowledge-memory' (#55) from 003-knowledge-memory into master
Reviewed-on: #55
2026-04-21 01:54:34 +00:00
148 changed files with 46099 additions and 5420 deletions
+3
View File
@@ -0,0 +1,3 @@
[target.x86_64-unknown-linux-gnu]
linker = "/usr/bin/gcc"
rustflags = ["-C", "link-arg=-fuse-ld=mold"]
+141
View File
@@ -0,0 +1,141 @@
# ImageApi configuration template. Copy to `.env` and fill in for your
# deploy. Comments mirror the canonical docs in CLAUDE.md — see there
# for the full picture (especially the AI-Insights / Apollo / face
# integration sections).
# ── Required ────────────────────────────────────────────────────────────
DATABASE_URL=./database.db
BASE_PATH=/path/to/media
THUMBNAILS=/path/to/thumbnails
VIDEO_PATH=/path/to/video/hls
GIFS_DIRECTORY=/path/to/gifs
PREVIEW_CLIPS_DIRECTORY=/path/to/preview-clips
BIND_URL=0.0.0.0:8080
CORS_ALLOWED_ORIGINS=http://localhost:3000
SECRET_KEY=replace-me-with-a-long-random-secret
RUST_LOG=info
# ── File watching ───────────────────────────────────────────────────────
# Quick scan = recently-modified-files only; full scan = comprehensive walk.
WATCH_QUICK_INTERVAL_SECONDS=60
WATCH_FULL_INTERVAL_SECONDS=3600
# Comma-separated path prefixes / component names to skip in /memories
# AND in face detection (e.g. @eaDir, .thumbnails, /private).
EXCLUDED_DIRS=
# ── Video / HLS ─────────────────────────────────────────────────────────
HLS_CONCURRENCY=2
HLS_TIMEOUT_SECONDS=900
PLAYLIST_CLEANUP_INTERVAL_SECONDS=86400
# ── Telemetry (release builds only) ─────────────────────────────────────
# OTLP_OTLS_ENDPOINT=http://localhost:4317
# ── AI Insights — Ollama (local LLM) ────────────────────────────────────
OLLAMA_PRIMARY_URL=http://localhost:11434
OLLAMA_PRIMARY_MODEL=nemotron-3-nano:30b
# Optional fallback server tried on connection failure.
# OLLAMA_FALLBACK_URL=http://server:11434
# OLLAMA_FALLBACK_MODEL=llama3.2:3b
OLLAMA_REQUEST_TIMEOUT_SECONDS=120
# Cap on tool-calling iterations per chat turn / agentic insight.
AGENTIC_MAX_ITERATIONS=6
AGENTIC_CHAT_MAX_ITERATIONS=6
# ── AI Insights — OpenRouter (hybrid backend, optional) ─────────────────
# Set OPENROUTER_API_KEY to enable the hybrid backend (vision stays
# local on Ollama, chat routes to OpenRouter).
# OPENROUTER_API_KEY=sk-or-...
# OPENROUTER_DEFAULT_MODEL=anthropic/claude-sonnet-4
# OPENROUTER_ALLOWED_MODELS=openai/gpt-4o-mini,anthropic/claude-haiku-4-5,google/gemini-2.5-flash
# OPENROUTER_BASE_URL=https://openrouter.ai/api/v1
# OPENROUTER_EMBEDDING_MODEL=openai/text-embedding-3-small
# OPENROUTER_HTTP_REFERER=https://your-site.example
# OPENROUTER_APP_TITLE=ImageApi
# ── AI Insights — local backend switch ──────────────────────────────────
# Picks which local LLM stack the server uses for chat, vision describe,
# and embeddings. `ollama` (default) uses the OLLAMA_* settings above;
# `llamacpp` uses the LLAMA_SWAP_* settings below. The switch is global
# and applies to both `backend=local` and `backend=hybrid` (hybrid keeps
# chat on OpenRouter but still uses this stack for the describe pass).
# Don't flip mid-deploy without re-embedding existing index rows —
# mixed vector spaces break similarity search.
# LLM_BACKEND=ollama
# ── AI Insights — llama.cpp / llama-swap (optional) ─────────────────────
# Set LLAMA_SWAP_URL plus LLM_BACKEND=llamacpp to swap the local stack
# off Ollama. Talks OpenAI-compatible /v1 to a llama-swap proxy fronting
# per-slot llama-server instances. Chat models receive images directly
# via content-parts (vision-capable models assumed); a separate vision
# slot is used only by the describe_photo tool and describe-image utility.
# LLAMA_SWAP_URL=http://localhost:9292/v1
# LLAMA_SWAP_PRIMARY_MODEL=chat
# Optional dedicated vision slot for describe_image. Defaults to
# PRIMARY_MODEL so describe_photo works without extra config.
# LLAMA_SWAP_VISION_MODEL=vision
# LLAMA_SWAP_EMBEDDING_MODEL=embed
# Comma-separated allowlist surfaced by /insights/models when
# LLM_BACKEND=llamacpp. All report has_vision=true.
# LLAMA_SWAP_ALLOWED_MODELS=chat,vision,embed
# LLAMA_SWAP_REQUEST_TIMEOUT_SECONDS=180
# ── Text-to-speech (optional, requires LLAMA_SWAP_URL) ───────────────────
# TTS routes through the same llama-swap proxy (a Chatterbox model id), so it
# only needs LLAMA_SWAP_URL — it does NOT require LLM_BACKEND=llamacpp.
# Powers POST /tts/speech and the /tts/voices* endpoints (read-aloud insights
# + voice cloning in the mobile app).
# LLAMA_SWAP_TTS_MODEL=chatterbox # TTS model id in config.yaml
# LLAMA_SWAP_TTS_VOICE=m # default voice when a request omits one
# LLAMA_SWAP_TTS_REF_SECONDS=30 # max voice-clone reference clip length (s)
# LLAMA_SWAP_TTS_REQUEST_TIMEOUT_SECONDS=600 # synth timeout (long chunked text)
# ── AI Insights — sibling services (optional) ───────────────────────────
# Apollo (places, face inference, CLIP encoders). Single-Apollo deploys
# typically set only APOLLO_API_BASE_URL and let the face + CLIP
# clients fall back to it.
# APOLLO_API_BASE_URL=http://apollo.lan:8000
# APOLLO_FACE_API_BASE_URL=http://apollo.lan:8000
# APOLLO_CLIP_API_BASE_URL=http://apollo.lan:8000
# SMS_API_URL=http://localhost:8000
# SMS_API_TOKEN=
# Display name used in agentic prompts when the LLM refers to "you".
USER_NAME=
# ── Face detection (Phase 3+) ───────────────────────────────────────────
# Cosine-sim floor for auto-binding a detected face to an existing
# same-named person on detection. 0.4 ≈ moderate-confidence match.
FACE_AUTOBIND_MIN_COS=0.4
# Per-scan-tick fan-out into Apollo's detect endpoint. Apollo's GPU
# pool serializes server-side; this just overlaps file-IO with
# inference RTT.
FACE_DETECT_CONCURRENCY=8
# Per-detect HTTP timeout. CPU-only Apollo deploys may need higher.
FACE_DETECT_TIMEOUT_SEC=60
# Per-tick caps on the two backlog drains (independent of WATCH_*
# quick / full scans). Tune up if you have a large unscanned backlog
# and want it to clear faster; tune down if Apollo is overloaded.
FACE_BACKLOG_MAX_PER_TICK=64
FACE_HASH_BACKFILL_MAX_PER_TICK=2000
# ── CLIP semantic photo search ──────────────────────────────────────────
# ImageApi calls Apollo's /api/internal/clip/{encode_image,encode_text}
# to populate per-photo embeddings during the watcher's backlog drain
# and to encode user queries at /photos/search time. Disabled when
# neither APOLLO_CLIP_API_BASE_URL nor APOLLO_API_BASE_URL is set.
#
# Per-watcher-tick cap on the encode drain. Default 32 ≈ ~1 photo/sec
# on CPU, ~30 photos/sec on a single-GPU host (Apollo's threadpool
# is 1 on CUDA, so concurrency is bounded server-side regardless of
# our setting). Bump on a fresh deploy to clear the backlog faster.
CLIP_BACKLOG_MAX_PER_TICK=32
# Client-side parallel encode calls per drain pass. Apollo's GPU pool
# serializes server-side; this just overlaps file-IO with inference.
CLIP_ENCODE_CONCURRENCY=4
# Per-encode HTTP timeout. CPU-only Apollo deploys may need higher.
CLIP_REQUEST_TIMEOUT_SEC=60
# ── RAG / search ────────────────────────────────────────────────────────
# Set to `1` to enable cross-encoder reranking on /search results.
SEARCH_RAG_RERANK=0
+9
View File
@@ -0,0 +1,9 @@
# Normalize line endings in the repo to LF. Windows checkouts can still
# present working-copy files as CRLF; this just keeps the committed history
# stable so contributors on any OS don't see whitespace-only diffs every
# time someone touches a file.
* text=auto eol=lf
# Migrations and SQL must be LF — SQLite parsers don't care, but diffing
# is much cleaner with stable endings.
*.sql text eol=lf
+9
View File
@@ -1,12 +1,21 @@
/target
database/target
*.db
*.db.bak
*.db-shm
*.db-wal
.env
# Server-local TTS pronunciation overrides (tts_pronunciations.example.json is the template)
/tts_pronunciations.json
/tmp
/docs
/specs
# Default ignored files
.idea/shelf/
.idea/workspace.xml
.idea/inspectionProfiles/
.idea/markdown.xml
# Datasource local storage ignored files
.idea/dataSources*
.idea/dataSources.local.xml
+593 -5
View File
@@ -69,9 +69,6 @@ cargo fix
```bash
# Two-phase cleanup: resolve missing files and validate file types
cargo run --bin cleanup_files -- --base-path /path/to/media --database-url ./database.db
# Batch extract EXIF for existing files
cargo run --bin migrate_exif
```
## Architecture Overview
@@ -79,7 +76,10 @@ cargo run --bin migrate_exif
### Core Components
**Layered Architecture:**
- **HTTP Layer** (`main.rs`): Route handlers for images, videos, metadata, tags, favorites, memories
- **Startup wiring** (`main.rs`): only ~350 lines — env load, migrations, AppState, route registration, server bind. Background jobs are kicked off here but defined elsewhere.
- **HTTP Layer** (`handlers/{image,video,favorites}.rs`, `files.rs`, `tags.rs`, `faces.rs`, `memories.rs`, `ai/handlers.rs`): the route handlers, grouped by domain.
- **Background loops** (`watcher.rs`): the file-watcher tick (`watch_files`, `process_new_files`) and the orphaned-playlist cleanup (`cleanup_orphaned_playlists`). Per-tick drains are factored into `backfill.rs` (`backfill_unhashed_backlog`, `backfill_missing_date_taken`, `backfill_missing_content_hashes`, `process_face_backlog`, `build_face_candidates`).
- **Thumbnails** (`thumbnails.rs`): generation pipeline + the `IMAGE_GAUGE` / `VIDEO_GAUGE` Prometheus metrics.
- **Auth Layer** (`auth.rs`): JWT token validation, Claims extraction via FromRequest trait
- **Service Layer** (`files.rs`, `exif.rs`, `memories.rs`): Business logic for file operations and EXIF extraction
- **DAO Layer** (`database/mod.rs`): Trait-based data access (ExifDao, UserDao, FavoriteDao, TagDao)
@@ -107,6 +107,242 @@ All database access goes through trait-based DAOs (e.g., `ExifDao`, `SqliteExifD
- `query_by_exif()`: Complex filtering by camera, GPS bounds, date ranges
- Batch operations minimize DB hits during file watching
### Multi-library data model
ImageApi supports more than one library (a library = a `(name, root_path)`
row in the `libraries` table that maps to a mounted directory tree). The
same bytes may exist under more than one library — typical case is an
"active" library plus an "archive" library that ingests files as they age
out — and the data model is designed so that derived data follows the
**bytes**, not the path, while user-managed data does the same.
**The principle.** A photo's identity is its `content_hash` (blake3, see
`src/content_hash.rs`). Anything we compute from or attach to a photo is
keyed on that hash so it survives:
- the same file appearing in a second library (backup / archive / mirror),
- the file moving between libraries (recent → archive handoff),
- the file moving within a library (re-organized rel_path),
- intra-library duplicates (same bytes at two paths).
**Table classification.** Three categories drive the keying decision:
| Category | Key | Rationale | Tables |
|---|---|---|---|
| Intrinsic to bytes | `content_hash` | Rerunning is wasted work (or LLM cost) | `face_detections` ✓, `image_exif` (target), `photo_insights` (target), `video_preview_clips` (target) |
| User intent about a photo | `content_hash` | "Tag this photo" means the bytes, not a path | `tagged_photo` (target), `favorites` (target) |
| Library administrative | `(library_id, rel_path)` | Tied to a specific filesystem location | `libraries`, `entity_photo_links`, the `rel_path` back-ref columns on hash-keyed tables |
✓ = already implemented this way. *(target)* = today still keyed on
`(library_id, rel_path)` and slated for migration. The migration adds a
nullable `content_hash` column, populates it from `image_exif` where
known, and read paths fall back to rel_path while the hash is null.
**Carrying a `rel_path` even when hash-keyed.** Hash-keyed tables retain
`(library_id, rel_path)` columns as a denormalized **back-reference**, not
as the key. This lets a single query answer "what is at this path right
now" without joining through `image_exif`, and supports the path-only
endpoints that predate the hash. `face_detections` is the reference
implementation: hash is the truth, path is a hint.
**Merge semantics on read.** When the same hash has rows under more than
one library:
- Set-valued data (tags, favorites, faces, entity links) → **union**.
- Scalar data (current insight, EXIF row, video preview clip) → earliest
`generated_at` / `created_time` wins. The historical lib1 row beats a
re-generated lib2 row, so the user's curated insight isn't shadowed by
a re-run on archive ingest.
**Write attribution.** A new tag/favorite/insight created while viewing
under lib2 binds to the bytes, not to lib2 — so it shows up under lib1
too. This is by design, but it's the most surprising rule on first
encounter; clients should not assume tags are library-scoped.
**Hash-less rows (transitional state).** During and immediately after a
new mount, `image_exif.content_hash` is being populated by
`backfill_unhashed_backlog` (capped per tick). Rules during this window:
- Writes: if the hash is known, write hash-keyed. If not, write
`(library_id, rel_path)`-keyed and let the reconciliation job collapse
duplicates once the hash lands.
- Reads: prefer hash key, fall back to `(library_id, rel_path)`.
- Reconciliation: a one-shot pass after every backfill tick collapses
rows that now share a hash, applying the merge semantics above.
Idempotent — safe to re-run.
**Library handoff (recent → archive).** When a file moves between
libraries (e.g. operator moves `~/photos/2024/IMG.nef` to the archive
mount), the file watcher sees the disappearance under lib1 and the
appearance under lib2. Hash-keyed rows don't need migration; the
`(library_id, rel_path)` back-ref columns are updated to point to the new
location. Library administrative rows (`entity_photo_links`,
`(library_id, rel_path)` rows in `image_exif` for hash-less items) are
re-keyed by the move detector, which matches a disappearance to an
appearance by `content_hash` within a configurable window.
**Orphans (source deleted while a copy survives).** When the only
`image_exif` row for a hash is deleted (file removed from disk), the
hash-keyed derived rows survive **as long as another `image_exif` row
references the same hash**. If the last reference is gone, derived rows
are eligible for GC (deferred — the GC job runs on a slow schedule so
that a brief unmount or rename doesn't wipe history).
**Stats and counts.** When reporting "how many photos do you have," count
`DISTINCT content_hash` over `image_exif`, not row count. Faces stats
already does this (`FaceDao::stats` in `src/faces.rs`); other counters
should follow suit. Numerator and denominator must live in the same
domain — see the face-stats commentary below for the cautionary tale.
**Per-library scoping when the user asks for it.** A request scoped to
`?library=N` filters the `image_exif` view to that library, and the
hash-keyed derived data is joined through that view. The user sees only
photos that have a copy under lib N, but the derived data attached to
those photos is the merged hash-keyed view. This is the answer to "show
me archive photos with their original tags."
**Operator kill switch (`libraries.enabled`).** Setting `enabled=0` on a
library is a hard pause: the watcher skips it entirely — before the
probe, before ingest, before any maintenance pass — and the orphan-GC
all-online consensus check filters disabled libraries out (they don't
keep the GC window closed). Reads / serving are unaffected; nothing
prevents `/image?path=...` from resolving against a disabled library's
root if the file is on disk. The existing `image_exif` rows for a
disabled library are **not deleted** — they continue to anchor
hash-keyed derived data, so cross-library duplicates survive the
disable. Toggle via SQL; there is intentionally no HTTP endpoint for
library mutation (single-user tool, no role / permission story).
Typical workflows: stage a new mount with `enabled=0` then flip to `1`;
quiet a flaky NAS during maintenance without disturbing the rest of
the system.
**Per-library excludes (`libraries.excluded_dirs`).** A
comma-separated column, same shape as the global `EXCLUDED_DIRS` env
var, that's applied **in union** with the env-var globals when a
walker scans this library. Use case: mount a parent directory as a
new library while a sibling library covers a child subtree, and
exclude that child subtree from the parent so the two libraries
don't double-walk and double-write `image_exif`. Two entry forms
(parsed by `memories::PathExcluder`):
- `/sub/path` — leading slash flags it as a path under the library
root. Joins to root + matches by `path.starts_with(...)`. Works
at any depth (`/photos`, `/media/2024/raw`).
- `name` — no leading slash flags it as a component name to skip
anywhere in the tree (`@eaDir`, `.thumbnails`). Single segment
only — `media/photos/a` without a leading slash never matches
anything. Hash-keyed derived
data (faces, tags, insights) is unaffected either way — those
follow the bytes — but `image_exif` row count, walker CPU, and
thumbnail disk usage all drop to 1× instead of 2× for the overlap.
Affects: file-watch ingest (`process_new_files`), thumbnail
generation, media-count gauges, the orphaned-playlist cleanup walk,
and the `/memories` endpoint. The face-detection backlog drain
inherits via `face_watch::filter_excluded`. NULL = no extras (only
the global env var applies).
**Library availability and safety.** Libraries can be on network shares
or removable media; the file watcher must not interpret a temporary
unavailability as a mass-deletion event. Every tick begins with a
**presence probe** per library: the library is considered online iff
its `root_path` exists, is readable, and a top-level scan returns at
least one expected entry (or matches a recent file-count high-water
mark within a tolerance). The probe result gates which actions are safe
to run on that library this tick:
| Action | Requires online? |
|---|---|
| Quick / full scan ingest of new files | yes |
| EXIF / face / insight backlog drains | yes — but the work runs against any online library |
| Move-handoff detection (lib1 disappearance ↔ lib2 appearance match) | **both** libraries online |
| `(library_id, rel_path)` re-keying on detected move | **both** libraries online |
| Orphan GC of hash-keyed derived data | all libraries that have *ever* held the hash must be online and confirmed-clean for two consecutive ticks |
| Reads / serving | always allowed; falls back to whichever library is online |
A library that fails the probe enters a "stale" state: writes scoped to
it are paused, its rows are flagged stale (not deleted) in
`/libraries` status, and the watcher logs at `warn` once per
state-transition (not per tick). A library that recovers re-enters the
online set automatically; no operator action required for transient
outages. The intent is that pulling a USB drive, rebooting a NAS, or
losing a VPN never triggers a destructive code path — the worst case is
that derived-data work pauses until the share returns.
The same rule constrains the move-handoff matcher: a disappearance
under lib1 only counts as a "move" if there is a matching appearance
under another **online** library within the window. A bare
disappearance with no matching appearance is treated as
"unavailable-or-deleted, defer judgment" — it does not re-key any rows
and does not enqueue GC.
**Maintenance pipeline (`src/library_maintenance.rs`).** The watcher
runs three maintenance passes per tick that together implement the
move/handoff and orphan rules:
1. **Missing-file scan** — per online library, paginated. A page of
`image_exif` rows is loaded (`IMAGE_EXIF_MISSING_SCAN_PAGE_SIZE`,
default 500), each row's `(root_path/rel_path)` is `stat()`-ed,
and confirmed-not-found rows are deleted from `image_exif`
(capped at `IMAGE_EXIF_MISSING_DELETE_CAP_PER_TICK`, default 200).
Permission/IO errors are skipped, never deleted — only `NotFound`
triggers a deletion. The cursor wraps every time a partial page
comes back, so the whole library is swept across consecutive ticks.
Skipped wholesale for Stale libraries via the per-library probe
gate at the top of the loop iteration.
2. **Back-ref refresh** — DB-only. For `face_detections`,
`tagged_photo`, and `photo_insights`: any hash-keyed row whose
`(library_id, rel_path)` no longer matches an `image_exif` row
*but whose `content_hash` does* is repointed at the surviving
`image_exif` location. Idempotent SQL; no health gate needed.
This is what makes the recent → archive handoff invisible to
read paths: when the missing-file scan retires the lib-A row,
tags/faces/insights pivot to lib-B's path before any user
notices.
3. **Orphan GC** — destructive. Hash-keyed derived rows whose
`content_hash` no longer has any `image_exif` row are eligible.
Two-tick consensus: a hash must be observed orphaned on two
consecutive ticks AND every library must be online for both. A
single Stale tick within the window cancels all pending deletes.
The pending set is held in memory (`OrphanGcState`) — restart
resets it, which only delays a delete, never causes one. Tags,
faces, and insights for orphaned hashes are deleted in one batch
per tick.
A backup library that briefly disappears, then returns within two
ticks, never loses any derived data. A move from lib-A to lib-B
without disappearance flips through pass 1 (lib-A row retired) and
pass 2 (back-refs follow), with pass 3 noting nothing because the
hash is still present in `image_exif` (lib-B's row).
**Known gap: in-place content changes (future Branch D).** The
maintenance pipeline assumes a `(library_id, rel_path)`'s bytes are
stable for as long as the file exists at that path. If a user edits
a file in place (crop, re-export) without renaming, the watcher's
quick scan walks the file (mtime is recent) but `process_new_files`
short-circuits because `(library_id, rel_path)` already has an
`image_exif` row — no re-hash, no re-EXIF, no face redetection. The
row's `content_hash` keeps pointing at the original bytes. Tags /
faces / insights stay attached to the original hash and continue to
display because the rel_path back-ref still resolves; new faces
introduced by the edit are never detected.
The right place to fix this is a **stale-content detection pass**
that compares `image_exif.last_modified` / `size_bytes` to
`fs::metadata` for rows the quick scan would otherwise skip. On
mismatch, recompute the hash, update `image_exif`, and apply the
"content branched" semantics:
- **Faces** re-run (faces are fully derived from bytes).
- **Tags** migrate to the new hash (user intent — "this photo is
vacation" survives a crop). Insights migrate forward as a
starting point and are flagged for re-generation.
- **Favorites** (when migrated to hash-keyed) follow the path /
user intent.
The interesting case is the operator who keeps an unedited copy in
the archive library and edits the local copy: post-detection, the
archive copy stays on the original hash, the local copy branches to
the new hash, and the two histories cleanly split. Apollo's
`derived.db` cache will need an invalidation hook for the changed
hash — design it alongside Branch D.
### File Processing Pipeline
**Thumbnail Generation:**
@@ -114,6 +350,15 @@ All database access goes through trait-based DAOs (e.g., `ExifDao`, `SqliteExifD
2. Creates 200x200 thumbnails in THUMBNAILS directory (mirrors source structure)
3. Videos: extracts frame at 3-second mark via ffmpeg
4. Images: uses `image` crate for JPEG/PNG processing
5. RAW formats (NEF/CR2/ARW/DNG/etc.): the `image` crate can't decode RAW
pixel data, so the pipeline pulls an embedded JPEG preview instead. Fast
path is `exif::read_jpeg_at_ifd` against IFD0 (PRIMARY) and IFD1
(THUMBNAIL) — covers most older bodies and DNGs. Slow-path fallback shells
out to **`exiftool`** for `PreviewImage` / `JpgFromRaw` / `OtherImage`,
which reaches MakerNote / SubIFD-hosted previews kamadak-exif can't see
(e.g. Nikon's `PreviewIFD`, where modern Nikon bodies store the full-res
review JPEG). All candidates are pooled and the largest valid JPEG wins.
See `src/exif.rs::extract_embedded_jpeg_preview`.
**File Watching:**
Runs in background thread with two-tier strategy:
@@ -122,6 +367,60 @@ Runs in background thread with two-tier strategy:
- Batch queries EXIF DB to detect new files
- Configurable via `WATCH_QUICK_INTERVAL_SECONDS` and `WATCH_FULL_INTERVAL_SECONDS`
**Canonical date_taken pipeline (`src/date_resolver.rs`).** Every row's
`image_exif.date_taken` is populated at ingest by a four-step waterfall;
which step won is recorded in `image_exif.date_taken_source` so the
per-tick drain can re-resolve weak entries when better tools become
available, and so the UI/debug surface can answer "why did this photo
land on this date?". Order:
1. **`exif`** — kamadak-exif `DateTime` / `DateTimeOriginal`. Fast,
in-process, image-only.
2. **`exiftool`** — shell-out fallback for tags kamadak can't reach:
QuickTime/MP4 (`MediaCreateDate`, `TrackCreateDate`, `CreateDate`),
Apple's `ContentCreateDate`, MakerNote sub-IFDs. Required for
videos to land a real date. Single-file at ingest; the per-tick
drain feeds the whole batch through one `exiftool -@ -` subprocess.
Degrades silently when `exiftool` isn't on PATH (resolver caches the
"available" check via `OnceLock`).
3. **`filename`** — `extract_date_from_filename` in `memories.rs`
matches screenshot, chat-export, and timestamp-named patterns.
4. **`fs_time`** — `earliest_fs_time(metadata)` (earlier of created /
modified). Last resort.
Notable behavior change vs. the pre-2026-05 request-time logic:
**EXIF beats filename when both are present.** A photo named
`Screenshot_2014-06-01.png` whose EXIF `DateTime` is 2021 now appears
under 2021, not 2014 — on the theory that EXIF is more reliable than
import-named filenames. The reverse case (no EXIF, filename has a
date) is unchanged.
The `backfill_missing_date_taken` drain (`src/backfill.rs`) runs every
watcher tick alongside `backfill_unhashed_backlog` (also `src/backfill.rs`). It loads up to
`DATE_BACKFILL_MAX_PER_TICK` rows (default 500) where
`date_taken IS NULL` (backed by the `idx_image_exif_date_backfill`
partial index), runs the waterfall batch via `resolve_dates_batch`,
and writes results via the `backfill_date_taken` DAO method (touches
only `date_taken` + `date_taken_source` so EXIF / hash / perceptual
columns are preserved). Resolved rows — including the ones the
waterfall could only resolve via `fs_time` — are not re-eligible:
the resolver is deterministic on file bytes + filename + fs metadata,
so re-running on the same inputs lands on the same source every time.
An earlier version included `date_taken_source = 'fs_time'` in the
eligibility predicate, but with `ORDER BY id ASC LIMIT 500` it spun on
the same lowest-id rows in perpetuity and held the SQLite write lock
long enough to starve face-PATCH writers (5s busy_timeout → 500). If
a stronger tool comes online (exiftool install, new filename regex),
re-resolve out-of-band rather than re-introducing the steady-state
eligibility.
`/memories` is a single SQL query against this column
(`get_memories_in_window` in `src/database/mod.rs`), using
`strftime('%m-%d' | '%W' | '%m', date_taken, 'unixepoch', tz)` for
calendar matching with the client's timezone offset. The pre-rewrite
version stat'd every row and walked the entire library tree — at
~14k photos this took 1015 s; the rewrite is single-digit ms.
**EXIF Extraction:**
- Uses `kamadak-exif` crate
- Supports: JPEG, TIFF, RAW (NEF, CR2, CR3), HEIF/HEIC, PNG, WebP
@@ -169,6 +468,26 @@ POST /image/tags/batch (bulk tag updates)
// Memories (week-based grouping)
GET /memories?path=...&recursive=true
// AI Insights
POST /insights/generate (non-agentic single-shot)
POST /insights/generate/agentic (tool-calling loop; body: { file_path, backend?, model?, ... })
GET /insights?path=...&library=...
GET /insights/models (local-backend models + capabilities; Ollama OR llama-swap based on LLM_BACKEND)
GET /insights/openrouter/models (curated OpenRouter allowlist)
POST /insights/rate (thumbs up/down for training data)
// Text-to-Speech (Chatterbox via llama-swap; needs LLAMA_SWAP_URL)
POST /tts/speech (read-aloud: { text, voice?, ... } -> { audio_base64, format })
GET /tts/voices (Chatterbox voice library)
POST /tts/voices/upload (clone a voice from an uploaded clip; multipart)
POST /tts/voices/from-library (clone a voice from a library audio/video file)
// Insight Chat Continuation
POST /insights/chat (single-turn reply, non-streaming)
POST /insights/chat/stream (SSE: text / tool_call / tool_result / truncated / done)
GET /insights/chat/history?path=... (rendered transcript with tool invocations)
POST /insights/chat/rewind (truncate transcript at a rendered index)
```
**Request Types:**
@@ -190,7 +509,38 @@ Centralized in `file_types.rs` with constants `IMAGE_EXTENSIONS` and `VIDEO_EXTE
All database operations and HTTP handlers wrapped in spans. In release builds, exports to OTLP endpoint via `OTLP_OTLS_ENDPOINT`. Debug builds use basic logger.
**Memory Exclusion:**
`PathExcluder` in `memories.rs` filters out directories from memories API via `EXCLUDED_DIRS` environment variable (comma-separated paths or substring patterns).
`PathExcluder` in `memories.rs` filters out directories from memories API via `EXCLUDED_DIRS` environment variable (comma-separated paths or substring patterns). The same excluder is applied to face-detection candidates (`face_watch::filter_excluded`) so junk directories like `@eaDir` / `.thumbnails` don't burn detect calls on Apollo.
### Face detection system
ImageApi owns the face data; Apollo (sibling repo) hosts the insightface inference service. Inference is triggered automatically by the file watcher and persisted into two tables:
- `persons(id, name UNIQUE COLLATE NOCASE, cover_face_id, entity_id, created_from_tag, notes, ...)` — operator-managed, name is the user-visible identity.
- `face_detections(id, library_id, content_hash, rel_path, bbox_*, embedding BLOB, confidence, source, person_id, status, model_version, ...)` — keyed on `content_hash` so a photo duplicated across libraries is detected once. Marker rows for `status IN ('no_faces','failed')` carry NULL bbox/embedding (CHECK constraint enforces this).
**Why content_hash and not (library_id, rel_path):** ties face data to the bytes, not the path. A backup mount that copies files from the primary library naturally inherits the existing detections without re-running inference. This is the reference implementation of the multi-library data model — see "Multi-library data model" above.
**File-watch hook** (`src/watcher.rs::process_new_files`): for each photo with a populated `content_hash`, check `FaceDao::already_scanned(hash)`; if not, send bytes (or embedded JPEG preview for RAW via `exif::extract_embedded_jpeg_preview`) to Apollo's `/api/internal/faces/detect`. K=`FACE_DETECT_CONCURRENCY` (default 8) parallel calls per scan tick; Apollo serializes them via its single-worker GPU pool. `face_watch.rs` is the Tokio orchestration layer.
**Per-tick backlog drain** (`src/backfill.rs`): two passes that run on every watcher tick regardless of quick-vs-full scan:
- `backfill_unhashed_backlog` — populates `image_exif.content_hash` for photos that arrived before the hash field was retroactive. Capped by `FACE_HASH_BACKFILL_MAX_PER_TICK` (default 2000); errors don't burn the cap.
- `process_face_backlog` — runs detection on photos that have a hash but no `face_detections` row. Capped by `FACE_BACKLOG_MAX_PER_TICK` (default 64). Selected via a SQL anti-join (`FaceDao::list_unscanned_candidates`); videos and EXCLUDED_DIRS paths filtered out client-side via `face_watch::filter_excluded` so they never reach Apollo.
**Auto-bind on detection:** when a photo carries a tag whose name matches a `persons.name` (case-insensitive), the new face binds automatically iff cosine similarity to the person's existing-face mean is ≥ `FACE_AUTOBIND_MIN_COS` (default 0.4). Persons with no existing faces bind unconditionally and the new face becomes the cover.
**Manual face create** (`POST /image/faces`): crops the image to the user-supplied bbox, applies EXIF orientation via `exif::apply_orientation` (the `image` crate hands raw pre-rotation pixels — without this, manually-drawn bboxes never resolved a face on re-detection), pads to ~50% of bbox dims (RetinaFace anchor scales need ~50% face-fill at det_size=640), then calls Apollo's embed endpoint. A `force` flag lets the operator save a face the detector couldn't see (e.g. profile shots, occluded faces) — the row gets a zero-vector embedding so it's manually-bound only and won't participate in clustering.
**Rerun preserves manual rows** (`POST /image/faces/{id}/rerun`): only `source='auto'` rows are deleted before re-running detection. `already_scanned` returns true on ANY row, so a photo whose only faces are manually drawn never auto-redetects.
**Stats domain — content_hash, not file rows** (`FaceDao::stats` in `src/faces.rs`): `total_photos` counts `DISTINCT content_hash` over `image_exif` (filtered to image extensions, `content_hash IS NOT NULL`), and so do `scanned` / `with_faces` / `no_faces` / `failed` over `face_detections`. Numerator and denominator must live in the same domain — `face_detections` is keyed on content_hash, so the same JPEG present at two rel_paths or in two libraries scans once. Counting `image_exif` rows in the denominator inflated total by one per duplicate file and produced a permanent gap (e.g. 1101/1103 with nothing actually pending). Hash-less rows are excluded from total_photos while they sit in the `backfill_unhashed_backlog` queue; otherwise the bar pins below 100% for the duration of that backfill even though those rows aren't pending detection yet — they're pending hashing.
Module map:
- `src/faces.rs``FaceDao` trait + `SqliteFaceDao` impl, route handlers for `/faces/*`, `/image/faces/*`, `/persons/*`. Mirror of `tags.rs` layout.
- `src/face_watch.rs` — Tokio orchestration for the file-watch detect pass; `filter_excluded` (PathExcluder + image-extension filter), `read_image_bytes_for_detect` (RAW preview fallback).
- `src/backfill.rs` — per-tick drains (unhashed-hash, date_taken, face-backlog, etc.) called from `watcher::watch_files` and `watcher::process_new_files`.
- `src/watcher.rs` — the watcher loop itself and `process_new_files` (file walk → EXIF write → face-candidate build).
- `src/ai/face_client.rs` — HTTP client for Apollo's inference. Configured by `APOLLO_FACE_API_BASE_URL`, falls back to `APOLLO_API_BASE_URL`. Both unset → feature disabled, file-watch hook is a no-op.
- `migrations/2026-04-29-000000_add_faces/` — schema.
### Startup Sequence
@@ -249,6 +599,7 @@ Optional:
```bash
WATCH_QUICK_INTERVAL_SECONDS=60 # Quick scan interval
WATCH_FULL_INTERVAL_SECONDS=3600 # Full scan interval
DATE_BACKFILL_MAX_PER_TICK=500 # Cap on canonical-date drain per watcher tick
OTLP_OTLS_ENDPOINT=http://... # OpenTelemetry collector (release builds)
# AI Insights Configuration
@@ -256,8 +607,85 @@ OLLAMA_PRIMARY_URL=http://desktop:11434 # Primary Ollama server (e.g., de
OLLAMA_FALLBACK_URL=http://server:11434 # Fallback Ollama server (optional, always-on)
OLLAMA_PRIMARY_MODEL=nemotron-3-nano:30b # Model for primary server (default: nemotron-3-nano:30b)
OLLAMA_FALLBACK_MODEL=llama3.2:3b # Model for fallback server (optional, uses primary if not set)
OLLAMA_REQUEST_TIMEOUT_SECONDS=120 # Per-request generation timeout (default 120). Increase for slow CPU-offloaded models.
SMS_API_URL=http://localhost:8000 # SMS message API endpoint (default: localhost:8000)
SMS_API_TOKEN=your-api-token # SMS API authentication token (optional)
# Apollo Places integration (optional). When set, photo-insight enrichment
# folds the user's personal place name (Home, Work, Cabin, ...) into the
# location string fed to the LLM, and the agentic loop gains a
# `get_personal_place_at` tool. Unset = legacy Nominatim-only path.
APOLLO_API_BASE_URL=http://apollo.lan:8000 # Base URL of the sibling Apollo backend
# Face inference (optional). Apollo also hosts the insightface inference
# service; ImageApi calls it from the file-watch hook (Phase 3) and from
# the manual face-create endpoint. Falls back to APOLLO_API_BASE_URL when
# unset (typical single-Apollo deploy). Both unset = feature disabled.
APOLLO_FACE_API_BASE_URL=http://apollo.lan:8000 # Override if face service runs separately
FACE_AUTOBIND_MIN_COS=0.4 # Phase 3: cosine-sim floor for tag-name auto-bind
FACE_DETECT_CONCURRENCY=8 # Phase 3: per-scan-tick parallel detect calls
FACE_DETECT_TIMEOUT_SEC=60 # reqwest client timeout (CPU inference can be slow)
# OpenRouter (Hybrid Backend) - keeps embeddings + vision local, routes chat to OpenRouter
OPENROUTER_API_KEY=sk-or-... # Required to enable hybrid backend
OPENROUTER_DEFAULT_MODEL=anthropic/claude-sonnet-4 # Used when client doesn't pick a model
OPENROUTER_ALLOWED_MODELS=openai/gpt-4o-mini,anthropic/claude-haiku-4-5,google/gemini-2.5-flash
# Curated allowlist exposed to clients via
# GET /insights/openrouter/models. Empty = no picker.
OPENROUTER_BASE_URL=https://openrouter.ai/api/v1 # Override base URL (optional)
OPENROUTER_EMBEDDING_MODEL=openai/text-embedding-3-small # Optional, embeddings stay local today
OPENROUTER_HTTP_REFERER=https://your-site.example # Optional attribution header
OPENROUTER_APP_TITLE=ImageApi # Optional attribution header
# Local LLM backend switch. `ollama` (default) keeps the OLLAMA_* settings
# above; `llamacpp` swaps the entire local stack (chat + vision describe +
# embeddings) over to llama-swap. The switch is global and applies to
# `backend=local` requests and to `backend=hybrid`'s describe pass (hybrid
# chat still goes to OpenRouter). Don't flip mid-deploy without
# re-embedding — mixed vector spaces break similarity search.
LLM_BACKEND=ollama
# Embedding model contract. Corpus and queries must be embedded by the same
# model with matching prefixes — after changing the embed model or any of
# these, run `cargo run --bin reembed_embeddings` (all tables) or search is
# garbage. Prefix values may contain a literal \n (expanded to a newline).
EMBEDDING_DIM=768 # 768 = nomic-embed-text v1.5; 1024 = Qwen3-Embedding-0.6B
EMBED_QUERY_PREFIX= # nomic: "search_query: " | Qwen3: "Instruct: <task>\nQuery: "
EMBED_DOCUMENT_PREFIX= # nomic: "search_document: " | Qwen3: leave empty
# llama.cpp / llama-swap (used when LLM_BACKEND=llamacpp). OpenAI-compatible
# proxy hosting one or more llama-server processes. Chat models receive
# images directly via content-parts (all models assumed vision-capable).
LLAMA_SWAP_URL=http://localhost:9292/v1 # Required when LLM_BACKEND=llamacpp
LLAMA_SWAP_PRIMARY_MODEL=chat # Chat slot id (matches config.yaml)
LLAMA_SWAP_VISION_MODEL= # Dedicated vision slot for describe_image / describe_photo
# tool. Defaults to PRIMARY_MODEL when unset.
LLAMA_SWAP_EMBEDDING_MODEL=embed # Embedding slot id
LLAMA_SWAP_ALLOWED_MODELS=chat,coder # Curated allowlist surfaced by GET /insights/models
# when LLM_BACKEND=llamacpp. All report has_vision=true.
# Empty = picker shows only the configured primary model.
LLAMA_SWAP_REQUEST_TIMEOUT_SECONDS=180 # Per-request timeout; bump for slow CPU offload
# Text-to-speech (Chatterbox served behind llama-swap). Only needs
# LLAMA_SWAP_URL — independent of LLM_BACKEND. Powers /tts/speech (read-aloud)
# and /tts/voices* (voice cloning). Reference audio is ffmpeg-normalized to WAV
# server-side, so any source format works.
LLAMA_SWAP_TTS_MODEL=chatterbox # TTS model id in config.yaml (default: chatterbox)
LLAMA_SWAP_TTS_VOICE=m # Default voice when /tts/speech omits one (optional)
LLAMA_SWAP_TTS_REF_SECONDS=30 # Max voice-clone reference clip length, seconds
# (Chatterbox is zero-shot; ~10-20s clean ref is ideal)
LLAMA_SWAP_TTS_REQUEST_TIMEOUT_SECONDS=600 # Per-request synth timeout (long chunked insights take
# minutes); overrides the shared client timeout for /tts/speech
TTS_PRONUNCIATIONS_PATH=tts_pronunciations.json # JSON map of pronunciation overrides applied before synth
# (see tts_pronunciations.example.json); hot-reloaded on change
# Insight Chat Continuation
AGENTIC_CHAT_MAX_ITERATIONS=6 # Cap on tool-calling iterations per chat turn (default 6)
AGENTIC_CHAT_DEFAULT_NUM_CTX=32768 # Assumed context window for the history-truncation budget
# when a chat request omits num_ctx (default 32768). Size to
# the smallest context among the chat models actually served;
# too small silently guts replayed history every turn (and
# destroys llama.cpp KV-cache prefix reuse).
```
**AI Insights Fallback Behavior:**
@@ -275,8 +703,153 @@ The `OllamaClient` provides methods to query available models:
This allows runtime verification of model availability before generating insights.
**Local backend switch (`LLM_BACKEND`):**
One env var decides which "local" stack the server runs against — `ollama`
(default) or `llamacpp`. It's global on purpose: chat, vision, and
embeddings all route through the same backend, so the embedding-vector
column in SQLite stays in one vector space. Don't flip mid-deploy without
re-embedding the affected rows — similarity search will collapse.
- `LLM_BACKEND=ollama`: chat, vision, and embeddings use Ollama. Vision
capability is probed per-model via `/api/show`.
- `LLM_BACKEND=llamacpp`: chat models receive images directly via OpenAI
content-parts (all models assumed vision-capable). Embeddings hit the
`embed` slot. A dedicated `LLAMA_SWAP_VISION_MODEL` slot (defaults to
the chat model) handles `describe_image` for the `describe_photo` tool.
Requires `LLAMA_SWAP_URL`.
The per-request `backend=hybrid` override is orthogonal: it always sends
chat to OpenRouter (text-only, images are pre-described and inlined), but
the describe + embed passes still route through whichever `LLM_BACKEND`
is configured.
**Backend dispatch (`ResolvedBackend`):**
`InsightGenerator::resolve_backend(kind, overrides)` is the single entry
point that builds clients for a request. Returns a `ResolvedBackend` with
two roles: `.chat()` (the agentic/chat client) and `.local()` (local-only
utility calls: rerank, describe_image, embeddings). `BackendKind` is an
enum (`Local` | `Hybrid`) replacing the stringly-typed `"local"` /
`"hybrid"` labels. `SamplingOverrides` groups model/ctx/temp/top_p/top_k/
min_p per-request overrides. All downstream code (`execute_tool`,
`run_streaming_agentic_loop`, etc.) takes `&ResolvedBackend` rather than
individual client references.
`GET /insights/models` returns the local-backend models with capabilities
in the same envelope shape regardless of `LLM_BACKEND`: Ollama servers
when `ollama`, llama-swap slots (from `LLAMA_SWAP_ALLOWED_MODELS`) when
`llamacpp`. No `/insights/llamacpp/models` — the picker reads a single
endpoint.
**Hybrid Backend (OpenRouter):**
- Per-request opt-in via `backend=hybrid` on `POST /insights/generate/agentic`.
- Vision describe happens before the agentic loop; the description is inlined
into the chat prompt and the agentic loop runs on OpenRouter. Vision
routes through whichever `LLM_BACKEND` is configured.
- `request.model` (if provided) overrides `OPENROUTER_DEFAULT_MODEL` for that
call. The mobile picker reads from `OPENROUTER_ALLOWED_MODELS`.
- No live capability precheck — the operator-curated allowlist is trusted.
A bad model id surfaces as a chat-call error.
- `GET /insights/openrouter/models` returns `{ models, default_model, configured }`
for client picker UIs.
**Cross-replay matrix (chat continuation):**
- `local → local` allowed (whether served by Ollama or llama-swap; that's
a deploy-time decision, not a request-time one).
- `hybrid → hybrid` allowed.
- `hybrid → local` allowed (the inlined description replays as text).
- `local → hybrid` rejected — the stored transcript has raw images in the
first user message and OpenRouter providers don't accept that shape
consistently. Regenerate the insight in hybrid mode instead.
**Insight Chat Continuation:**
After an agentic insight is generated, the full `Vec<ChatMessage>` transcript is
stored in `photo_insights.training_messages` and can be continued via the
chat endpoints. The `PhotoInsightResponse.has_training_messages` flag tells
clients whether chat is available for a given insight.
- `POST /insights/chat` runs one turn of the agentic loop against the replayed
history. Body: `{ file_path, library?, user_message, model?, backend?, num_ctx?,
temperature?, top_p?, top_k?, min_p?, max_iterations?, system_prompt?, amend? }`.
`system_prompt` is a per-turn override: in append mode (default) it's applied
ephemerally — the original system message is restored before persistence so
the stored transcript keeps its baked persona. In amend mode the override
stays in place and becomes the new insight row's system message. Mirrors the
internal `annotate_system_with_budget` swap-and-restore pattern.
- `POST /insights/chat/stream` is the SSE variant — same request body, response
is `text/event-stream` with events: `iteration_start`, `text` (delta), `tool_call`,
`tool_result`, `truncated`, `done`, plus a server-emitted `error_message` on
failure. Preferred by the mobile client for live tool-chip updates.
- `GET /insights/chat/history?path=...&library=...` returns the rendered
transcript. Each assistant message carries a `tools: [{name, arguments, result,
result_truncated?}]` array with the tool invocations that led up to it. Tool
results over 2000 chars are truncated with `result_truncated: true`.
- `POST /insights/chat/rewind` truncates the transcript at a given rendered
index (drops that message + any tool-call scaffolding that preceded it + all
later turns). Index 0 is protected. Used for "try again from here" flows.
Backend routing rules (matches agentic-insight generation):
- Stored `backend` on the insight row is authoritative by default.
- `request.backend` may override per-turn. `local -> hybrid` is rejected in
v1 (would require on-the-fly visual-description rewrite); `hybrid -> local`
replays verbatim since the description is already inlined as text.
- `request.model` overrides the chat model (an Ollama id in local mode, an
OpenRouter id in hybrid mode).
Persistence:
- Append mode (default): re-serialize the full history and `UPDATE` the same
row's `training_messages`.
- Amend mode (`amend: true`): regenerate the title, insert a new insight row
via `store_insight` (auto-flips prior rows' `is_current=false`). Response
surfaces the new row's id as `amended_insight_id`.
Per-`(library_id, file_path)` async mutex (`AppState.insight_chat.chat_locks`)
serialises concurrent turns on the same insight so the JSON blob doesn't race.
Context management is a soft bound: if the serialized history exceeds
`num_ctx - 2048` tokens (cheap 4-byte/token heuristic; `num_ctx` defaults
to `AGENTIC_CHAT_DEFAULT_NUM_CTX`, 32768, when the request omits it), the
oldest assistant-tool_call + tool_result pairs are dropped until under budget. The
initial user message (with any images) and system prompt are always preserved.
The `truncated` event / flag is surfaced to the client when a drop occurred.
Configurable env:
- `AGENTIC_CHAT_MAX_ITERATIONS` — cap on tool-calling iterations per turn
(default 6). Per-request `max_iterations` is clamped to this cap.
- `AGENTIC_CHAT_DEFAULT_NUM_CTX` — assumed context window for the truncation
budget when the request omits `num_ctx` (default 32768).
**Apollo Places integration (optional):**
The sibling Apollo project (personal location-history viewer) owns
user-defined Places: `name + lat/lon + radius_m + description (+ optional
category)`. When `APOLLO_API_BASE_URL` is set, ImageApi queries
`/api/places/contains?lat=&lon=` to enrich the LLM prompt's location
string. See `src/ai/apollo_client.rs` and `src/ai/insight_generator.rs`:
- **Auto-enrichment** (always on when configured): the per-photo location
resolver folds the most-specific containing Place ("Home — near
Cambridge, MA" or "Home (My house in Cambridge) — near Cambridge, MA"
when a description is set) into the location field of `combine_contexts`.
Smallest-radius wins — Apollo sorts server-side, this code takes `[0]`.
- **Agentic tool** `get_personal_place_at(latitude, longitude)`: registered
alongside `reverse_geocode` only when `apollo_enabled()` returns true.
Returns "- Name [category]: description (radius N m)" lines, smallest
radius first. The tool is **deliberately narrow** — no enumerate-all
variant; auto-enrichment covers the photo-context path and the agentic
tool covers ad-hoc lat/lon questions in chat continuation.
Failure modes degrade silently to the legacy Nominatim path: 5 s timeout,
errors logged at `warn`, empty results returned. Apollo's routes are
unauthenticated (single-user, LAN-trust); add JWT auth here + on Apollo's
side if exposing beyond a trusted network.
## Dependencies of Note
### Rust crates
- **actix-web**: HTTP framework
- **diesel**: ORM for SQLite
- **jsonwebtoken**: JWT implementation
@@ -287,3 +860,18 @@ This allows runtime verification of model availability before generating insight
- **opentelemetry**: Distributed tracing
- **bcrypt**: Password hashing
- **infer**: Magic number file type detection
### External binaries (must be on `PATH`)
- **`ffmpeg`** — video thumbnail extraction (`StreamActor`, HLS pipeline) and
the HEIF/HEIC/NEF/ARW thumbnail fallback in `generate_image_thumbnail_ffmpeg`.
Required for any deploy that holds video or HEIF files.
- **`exiftool`** — optional but strongly recommended for RAW-heavy libraries.
The thumbnail pipeline shells out to it as the slow-path fallback for
embedded preview extraction (Nikon MakerNote `PreviewIFD`, Canon SubIFDs,
etc. — anything kamadak-exif's IFD0/IFD1 readers can't reach). Without
exiftool installed, RAWs whose preview lives outside IFD0/IFD1 will fall
through to ffmpeg, which often produces black thumbnails. Install via
package manager: `apt install libimage-exiftool-perl`,
`brew install exiftool`, `winget install OliverBetz.ExifTool`, or
`choco install exiftool`.
Generated
+1268 -831
View File
File diff suppressed because it is too large Load Diff
+21 -4
View File
@@ -1,6 +1,6 @@
[package]
name = "image-api"
version = "0.5.2"
version = "1.3.0"
authors = ["Cameron Cordes <cameronc.dev@gmail.com>"]
edition = "2024"
@@ -9,6 +9,9 @@ edition = "2024"
[profile.release]
lto = "thin"
[profile.dev]
debug = "line-tables-only"
[dependencies]
actix = "0.13.1"
actix-web = "4"
@@ -23,13 +26,13 @@ jsonwebtoken = "9.3.0"
serde = "1"
serde_json = "1"
diesel = { version = "2.2.10", features = ["sqlite"] }
libsqlite3-sys = { version = "0.35", features = ["bundled"] }
libsqlite3-sys = "0.35"
diesel_migrations = "2.2.0"
chrono = "0.4"
clap = { version = "4.5", features = ["derive"] }
dotenv = "0.15"
bcrypt = "0.17.1"
image = { version = "0.25.5", default-features = false, features = ["jpeg", "png", "rayon"] }
image = { version = "0.25.5", default-features = false, features = ["jpeg", "png", "rayon", "webp", "tiff", "avif"] }
infer = "0.16"
walkdir = "2.4.0"
rayon = "1.5"
@@ -49,9 +52,23 @@ opentelemetry-appender-log = "0.31.0"
tempfile = "3.20.0"
regex = "1.11.1"
exif = { package = "kamadak-exif", version = "0.6.1" }
reqwest = { version = "0.12", features = ["json"] }
reqwest = { version = "0.12", features = ["json", "stream", "multipart"] }
async-stream = "0.3"
tokio-util = { version = "0.7", features = ["io"] }
bytes = "1"
urlencoding = "2.1"
zerocopy = "0.8"
ical = "0.11"
scraper = "0.20"
base64 = "0.22"
blake3 = "1.5"
image_hasher = "3.0"
bk-tree = "0.5"
async-trait = "0.1"
indicatif = "0.17"
uuid = { version = "1.10", features = ["v4", "serde"] }
# Windows lacks system sqlite3, so re-enable the bundled C build there.
# Linux/macOS use the system library (faster builds, smaller binary).
[target.'cfg(windows)'.dependencies]
libsqlite3-sys = { version = "0.35", features = ["bundled"] }
+170 -2
View File
@@ -14,14 +14,60 @@ Upon first run it will generate thumbnails for all images and videos at `BASE_PA
- **RAG-based Context Retrieval** - Semantic search over daily conversation summaries
- **Automatic Daily Summaries** - LLM-generated summaries of daily conversations with embeddings
## External Dependencies
### ffmpeg (required)
`ffmpeg` must be on `PATH`. It is used for:
- **HLS video streaming** — transcoding/segmenting source videos into `.m3u8` + `.ts` playlists
- **Video thumbnails** — extracting a frame at the 3-second mark
- **Video preview clips** — short looping previews for the Video Wall
- **HEIC / HEIF thumbnails** — decoding Apple's HEIC format (your ffmpeg build must include
`libheif`; most modern builds do)
Builds used in development: the `gyan.dev` full build on Windows, and distro `ffmpeg`
packages on Linux work fine. If HEIC thumbnails silently fail, check
`ffmpeg -formats | grep heif` to confirm HEIF support.
### RAW photo thumbnails
RAW formats (ARW, NEF, CR2, CR3, DNG, RAF, ORF, RW2, PEF, SRW, TIFF) are thumbnailed
by reading an embedded JPEG preview out of the TIFF container — no external RAW
decoder (libraw / dcraw) is involved. The pipeline tries two layers in order and
keeps the largest valid JPEG:
1. **Fast path (no extra dependency)**`kamadak-exif` reads
`JPEGInterchangeFormat` from IFD0 / IFD1 directly. Covers older bodies and
most DNGs.
2. **`exiftool` fallback (recommended for RAW-heavy libraries)** — shells out
to extract `PreviewImage` / `JpgFromRaw` / `OtherImage`, which reaches
MakerNote and SubIFD-hosted previews kamadak-exif can't see (e.g. Nikon's
`PreviewIFD`, where modern Nikon bodies stash the full-res review JPEG).
If `exiftool` isn't on `PATH` this layer is skipped silently and only the
fast-path result is used.
Install `exiftool` via your package manager:
- macOS: `brew install exiftool`
- Linux (Debian/Ubuntu): `apt install libimage-exiftool-perl`
- Windows: `winget install OliverBetz.ExifTool` or `choco install exiftool`
Files where neither layer produces a valid preview fall back to ffmpeg. Anything
that still can't be decoded is marked with a `<thumb>.unsupported` sentinel in
the thumbnail directory so we don't retry it every scan. Delete those sentinels
(and any cached black thumbnails) to force retries after a tooling upgrade.
## Environment
There are a handful of required environment variables to have the API run.
They should be defined where the binary is located or above it in an `.env` file.
You must have `ffmpeg` installed for streaming video and generating video thumbnails.
- `DATABASE_URL` is a path or url to a database (currently only SQLite is tested)
- `BASE_PATH` is the root from which you want to serve images and videos
- `THUMBNAILS` is a path where generated thumbnails should be stored
- `THUMBNAILS` is a path where generated thumbnails should be stored. Thumbnails
mirror the source tree under `BASE_PATH` and keep the source's original
extension (e.g. `foo.arw` or `bar.mp4`), though the file contents are always
JPEG bytes — browsers content-sniff. Files that can't be thumbnailed by the
`image` crate, ffmpeg, or an embedded RAW preview get a zero-byte
`<thumb_path>.unsupported` sentinel in this directory so subsequent scans
skip them. Delete the `*.unsupported` files to force retries (for example
after upgrading ffmpeg or adding libheif)
- `VIDEO_PATH` is a path where HLS playlists and video parts should be stored
- `GIFS_DIRECTORY` is a path where generated video GIF thumbnails should be stored
- `BIND_URL` is the url and port to bind to (typically your own IP address)
@@ -50,6 +96,29 @@ The following environment variables configure AI-powered photo insights and dail
- `OLLAMA_URL` - Used if `OLLAMA_PRIMARY_URL` not set
- `OLLAMA_MODEL` - Used if `OLLAMA_PRIMARY_MODEL` not set
#### OpenRouter Configuration (Hybrid Backend)
The hybrid agentic backend keeps embeddings + vision local (Ollama) while routing
chat + tool-calling to OpenRouter. Enabled per-request when the client sends
`backend=hybrid`.
- `OPENROUTER_API_KEY` - OpenRouter API key. Required to enable the hybrid backend.
- `OPENROUTER_DEFAULT_MODEL` - Model id used when the client doesn't specify one
[default: `anthropic/claude-sonnet-4`]
- Example: `openai/gpt-4o-mini`, `google/gemini-2.5-flash`
- `OPENROUTER_ALLOWED_MODELS` - Comma-separated curated allowlist exposed to
clients via `GET /insights/openrouter/models`. The mobile picker shows only
these. Empty/unset = no picker, server default is used.
- Example: `openai/gpt-4o-mini,anthropic/claude-haiku-4-5,google/gemini-2.5-flash`
- `OPENROUTER_BASE_URL` - Override base URL [default: `https://openrouter.ai/api/v1`]
- `OPENROUTER_EMBEDDING_MODEL` - Embedding model for OpenRouter
[default: `openai/text-embedding-3-small`]. Only used if/when embeddings are
routed through OpenRouter (currently embeddings stay local).
- `OPENROUTER_HTTP_REFERER` - Optional `HTTP-Referer` for OpenRouter attribution
- `OPENROUTER_APP_TITLE` - Optional `X-Title` for OpenRouter attribution
Capability checks are skipped for the curated allowlist — bad model ids surface
as a 4xx from the chat call. Pick tool-capable models.
#### SMS API Configuration
- `SMS_API_URL` - URL to SMS message API [default: `http://localhost:8000`]
- Used to fetch conversation data for context in insights
@@ -60,6 +129,74 @@ The following environment variables configure AI-powered photo insights and dail
- Controls how many times the model can invoke tools before being forced to produce a final answer
- Increase for more thorough context gathering; decrease to limit response time
#### Insight Chat Continuation
After an agentic insight is generated, the conversation can be continued. Endpoints:
- `POST /insights/chat` — single-turn reply (non-streaming)
- `POST /insights/chat/stream` — SSE variant with live `text` deltas and
`tool_call` / `tool_result` events. Mobile client uses this.
- `GET /insights/chat/history?path=...&library=...` — rendered transcript;
each assistant message carries a `tools: [{name, arguments, result}]` array
- `POST /insights/chat/rewind` — truncate transcript at a rendered index
(drops that message + any preceding tool scaffolding + later turns). Used
for "try again from here" flows. The initial user message is protected.
Amend mode (`amend: true` in the chat request body) regenerates the insight's
title and inserts a new row instead of appending to the existing transcript,
so you can rewrite the saved summary from within chat.
- `AGENTIC_CHAT_MAX_ITERATIONS` - Cap on tool-calling iterations per chat turn [default: `6`]
- Per-request `max_iterations` (when sent by the client) is clamped to this cap
#### Text-to-Speech (Optional)
Reads insights aloud and manages cloned voices via a Chatterbox model served
behind the same llama-swap proxy. Only requires `LLAMA_SWAP_URL` (the TTS client
is built whenever that's set — independent of `LLM_BACKEND`). Endpoints:
- `POST /tts/speech` — body `{ text, voice?, format?, exaggeration?, cfg_weight?,
temperature? }`; returns `{ audio_base64, format }`. Input is cleaned
server-side (markdown + emoji stripped, then pronunciation overrides applied —
see below) and the generation knobs are clamped
to Chatterbox's ranges. Synthesis is serialized (one at a time — the upstream
has no GPU lock of its own); a concurrent request gets a fast `429`.
- `POST /tts/speech/jobs` — durable variant for long syntheses: same body as
`/tts/speech`, returns `202 { job_id, status }` immediately. Jobs queue on the
GPU permit instead of fast-failing `429`.
- `GET /tts/speech/jobs/{id}` — poll a job: `{ job_id, status, format,
audio_base64?, error? }` with status `queued|running|done|error|cancelled`.
Results are kept in memory ~10 min after completion, then the job 404s.
- `DELETE /tts/speech/jobs/{id}` — cancel a queued/running job.
- `GET /tts/voices` — list the voice library. Served from an in-memory cache
(so the listing doesn't make llama-swap spin up the TTS model and evict the
resident LLM); pass `?refresh=1` to force an upstream re-query. The cache is
invalidated by voice create/delete.
- `POST /tts/voices/upload` — multipart `voice_name` + `voice_file`; clone a
voice from an uploaded clip (≤25 MB).
- `POST /tts/voices/from-library` — body `{ voice_name, path, library? }`; clone
from a library file (audio forwarded as-is; video has its audio extracted via
ffmpeg).
- `DELETE /tts/voices/{name}` — remove a cloned voice from the library.
Created voice names are tagged with the ref-clip cap in effect (e.g.
`grandma-30s`) so the library shows which reference length produced each clone.
Words the model mispronounces (place names, initialisms) can be rewritten
before synthesis via a JSON map — copy `tts_pronunciations.example.json` to
`tts_pronunciations.json` and edit; changes apply without a restart. Full
matching rules are documented in `src/ai/pronunciation.rs`.
Env:
- `TTS_PRONUNCIATIONS_PATH` - pronunciation-override JSON file
[default: `tts_pronunciations.json` in the working directory]
- `LLAMA_SWAP_TTS_MODEL` - TTS model id in llama-swap's `config.yaml` [default: `chatterbox`]
- `LLAMA_SWAP_TTS_VOICE` - default voice used when a `/tts/speech` request omits `voice` (optional)
- `LLAMA_SWAP_TTS_REF_SECONDS` - max voice-clone reference clip length in seconds
[default: `30`]. Reference audio is ffmpeg-normalized to mono 24 kHz WAV (so any
source format works); Chatterbox is zero-shot, so a clean ~1020s sample is the
sweet spot — more rarely helps.
- `LLAMA_SWAP_TTS_REQUEST_TIMEOUT_SECONDS` - per-request synthesis timeout in
seconds [default: `600`]. Long insights are chunked + synthesized server-side
and can take minutes; this is separate from (and overrides, for `/tts/speech`)
the shared `LLAMA_SWAP_REQUEST_TIMEOUT_SECONDS`.
#### Fallback Behavior
- Primary server is tried first with 5-second connection timeout
- On failure, automatically falls back to secondary server (if configured)
@@ -72,3 +209,34 @@ Daily conversation summaries are generated automatically on server startup. Conf
- Contacts to process
- Model version used for embeddings: `nomic-embed-text:v1.5`
### Apollo + Face Recognition (Optional)
Apollo (sibling project) hosts both the Places API and the local insightface
inference service. Both integrations are optional and degrade gracefully when
unset.
- `APOLLO_API_BASE_URL` - Base URL of the sibling Apollo backend.
- When set, photo-insight enrichment folds the user's personal place name
(Home, Work, Cabin, ...) into the location string, and the agentic loop
gains a `get_personal_place_at` tool. Unset = legacy Nominatim-only path.
- `APOLLO_FACE_API_BASE_URL` - Base URL for the face-detection service.
- Falls back to `APOLLO_API_BASE_URL` when unset (typical single-Apollo
deploy). Both unset = face feature disabled (file-watch hook and
manual-face endpoints short-circuit silently).
- `FACE_AUTOBIND_MIN_COS` (Phase 3) - Cosine-sim floor for auto-binding a
detected face to an existing same-named person via people-tag bootstrap
[default: `0.4`].
- `FACE_DETECT_CONCURRENCY` (Phase 3) - Per-scan-tick concurrent detect
calls fired by the file watcher [default: `8`]. Apollo serializes them
via its single-worker GPU pool.
- `FACE_DETECT_TIMEOUT_SEC` - reqwest client timeout per detect call
[default: `60`]. CPU inference on a backlog can take many seconds.
- `FACE_BACKLOG_MAX_PER_TICK` - Cap on the per-tick backlog drain (photos
with a content_hash but no face_detections row) [default: `64`]. Runs
every watcher tick regardless of quick-vs-full scan, so the unscanned
set drains independently of the file walk.
- `FACE_HASH_BACKFILL_MAX_PER_TICK` - Cap on the per-tick content_hash
backfill (photos that were registered before the hash field was
populated retroactively) [default: `2000`]. Errors don't burn the cap;
only successful hashes count.
@@ -0,0 +1,155 @@
-- Revert multi-library support.
-- Drops library_id/content_hash/size_bytes, renames rel_path back to the
-- original column names, and drops the libraries table. Rows originally
-- from non-primary libraries (id > 1) would be orphaned, so the rollback
-- keeps only rows from library_id=1.
PRAGMA foreign_keys=OFF;
-- tagged_photo: rel_path → photo_name.
DROP INDEX IF EXISTS idx_tagged_photo_relpath_tag;
DROP INDEX IF EXISTS idx_tagged_photo_rel_path;
ALTER TABLE tagged_photo RENAME COLUMN rel_path TO photo_name;
CREATE INDEX IF NOT EXISTS idx_tagged_photo_photo_name ON tagged_photo(photo_name);
CREATE INDEX IF NOT EXISTS idx_tagged_photo_count ON tagged_photo(photo_name, tag_id);
-- favorites: rel_path → path.
DROP INDEX IF EXISTS idx_favorites_unique;
DROP INDEX IF EXISTS idx_favorites_rel_path;
ALTER TABLE favorites RENAME COLUMN rel_path TO path;
CREATE INDEX IF NOT EXISTS idx_favorites_path ON favorites(path);
CREATE UNIQUE INDEX IF NOT EXISTS idx_favorites_unique ON favorites(userid, path);
-- video_preview_clips: drop library_id, rel_path → file_path.
CREATE TABLE video_preview_clips_old (
id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
file_path TEXT NOT NULL UNIQUE,
status TEXT NOT NULL DEFAULT 'pending',
duration_seconds REAL,
file_size_bytes INTEGER,
error_message TEXT,
created_at TEXT NOT NULL,
updated_at TEXT NOT NULL
);
INSERT INTO video_preview_clips_old (
id, file_path, status, duration_seconds, file_size_bytes,
error_message, created_at, updated_at
)
SELECT
id, rel_path, status, duration_seconds, file_size_bytes,
error_message, created_at, updated_at
FROM video_preview_clips
WHERE library_id = 1;
DROP TABLE video_preview_clips;
ALTER TABLE video_preview_clips_old RENAME TO video_preview_clips;
CREATE INDEX idx_preview_clips_file_path ON video_preview_clips(file_path);
CREATE INDEX idx_preview_clips_status ON video_preview_clips(status);
-- entity_photo_links: drop library_id, rel_path → file_path.
CREATE TABLE entity_photo_links_old (
id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
entity_id INTEGER NOT NULL,
file_path TEXT NOT NULL,
role TEXT NOT NULL,
CONSTRAINT fk_epl_entity FOREIGN KEY (entity_id) REFERENCES entities(id) ON DELETE CASCADE,
UNIQUE(entity_id, file_path, role)
);
INSERT INTO entity_photo_links_old (id, entity_id, file_path, role)
SELECT id, entity_id, rel_path, role
FROM entity_photo_links
WHERE library_id = 1;
DROP TABLE entity_photo_links;
ALTER TABLE entity_photo_links_old RENAME TO entity_photo_links;
CREATE INDEX idx_entity_photo_links_entity ON entity_photo_links(entity_id);
CREATE INDEX idx_entity_photo_links_photo ON entity_photo_links(file_path);
-- photo_insights: drop library_id, rel_path → file_path.
CREATE TABLE photo_insights_old (
id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
file_path TEXT NOT NULL,
title TEXT NOT NULL,
summary TEXT NOT NULL,
generated_at BIGINT NOT NULL,
model_version TEXT NOT NULL,
is_current BOOLEAN NOT NULL DEFAULT 0,
training_messages TEXT,
approved BOOLEAN
);
INSERT INTO photo_insights_old (
id, file_path, title, summary, generated_at, model_version, is_current,
training_messages, approved
)
SELECT
id, rel_path, title, summary, generated_at, model_version, is_current,
training_messages, approved
FROM photo_insights
WHERE library_id = 1;
DROP TABLE photo_insights;
ALTER TABLE photo_insights_old RENAME TO photo_insights;
CREATE INDEX idx_photo_insights_file_path ON photo_insights(file_path);
CREATE INDEX idx_photo_insights_current ON photo_insights(file_path, is_current);
-- image_exif: drop library_id/content_hash/size_bytes, rel_path → file_path.
CREATE TABLE image_exif_old (
id INTEGER PRIMARY KEY NOT NULL,
file_path TEXT NOT NULL UNIQUE,
camera_make TEXT,
camera_model TEXT,
lens_model TEXT,
width INTEGER,
height INTEGER,
orientation INTEGER,
gps_latitude REAL,
gps_longitude REAL,
gps_altitude REAL,
focal_length REAL,
aperture REAL,
shutter_speed TEXT,
iso INTEGER,
date_taken BIGINT,
created_time BIGINT NOT NULL,
last_modified BIGINT NOT NULL
);
INSERT INTO image_exif_old (
id, file_path,
camera_make, camera_model, lens_model,
width, height, orientation,
gps_latitude, gps_longitude, gps_altitude,
focal_length, aperture, shutter_speed, iso, date_taken,
created_time, last_modified
)
SELECT
id, rel_path,
camera_make, camera_model, lens_model,
width, height, orientation,
gps_latitude, gps_longitude, gps_altitude,
focal_length, aperture, shutter_speed, iso, date_taken,
created_time, last_modified
FROM image_exif
WHERE library_id = 1;
DROP TABLE image_exif;
ALTER TABLE image_exif_old RENAME TO image_exif;
CREATE INDEX idx_image_exif_file_path ON image_exif(file_path);
CREATE INDEX idx_image_exif_camera ON image_exif(camera_make, camera_model);
CREATE INDEX idx_image_exif_gps ON image_exif(gps_latitude, gps_longitude);
CREATE INDEX idx_image_exif_date_taken ON image_exif(date_taken);
CREATE INDEX idx_image_exif_date_path ON image_exif(date_taken DESC, file_path);
-- Finally, drop the libraries registry.
DROP TABLE libraries;
PRAGMA foreign_keys=ON;
ANALYZE;
@@ -0,0 +1,216 @@
-- Multi-library support.
-- Adds `libraries` registry table and a `library_id` column on per-instance
-- metadata tables. Renames `file_path` / `photo_name` to `rel_path` for
-- semantic clarity (values already stored relative to BASE_PATH).
-- Adds `content_hash` + `size_bytes` to `image_exif` to support
-- content-based dedup of thumbnails and HLS output across libraries.
--
-- SQLite cannot alter column constraints in place, so per-instance tables
-- are recreated following the idiom established in
-- 2026-04-02-000000_photo_insights_history/up.sql. Existing row `id`s are
-- preserved so foreign keys (entity_facts.source_insight_id, etc.) remain
-- valid after migration.
PRAGMA foreign_keys=OFF;
-- ---------------------------------------------------------------------------
-- 1. Libraries registry.
-- Seeded with a placeholder for the primary library; AppState patches
-- `root_path` from the BASE_PATH env var on first boot. Subsequent
-- prod-to-dev DB syncs update this row via a single SQL UPDATE.
-- ---------------------------------------------------------------------------
CREATE TABLE libraries (
id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
name TEXT NOT NULL UNIQUE,
root_path TEXT NOT NULL,
created_at BIGINT NOT NULL
);
INSERT INTO libraries (id, name, root_path, created_at)
VALUES (1, 'main', 'BASE_PATH_PLACEHOLDER', strftime('%s','now'));
-- ---------------------------------------------------------------------------
-- 2. image_exif: + library_id, file_path → rel_path, + content_hash/size_bytes.
-- ---------------------------------------------------------------------------
CREATE TABLE image_exif_new (
id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
library_id INTEGER NOT NULL REFERENCES libraries(id),
rel_path TEXT NOT NULL,
-- Camera information
camera_make TEXT,
camera_model TEXT,
lens_model TEXT,
-- Image properties
width INTEGER,
height INTEGER,
orientation INTEGER,
-- GPS
gps_latitude REAL,
gps_longitude REAL,
gps_altitude REAL,
-- Capture settings
focal_length REAL,
aperture REAL,
shutter_speed TEXT,
iso INTEGER,
date_taken BIGINT,
-- Housekeeping
created_time BIGINT NOT NULL,
last_modified BIGINT NOT NULL,
-- Content identity (backfilled by the `backfill_hashes` binary and by the watcher for new files)
content_hash TEXT,
size_bytes BIGINT,
UNIQUE(library_id, rel_path)
);
INSERT INTO image_exif_new (
id, library_id, rel_path,
camera_make, camera_model, lens_model,
width, height, orientation,
gps_latitude, gps_longitude, gps_altitude,
focal_length, aperture, shutter_speed, iso, date_taken,
created_time, last_modified
)
SELECT
id, 1, file_path,
camera_make, camera_model, lens_model,
width, height, orientation,
gps_latitude, gps_longitude, gps_altitude,
focal_length, aperture, shutter_speed, iso, date_taken,
created_time, last_modified
FROM image_exif;
DROP TABLE image_exif;
ALTER TABLE image_exif_new RENAME TO image_exif;
CREATE INDEX idx_image_exif_rel_path ON image_exif(rel_path);
CREATE INDEX idx_image_exif_camera ON image_exif(camera_make, camera_model);
CREATE INDEX idx_image_exif_gps ON image_exif(gps_latitude, gps_longitude);
CREATE INDEX idx_image_exif_date_taken ON image_exif(date_taken);
CREATE INDEX idx_image_exif_date_path ON image_exif(date_taken DESC, rel_path);
CREATE INDEX idx_image_exif_lib_date ON image_exif(library_id, date_taken);
CREATE INDEX idx_image_exif_content_hash ON image_exif(content_hash);
-- ---------------------------------------------------------------------------
-- 3. photo_insights: + library_id, file_path → rel_path.
-- Preserve `id` so entity_facts.source_insight_id FKs remain valid.
-- ---------------------------------------------------------------------------
CREATE TABLE photo_insights_new (
id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
library_id INTEGER NOT NULL REFERENCES libraries(id),
rel_path TEXT NOT NULL,
title TEXT NOT NULL,
summary TEXT NOT NULL,
generated_at BIGINT NOT NULL,
model_version TEXT NOT NULL,
is_current BOOLEAN NOT NULL DEFAULT 0,
training_messages TEXT,
approved BOOLEAN
);
INSERT INTO photo_insights_new (
id, library_id, rel_path, title, summary, generated_at, model_version,
is_current, training_messages, approved
)
SELECT
id, 1, file_path, title, summary, generated_at, model_version,
is_current, training_messages, approved
FROM photo_insights;
DROP TABLE photo_insights;
ALTER TABLE photo_insights_new RENAME TO photo_insights;
CREATE INDEX idx_photo_insights_rel_path ON photo_insights(rel_path);
CREATE INDEX idx_photo_insights_current ON photo_insights(library_id, rel_path, is_current);
-- ---------------------------------------------------------------------------
-- 4. entity_photo_links: + library_id, file_path → rel_path.
-- Preserves entity FK; UNIQUE now includes library_id to allow the same
-- rel_path to link entities in multiple libraries independently.
-- ---------------------------------------------------------------------------
CREATE TABLE entity_photo_links_new (
id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
entity_id INTEGER NOT NULL,
library_id INTEGER NOT NULL REFERENCES libraries(id),
rel_path TEXT NOT NULL,
role TEXT NOT NULL,
CONSTRAINT fk_epl_entity FOREIGN KEY (entity_id) REFERENCES entities(id) ON DELETE CASCADE,
UNIQUE(entity_id, library_id, rel_path, role)
);
INSERT INTO entity_photo_links_new (id, entity_id, library_id, rel_path, role)
SELECT id, entity_id, 1, file_path, role FROM entity_photo_links;
DROP TABLE entity_photo_links;
ALTER TABLE entity_photo_links_new RENAME TO entity_photo_links;
CREATE INDEX idx_entity_photo_links_entity ON entity_photo_links(entity_id);
CREATE INDEX idx_entity_photo_links_photo ON entity_photo_links(library_id, rel_path);
-- ---------------------------------------------------------------------------
-- 5. video_preview_clips: + library_id, file_path → rel_path.
-- ---------------------------------------------------------------------------
CREATE TABLE video_preview_clips_new (
id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
library_id INTEGER NOT NULL REFERENCES libraries(id),
rel_path TEXT NOT NULL,
status TEXT NOT NULL DEFAULT 'pending',
duration_seconds REAL,
file_size_bytes INTEGER,
error_message TEXT,
created_at TEXT NOT NULL,
updated_at TEXT NOT NULL,
UNIQUE(library_id, rel_path)
);
INSERT INTO video_preview_clips_new (
id, library_id, rel_path, status, duration_seconds, file_size_bytes,
error_message, created_at, updated_at
)
SELECT
id, 1, file_path, status, duration_seconds, file_size_bytes,
error_message, created_at, updated_at
FROM video_preview_clips;
DROP TABLE video_preview_clips;
ALTER TABLE video_preview_clips_new RENAME TO video_preview_clips;
CREATE INDEX idx_preview_clips_rel_path ON video_preview_clips(rel_path);
CREATE INDEX idx_preview_clips_status ON video_preview_clips(status);
-- ---------------------------------------------------------------------------
-- 6. favorites: path → rel_path. Library-agnostic (cross-library sharing).
-- ---------------------------------------------------------------------------
ALTER TABLE favorites RENAME COLUMN path TO rel_path;
DROP INDEX IF EXISTS idx_favorites_path;
DROP INDEX IF EXISTS idx_favorites_unique;
CREATE INDEX idx_favorites_rel_path ON favorites(rel_path);
CREATE UNIQUE INDEX idx_favorites_unique ON favorites(userid, rel_path);
-- ---------------------------------------------------------------------------
-- 7. tagged_photo: photo_name → rel_path. Library-agnostic.
-- Dedup first so the (rel_path, tag_id) unique index can be created safely.
-- ---------------------------------------------------------------------------
ALTER TABLE tagged_photo RENAME COLUMN photo_name TO rel_path;
DELETE FROM tagged_photo
WHERE id NOT IN (
SELECT MIN(id) FROM tagged_photo GROUP BY rel_path, tag_id
);
DROP INDEX IF EXISTS idx_tagged_photo_photo_name;
DROP INDEX IF EXISTS idx_tagged_photo_count;
CREATE INDEX idx_tagged_photo_rel_path ON tagged_photo(rel_path);
CREATE UNIQUE INDEX idx_tagged_photo_relpath_tag ON tagged_photo(rel_path, tag_id);
PRAGMA foreign_keys=ON;
ANALYZE;
@@ -0,0 +1,4 @@
-- No-op: there's no sensible way to recover which rows originally used
-- backslashes, and there's no reason to want backslashes back. The
-- deleted duplicates are also gone.
SELECT 1;
@@ -0,0 +1,85 @@
-- Normalize `rel_path` columns to forward slashes. Windows ingest
-- historically produced a mix of `\` and `/`, which broke lookups and
-- caused spurious UNIQUE-constraint violations on re-registration.
--
-- SQLite enforces UNIQUE per-row during UPDATE, so we have to drop
-- losing duplicates BEFORE normalizing. For each table that has a
-- UNIQUE on rel_path, we delete rows whose normalized form already
-- exists in canonical (forward-slash) form — keeping the existing
-- forward-slash row as the survivor. Then a flat UPDATE finishes the
-- job for remaining backslash rows.
-- image_exif: UNIQUE(library_id, rel_path)
DELETE FROM image_exif
WHERE rel_path LIKE '%\%'
AND EXISTS (
SELECT 1 FROM image_exif AS other
WHERE other.library_id = image_exif.library_id
AND other.rel_path = REPLACE(image_exif.rel_path, '\', '/')
AND other.id != image_exif.id
);
UPDATE image_exif
SET rel_path = REPLACE(rel_path, '\', '/')
WHERE rel_path LIKE '%\%';
-- favorites: UNIQUE(userid, rel_path)
DELETE FROM favorites
WHERE rel_path LIKE '%\%'
AND EXISTS (
SELECT 1 FROM favorites AS other
WHERE other.userid = favorites.userid
AND other.rel_path = REPLACE(favorites.rel_path, '\', '/')
AND other.id != favorites.id
);
UPDATE favorites
SET rel_path = REPLACE(rel_path, '\', '/')
WHERE rel_path LIKE '%\%';
-- tagged_photo: UNIQUE(rel_path, tag_id)
DELETE FROM tagged_photo
WHERE rel_path LIKE '%\%'
AND EXISTS (
SELECT 1 FROM tagged_photo AS other
WHERE other.tag_id = tagged_photo.tag_id
AND other.rel_path = REPLACE(tagged_photo.rel_path, '\', '/')
AND other.id != tagged_photo.id
);
UPDATE tagged_photo
SET rel_path = REPLACE(rel_path, '\', '/')
WHERE rel_path LIKE '%\%';
-- entity_photo_links: UNIQUE(entity_id, library_id, rel_path, role)
DELETE FROM entity_photo_links
WHERE rel_path LIKE '%\%'
AND EXISTS (
SELECT 1 FROM entity_photo_links AS other
WHERE other.entity_id = entity_photo_links.entity_id
AND other.library_id = entity_photo_links.library_id
AND other.role = entity_photo_links.role
AND other.rel_path = REPLACE(entity_photo_links.rel_path, '\', '/')
AND other.id != entity_photo_links.id
);
UPDATE entity_photo_links
SET rel_path = REPLACE(rel_path, '\', '/')
WHERE rel_path LIKE '%\%';
-- video_preview_clips: UNIQUE(library_id, rel_path)
DELETE FROM video_preview_clips
WHERE rel_path LIKE '%\%'
AND EXISTS (
SELECT 1 FROM video_preview_clips AS other
WHERE other.library_id = video_preview_clips.library_id
AND other.rel_path = REPLACE(video_preview_clips.rel_path, '\', '/')
AND other.id != video_preview_clips.id
);
UPDATE video_preview_clips
SET rel_path = REPLACE(rel_path, '\', '/')
WHERE rel_path LIKE '%\%';
-- photo_insights has no UNIQUE on rel_path (history table), so a plain
-- normalize is safe.
UPDATE photo_insights
SET rel_path = REPLACE(rel_path, '\', '/')
WHERE rel_path LIKE '%\%';
ANALYZE;
@@ -0,0 +1,23 @@
-- SQLite can't DROP COLUMN cleanly on older versions; rebuild the table.
CREATE TABLE photo_insights_backup AS
SELECT id, library_id, rel_path, title, summary, generated_at, model_version,
is_current, training_messages, approved
FROM photo_insights;
DROP TABLE photo_insights;
CREATE TABLE photo_insights (
id INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
library_id INTEGER NOT NULL REFERENCES libraries(id),
rel_path TEXT NOT NULL,
title TEXT NOT NULL,
summary TEXT NOT NULL,
generated_at BIGINT NOT NULL,
model_version TEXT NOT NULL,
is_current BOOLEAN NOT NULL DEFAULT TRUE,
training_messages TEXT,
approved BOOLEAN
);
INSERT INTO photo_insights
SELECT id, library_id, rel_path, title, summary, generated_at, model_version,
is_current, training_messages, approved
FROM photo_insights_backup;
DROP TABLE photo_insights_backup;
@@ -0,0 +1 @@
ALTER TABLE photo_insights ADD COLUMN backend TEXT NOT NULL DEFAULT 'local';
@@ -0,0 +1,24 @@
-- SQLite can't DROP COLUMN cleanly on older versions; rebuild the table.
CREATE TABLE photo_insights_backup AS
SELECT id, library_id, rel_path, title, summary, generated_at, model_version,
is_current, training_messages, approved, backend
FROM photo_insights;
DROP TABLE photo_insights;
CREATE TABLE photo_insights (
id INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
library_id INTEGER NOT NULL REFERENCES libraries(id),
rel_path TEXT NOT NULL,
title TEXT NOT NULL,
summary TEXT NOT NULL,
generated_at BIGINT NOT NULL,
model_version TEXT NOT NULL,
is_current BOOLEAN NOT NULL DEFAULT TRUE,
training_messages TEXT,
approved BOOLEAN,
backend TEXT NOT NULL DEFAULT 'local'
);
INSERT INTO photo_insights
SELECT id, library_id, rel_path, title, summary, generated_at, model_version,
is_current, training_messages, approved, backend
FROM photo_insights_backup;
DROP TABLE photo_insights_backup;
@@ -0,0 +1 @@
ALTER TABLE photo_insights ADD COLUMN fewshot_source_ids TEXT;
@@ -0,0 +1,2 @@
DROP TABLE IF EXISTS face_detections;
DROP TABLE IF EXISTS persons;
@@ -0,0 +1,67 @@
-- Local face recognition tables.
--
-- `persons` are visual identities (the "who" of a face). The optional
-- `entity_id` bridges to the existing knowledge graph `entities` table —
-- when set, this person is the visual side of an LLM-extracted entity.
-- Don't auto-create entities from persons; the entity table represents
-- LLM-extracted knowledge with its own confidence semantics, and silently
-- filling it from face detections muddies the provenance.
--
-- `face_detections` carries one row per detected face on a content_hash,
-- plus marker rows with `status='no_faces'` or `status='failed'` so the
-- file watcher knows not to re-scan a hash. Keying on `content_hash`
-- (cross-library dedup) rather than `(library_id, rel_path)` means the
-- same JPEG in two libraries is scanned once. The denormalized `rel_path`
-- carries the most-recently-seen path — useful for cluster-thumb URL
-- generation; canonical path lookup goes through image_exif.
CREATE TABLE persons (
id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
name TEXT NOT NULL,
cover_face_id INTEGER, -- backfilled when the first face binds
entity_id INTEGER, -- optional bridge to entities(id)
created_from_tag BOOLEAN NOT NULL DEFAULT 0,
notes TEXT,
created_at BIGINT NOT NULL,
updated_at BIGINT NOT NULL,
CONSTRAINT fk_persons_entity FOREIGN KEY (entity_id) REFERENCES entities(id) ON DELETE SET NULL,
UNIQUE(name COLLATE NOCASE)
);
CREATE INDEX idx_persons_entity ON persons(entity_id);
CREATE TABLE face_detections (
id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
library_id INTEGER NOT NULL,
content_hash TEXT NOT NULL, -- canonical key (cross-library dedup)
rel_path TEXT NOT NULL, -- denormalized; most recently seen
bbox_x REAL, -- normalized 0..1; NULL on marker rows
bbox_y REAL,
bbox_w REAL,
bbox_h REAL,
embedding BLOB, -- 512×f32 = 2048 bytes; NULL on marker rows
confidence REAL, -- detector score
source TEXT NOT NULL, -- 'auto' | 'manual'
person_id INTEGER,
status TEXT NOT NULL DEFAULT 'detected', -- 'detected' | 'no_faces' | 'failed'
model_version TEXT NOT NULL, -- e.g. 'buffalo_l'; embedding lineage
created_at BIGINT NOT NULL,
CONSTRAINT fk_fd_library FOREIGN KEY (library_id) REFERENCES libraries(id),
CONSTRAINT fk_fd_person FOREIGN KEY (person_id) REFERENCES persons(id) ON DELETE SET NULL,
-- Detected rows carry geometry + embedding; marker rows ('no_faces',
-- 'failed') carry neither. CHECK enforces the invariant so manual
-- inserts can't slip through with half a row.
CONSTRAINT chk_marker CHECK (
(status = 'detected' AND bbox_x IS NOT NULL AND embedding IS NOT NULL)
OR (status IN ('no_faces','failed') AND bbox_x IS NULL AND embedding IS NULL)
)
);
CREATE INDEX idx_face_detections_hash ON face_detections(content_hash);
CREATE INDEX idx_face_detections_lib_path ON face_detections(library_id, rel_path);
CREATE INDEX idx_face_detections_person ON face_detections(person_id);
CREATE INDEX idx_face_detections_status ON face_detections(status);
-- One marker row per (content_hash, status='no_faces') so the file watcher
-- doesn't double-mark when a hash is seen on multiple full-scan passes.
CREATE UNIQUE INDEX idx_face_detections_no_faces_unique
ON face_detections(content_hash) WHERE status = 'no_faces';
@@ -0,0 +1,2 @@
DROP INDEX IF EXISTS idx_persons_is_ignored;
ALTER TABLE persons DROP COLUMN is_ignored;
@@ -0,0 +1,20 @@
-- IGNORE / junk bucket for the face recognition feature.
--
-- An "Ignored" person is the destination for strangers, faces the user
-- doesn't want tagged, and false detections. It looks like any other
-- person row (so face_detections.person_id stays a clean foreign key)
-- but `is_ignored=1` flags it for special UI treatment:
-- - hidden from the persons list by default
-- - excluded from `find_persons_by_names_ci` so a tag-name match
-- can never auto-bind a real face to the ignore bucket
-- - cluster-suggest already filters by `person_id IS NULL`, so faces
-- bound to an ignored person are naturally excluded from future
-- re-clustering
--
-- Partial index because the WHERE-clause is small (typically 1 row),
-- and we only ever query for `is_ignored = 1` to find the bucket.
ALTER TABLE persons ADD COLUMN is_ignored BOOLEAN NOT NULL DEFAULT 0;
CREATE INDEX idx_persons_is_ignored
ON persons(is_ignored) WHERE is_ignored = 1;
@@ -0,0 +1 @@
DROP INDEX IF EXISTS idx_tags_name_nocase;
@@ -0,0 +1,28 @@
-- Tags only enforced uniqueness in application code (the add_tag handler
-- looks up by name before inserting). The schema itself accepted dupes,
-- so a divergent code path could land two tags with the same name. Now
-- that we expose a rename endpoint we want a hard guarantee: case-
-- insensitive UNIQUE on tags.name.
-- Pre-flight: collapse exact-name duplicates (case-insensitive) onto the
-- lowest-id row before adding the constraint, otherwise the index
-- creation fails on any DB that ever produced dupes. On a clean DB this
-- is a no-op.
UPDATE tagged_photo
SET tag_id = (
SELECT MIN(t2.id) FROM tags t2
WHERE LOWER(t2.name) = LOWER((SELECT name FROM tags WHERE id = tagged_photo.tag_id))
)
WHERE tag_id IN (
SELECT t.id FROM tags t
WHERE t.id <> (
SELECT MIN(t2.id) FROM tags t2 WHERE LOWER(t2.name) = LOWER(t.name)
)
);
DELETE FROM tags
WHERE id <> (
SELECT MIN(t2.id) FROM tags t2 WHERE LOWER(t2.name) = LOWER(tags.name)
);
CREATE UNIQUE INDEX idx_tags_name_nocase ON tags (name COLLATE NOCASE);
@@ -0,0 +1,5 @@
DROP INDEX IF EXISTS idx_photo_insights_content_hash;
ALTER TABLE photo_insights DROP COLUMN content_hash;
DROP INDEX IF EXISTS idx_tagged_photo_content_hash;
ALTER TABLE tagged_photo DROP COLUMN content_hash;
@@ -0,0 +1,64 @@
-- Phase B of the multi-library data-model rollout: add a nullable
-- `content_hash` column to derived/user-intent tables that should follow
-- the bytes rather than the path. Reads will prefer hash-key joins and
-- fall back to rel_path while the column is null. A separate
-- reconciliation pass collapses duplicates as the column populates.
--
-- See CLAUDE.md → "Multi-library data model" for the policy. The
-- reference implementation is `face_detections`, which has been
-- hash-keyed since it was introduced.
--
-- Tables in this migration:
-- * tagged_photo — user-intent (tags follow the bytes)
-- * photo_insights — intrinsic to bytes (LLM-generated description)
--
-- favorites is the natural third candidate but its DAO is barely used in
-- v1 and the row count is tiny; deferring lets this migration stay
-- focused on the high-volume tables that drive cross-library overhead.
-- ---------------------------------------------------------------------------
-- tagged_photo
-- ---------------------------------------------------------------------------
ALTER TABLE tagged_photo ADD COLUMN content_hash TEXT;
-- Backfill: for each tagged_photo row, find the content_hash for its
-- rel_path. tagged_photo doesn't carry a library_id, so a rel_path that
-- exists under multiple libraries with different content is genuinely
-- ambiguous — we take the first matching image_exif row. The
-- reconciliation pass at runtime cleans up any rows that resolve
-- differently once a hash is known per library.
UPDATE tagged_photo
SET content_hash = (
SELECT content_hash FROM image_exif
WHERE image_exif.rel_path = tagged_photo.rel_path
AND image_exif.content_hash IS NOT NULL
LIMIT 1
)
WHERE content_hash IS NULL;
-- Hash-key index. Partial (only non-null rows) to keep the index small
-- during the transitional window where most rows are still null.
CREATE INDEX idx_tagged_photo_content_hash
ON tagged_photo (content_hash)
WHERE content_hash IS NOT NULL;
-- ---------------------------------------------------------------------------
-- photo_insights
-- ---------------------------------------------------------------------------
ALTER TABLE photo_insights ADD COLUMN content_hash TEXT;
-- Backfill keyed on (library_id, rel_path) — photo_insights already
-- carries library_id, so the resolution is unambiguous.
UPDATE photo_insights
SET content_hash = (
SELECT content_hash FROM image_exif
WHERE image_exif.library_id = photo_insights.library_id
AND image_exif.rel_path = photo_insights.rel_path
AND image_exif.content_hash IS NOT NULL
LIMIT 1
)
WHERE content_hash IS NULL;
CREATE INDEX idx_photo_insights_content_hash
ON photo_insights (content_hash)
WHERE content_hash IS NOT NULL;
@@ -0,0 +1,2 @@
-- Requires SQLite 3.35+ for ALTER TABLE DROP COLUMN.
ALTER TABLE libraries DROP COLUMN enabled;
@@ -0,0 +1,14 @@
-- Operator-controlled kill switch for a library. When `enabled = 0` the
-- watcher tick skips that library entirely — before the availability
-- probe, before ingest, before any maintenance pass — and the orphan-GC
-- all-online check treats it as out-of-scope rather than as a blocker.
--
-- The intended workflow is staging a new mount: insert with enabled=0,
-- verify the row appears in /libraries with enabled=false, then UPDATE
-- to 1 to start ingest. Same toggle works as a maintenance kill switch
-- after the fact ("don't keep probing this NAS while I'm rebooting it").
--
-- Default 1 so every existing library stays running on upgrade — no
-- behavior change without an explicit flip.
ALTER TABLE libraries ADD COLUMN enabled BOOLEAN NOT NULL DEFAULT 1;
@@ -0,0 +1,2 @@
-- Requires SQLite 3.35+ for ALTER TABLE DROP COLUMN.
ALTER TABLE libraries DROP COLUMN excluded_dirs;
@@ -0,0 +1,14 @@
-- Per-library excluded directories.
--
-- The global EXCLUDED_DIRS env var is the right knob for excludes that
-- every library shares (Synology @eaDir, .thumbnails, etc.). It's a
-- poor fit for "exclude this subtree from THIS library only", which
-- the natural use case for is mounting a parent directory while
-- another library already covers a child subtree underneath.
--
-- This column is parsed comma-separated, same shape as the env var,
-- and the watcher / memories / thumbnail walks each apply
-- (env_globals library.excluded_dirs) when scanning the library.
-- NULL = no extra excludes; the global env var still applies.
ALTER TABLE libraries ADD COLUMN excluded_dirs TEXT;
@@ -0,0 +1,8 @@
DROP INDEX IF EXISTS idx_image_exif_duplicate_of_hash;
DROP INDEX IF EXISTS idx_image_exif_dhash;
DROP INDEX IF EXISTS idx_image_exif_phash;
ALTER TABLE image_exif DROP COLUMN duplicate_decided_at;
ALTER TABLE image_exif DROP COLUMN duplicate_of_hash;
ALTER TABLE image_exif DROP COLUMN dhash_64;
ALTER TABLE image_exif DROP COLUMN phash_64;
@@ -0,0 +1,41 @@
-- Adds perceptual-hash signals + soft-mark resolution state to image_exif so
-- the duplicates surface in Apollo can group near-duplicates (re-encoded,
-- resized, format-converted copies) and let the user demote losers without
-- touching the file on disk. Image-only for v1: phash_64/dhash_64 are NULL
-- on videos and on images that fail to decode. See Apollo CLAUDE.md →
-- Duplicate detection / Caching layer for the policy.
--
-- Soft-mark columns are media-type-agnostic — when video perceptual hashing
-- arrives, it lives in a separate hash-keyed companion table and reuses the
-- same duplicate_of_hash / duplicate_decided_at machinery.
-- pHash (DCT, 64-bit) packed as i64 for fast XOR + popcount Hamming.
ALTER TABLE image_exif ADD COLUMN phash_64 BIGINT;
-- dHash (gradient, 64-bit). Cheap, robust to compression/resize. Stored
-- alongside pHash so the query layer can fall back if either is null.
ALTER TABLE image_exif ADD COLUMN dhash_64 BIGINT;
-- When non-null, this row is a soft-marked duplicate of the row whose
-- content_hash matches. The duplicate file stays on disk; the default
-- /photos listing filters it out. /photos?include_duplicates=true opts
-- back in (the Apollo duplicates modal uses this).
ALTER TABLE image_exif ADD COLUMN duplicate_of_hash TEXT;
-- Unix seconds of the resolve. Distinguishes "never reviewed" from
-- "reviewed and resolved" for the Apollo include_resolved toggle.
ALTER TABLE image_exif ADD COLUMN duplicate_decided_at BIGINT;
-- Partial indexes — the columns are NULL for the vast majority of rows
-- during the transitional window and forever for videos / decode failures.
CREATE INDEX idx_image_exif_phash
ON image_exif (phash_64)
WHERE phash_64 IS NOT NULL;
CREATE INDEX idx_image_exif_dhash
ON image_exif (dhash_64)
WHERE dhash_64 IS NOT NULL;
CREATE INDEX idx_image_exif_duplicate_of_hash
ON image_exif (duplicate_of_hash)
WHERE duplicate_of_hash IS NOT NULL;
@@ -0,0 +1,2 @@
DROP INDEX IF EXISTS idx_image_exif_date_backfill;
ALTER TABLE image_exif DROP COLUMN date_taken_source;
@@ -0,0 +1,24 @@
-- Tracks where a row's `date_taken` was sourced so the canonical-date
-- waterfall (kamadak-exif → exiftool → filename → earliest_fs_time) is
-- visible to debugging and to the per-tick backfill drain that re-runs
-- weak sources once stronger ones become available (e.g. exiftool gets
-- installed on a deploy that didn't have it). See CLAUDE.md → Memories
-- canonical-date pipeline.
--
-- Values:
-- 'exif' — kamadak-exif read DateTime/DateTimeOriginal directly
-- 'exiftool' — exiftool fallback caught a video / MakerNote / QuickTime tag
-- 'filename' — extract_date_from_filename matched a known pattern
-- 'fs_time' — fell through to earliest_fs_time(metadata)
--
-- NULL when `date_taken` itself is NULL (no source resolved the date).
ALTER TABLE image_exif ADD COLUMN date_taken_source TEXT;
-- Partial index for the per-tick backfill drain: targets rows that need
-- re-resolution (no date yet, or only the weakest source resolved it).
-- Filename-sourced rows are intentionally excluded — the regex is
-- authoritative when it matches and re-running exiftool wouldn't change
-- the answer.
CREATE INDEX idx_image_exif_date_backfill
ON image_exif (library_id, id)
WHERE date_taken IS NULL OR date_taken_source = 'fs_time';
@@ -0,0 +1,9 @@
-- Reverting this migration is a no-op: the labels we wrote in `up.sql`
-- are correct under any state of the schema (every dated row was indeed
-- exif-sourced before the resolver landed), and there's no signal that
-- distinguishes "labelled by this migration" from "labelled by the
-- ingest path post-resolver". Clearing them would break the drain's
-- eligibility filter again.
--
-- The companion migration `2026-05-06-000000_add_date_taken_source` is
-- the one to revert if you need to remove the column entirely.
@@ -0,0 +1,20 @@
-- Backfill `date_taken_source` for rows that pre-date the canonical-date
-- pipeline. Before the resolver landed, `image_exif.date_taken` could
-- only be populated via `exif::extract_exif_from_path` (kamadak-exif)
-- on the file-watcher, upload, or GPS-write paths. The resolver column
-- migration added `date_taken_source` defaulting to NULL, so every
-- historical row with a date is currently unlabelled — and the
-- per-tick drain skips them because its eligibility predicate is
-- `date_taken IS NULL OR date_taken_source = 'fs_time'`.
--
-- Label them `'exif'` once and let the drain take over from here. Safe
-- because every code path that wrote `date_taken` prior to the
-- resolver was a kamadak-exif read — there was no other source.
--
-- Idempotent: re-running this migration on a DB that has already been
-- backfilled is a no-op (the WHERE clause matches nothing the second
-- time around).
UPDATE image_exif
SET date_taken_source = 'exif'
WHERE date_taken IS NOT NULL
AND date_taken_source IS NULL;
@@ -0,0 +1,2 @@
ALTER TABLE image_exif DROP COLUMN original_date_taken_source;
ALTER TABLE image_exif DROP COLUMN original_date_taken;
@@ -0,0 +1,15 @@
-- Manual date_taken override: when an operator overrides a row's date via
-- POST /image/exif/date, the prior `(date_taken, date_taken_source)` is
-- snapshotted into these columns and the live columns hold the new value
-- with `date_taken_source = 'manual'`. POST /image/exif/date/clear restores
-- the pair and nulls the originals.
--
-- The waterfall source-name set is now:
-- 'exif' | 'exiftool' | 'filename' | 'fs_time' | 'manual'
--
-- The `idx_image_exif_date_backfill` partial index already filters to
-- `date_taken IS NULL OR date_taken_source = 'fs_time'`, so 'manual' rows
-- are naturally excluded from the per-tick backfill drain — no index
-- change needed.
ALTER TABLE image_exif ADD COLUMN original_date_taken BIGINT;
ALTER TABLE image_exif ADD COLUMN original_date_taken_source TEXT;
@@ -0,0 +1,43 @@
-- Drop the persona-scoping column on entity_facts via the table-rebuild
-- dance for SQLite-version portability (matches the pattern in
-- 2026-04-20-000000_add_backend_to_insights/down.sql).
DROP INDEX IF EXISTS idx_entity_facts_persona;
CREATE TABLE entity_facts_backup AS
SELECT id, subject_entity_id, predicate, object_entity_id, object_value,
source_photo, source_insight_id, confidence, status, created_at
FROM entity_facts;
DROP TABLE entity_facts;
CREATE TABLE entity_facts (
id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
subject_entity_id INTEGER NOT NULL,
predicate TEXT NOT NULL,
object_entity_id INTEGER,
object_value TEXT,
source_photo TEXT,
source_insight_id INTEGER,
confidence REAL NOT NULL DEFAULT 0.6,
status TEXT NOT NULL DEFAULT 'active',
created_at BIGINT NOT NULL,
CONSTRAINT fk_ef_subject FOREIGN KEY (subject_entity_id) REFERENCES entities(id) ON DELETE CASCADE,
CONSTRAINT fk_ef_object FOREIGN KEY (object_entity_id) REFERENCES entities(id) ON DELETE SET NULL,
CONSTRAINT fk_ef_insight FOREIGN KEY (source_insight_id) REFERENCES photo_insights(id) ON DELETE SET NULL,
CHECK (object_entity_id IS NOT NULL OR object_value IS NOT NULL)
);
INSERT INTO entity_facts
SELECT id, subject_entity_id, predicate, object_entity_id, object_value,
source_photo, source_insight_id, confidence, status, created_at
FROM entity_facts_backup;
DROP TABLE entity_facts_backup;
CREATE INDEX idx_entity_facts_subject ON entity_facts(subject_entity_id);
CREATE INDEX idx_entity_facts_predicate ON entity_facts(predicate);
CREATE INDEX idx_entity_facts_status ON entity_facts(status);
CREATE INDEX idx_entity_facts_source_photo ON entity_facts(source_photo);
DROP INDEX IF EXISTS idx_personas_user;
DROP TABLE IF EXISTS personas;
@@ -0,0 +1,64 @@
-- Personas live server-side now (mobile previously stored them in
-- AsyncStorage only). Each user gets the three built-ins seeded; custom
-- personas land here too via POST /personas or POST /personas/migrate.
--
-- `entity_facts` gains a persona_id so each persona accumulates its own
-- voice over a shared entity graph (entities themselves stay unscoped).
-- Existing rows backfill to 'default' via the column DEFAULT — that
-- becomes the historical baseline. The `include_all_memories` flag on
-- personas lets any persona opt back into reading the full pool.
CREATE TABLE personas (
id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
user_id INTEGER NOT NULL,
persona_id TEXT NOT NULL,
name TEXT NOT NULL,
system_prompt TEXT NOT NULL,
is_built_in BOOLEAN NOT NULL DEFAULT FALSE,
include_all_memories BOOLEAN NOT NULL DEFAULT FALSE,
created_at BIGINT NOT NULL,
updated_at BIGINT NOT NULL,
UNIQUE(user_id, persona_id),
CONSTRAINT fk_personas_user FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE
);
CREATE INDEX idx_personas_user ON personas(user_id);
-- Seed built-ins for every existing user. System prompts copied verbatim
-- from FileViewer-React/hooks/usePersonas.tsx so server and client agree
-- on the canonical voice for each built-in.
INSERT INTO personas (user_id, persona_id, name, system_prompt, is_built_in, created_at, updated_at)
SELECT
u.id,
'default',
'Default Assistant',
'You are my long-term memory assistant. Use only the information provided. Do not invent details. Respond in 36 sentences in third person, leading with the most concrete moment from the photo and the surrounding context. Plain prose, no headings.',
TRUE,
strftime('%s', 'now') * 1000,
strftime('%s', 'now') * 1000
FROM users u
UNION ALL
SELECT
u.id,
'journal',
'Personal Journal',
'You are a personal journal writer. Write in first person, present tense, with warmth and reflection — focusing on emotions and meaningful moments. Use only the information provided; do not invent details. Aim for 48 sentences in a single flowing paragraph, no headings.',
TRUE,
strftime('%s', 'now') * 1000,
strftime('%s', 'now') * 1000
FROM users u
UNION ALL
SELECT
u.id,
'factual',
'Factual Reporter',
'You are a factual memory recorder. Be precise, objective, and concise. Lead with the date and place, then list what / when / who in 24 short sentences. Use only the information provided; if a detail is unknown, say so rather than guessing.',
TRUE,
strftime('%s', 'now') * 1000,
strftime('%s', 'now') * 1000
FROM users u;
-- Persona scoping on facts only. Entities and entity_photo_links stay
-- shared (real-world referents and shared photo ↔ entity associations).
ALTER TABLE entity_facts ADD COLUMN persona_id TEXT NOT NULL DEFAULT 'default';
CREATE INDEX idx_entity_facts_persona ON entity_facts(persona_id);
@@ -0,0 +1,47 @@
-- Reverse 2026-05-10-000000_entity_facts_persona_fk: drop the
-- composite FK and the user_id column via the same rebuild pattern.
DROP INDEX IF EXISTS idx_entity_facts_user_persona;
DROP INDEX IF EXISTS idx_entity_facts_persona;
DROP INDEX IF EXISTS idx_entity_facts_source_photo;
DROP INDEX IF EXISTS idx_entity_facts_status;
DROP INDEX IF EXISTS idx_entity_facts_predicate;
DROP INDEX IF EXISTS idx_entity_facts_subject;
ALTER TABLE entity_facts RENAME TO entity_facts_old;
CREATE TABLE entity_facts (
id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
subject_entity_id INTEGER NOT NULL,
predicate TEXT NOT NULL,
object_entity_id INTEGER,
object_value TEXT,
source_photo TEXT,
source_insight_id INTEGER,
confidence REAL NOT NULL DEFAULT 0.6,
status TEXT NOT NULL DEFAULT 'active',
created_at BIGINT NOT NULL,
persona_id TEXT NOT NULL DEFAULT 'default',
CONSTRAINT fk_ef_subject FOREIGN KEY (subject_entity_id) REFERENCES entities(id) ON DELETE CASCADE,
CONSTRAINT fk_ef_object FOREIGN KEY (object_entity_id) REFERENCES entities(id) ON DELETE SET NULL,
CONSTRAINT fk_ef_insight FOREIGN KEY (source_insight_id) REFERENCES photo_insights(id) ON DELETE SET NULL,
CHECK (object_entity_id IS NOT NULL OR object_value IS NOT NULL)
);
INSERT INTO entity_facts
(id, subject_entity_id, predicate, object_entity_id, object_value,
source_photo, source_insight_id, confidence, status, created_at,
persona_id)
SELECT
id, subject_entity_id, predicate, object_entity_id, object_value,
source_photo, source_insight_id, confidence, status, created_at,
persona_id
FROM entity_facts_old;
DROP TABLE entity_facts_old;
CREATE INDEX idx_entity_facts_subject ON entity_facts(subject_entity_id);
CREATE INDEX idx_entity_facts_predicate ON entity_facts(predicate);
CREATE INDEX idx_entity_facts_status ON entity_facts(status);
CREATE INDEX idx_entity_facts_source_photo ON entity_facts(source_photo);
CREATE INDEX idx_entity_facts_persona ON entity_facts(persona_id);
@@ -0,0 +1,82 @@
-- Add a real foreign key from entity_facts to personas. Until now,
-- entity_facts.persona_id was a free-form string with no integrity
-- guarantee — deleting a persona orphaned its facts, which then sat
-- forever in the readable-only-via-PersonaFilter::All hive-mind view.
--
-- personas is keyed (user_id, persona_id) so the FK has to be
-- composite. That requires entity_facts to carry user_id too, which
-- has the side benefit of fixing multi-user fact leakage on the read
-- path (without it, two users with the same 'default' persona would
-- see each other's default-scoped facts).
--
-- SQLite can't ALTER TABLE to add an FK; the table-rebuild dance is
-- the only way. Pattern matches 2026-05-09's down.sql and the older
-- 2026-04-20-000000 migration.
DROP INDEX IF EXISTS idx_entity_facts_subject;
DROP INDEX IF EXISTS idx_entity_facts_predicate;
DROP INDEX IF EXISTS idx_entity_facts_status;
DROP INDEX IF EXISTS idx_entity_facts_source_photo;
DROP INDEX IF EXISTS idx_entity_facts_persona;
ALTER TABLE entity_facts RENAME TO entity_facts_old;
CREATE TABLE entity_facts (
id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
subject_entity_id INTEGER NOT NULL,
predicate TEXT NOT NULL,
object_entity_id INTEGER,
object_value TEXT,
source_photo TEXT,
source_insight_id INTEGER,
confidence REAL NOT NULL DEFAULT 0.6,
status TEXT NOT NULL DEFAULT 'active',
created_at BIGINT NOT NULL,
persona_id TEXT NOT NULL DEFAULT 'default',
user_id INTEGER NOT NULL DEFAULT 1,
CONSTRAINT fk_ef_subject FOREIGN KEY (subject_entity_id) REFERENCES entities(id) ON DELETE CASCADE,
CONSTRAINT fk_ef_object FOREIGN KEY (object_entity_id) REFERENCES entities(id) ON DELETE SET NULL,
CONSTRAINT fk_ef_insight FOREIGN KEY (source_insight_id) REFERENCES photo_insights(id) ON DELETE SET NULL,
CONSTRAINT fk_ef_persona FOREIGN KEY (user_id, persona_id) REFERENCES personas(user_id, persona_id) ON DELETE CASCADE,
CHECK (object_entity_id IS NOT NULL OR object_value IS NOT NULL)
);
-- Backfill: assign each legacy fact to the user that owns the matching
-- persona. Built-ins are seeded per-user with the same persona_id
-- string for everyone, so MIN(user_id) deterministically picks the
-- earliest registered user (typically user 1, the operator). Custom
-- persona_ids exist for at most one user, so MIN is also unique.
-- Falls back to user_id=1 when no matching persona row exists; in that
-- case the FK below would still fail, but legacy rows shouldn't be in
-- that state because 2026-05-09 ADD COLUMN defaulted persona_id to
-- 'default', which is seeded for every user.
INSERT INTO entity_facts
(id, subject_entity_id, predicate, object_entity_id, object_value,
source_photo, source_insight_id, confidence, status, created_at,
persona_id, user_id)
SELECT
old.id,
old.subject_entity_id,
old.predicate,
old.object_entity_id,
old.object_value,
old.source_photo,
old.source_insight_id,
old.confidence,
old.status,
old.created_at,
old.persona_id,
COALESCE(
(SELECT MIN(p.user_id) FROM personas p WHERE p.persona_id = old.persona_id),
1
)
FROM entity_facts_old old;
DROP TABLE entity_facts_old;
CREATE INDEX idx_entity_facts_subject ON entity_facts(subject_entity_id);
CREATE INDEX idx_entity_facts_predicate ON entity_facts(predicate);
CREATE INDEX idx_entity_facts_status ON entity_facts(status);
CREATE INDEX idx_entity_facts_source_photo ON entity_facts(source_photo);
CREATE INDEX idx_entity_facts_persona ON entity_facts(persona_id);
CREATE INDEX idx_entity_facts_user_persona ON entity_facts(user_id, persona_id);
@@ -0,0 +1,5 @@
-- SQLite can drop columns since 3.35 (March 2021); embedded
-- libsqlite3-sys is well past that. Drop in reverse insert order so
-- a partial down still leaves the schema valid.
ALTER TABLE entity_facts DROP COLUMN valid_until;
ALTER TABLE entity_facts DROP COLUMN valid_from;
@@ -0,0 +1,25 @@
-- Add valid-time columns to entity_facts.
--
-- entity_facts already has created_at — *transaction time*, the
-- moment WE recorded the fact. That's not the same as the real-world
-- period the fact was true. "Cameron is_in_relationship_with X" was
-- only true during a window; recording it in 2026 doesn't make it
-- true today. Without the distinction, every former relationship,
-- former job, former address reads as currently-true.
--
-- Adding two BIGINT NULL columns: valid_from / valid_until (unix
-- seconds). NULL means "unbounded on that side" — `valid_from IS
-- NULL` reads as "always-true-back-to-the-beginning",
-- `valid_until IS NULL` as "still-true-now-or-unknown". Both NULL =
-- temporal validity unknown (current state of all legacy rows).
--
-- Conflict detection refines accordingly: same-predicate facts with
-- different objects stop flagging when their intervals are disjoint
-- ("lives_in NYC 2018-2020" and "lives_in SF 2020-present" are both
-- valid, just at different times).
ALTER TABLE entity_facts ADD COLUMN valid_from BIGINT;
ALTER TABLE entity_facts ADD COLUMN valid_until BIGINT;
-- Optional partial index for time-bounded scans. Skipped for now —
-- conflict detection runs per-entity (small N) and doesn't need it.
@@ -0,0 +1,2 @@
DROP INDEX IF EXISTS idx_entity_facts_superseded_by;
ALTER TABLE entity_facts DROP COLUMN superseded_by;
@@ -0,0 +1,31 @@
-- Add a supersession pointer to entity_facts.
--
-- Status alone is a one-way trapdoor: 'rejected' loses the link
-- between the rejected fact and the one that replaced it. For
-- evolving facts (Cameron's relationship, employer, address) the
-- curator wants to *replace* a stale fact with a new one and keep
-- the history readable: "from 2018 until 2022 this was true, then
-- it became this other thing".
--
-- A nullable INTEGER column pointing at another entity_facts.id —
-- no FK constraint because SQLite can't ALTER ADD COLUMN with REFs;
-- the DAO's delete_fact clears dangling pointers in the same
-- transaction as the parent delete to keep the column honest.
--
-- A status of 'superseded' on the old fact (alongside the existing
-- active / reviewed / rejected) signals "replaced by a newer
-- claim". Read paths already filter 'rejected' out of the active
-- view; the curation UI will treat 'superseded' the same way for
-- conflict detection so they don't keep flagging.
--
-- Pairs with the valid-time columns from 2026-05-10-000100: the
-- supersede action auto-stamps the old fact's `valid_until` from
-- the new fact's `valid_from`, closing the interval cleanly.
ALTER TABLE entity_facts ADD COLUMN superseded_by INTEGER;
-- Helpful index for "show me what superseded this fact" walks
-- (rare today; cheap to add now while the table is small).
CREATE INDEX idx_entity_facts_superseded_by
ON entity_facts(superseded_by)
WHERE superseded_by IS NOT NULL;
@@ -0,0 +1,4 @@
DROP INDEX IF EXISTS idx_entity_facts_created_by_backend;
DROP INDEX IF EXISTS idx_entity_facts_created_by_model;
ALTER TABLE entity_facts DROP COLUMN created_by_backend;
ALTER TABLE entity_facts DROP COLUMN created_by_model;
@@ -0,0 +1,30 @@
-- Track which model + backend generated each fact so the curator
-- can audit which configurations produce trustworthy knowledge.
--
-- photo_insights already carries `model_version` + `backend`, and
-- entity_facts.source_insight_id links to it — but:
-- 1. source_insight_id is only set after an insight is stored
-- (post-loop), so chat-continuation facts and facts whose insight
-- was regenerated lose the link.
-- 2. JOINing for every read is more friction than just embedding the
-- provenance on the fact row itself.
-- 3. Manual facts (POST /knowledge/facts) have no insight at all and
-- need to record "manual" as their provenance.
--
-- Two nullable TEXT columns are enough for the audit use case: model
-- (e.g. "qwen2.5:7b", "anthropic/claude-sonnet-4") and backend
-- ("local", "hybrid", "manual"). Pre-existing rows leave both NULL —
-- legacy facts predate this tracking and can't be back-filled
-- reliably from training_messages without burning compute.
ALTER TABLE entity_facts ADD COLUMN created_by_model TEXT;
ALTER TABLE entity_facts ADD COLUMN created_by_backend TEXT;
-- Indexes are cheap and useful for "show me all facts from model X"
-- audit queries — partial so the legacy NULL rows don't bloat them.
CREATE INDEX idx_entity_facts_created_by_model
ON entity_facts(created_by_model)
WHERE created_by_model IS NOT NULL;
CREATE INDEX idx_entity_facts_created_by_backend
ON entity_facts(created_by_backend)
WHERE created_by_backend IS NOT NULL;
@@ -0,0 +1 @@
ALTER TABLE personas DROP COLUMN reviewed_only_facts;
@@ -0,0 +1,16 @@
-- Per-persona toggle: when true, agent reads only see facts whose
-- status is exactly 'reviewed' (human-verified). When false (the
-- default), agent reads see 'active' OR 'reviewed' — everything not
-- rejected or superseded.
--
-- The mobile app surfaces this as "Strict mode" on the persona
-- editor: useful when you want a persona's chat to be grounded
-- exclusively on the curated subset, e.g. for tasks where
-- hallucinated agent claims are particularly costly.
--
-- Note: this is separate from `include_all_memories` (which unions
-- across personas for hive-mind reads). Reviewed-only operates on
-- the status axis; include_all_memories operates on the persona-
-- scope axis. They compose freely.
ALTER TABLE personas ADD COLUMN reviewed_only_facts BOOLEAN NOT NULL DEFAULT 0;
@@ -0,0 +1,5 @@
ALTER TABLE personas DROP COLUMN allow_agent_corrections;
DROP INDEX IF EXISTS idx_entity_facts_last_modified_at;
ALTER TABLE entity_facts DROP COLUMN last_modified_at;
ALTER TABLE entity_facts DROP COLUMN last_modified_by_backend;
ALTER TABLE entity_facts DROP COLUMN last_modified_by_model;
@@ -0,0 +1,30 @@
-- Three coupled changes for agent self-correction safety:
--
-- 1. `entity_facts.last_modified_by_*` + `last_modified_at` track who
-- most recently mutated each fact. `created_by_*` from migration
-- 2026-05-10-000300 records who first wrote the row; this records
-- who last *changed* it. Separate columns so the create vs update
-- audit is independently grep-able ("show me every fact gpt-5
-- altered last week" stays a single index scan).
--
-- 2. `personas.allow_agent_corrections` is the gate for the new
-- agent-side `update_fact` / `supersede_fact` tools. Default OFF —
-- a fresh persona's agent can create but can't alter or replace.
-- Operator opts in per-persona after the model has earned trust,
-- typically via the strict-mode flow (curate, then ratchet up
-- agent autonomy as confidence rises). Parallel in shape to
-- `reviewed_only_facts` from 2026-05-10-000400; they compose.
--
-- 3. Index on `last_modified_at` (partial, NOT NULL) for the
-- audit-feed reads in the curation UI ("show recent agent edits
-- sorted newest first").
ALTER TABLE entity_facts ADD COLUMN last_modified_by_model TEXT;
ALTER TABLE entity_facts ADD COLUMN last_modified_by_backend TEXT;
ALTER TABLE entity_facts ADD COLUMN last_modified_at BIGINT;
CREATE INDEX idx_entity_facts_last_modified_at
ON entity_facts(last_modified_at)
WHERE last_modified_at IS NOT NULL;
ALTER TABLE personas ADD COLUMN allow_agent_corrections BOOLEAN NOT NULL DEFAULT 0;
@@ -0,0 +1,6 @@
-- Irreversible: we collapsed multiple raw entity_type strings to
-- canonical forms and don't have a per-row record of the original.
-- The down migration is intentionally a no-op (the rewritten values
-- are still semantically correct), and the up migration is safe to
-- re-run because every UPDATE is conditional on `!= canonical`.
SELECT 1;
@@ -0,0 +1,43 @@
-- Canonicalize `entities.entity_type` so legacy rows from before
-- `normalize_entity_type` landed in upsert_entity stop polluting
-- client-side filters. Mirrors the synonym map in
-- `src/database/knowledge_dao.rs::normalize_entity_type`:
-- person ← person | people | human | individual | contact
-- place ← place | location | venue | site | area | landmark
-- event ← event | occasion | activity | celebration
-- thing ← thing | object | item | product
-- Types outside the synonym set (e.g. "friend", "family") are not
-- recognized as canonical and get a lowercase+trim pass instead, so
-- at minimum case variants collapse.
--
-- `UPDATE OR IGNORE` skips rows that would violate UNIQUE(name,
-- entity_type) after the rewrite. Two rows like ("Sarah", "person")
-- + ("Sarah", "Person") would otherwise collide — the duplicate
-- survives unchanged so the curator can merge it via the curation
-- UI rather than have the migration silently delete data.
UPDATE OR IGNORE entities
SET entity_type = 'person'
WHERE LOWER(TRIM(entity_type)) IN ('person', 'people', 'human', 'individual', 'contact')
AND entity_type != 'person';
UPDATE OR IGNORE entities
SET entity_type = 'place'
WHERE LOWER(TRIM(entity_type)) IN ('place', 'location', 'venue', 'site', 'area', 'landmark')
AND entity_type != 'place';
UPDATE OR IGNORE entities
SET entity_type = 'event'
WHERE LOWER(TRIM(entity_type)) IN ('event', 'occasion', 'activity', 'celebration')
AND entity_type != 'event';
UPDATE OR IGNORE entities
SET entity_type = 'thing'
WHERE LOWER(TRIM(entity_type)) IN ('thing', 'object', 'item', 'product')
AND entity_type != 'thing';
-- Anything left ("Friend" vs "friend") gets a lowercase+trim sweep
-- so at least case variants of the same custom type collapse.
UPDATE OR IGNORE entities
SET entity_type = LOWER(TRIM(entity_type))
WHERE entity_type != LOWER(TRIM(entity_type));
@@ -0,0 +1,5 @@
DROP INDEX IF EXISTS idx_image_exif_date_backfill;
CREATE INDEX idx_image_exif_date_backfill
ON image_exif (library_id, id)
WHERE date_taken IS NULL OR date_taken_source = 'fs_time';
@@ -0,0 +1,18 @@
-- Narrow the date-backfill partial index to NULL-only rows.
--
-- The original index (2026-05-06-000000_add_date_taken_source) also matched
-- `date_taken_source = 'fs_time'` so the drain could "re-resolve weak
-- entries when better tools become available." In practice the resolver
-- is deterministic on file bytes + filename + fs metadata: a row that
-- landed on fs_time once will land on fs_time again on every subsequent
-- tick. With `ORDER BY id ASC LIMIT 500`, the drain spun on the same
-- lowest-id fs_time rows in perpetuity, never advancing, while hammering
-- the SQLite write lock once per row and starving other writers (face
-- PATCHes were hitting busy_timeout and returning 500). Drop fs_time
-- from the eligibility set; if exiftool / a new filename pattern ever
-- comes online, a one-shot operator command can re-resolve.
DROP INDEX IF EXISTS idx_image_exif_date_backfill;
CREATE INDEX idx_image_exif_date_backfill
ON image_exif (library_id, id)
WHERE date_taken IS NULL;
@@ -0,0 +1,3 @@
DROP INDEX IF EXISTS idx_image_exif_clip_backfill;
ALTER TABLE image_exif DROP COLUMN clip_model_version;
ALTER TABLE image_exif DROP COLUMN clip_embedding;
@@ -0,0 +1,27 @@
-- CLIP semantic photo search: store a per-photo image embedding so
-- text queries can rerank against the live library via cosine
-- similarity. Apollo encodes the bytes via its CLIP service; ImageApi
-- writes the resulting blob here.
--
-- `clip_embedding` is the raw little-endian float32 buffer of an
-- L2-normalized vector (dim depends on the model — 768 bytes×4 for
-- ViT-L/14, 512 bytes×4 for ViT-B/32). Apollo always returns the
-- normalized form so the search-time dot product reduces to a plain
-- cosine similarity.
--
-- `clip_model_version` echoes the upstream `APOLLO_CLIP_MODEL` (e.g.
-- "ViT-L/14"). A model swap shouldn't silently mix geometries — the
-- backfill drain will re-eligibilize rows whose stored model_version
-- differs from the live engine's, and the search route refuses to
-- mix rows from two model_versions in the same response.
ALTER TABLE image_exif ADD COLUMN clip_embedding BLOB;
ALTER TABLE image_exif ADD COLUMN clip_model_version TEXT;
-- Partial index for the backfill drain. Mirrors the shape of
-- `idx_image_exif_date_backfill`: candidate rows are those with a
-- known content_hash (so we don't race the unhashed backlog) but no
-- embedding yet. SELECT cost stays O(missing rows) instead of full
-- table scan once the column is mostly populated.
CREATE INDEX IF NOT EXISTS idx_image_exif_clip_backfill
ON image_exif (id)
WHERE clip_embedding IS NULL AND content_hash IS NOT NULL;
@@ -0,0 +1,3 @@
DROP INDEX IF EXISTS idx_insight_gen_jobs_status_cleanup;
DROP INDEX IF EXISTS idx_insight_gen_jobs_file;
DROP TABLE IF EXISTS insight_generation_jobs;
@@ -0,0 +1,23 @@
-- Track async insight generation jobs so the client can poll for
-- completion after the server returns 202 Accepted. Each generation
-- creates a new row; the application layer cancels prior running
-- jobs before inserting.
CREATE TABLE insight_generation_jobs (
id INTEGER PRIMARY KEY AUTOINCREMENT,
library_id INTEGER NOT NULL DEFAULT 1,
file_path TEXT NOT NULL,
generation_type TEXT NOT NULL,
status TEXT NOT NULL DEFAULT 'running',
started_at INTEGER NOT NULL,
completed_at INTEGER,
result_insight_id INTEGER,
error_message TEXT
);
-- For the status endpoint: fast lookup by (library_id, file_path)
CREATE INDEX idx_insight_gen_jobs_file
ON insight_generation_jobs(library_id, file_path);
-- For startup cleanup (future): prune old completed/failed jobs
CREATE INDEX idx_insight_gen_jobs_status_cleanup
ON insight_generation_jobs(status, started_at);
@@ -0,0 +1,28 @@
-- Restore UNIQUE constraint
CREATE TABLE insight_generation_jobs_new (
id INTEGER PRIMARY KEY AUTOINCREMENT,
library_id INTEGER NOT NULL DEFAULT 1,
file_path TEXT NOT NULL,
generation_type TEXT NOT NULL,
status TEXT NOT NULL DEFAULT 'running',
started_at INTEGER NOT NULL,
completed_at INTEGER,
result_insight_id INTEGER,
error_message TEXT,
UNIQUE(library_id, file_path, generation_type)
);
INSERT INTO insight_generation_jobs_new
SELECT id, library_id, file_path, generation_type, status, started_at, completed_at, result_insight_id, error_message
FROM insight_generation_jobs;
DROP TABLE insight_generation_jobs;
ALTER TABLE insight_generation_jobs_new RENAME TO insight_generation_jobs;
CREATE INDEX idx_insight_gen_jobs_file
ON insight_generation_jobs(library_id, file_path);
CREATE INDEX idx_insight_gen_jobs_status_cleanup
ON insight_generation_jobs(status, started_at);
@@ -0,0 +1,30 @@
-- Remove UNIQUE(library_id, file_path, generation_type) constraint to allow
-- multiple job rows per file. This enables proper cancel/regenerate semantics:
-- a new job is always inserted on regenerate, and the old job is cancelled
-- independently. The application layer prevents concurrent running jobs.
CREATE TABLE insight_generation_jobs_new (
id INTEGER PRIMARY KEY AUTOINCREMENT,
library_id INTEGER NOT NULL DEFAULT 1,
file_path TEXT NOT NULL,
generation_type TEXT NOT NULL,
status TEXT NOT NULL DEFAULT 'running',
started_at INTEGER NOT NULL,
completed_at INTEGER,
result_insight_id INTEGER,
error_message TEXT
);
INSERT INTO insight_generation_jobs_new
SELECT id, library_id, file_path, generation_type, status, started_at, completed_at, result_insight_id, error_message
FROM insight_generation_jobs;
DROP TABLE insight_generation_jobs;
ALTER TABLE insight_generation_jobs_new RENAME TO insight_generation_jobs;
CREATE INDEX idx_insight_gen_jobs_file
ON insight_generation_jobs(library_id, file_path);
CREATE INDEX idx_insight_gen_jobs_status_cleanup
ON insight_generation_jobs(status, started_at);
@@ -0,0 +1,11 @@
-- SQLite doesn't support DROP COLUMN before 3.35.0; recreate the table
-- without the new columns. This is only needed for rollback.
CREATE TABLE photo_insights_old AS
SELECT id, library_id, rel_path, title, summary, generated_at,
model_version, is_current, training_messages, approved,
backend, fewshot_source_ids, content_hash
FROM photo_insights;
DROP TABLE photo_insights;
ALTER TABLE photo_insights_old RENAME TO photo_insights;
@@ -0,0 +1,8 @@
-- Persist generation parameters on each insight row for auditing.
ALTER TABLE photo_insights ADD COLUMN num_ctx INTEGER;
ALTER TABLE photo_insights ADD COLUMN temperature REAL;
ALTER TABLE photo_insights ADD COLUMN top_p REAL;
ALTER TABLE photo_insights ADD COLUMN top_k INTEGER;
ALTER TABLE photo_insights ADD COLUMN min_p REAL;
ALTER TABLE photo_insights ADD COLUMN system_prompt TEXT;
ALTER TABLE photo_insights ADD COLUMN persona_id TEXT;
@@ -0,0 +1,13 @@
-- SQLite doesn't support DROP COLUMN before 3.35.0; recreate the table
-- without the token-count columns. This is only needed for rollback.
CREATE TABLE photo_insights_old AS
SELECT id, library_id, rel_path, title, summary, generated_at,
model_version, is_current, training_messages, approved,
backend, fewshot_source_ids, content_hash,
num_ctx, temperature, top_p, top_k, min_p,
system_prompt, persona_id
FROM photo_insights;
DROP TABLE photo_insights;
ALTER TABLE photo_insights_old RENAME TO photo_insights;
@@ -0,0 +1,6 @@
-- Persist token usage on each insight row. Split from
-- 2026-05-27-000002_add_insight_generation_params because that
-- migration was already applied on some environments before these
-- columns were added.
ALTER TABLE photo_insights ADD COLUMN prompt_eval_count INTEGER;
ALTER TABLE photo_insights ADD COLUMN eval_count INTEGER;
+392
View File
@@ -0,0 +1,392 @@
# Insight Chat improvements — design
**Date:** 2026-05-07
**Branch:** `feature/insight-chat-improvements` (in both `ImageApi/` and `FileViewer-React/`)
**Scope:** ImageApi photo-anchored insight + chat surface, plus the
FileViewer-React client. Apollo's free/visit chat is **not** in this cycle.
## Problem
Three concrete gaps in today's insight + chat surface:
1. **Tool drift.** ImageApi exposes 13 tools to the LLM. Some are gated on
`apollo_enabled` / `has_vision`, but several optional ones
(`search_rag`, `get_calendar_events`, `get_location_history`) are
registered unconditionally even when their backing tables are empty.
Descriptions vary in quality and a couple have outright bugs.
2. **Inconsistent / incomplete tool descriptions.** Tools like
`search_messages` describe their selection rules but omit useful
examples; `store_fact` doesn't show the `object_entity_id` vs
`object_value` choice; `get_sms_messages` accepts a `days_radius`
parameter that the backing client silently ignores. The LLM is being
instructed against a slightly wrong reality.
3. **System prompt fights the persona.** Today's generation prompt
prepends the user's `custom_system_prompt` and then immediately asserts
`"You are a personal photo memory assistant..."`. The user message
demands `"a detailed insight with a title and summary"`. Both
contradict whatever voice / shape / POV the persona just established.
On chat continuation the persona is baked into the stored transcript at
generation time and can't be changed without regenerating.
## Goals
- Tool catalog is **representative** — every tool registered for a turn is
backed by data the user actually has.
- Tool descriptions are **concise but complete**, with examples for any
tool whose param choice has multiple modes or non-obvious interactions.
- Persona / system prompt is **authoritative** for voice, length, and
shape — both at generation and during chat continuation.
- Per-turn system prompt overrides on chat work without surprising
side-effects on the stored transcript outside `amend` mode.
## Non-goals
- Apollo backend / frontend changes. Separate cycle.
- Refactoring the `generate_photo_title` post-hoc title flow. Already
takes `custom_system_prompt`.
- Tool consolidation (e.g. merging `search_messages` + `get_sms_messages`).
Considered and deferred — keeps blast radius small.
- Removing knowledge-memory tools (`recall_*` / `store_*`). Audit
confirmed they have a live read path via `knowledge.rs` HTTP routes.
- Persisting persona changes to the stored transcript outside `amend`
mode. Deliberate — re-opens use the persona currently active in the
client, not a sticky historical setting.
---
## Design
### A. System prompt — generation
Today (`insight_generator.rs:33053326`):
```
[custom_system_prompt if any] +
"You are a personal photo memory assistant helping to reconstruct..." +
{owner_id_note} +
{fewshot_block} +
"IMPORTANT INSTRUCTIONS:
1. You MUST call multiple tools...
2. When calling get_sms_messages and search_rag...
3. Use recall_facts_for_photo...
...
8. You have a hard budget of {max_iterations} iterations..."
```
The first concatenation is the bug: `custom` claims one identity, the
next line asserts another.
**New structure** — two named blocks, in order:
```
[Identity / voice / format block] ← persona-controlled (or neutral default)
[Procedural block] ← always identity-free
```
**Identity block:**
- When `custom_system_prompt` is supplied: use that string verbatim, no
pre/append.
- When not: a neutral default that doesn't fight a future persona.
Working text: `"You are reconstructing a memory from a photo. Use the
gathered context to write a thoughtful summary; you decide voice,
length, and shape."`
**Procedural block** — identity-free, always emitted:
```
Tool-use guidance:
- You have a budget of {max_iterations} tool-calling iterations.
- Call tools to gather context BEFORE writing your final answer; don't
answer after one or two calls.
- When calling get_sms_messages or search_rag, make at least one call
WITHOUT a contact filter — surrounding events matter even when a
contact is known.
- Use recall_facts_for_photo + recall_entities to load any prior
knowledge about subjects in the photo.
- When you identify people / places / events / things, use store_entity
+ store_fact to grow the persistent memory.
- A tool returning no results is informative; continue with the others.
{owner_id_note if applicable}
{fewshot_block if applicable}
```
Differences from today's "IMPORTANT INSTRUCTIONS" block: removed the
"you are a personal photo memory assistant" framing and the explicit
"at least 5 tool calls" floor (replaced with the softer "don't answer
after one or two"). Few-shot stays — it's pattern-of-tool-use, not
identity.
### B. User message — generation
Today (line 3357):
```
{visual_block}Please analyze this photo and gather any relevant context
from the surrounding weeks.
Photo file path: {file_path}
Date taken: {date}
{contact_info}
{gps_info}
{tags_info}
Use the available tools to gather more context about this moment
(messages, calendar events, location history, etc.), then write a
detailed insight with a title and summary.
```
Problems: the trailing line bakes in output shape ("title and
summary"), and the title from the resulting response is **discarded
anyway** — `generate_photo_title` (line 3494) regenerates the title
post-hoc from the summary. So the prompt is constraining voice for no
data-model benefit.
**New payload** — context-only, no output prescription:
```
{visual_block}Photo file path: {file_path}
Date taken: {date}
{contact_info}
{gps_info}
{tags_info}
Gather context with the available tools, then respond.
```
The persona owns shape. If a user wants "title-then-paragraph" output,
their persona prompt says so.
### C. System prompt — chat continuation
Add `system_prompt: Option<String>` to `ChatTurnRequest` (and to its
HTTP wrapper `ChatTurnHttpRequest`). It carries through both the
non-streaming `chat_turn` and the streaming `chat_turn_stream`.
**Append mode (default, `amend=false`)** — ephemeral
swap-and-restore, mirroring the existing `annotate_system_with_budget`
pattern:
1. Load stored transcript.
2. If `system_prompt` is `Some(s)`:
- If first message is a `system` role: stash original content,
replace with `s`.
- Else: prepend a synthetic ephemeral system message with `s` (note
it's synthetic so the restore step pops it rather than rewriting).
3. Run `annotate_system_with_budget` on top (existing per-turn budget
note appends to whatever's there now).
4. Run the agentic loop.
5. **Before persistence**, restore the original system content (or pop
the synthetic one). Run `restore_system_content` for the budget
annotation as today.
6. Save.
Result: the model sees the override; the stored transcript is
unchanged outside the model's actual reply.
**Amend mode (`amend=true`)**:
- If `system_prompt` is supplied: the override stays in place during
the serialization for the new insight row. The new row's
`training_messages` system message is the override. `is_current=false`
flips on prior rows as today.
- If not supplied: behaves as today (stored transcript's system message
carries forward unchanged).
### D. FileViewer-React — client wiring
`hooks/useInsightChat.tsx`:
- `SendTurnOptions` gains `systemPromptOverride?: string | null`.
- Inside `sendTurn`, before issuing the streaming POST:
1. Read the active persona's `systemPrompt` from AsyncStorage
(already loaded for generation flows — reuse the same accessor).
2. If a one-shot `systemPromptOverride` is set, append as a suffix
(`${persona}\n\n${override}`) so persona voice survives + override
tweaks the turn.
3. Include the resulting string as `system_prompt` on the request body.
- No history-load change. The history endpoint still returns the stored
transcript.
`components/InsightChatModal.tsx`:
- Add a small "Style note" composer affordance — a one-shot text input
that, when filled, becomes the `systemPromptOverride` for the next
send. Cleared after send.
- The existing persona chip continues to open `PersonaManagerModal`.
`hooks/usePersonas.tsx` and the bundled defaults:
- Built-in `assistant` and `journal` prompts get audited and rewritten
to **explicitly state voice / shape / length** — since the framework
no longer guarantees a default shape, the persona must.
### E. Tool catalog — gating
Widen `build_tool_definitions` from `(has_vision: bool, apollo_enabled:
bool)` to a single `ToolGateOpts` struct:
```rust
pub struct ToolGateOpts {
pub has_vision: bool,
pub apollo_enabled: bool,
pub daily_summaries_present: bool,
pub calendar_present: bool,
pub location_history_present: bool,
}
```
The chat / generation services compute the three new fields lazily per
turn via `SELECT 1 FROM <table> LIMIT 1` (cheap; cached for the turn's
duration). Lazy because operators import data after launch and we don't
want to require a restart for the LLM to discover its new capabilities.
Per-tool gating:
| Tool | Existing gate | New gate |
|---|---|---|
| `describe_photo` | `has_vision` | unchanged |
| `get_personal_place_at` | `apollo_enabled` | unchanged |
| `get_calendar_events` | none | `calendar_present` |
| `get_location_history` | none | `location_history_present` |
| `search_rag` | none | `daily_summaries_present` |
All other tools always-on. (`get_sms_messages` and `search_messages`
fail informatively if SMS-API is unreachable; not worth a startup probe
since intermittent failures are the same shape.)
### F. Tool descriptions — convention
Every description follows:
1. One sentence: **what** + **when to call**.
2. Param semantics worth knowing (units, ranges, mode behavior,
precedence).
3. **Example invocation** for tools with multiple modes, optional bands,
or non-obvious parameter interactions.
4. Cross-references when relevant: `prefer X when both apply`.
Banned: all-caps section headers inside descriptions
(`"CONTENT search"`, `"TIME-BASED fetch"`); persona-prescriptive language
(`"you are a..."`); behavioral references to other tools by description
rather than name.
Tools getting examples: `search_messages`, `search_rag`, `store_fact`,
`get_sms_messages`. Trivial tools (`get_current_datetime`,
`reverse_geocode`, `get_file_tags`) skip the example.
Sample (`search_messages`):
> Search SMS/MMS message bodies. Modes: `fts5` (keyword + phrase + prefix
> + AND/OR/NOT + NEAR proximity), `semantic` (embedding similarity,
> requires generated embeddings), `hybrid` (RRF merge, recommended;
> degrades to `fts5` when embeddings absent). Optional `start_ts` /
> `end_ts` (real-UTC unix seconds) and `contact_id` filters. For pure
> date / contact browsing without keywords, prefer `get_sms_messages`.
>
> Examples:
> - `{query: "trader joe's"}` — phrase across all time.
> - `{query: "dinner", contact_id: 42, start_ts: 1700000000, end_ts: 1700604800}`
> — keyword within a contact and a week.
> - `{query: "NEAR(meeting work, 5)"}` — proximity search.
### G. SMS tool fixes
#### `get_sms_messages` — honor `days_radius`
Today: `sms_client::fetch_messages_for_contact(contact, center_ts)`
hardcodes `Duration::days(4)` (lines 3137). The tool accepts
`days_radius` and silently ignores it.
**Fix:** widen the signature to
`fetch_messages_for_contact(contact, center_ts, days_radius)`. Tool
plumbs through. Default 4 retained for back-compat.
#### `search_messages` — add date and contact_id filters
Today: ImageApi's `search_messages` only forwards `query`, `mode`,
`limit` to SMS-API.
**Fix:** add `start_ts`, `end_ts`, `contact_id` parameters.
- `contact_id` forwards directly to SMS-API
(`/api/messages/search/?contact_id=`).
- `start_ts` / `end_ts` are not natively accepted by SMS-API's search
endpoint. Apply client-side post-filter on the response (Apollo's
pattern: `chat_tools.py:670680`). Bump the SMS-API `limit` to a
larger fetch pool when a date filter is supplied so in-window matches
aren't lost to out-of-window FTS rank.
---
## Implementation sequencing
Each step is independently mergeable.
### ImageApi PRs
1. **Split system-prompt assembly + neutralize user message.** Two
named blocks; user message context-only. Default identity string
added. Tests: golden snapshots of the resulting `system_content`
with and without `custom_system_prompt`.
2. **`system_prompt` field on chat request + swap/restore + amend
persistence.** Mirrors `annotate_system_with_budget` pattern. Tests:
round-trip system content unchanged in append mode; persisted in
amend mode.
3. **`fetch_messages_for_contact` honors `days_radius`.** Tool wires
the param through. Tests: window math at the client level.
4. **`ToolGateOpts` + per-tool description rewrites.** Description
text changes are the bulk of the diff but no behavior change beyond
gating.
### FileViewer-React PR
5. **Chat hook sends `system_prompt`; modal gets style-note input;
built-in personas updated to specify shape.** The
`useInsightChat.sendTurn` call site picks up the persona and
includes it on every chat turn body. Style-note input is a one-shot
suffix.
## Testing & verification
**Automated:**
- Unit (Rust): swap-and-restore round-trip preserves stored transcript.
- Unit (Rust): amend mode persists override into new insight row.
- Unit (Rust): `fetch_messages_for_contact(days_radius=N)` produces a
window of `2N` days centered on `center_ts`.
- Unit (Rust): `build_tool_definitions(opts)` excludes gated tools when
the corresponding flag is false.
**Manual:**
- Run a chat turn against an existing insight without `system_prompt`
output unchanged from baseline.
- Same insight, with override → output reflects new voice.
- Re-open chat → original baked persona still authoritative (override
was ephemeral).
- Regenerate an insight with the journal persona → model's voice
matches journal style; no "memory assistant" framing leaks through.
- Toggle data presence (delete a row from `calendar_events`) → tool
drops from the catalog on the next turn.
## Risks
- **Default identity wording matters.** A too-neutral default ("Use the
gathered context to write a summary") might produce flatter output
than today's "personal photo memory assistant" framing for users
who never set a persona. Mitigation: tune the default with a small
set of test photos before merging.
- **Persona-suffix style notes can contradict persona voice.** A user
who picks `journal` (first person, warm) and adds the style note
"respond in bullet points" will get a tonal collision. Acceptable —
the user expressed a per-turn intent and we honor it. Document the
composition rule in the persona-manager UI.
- **Lazy data-presence probes add a per-turn `SELECT 1`.** Negligible
on SQLite (sub-millisecond) but adds up across many turns. Cache the
result for the turn's duration; don't re-probe per-tool.
## Open questions
None blocking. Items deferred to a possible follow-up cycle:
- Apollo parity for the same per-turn override pattern (already
present; just needs RN client wiring on the photo path which is
already proxy).
- Tool consolidation (`search_messages` + `get_sms_messages`
single `search_messages` with optional date filter, Apollo-style).
Considered and deferred — separate spec.
+110
View File
@@ -0,0 +1,110 @@
//! Thin async HTTP client for Apollo's `/api/places/*` endpoints.
//!
//! Apollo (the personal location-history viewer at the sibling repo) owns
//! user-defined Places: `name + lat/lon + radius_m + description (+ optional
//! category)`. We consume them in two places:
//!
//! 1. Automatic enrichment in [`crate::ai::insight_generator`] — the always-on
//! path that folds the most-specific containing Place into the location
//! string fed to the LLM.
//! 2. The agentic `get_personal_place_at` tool — lets the LLM ask "what
//! user-defined place contains this lat/lon" during chat continuation.
//!
//! Apollo does the haversine. This client is plumbing only — no geometry,
//! no caching at the moment. If insight throughput ever makes per-photo
//! HTTP latency a problem, swap to a small `Mutex<HashMap>` TTL cache here.
//!
//! Configured via `APOLLO_API_BASE_URL`. When unset, the client constructs
//! to a no-op shell: every method returns empty / `None`, the enrichment
//! path silently falls through to the legacy Nominatim-only output, and the
//! tool registration in `insight_generator` reports "integration disabled."
use anyhow::Result;
use reqwest::Client;
use serde::Deserialize;
use std::time::Duration;
// Public fields — `id`, `lat`, `lon` aren't read from the current tool
// output but are part of the wire model and useful for future tool
// extensions / debugging.
#[allow(dead_code)]
#[derive(Debug, Clone, Deserialize)]
pub struct ApolloPlace {
pub id: i32,
pub name: String,
#[serde(default)]
pub description: String,
pub lat: f64,
pub lon: f64,
pub radius_m: i32,
#[serde(default)]
pub category: Option<String>,
}
#[derive(Deserialize)]
struct PlacesResponse {
places: Vec<ApolloPlace>,
}
#[derive(Clone)]
pub struct ApolloClient {
client: Client,
/// `None` means the integration is disabled — every method returns
/// empty so the rest of insight generation runs unchanged.
base_url: Option<String>,
}
impl ApolloClient {
pub fn new(base_url: Option<String>) -> Self {
// 5 s timeout: Apollo runs on the LAN. If it doesn't answer in
// five seconds, treat the call as failed and fall back to the
// legacy Nominatim path rather than block the whole insight.
let client = Client::builder()
.timeout(Duration::from_secs(5))
.build()
.expect("reqwest client build");
Self { client, base_url }
}
/// Convenience for callers that need to know whether to register the
/// `get_personal_place_at` tool (or to short-circuit enrichment).
pub fn is_enabled(&self) -> bool {
self.base_url.is_some()
}
/// Server-side haversine: returns places whose radius contains
/// (lat, lon), already sorted smallest-radius-first by Apollo. The
/// caller can take `[0]` for the most-specific match (matches
/// Apollo's `primaryPlaceFor` rule on the frontend, so the carousel
/// badge and the LLM prompt always agree).
pub async fn places_containing(&self, lat: f64, lon: f64) -> Vec<ApolloPlace> {
let Some(base) = self.base_url.as_deref() else {
return Vec::new();
};
match self.fetch_places_containing(base, lat, lon).await {
Ok(places) => places,
Err(err) => {
log::warn!("apollo_client: places_containing({lat:.4}, {lon:.4}) failed: {err}");
Vec::new()
}
}
}
async fn fetch_places_containing(
&self,
base: &str,
lat: f64,
lon: f64,
) -> Result<Vec<ApolloPlace>> {
let url = format!("{}/api/places/contains", base.trim_end_matches('/'));
let resp = self
.client
.get(&url)
.query(&[("lat", lat), ("lon", lon)])
.send()
.await?
.error_for_status()?;
let body: PlacesResponse = resp.json().await?;
Ok(body.places)
}
}
+140
View File
@@ -0,0 +1,140 @@
use anyhow::{Result, anyhow};
use crate::ai::llm_client::LlmClient;
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum BackendKind {
Local,
Hybrid,
}
impl BackendKind {
pub fn parse(s: &str) -> Result<Self> {
match s.trim().to_lowercase().as_str() {
"local" | "" => Ok(Self::Local),
"hybrid" => Ok(Self::Hybrid),
other => Err(anyhow!(
"unknown backend '{}'; expected 'local' or 'hybrid'",
other
)),
}
}
pub fn as_str(&self) -> &'static str {
match self {
Self::Local => "local",
Self::Hybrid => "hybrid",
}
}
}
impl std::fmt::Display for BackendKind {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
f.write_str(self.as_str())
}
}
pub struct SamplingOverrides {
pub model: Option<String>,
pub num_ctx: Option<i32>,
pub temperature: Option<f32>,
pub top_p: Option<f32>,
pub top_k: Option<i32>,
pub min_p: Option<f32>,
}
impl SamplingOverrides {
pub fn has_sampling(&self) -> bool {
self.temperature.is_some()
|| self.top_p.is_some()
|| self.top_k.is_some()
|| self.min_p.is_some()
}
}
pub struct ResolvedBackend {
chat: Box<dyn LlmClient>,
local: Box<dyn LlmClient>,
pub kind: BackendKind,
/// `true` when the chat model receives images directly (Ollama with
/// vision, or llamacpp). `false` for hybrid where we describe-then-inline.
pub images_inline: bool,
}
impl ResolvedBackend {
pub fn new(
chat: Box<dyn LlmClient>,
local: Box<dyn LlmClient>,
kind: BackendKind,
images_inline: bool,
) -> Self {
Self {
chat,
local,
kind,
images_inline,
}
}
pub fn chat(&self) -> &dyn LlmClient {
self.chat.as_ref()
}
pub fn local(&self) -> &dyn LlmClient {
self.local.as_ref()
}
pub fn model(&self) -> &str {
self.chat.primary_model()
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn parse_backend_kind() {
assert_eq!(BackendKind::parse("local").unwrap(), BackendKind::Local);
assert_eq!(BackendKind::parse("hybrid").unwrap(), BackendKind::Hybrid);
assert_eq!(BackendKind::parse(" Local ").unwrap(), BackendKind::Local);
assert_eq!(BackendKind::parse("HYBRID").unwrap(), BackendKind::Hybrid);
assert_eq!(BackendKind::parse("").unwrap(), BackendKind::Local);
assert!(BackendKind::parse("vllm").is_err());
}
#[test]
fn backend_kind_as_str_roundtrips() {
assert_eq!(
BackendKind::parse(BackendKind::Local.as_str()).unwrap(),
BackendKind::Local
);
assert_eq!(
BackendKind::parse(BackendKind::Hybrid.as_str()).unwrap(),
BackendKind::Hybrid
);
}
#[test]
fn sampling_overrides_has_sampling() {
let empty = SamplingOverrides {
model: None,
num_ctx: None,
temperature: None,
top_p: None,
top_k: None,
min_p: None,
};
assert!(!empty.has_sampling());
let with_temp = SamplingOverrides {
model: None,
num_ctx: Some(4096),
temperature: Some(0.7),
top_p: None,
top_k: None,
min_p: None,
};
assert!(with_temp.has_sampling());
}
}
+392
View File
@@ -0,0 +1,392 @@
//! Thin async HTTP client for Apollo's `/api/internal/clip/*` endpoints.
//!
//! Apollo hosts the OpenAI CLIP inference service (ViT-L/14 by default,
//! configurable via `APOLLO_CLIP_MODEL`). This client is the ImageApi side
//! of the contract: shove image bytes through `/encode_image` to populate
//! `image_exif.clip_embedding` during backfill, and call `/encode_text` to
//! encode a user's natural-language query at search time. The actual
//! cosine-similarity rerank runs locally in ImageApi.
//!
//! Mirrors `face_client.rs` / `tag_client.rs` shape: optional base URL
//! (None = disabled — feature off, drain and search no-op), reqwest
//! client with a generous timeout because GPU inference under a backlog
//! can queue server-side (Apollo's threadpool is bounded to 1 worker on
//! CUDA).
//!
//! Configured via `APOLLO_CLIP_API_BASE_URL`, falling back to
//! `APOLLO_API_BASE_URL` when the dedicated var is unset (single-Apollo
//! deploys are the common case).
//!
//! Wire format:
//! - `/encode_image`: multipart/form-data with `file=<bytes>` and
//! `meta=<json>` (content_hash / library_id / rel_path for logging).
//! - `/encode_text`: JSON `{"text": "<query>"}`.
//!
//! Both return `{model_version, embedding_dim, duration_ms, embedding}`
//! where `embedding` is base64 of `dim×4` little-endian float32 bytes,
//! L2-normalized so the rerank reduces to a plain dot product.
//!
//! Error mapping (reflected in [`ClipError`]):
//! - 422 `decode_failed` / `empty_text` → permanent: ImageApi marks the
//! row failed or surfaces the empty-query error to the search caller.
//! - 503 `cuda_oom` / `engine_unavailable` → defer-and-retry: no marker.
//! - Any other 5xx / network error → defer.
use anyhow::{Context, Result};
use base64::Engine;
use reqwest::Client;
use serde::{Deserialize, Serialize};
use std::time::Duration;
#[derive(Debug, Clone, Serialize)]
pub struct EncodeImageMeta {
pub content_hash: String,
pub library_id: i32,
pub rel_path: String,
}
#[derive(Debug, Clone, Deserialize)]
#[allow(dead_code)] // duration_ms logged by the backfill drain
pub struct EncodeResponse {
pub model_version: String,
pub embedding_dim: i32,
pub duration_ms: i64,
/// base64 of `embedding_dim * 4` bytes (LE float32). ImageApi stores
/// the decoded bytes verbatim as a BLOB.
pub embedding: String,
}
impl EncodeResponse {
/// Decode the wire-format embedding back into raw bytes for storage.
/// Validates the buffer is `embedding_dim * 4` bytes long so a
/// malformed response surfaces here rather than as a downstream
/// silent length mismatch.
pub fn decode_embedding(&self) -> Result<Vec<u8>> {
let bytes = base64::engine::general_purpose::STANDARD
.decode(self.embedding.as_bytes())
.context("clip embedding base64 decode")?;
let expected = (self.embedding_dim as usize) * 4;
if bytes.len() != expected {
anyhow::bail!(
"clip embedding wrong size: got {} bytes, expected {} ({} * 4)",
bytes.len(),
expected,
self.embedding_dim
);
}
Ok(bytes)
}
}
#[derive(Debug, Clone, Deserialize)]
#[allow(dead_code)] // load_error consumed by future health probe
pub struct ClipHealth {
pub loaded: bool,
pub device: String,
pub model_version: String,
pub embedding_dim: i32,
#[serde(default)]
pub load_error: Option<String>,
}
#[derive(Debug)]
pub enum ClipError {
/// Apollo refused for a reason that won't change on retry (decode
/// failure on /encode_image, empty text on /encode_text).
Permanent(anyhow::Error),
/// Apollo couldn't process this turn but might next time (CUDA OOM,
/// engine not loaded, network hiccup).
Transient(anyhow::Error),
/// Feature is disabled (no `APOLLO_CLIP_API_BASE_URL` /
/// `APOLLO_API_BASE_URL`).
Disabled,
}
impl std::fmt::Display for ClipError {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
match self {
ClipError::Permanent(e) => write!(f, "permanent: {e}"),
ClipError::Transient(e) => write!(f, "transient: {e}"),
ClipError::Disabled => write!(f, "clip client disabled"),
}
}
}
impl std::error::Error for ClipError {}
#[derive(Clone)]
pub struct ClipClient {
client: Client,
base_url: Option<String>,
}
impl ClipClient {
pub fn new(base_url: Option<String>) -> Self {
let timeout_secs = std::env::var("CLIP_REQUEST_TIMEOUT_SEC")
.ok()
.and_then(|s| s.parse::<u64>().ok())
.unwrap_or(60);
let client = Client::builder()
.timeout(Duration::from_secs(timeout_secs))
.build()
.expect("reqwest client build");
Self {
client,
base_url: base_url.map(|u| u.trim_end_matches('/').to_string()),
}
}
/// Read both standard env vars. `APOLLO_CLIP_API_BASE_URL` wins;
/// fallback to `APOLLO_API_BASE_URL`. Both unset → disabled.
pub fn from_env() -> Self {
let base = std::env::var("APOLLO_CLIP_API_BASE_URL")
.ok()
.filter(|s| !s.trim().is_empty())
.or_else(|| {
std::env::var("APOLLO_API_BASE_URL")
.ok()
.filter(|s| !s.trim().is_empty())
});
Self::new(base)
}
pub fn is_enabled(&self) -> bool {
self.base_url.is_some()
}
/// Encode an image to a 768-d (ViT-L/14) or 512-d (ViT-B/32)
/// L2-normalized embedding. Used by the backfill drain.
pub async fn encode_image(
&self,
bytes: Vec<u8>,
meta: EncodeImageMeta,
) -> std::result::Result<EncodeResponse, ClipError> {
let Some(base) = self.base_url.as_deref() else {
return Err(ClipError::Disabled);
};
let url = format!("{}/api/internal/clip/encode_image", base);
let meta_json = serde_json::to_string(&meta)
.map_err(|e| ClipError::Permanent(anyhow::anyhow!("meta serialize: {e}")))?;
let form = reqwest::multipart::Form::new()
.text("meta", meta_json)
.part(
"file",
reqwest::multipart::Part::bytes(bytes)
.file_name(meta.rel_path.clone())
.mime_str("application/octet-stream")
.unwrap_or_else(|_| reqwest::multipart::Part::bytes(Vec::new())),
);
self.send_multipart(&url, form).await
}
/// Encode a natural-language query to an embedding. Used by the
/// search route to rank stored image embeddings by cosine sim.
pub async fn encode_text(&self, text: &str) -> std::result::Result<EncodeResponse, ClipError> {
let Some(base) = self.base_url.as_deref() else {
return Err(ClipError::Disabled);
};
let url = format!("{}/api/internal/clip/encode_text", base);
let body = serde_json::json!({ "text": text });
let resp = match self.client.post(&url).json(&body).send().await {
Ok(r) => r,
Err(e) if e.is_timeout() || e.is_connect() => {
return Err(ClipError::Transient(anyhow::anyhow!(
"clip client network: {e}"
)));
}
Err(e) => {
return Err(ClipError::Transient(anyhow::anyhow!(
"clip client request: {e}"
)));
}
};
let status = resp.status();
if status.is_success() {
let body: EncodeResponse = resp
.json()
.await
.map_err(|e| ClipError::Transient(anyhow::anyhow!("clip response decode: {e}")))?;
return Ok(body);
}
let body_text = resp.text().await.unwrap_or_default();
Err(classify_error_response(status.as_u16(), &body_text))
}
/// Engine reachability + device/model report. Used as a startup
/// sanity check from the probe binary and (later) the backlog drain.
#[allow(dead_code)] // consumed by probe + drain
pub async fn health(&self) -> Result<ClipHealth> {
let base = self.base_url.as_deref().context("clip client disabled")?;
let url = format!("{}/api/internal/clip/health", base);
let resp = self.client.get(&url).send().await?.error_for_status()?;
let body: ClipHealth = resp.json().await?;
Ok(body)
}
async fn send_multipart(
&self,
url: &str,
form: reqwest::multipart::Form,
) -> std::result::Result<EncodeResponse, ClipError> {
let resp = match self.client.post(url).multipart(form).send().await {
Ok(r) => r,
Err(e) if e.is_timeout() || e.is_connect() => {
return Err(ClipError::Transient(anyhow::anyhow!(
"clip client network: {e}"
)));
}
Err(e) => {
return Err(ClipError::Transient(anyhow::anyhow!(
"clip client request: {e}"
)));
}
};
let status = resp.status();
if status.is_success() {
let body: EncodeResponse = resp
.json()
.await
.map_err(|e| ClipError::Transient(anyhow::anyhow!("clip response decode: {e}")))?;
return Ok(body);
}
let body_text = resp.text().await.unwrap_or_default();
Err(classify_error_response(status.as_u16(), &body_text))
}
}
/// Pulled out as a pure function so the marker-row contract is unit-
/// testable without spinning up an HTTP server. Matches the shape used
/// by face_client::classify_error_response so future retry policies
/// can share code.
fn classify_error_response(status: u16, body_text: &str) -> ClipError {
let detail_code = serde_json::from_str::<serde_json::Value>(body_text)
.ok()
.and_then(|v| {
v.get("detail")
.and_then(|d| d.as_str().map(str::to_string))
.or_else(|| {
v.get("detail")
.and_then(|d| d.get("code"))
.and_then(|c| c.as_str())
.map(str::to_string)
})
})
.unwrap_or_default();
if status == 422 {
return ClipError::Permanent(anyhow::anyhow!(
"clip {} {}: {}",
status,
detail_code,
body_text
));
}
if status == 503 {
return ClipError::Transient(anyhow::anyhow!(
"clip {} {}: {}",
status,
detail_code,
body_text
));
}
// 408 / 413 / 429 are operator-fixable infra issues; defer.
if matches!(status, 408 | 413 | 429) {
return ClipError::Transient(anyhow::anyhow!(
"clip {} {}: {}",
status,
detail_code,
body_text
));
}
if (400..500).contains(&status) {
ClipError::Permanent(anyhow::anyhow!(
"clip {} {}: {}",
status,
detail_code,
body_text
))
} else {
ClipError::Transient(anyhow::anyhow!(
"clip {} {}: {}",
status,
detail_code,
body_text
))
}
}
#[cfg(test)]
mod tests {
use super::*;
fn is_permanent(e: &ClipError) -> bool {
matches!(e, ClipError::Permanent(_))
}
fn is_transient(e: &ClipError) -> bool {
matches!(e, ClipError::Transient(_))
}
#[test]
fn classify_422_decode_failed_is_permanent() {
assert!(is_permanent(&classify_error_response(
422,
r#"{"detail":"decode_failed: bad bytes"}"#
)));
}
#[test]
fn classify_422_empty_text_is_permanent() {
assert!(is_permanent(&classify_error_response(
422,
r#"{"detail":"empty_text"}"#
)));
}
#[test]
fn classify_503_cuda_oom_is_transient() {
assert!(is_transient(&classify_error_response(
503,
r#"{"detail":{"code":"cuda_oom","error":"out of memory"}}"#,
)));
}
#[test]
fn classify_5xx_is_transient_other_4xx_is_permanent() {
assert!(is_transient(&classify_error_response(500, "")));
assert!(is_permanent(&classify_error_response(404, "{}")));
}
#[test]
fn classify_infra_4xx_is_transient() {
assert!(is_transient(&classify_error_response(408, "")));
assert!(is_transient(&classify_error_response(413, "<html>")));
assert!(is_transient(&classify_error_response(429, "{}")));
}
#[test]
fn decode_embedding_size_mismatch_errors() {
// dim=4 says we expect 16 bytes (4 floats × 4 bytes). Encode 8.
use base64::Engine;
let resp = EncodeResponse {
model_version: "ViT-L/14".into(),
embedding_dim: 4,
duration_ms: 0,
embedding: base64::engine::general_purpose::STANDARD.encode([0u8; 8]),
};
assert!(resp.decode_embedding().is_err());
}
#[test]
fn decode_embedding_round_trip() {
use base64::Engine;
let bytes: Vec<u8> = (0..16).collect();
let resp = EncodeResponse {
model_version: "ViT-L/14".into(),
embedding_dim: 4,
duration_ms: 0,
embedding: base64::engine::general_purpose::STANDARD.encode(&bytes),
};
assert_eq!(resp.decode_embedding().unwrap(), bytes);
}
}
+75 -60
View File
@@ -6,12 +6,83 @@ use std::collections::HashMap;
use std::sync::{Arc, Mutex};
use tokio::time::sleep;
use crate::ai::{OllamaClient, SmsApiClient, SmsMessage};
use crate::ai::{EMBEDDING_MODEL, OllamaClient, SmsApiClient, SmsMessage, user_display_name};
use crate::database::{DailySummaryDao, InsertDailySummary};
use crate::otel::global_tracer;
/// Strip boilerplate prefixes and common phrases from summaries before embedding.
/// This improves embedding diversity by removing structural similarity.
/// Maximum number of messages passed to the summarizer for a single day.
/// Tuned to avoid token overflow on typical chat models; shared between
/// the production job and the test binary so they can't drift.
pub const DAILY_SUMMARY_MESSAGE_LIMIT: usize = 300;
/// System prompt used when generating daily conversation summaries.
pub const DAILY_SUMMARY_SYSTEM_PROMPT: &str = "You are a conversation summarizer. Create clear, factual summaries with \
precise subject attribution AND extract distinctive keywords. Focus on \
specific, unique terms that differentiate this conversation from others.";
/// Build the prompt for a single day's conversation summary. Shared by the
/// production job and the test binary so prompt tweaks land in both places.
/// Returns `(prompt, system_prompt)`.
pub fn build_daily_summary_prompt(
contact: &str,
date: &NaiveDate,
messages: &[SmsMessage],
) -> (String, &'static str) {
let user_name = user_display_name();
let messages_text: String = messages
.iter()
.take(DAILY_SUMMARY_MESSAGE_LIMIT)
.map(|m| {
if m.is_sent {
format!("{}: {}", user_name, m.body)
} else {
format!("{}: {}", m.contact, m.body)
}
})
.collect::<Vec<_>>()
.join("\n");
let prompt = format!(
r#"Summarize this day's conversation between {user_name} and {contact}.
CRITICAL FORMAT RULES:
- Do NOT start with "Based on the conversation..." or "Here is a summary..." or similar preambles
- Do NOT repeat the date at the beginning
- Start DIRECTLY with the content - begin with a person's name or action
- Write in past tense, as if recording what happened
NARRATIVE (4-8 sentences):
- What specific topics, activities, or events were discussed?
- What places, people, or organizations were mentioned?
- What plans were made or decisions discussed?
- Clearly distinguish between what {user_name} did versus what {contact} did
KEYWORDS (comma-separated):
5-10 specific keywords that capture this conversation's unique content:
- Proper nouns (people, places, brands)
- Specific activities ("drum corps audition" not just "music")
- Distinctive terms that make this day unique
Date: {month_day_year} ({weekday})
Messages:
{messages_text}
YOUR RESPONSE (follow this format EXACTLY):
Summary: [Start directly with content, NO preamble]
Keywords: [specific, unique terms]"#,
user_name = user_name,
contact = contact,
month_day_year = date.format("%B %d, %Y"),
weekday = date.format("%A"),
messages_text = messages_text,
);
(prompt, DAILY_SUMMARY_SYSTEM_PROMPT)
}
pub fn strip_summary_boilerplate(summary: &str) -> String {
let mut text = summary.trim().to_string();
@@ -290,65 +361,10 @@ async fn generate_and_store_daily_summary(
span.set_attribute(KeyValue::new("contact", contact.to_string()));
span.set_attribute(KeyValue::new("message_count", messages.len() as i64));
// Format messages for LLM
let messages_text: String = messages
.iter()
.take(200) // Limit to 200 messages per day to avoid token overflow
.map(|m| {
if m.is_sent {
format!("Me: {}", m.body)
} else {
format!("{}: {}", m.contact, m.body)
}
})
.collect::<Vec<_>>()
.join("\n");
let weekday = date.format("%A");
let prompt = format!(
r#"Summarize this day's conversation between me and {}.
CRITICAL FORMAT RULES:
- Do NOT start with "Based on the conversation..." or "Here is a summary..." or similar preambles
- Do NOT repeat the date at the beginning
- Start DIRECTLY with the content - begin with a person's name or action
- Write in past tense, as if recording what happened
NARRATIVE (3-5 sentences):
- What specific topics, activities, or events were discussed?
- What places, people, or organizations were mentioned?
- What plans were made or decisions discussed?
- Clearly distinguish between what "I" did versus what {} did
KEYWORDS (comma-separated):
5-10 specific keywords that capture this conversation's unique content:
- Proper nouns (people, places, brands)
- Specific activities ("drum corps audition" not just "music")
- Distinctive terms that make this day unique
Date: {} ({})
Messages:
{}
YOUR RESPONSE (follow this format EXACTLY):
Summary: [Start directly with content, NO preamble]
Keywords: [specific, unique terms]"#,
contact,
contact,
date.format("%B %d, %Y"),
weekday,
messages_text
);
let (prompt, system_prompt) = build_daily_summary_prompt(contact, date, messages);
// Generate summary with LLM
let summary = ollama
.generate(
&prompt,
Some("You are a conversation summarizer. Create clear, factual summaries with precise subject attribution AND extract distinctive keywords. Focus on specific, unique terms that differentiate this conversation from others."),
)
.await?;
let summary = ollama.generate(&prompt, Some(system_prompt)).await?;
log::debug!(
"Generated summary for {}: {}",
@@ -381,8 +397,7 @@ Keywords: [specific, unique terms]"#,
message_count: messages.len() as i32,
embedding,
created_at: Utc::now().timestamp(),
// model_version: "nomic-embed-text:v1.5".to_string(),
model_version: "mxbai-embed-large:335m".to_string(),
model_version: EMBEDDING_MODEL.to_string(),
};
// Create context from current span for DB operation
+400
View File
@@ -0,0 +1,400 @@
//! Thin async HTTP client for Apollo's `/api/internal/faces/*` endpoints.
//!
//! Apollo (the personal location-history viewer at the sibling repo) hosts the
//! insightface inference service. This client is the ImageApi side of the
//! contract — it shoves image bytes through `/detect` and returns boxes +
//! 512-d ArcFace embeddings, plus a single-embedding `/embed` for the manual
//! face-create flow.
//!
//! Mirrors `apollo_client.rs` shape: optional base URL (None = disabled, the
//! file watcher and manual-create handlers no-op), reqwest client with a
//! generous timeout because CPU inference on a backlog can take many seconds
//! per photo.
//!
//! Configured via `APOLLO_FACE_API_BASE_URL`, falling back to
//! `APOLLO_API_BASE_URL` when the dedicated var is unset (single-Apollo
//! deploys are the common case). Both unset → `is_enabled()` returns false.
//!
//! Wire format: multipart/form-data with `file=<bytes>` and `meta=<json>`.
//! `meta` carries `{content_hash, library_id, rel_path, orientation?,
//! model_version?}` — useful for Apollo-side logging and idempotency, ignored
//! by Apollo today but part of the stable wire contract so future versions
//! can act on it without a client change.
//!
//! Error mapping (reflected in [`FaceDetectError`]):
//! - 422 `decode_failed` → permanent: ImageApi marks `status='failed'` and
//! doesn't retry until manual rerun.
//! - 200 with `faces:[]` → `status='no_faces'` marker row.
//! - 503 `cuda_oom` / `engine_unavailable` → defer-and-retry: no marker
//! written.
//! - Any other 5xx / network error → defer.
use anyhow::{Context, Result};
use base64::Engine;
use reqwest::Client;
use serde::{Deserialize, Serialize};
use std::time::Duration;
#[derive(Debug, Clone, Serialize)]
pub struct DetectMeta {
pub content_hash: String,
pub library_id: i32,
pub rel_path: String,
/// EXIF orientation int (1..8). Apollo applies `exif_transpose` on the
/// bytes before inference, so this is informational only — supply when
/// the bytes were extracted from a RAW preview that lost the tag.
#[serde(skip_serializing_if = "Option::is_none")]
pub orientation: Option<i32>,
/// Echoed back in the response. ImageApi stores it in
/// `face_detections.model_version`.
#[serde(skip_serializing_if = "Option::is_none")]
pub model_version: Option<String>,
}
// Wire shape for the bbox sub-object Apollo returns. Read by Phase 3's
// file-watch hook; silence the dead-code lint until then.
#[allow(dead_code)]
#[derive(Debug, Clone, Deserialize)]
pub struct DetectedBbox {
pub x: f32,
pub y: f32,
pub w: f32,
pub h: f32,
}
#[allow(dead_code)] // bbox consumed by Phase 3 file-watch hook
#[derive(Debug, Clone, Deserialize)]
pub struct DetectedFace {
pub bbox: DetectedBbox,
pub confidence: f32,
/// base64 of 2048 bytes (512×f32 LE). ImageApi stores the raw bytes
/// verbatim as a BLOB — see `decode_embedding` for the unpack.
pub embedding: String,
}
impl DetectedFace {
/// Decode the wire-format embedding back into raw bytes for storage.
/// Returns the 2048-byte little-endian f32 buffer or an error if the
/// base64 is malformed or the wrong length.
pub fn decode_embedding(&self) -> Result<Vec<u8>> {
let bytes = base64::engine::general_purpose::STANDARD
.decode(self.embedding.as_bytes())
.context("face embedding base64 decode")?;
if bytes.len() != 2048 {
anyhow::bail!(
"face embedding wrong size: got {} bytes, expected 2048",
bytes.len()
);
}
Ok(bytes)
}
}
#[allow(dead_code)] // duration_ms logged by Phase 3 file-watch hook
#[derive(Debug, Clone, Deserialize)]
pub struct DetectResponse {
pub model_version: String,
pub duration_ms: i64,
pub faces: Vec<DetectedFace>,
}
#[derive(Debug, Clone, Deserialize)]
#[allow(dead_code)] // Reported by Apollo; useful for future health-driven backoff
pub struct FaceHealth {
pub loaded: bool,
pub providers: Vec<String>,
pub model_version: String,
pub det_size: i32,
#[serde(default)]
pub load_error: Option<String>,
}
/// Distinguishes permanent failures (don't retry) from transient ones
/// (defer and retry on next scan tick). The file-watch hook keys its
/// marker-row decision on this — a `Permanent` outcome writes
/// `status='failed'`, a `Transient` outcome writes nothing so the next
/// pass tries again.
#[derive(Debug)]
pub enum FaceDetectError {
/// Apollo refused the bytes for a reason that won't change on retry
/// (decode failure, zero-dim image). Mark `status='failed'`.
Permanent(anyhow::Error),
/// Apollo couldn't process this turn but might next time (CUDA OOM,
/// engine not loaded yet, network hiccup). Don't mark anything.
Transient(anyhow::Error),
/// Feature is disabled (no `APOLLO_FACE_API_BASE_URL`). Caller should
/// silently no-op — same shape as `apollo_client::is_enabled()` false.
Disabled,
}
impl std::fmt::Display for FaceDetectError {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
match self {
FaceDetectError::Permanent(e) => write!(f, "permanent: {e}"),
FaceDetectError::Transient(e) => write!(f, "transient: {e}"),
FaceDetectError::Disabled => write!(f, "face client disabled"),
}
}
}
impl std::error::Error for FaceDetectError {}
#[derive(Clone)]
pub struct FaceClient {
client: Client,
/// `None` → disabled. Trim trailing slash at construction so url
/// building doesn't double up.
base_url: Option<String>,
}
impl FaceClient {
pub fn new(base_url: Option<String>) -> Self {
// 60 s timeout: CPU inference on a backlog can take many seconds
// per photo, especially the first call into a cold GPU. Apollo's
// bounded threadpool (1 worker on CUDA) means concurrent calls
// queue server-side; 60 s is enough headroom for a few items in
// the queue without surfacing a false transient.
let timeout_secs = std::env::var("FACE_DETECT_TIMEOUT_SEC")
.ok()
.and_then(|s| s.parse::<u64>().ok())
.unwrap_or(60);
let client = Client::builder()
.timeout(Duration::from_secs(timeout_secs))
.build()
.expect("reqwest client build");
Self {
client,
base_url: base_url.map(|u| u.trim_end_matches('/').to_string()),
}
}
pub fn is_enabled(&self) -> bool {
self.base_url.is_some()
}
/// Detect every face in `bytes`. ImageApi calls this from the file-watch
/// hook (Phase 3) and from the manual rerun handler. Empty `faces[]` in
/// the response is the no-faces signal — caller writes a marker row.
#[allow(dead_code)] // Phase 3 file-watch hook + rerun handler
pub async fn detect(
&self,
bytes: Vec<u8>,
meta: DetectMeta,
) -> std::result::Result<DetectResponse, FaceDetectError> {
let Some(base) = self.base_url.as_deref() else {
return Err(FaceDetectError::Disabled);
};
let url = format!("{}/api/internal/faces/detect", base);
self.post_multipart(&url, bytes, &meta).await
}
/// Single-embedding endpoint for the manual face-create flow. Caller
/// crops the image to the user-drawn bbox and passes those bytes; we
/// run detection inside the crop and return the highest-confidence
/// face's embedding. Apollo returns 422 `no_face_in_crop` when the
/// box missed — surfaced here as `Permanent`.
pub async fn embed(
&self,
bytes: Vec<u8>,
meta: DetectMeta,
) -> std::result::Result<DetectResponse, FaceDetectError> {
let Some(base) = self.base_url.as_deref() else {
return Err(FaceDetectError::Disabled);
};
let url = format!("{}/api/internal/faces/embed", base);
self.post_multipart(&url, bytes, &meta).await
}
/// Engine reachability + provider/model report. Used by ImageApi for a
/// startup sanity check; not on the hot path.
#[allow(dead_code)] // Phase 3 startup probe
pub async fn health(&self) -> Result<FaceHealth> {
let base = self.base_url.as_deref().context("face client disabled")?;
let url = format!("{}/api/internal/faces/health", base);
let resp = self.client.get(&url).send().await?.error_for_status()?;
let body: FaceHealth = resp.json().await?;
Ok(body)
}
async fn post_multipart(
&self,
url: &str,
bytes: Vec<u8>,
meta: &DetectMeta,
) -> std::result::Result<DetectResponse, FaceDetectError> {
let meta_json = serde_json::to_string(meta)
.map_err(|e| FaceDetectError::Permanent(anyhow::anyhow!("meta serialize: {e}")))?;
let form = reqwest::multipart::Form::new()
.text("meta", meta_json)
.part(
"file",
reqwest::multipart::Part::bytes(bytes)
.file_name(meta.rel_path.clone())
.mime_str("application/octet-stream")
.unwrap_or_else(|_| reqwest::multipart::Part::bytes(Vec::new())),
);
let resp = match self.client.post(url).multipart(form).send().await {
Ok(r) => r,
Err(e) if e.is_timeout() || e.is_connect() => {
return Err(FaceDetectError::Transient(anyhow::anyhow!(
"face client network: {e}"
)));
}
Err(e) => {
return Err(FaceDetectError::Transient(anyhow::anyhow!(
"face client request: {e}"
)));
}
};
let status = resp.status();
if status.is_success() {
let body: DetectResponse = resp.json().await.map_err(|e| {
FaceDetectError::Transient(anyhow::anyhow!("face response decode: {e}"))
})?;
return Ok(body);
}
let body_text = resp.text().await.unwrap_or_default();
Err(classify_error_response(status.as_u16(), &body_text))
}
}
/// Map an Apollo HTTP error response to a FaceDetectError. Pulled out as a
/// pure function so the marker-row contract (422 → Permanent, 503 →
/// Transient) is unit-testable without spinning up an HTTP server.
fn classify_error_response(status: u16, body_text: &str) -> FaceDetectError {
// Apollo encodes its error class in the JSON body's `detail`. Try to
// parse it; fall back to status-only classification.
let detail_code = serde_json::from_str::<serde_json::Value>(body_text)
.ok()
.and_then(|v| {
// detail can be a string ("decode_failed") or an object
// ({"code": "cuda_oom", ...}) depending on the endpoint and
// Apollo's response shape — handle both.
v.get("detail")
.and_then(|d| d.as_str().map(str::to_string))
.or_else(|| {
v.get("detail")
.and_then(|d| d.get("code"))
.and_then(|c| c.as_str())
.map(str::to_string)
})
})
.unwrap_or_default();
if status == 422 {
return FaceDetectError::Permanent(anyhow::anyhow!(
"face detect 422 {}: {}",
detail_code,
body_text
));
}
if status == 503 {
return FaceDetectError::Transient(anyhow::anyhow!(
"face detect 503 {}: {}",
detail_code,
body_text
));
}
// Infra-level 4xx that an operator can fix without re-encoding the
// bytes: 408 (proxy timeout), 413 (request too large — reverse-proxy
// body cap), 429 (rate limit). Treating these as Permanent poisons
// every photo that hit the misconfig with `status='failed'` and
// requires a manual DELETE to recover. Defer instead so the next
// scan tick retries naturally once the proxy is fixed.
if matches!(status, 408 | 413 | 429) {
return FaceDetectError::Transient(anyhow::anyhow!(
"face detect {} {}: {}",
status,
detail_code,
body_text
));
}
// Any other 4xx: be conservative and treat as Permanent so we don't
// loop forever on a stable rejection. Any other 5xx: Transient —
// likely intermittent.
if (400..500).contains(&status) {
FaceDetectError::Permanent(anyhow::anyhow!(
"face detect {} {}: {}",
status,
detail_code,
body_text
))
} else {
FaceDetectError::Transient(anyhow::anyhow!(
"face detect {} {}: {}",
status,
detail_code,
body_text
))
}
}
#[cfg(test)]
mod tests {
use super::*;
fn is_permanent(e: &FaceDetectError) -> bool {
matches!(e, FaceDetectError::Permanent(_))
}
fn is_transient(e: &FaceDetectError) -> bool {
matches!(e, FaceDetectError::Transient(_))
}
#[test]
fn classify_422_decode_failed_is_permanent() {
// Permanent → ImageApi marks status='failed' and stops retrying.
let e = classify_error_response(422, r#"{"detail":"decode_failed: bad bytes"}"#);
assert!(is_permanent(&e), "422 decode_failed must be Permanent");
assert!(format!("{e}").contains("decode_failed"));
}
#[test]
fn classify_503_cuda_oom_is_transient() {
// Transient → ImageApi must NOT write a marker so the next scan
// retries. The detail.code is nested in an object rather than a
// bare string; the parser handles both.
let e = classify_error_response(
503,
r#"{"detail":{"code":"cuda_oom","error":"out of memory"}}"#,
);
assert!(is_transient(&e), "503 cuda_oom must be Transient");
assert!(format!("{e}").contains("cuda_oom"));
}
#[test]
fn classify_500_is_transient_other_4xx_is_permanent() {
// Conservative split: 5xx defers (intermittent), other 4xx
// is treated as a stable rejection so we don't loop forever.
assert!(is_transient(&classify_error_response(500, "")));
assert!(is_transient(&classify_error_response(502, "{}")));
assert!(is_permanent(&classify_error_response(400, "{}")));
assert!(is_permanent(&classify_error_response(404, "{}")));
}
#[test]
fn classify_infra_4xx_is_transient() {
// 408 / 413 / 429 are operator-fixable proxy/infra errors.
// Marking them Permanent poisons every affected photo with
// status='failed' and requires manual SQL to recover. The
// 413 path specifically bit us when nginx defaulted to a 1 MB
// body cap and rejected normal-size photos before they reached
// the backend.
assert!(is_transient(&classify_error_response(408, "")));
assert!(is_transient(&classify_error_response(
413,
"<html>nginx</html>"
)));
assert!(is_transient(&classify_error_response(429, "{}")));
}
#[test]
fn classify_handles_unparseable_body() {
// Apollo can return non-JSON on misroute / proxy errors; the
// classifier must still produce a useful variant.
let e = classify_error_response(503, "<html>nginx</html>");
assert!(is_transient(&e));
}
}
+88
View File
@@ -0,0 +1,88 @@
// GPU lease — in-process coordination for llama-swap model contention.
//
// llama-swap runs the heavyweight models (chat / vision / Chatterbox TTS) as
// a mutually-exclusive set on one GPU (matrix DSL `(q27 | … | tts) & e`): a
// request for a non-resident model is HELD by llama-swap until the resident
// model's in-flight requests drain, then the models swap. That hold counts
// against the *holder's* reqwest timeout — measured live: a queued TTS burned
// 77s of its budget behind a single LLM turn, and an LLM request behind a
// running synthesis waited the entire remaining synth. Uncoordinated
// cross-model traffic therefore times out instead of queueing.
//
// The lease moves that wait into this process, BEFORE the HTTP request is
// sent and before its timeout starts:
// - chat/vision requests (the LLM-side slots) share the READ lease;
// - TTS synthesis and voice-library ops (anything that spins Chatterbox up
// and evicts the LLM) take the WRITE lease;
// - embeddings take NO lease: the `embed` slot is in llama-swap's
// always-resident group (the `& e` term) and never participates in a swap,
// so leasing it would only stall searches behind a queued synthesis.
//
// tokio's RwLock is fair (FIFO, write-preferring): a queued TTS gets the GPU
// right after the current LLM request drains, and later LLM requests queue
// behind it — bounded waits in both directions, no starvation, no timeout
// budget burned while waiting.
//
// RULES: hold a lease for exactly one HTTP request (for streaming, the
// stream's lifetime) and NEVER acquire one while already holding one — once a
// writer is queued, new read acquisitions block, so nested acquisition can
// deadlock.
use std::sync::LazyLock;
use std::time::Instant;
use tokio::sync::{RwLock, RwLockReadGuard, RwLockWriteGuard};
static GPU_LEASE: LazyLock<RwLock<()>> = LazyLock::new(|| RwLock::new(()));
/// Waits longer than this are logged — they mean a cross-model swap was
/// avoided and quantify what the request *would* have burned of its timeout.
const SLOW_WAIT_LOG_SECS: f64 = 2.0;
/// Shared lease for LLM-side requests (chat / vision slots).
pub async fn llm_lease() -> RwLockReadGuard<'static, ()> {
let started = Instant::now();
let guard = GPU_LEASE.read().await;
log_slow_wait("llm", started);
guard
}
/// Exclusive lease for TTS-side requests (speech synthesis + voice-library
/// ops that spin up Chatterbox).
pub async fn tts_lease() -> RwLockWriteGuard<'static, ()> {
let started = Instant::now();
let guard = GPU_LEASE.write().await;
log_slow_wait("tts", started);
guard
}
fn log_slow_wait(kind: &str, started: Instant) {
let waited = started.elapsed().as_secs_f64();
if waited > SLOW_WAIT_LOG_SECS {
log::info!("GPU lease ({kind}): waited {waited:.1}s for the other model class to drain");
}
}
#[cfg(test)]
mod tests {
use super::*;
// One sequential test, not several: the lease is a single global, so
// parallel tests interleaving reads and writes on it can hit the very
// nested-acquisition deadlock the module comment warns about.
#[tokio::test]
async fn write_lease_excludes_readers_then_reads_share() {
let w = tts_lease().await;
// A reader must not acquire while the writer is held.
let pending = tokio::spawn(async { drop(llm_lease().await) });
tokio::task::yield_now().await;
assert!(!pending.is_finished());
drop(w);
pending.await.expect("reader acquires after writer drops");
// With no writer queued, read leases are shared.
let a = llm_lease().await;
let b = llm_lease().await;
drop(a);
drop(b);
}
}
+1720 -153
View File
File diff suppressed because it is too large Load Diff
File diff suppressed because it is too large Load Diff
+2797 -498
View File
File diff suppressed because it is too large Load Diff
+1425
View File
File diff suppressed because it is too large Load Diff
+224
View File
@@ -0,0 +1,224 @@
use anyhow::Result;
use async_trait::async_trait;
use futures::stream::BoxStream;
use serde::{Deserialize, Serialize};
/// Provider-agnostic surface for LLM backends (Ollama, OpenRouter, …).
///
/// Impls translate these canonical shapes at the wire boundary: tool-call
/// arguments stay as `serde_json::Value` in memory and are stringified only
/// when a provider requires it (OpenAI-compatible APIs do), and `images`
/// stays as base64 strings here and is rewritten into content-parts where
/// needed.
// First consumer lands in a later PR (OpenRouter impl + hybrid mode routing).
#[allow(dead_code)]
#[async_trait]
pub trait LlmClient: Send + Sync {
/// Single-shot text generation. Optional system prompt and optional
/// base64 images (ignored by providers without vision support).
async fn generate(
&self,
prompt: &str,
system: Option<&str>,
images: Option<Vec<String>>,
) -> Result<String>;
/// Multi-turn chat with tool definitions. Returns the assistant message
/// (which may contain tool_calls) plus optional prompt/eval token counts.
async fn chat_with_tools(
&self,
messages: Vec<ChatMessage>,
tools: Vec<Tool>,
) -> Result<(ChatMessage, Option<i32>, Option<i32>)>;
/// Streaming variant of `chat_with_tools`. The returned stream yields
/// `TextDelta` items as content is produced, then a single terminal
/// `Done` carrying the complete assembled message (with tool_calls, if
/// any) plus token usage counts. Implementations that can't stream may
/// fall back to calling `chat_with_tools` and emitting the full reply
/// as one `Done` event.
async fn chat_with_tools_stream(
&self,
messages: Vec<ChatMessage>,
tools: Vec<Tool>,
) -> Result<BoxStream<'static, Result<LlmStreamEvent>>>;
/// Batch embedding generation. Dimensionality is provider/model specific.
async fn generate_embeddings(&self, texts: &[&str]) -> Result<Vec<Vec<f32>>>;
/// One-shot vision description of an image. Used to convert images into
/// plain text for the hybrid-mode conversation flow.
async fn describe_image(&self, image_base64: &str) -> Result<String>;
/// Enumerate available models with their capabilities.
async fn list_models(&self) -> Result<Vec<ModelCapabilities>>;
/// Look up capabilities for a single model.
async fn model_capabilities(&self, model: &str) -> Result<ModelCapabilities>;
/// Primary model identifier this client was constructed with.
fn primary_model(&self) -> &str;
}
/// Events emitted by streaming `chat_with_tools_stream`. A stream is a
/// sequence of zero or more `TextDelta` events followed by exactly one
/// `Done`. Callers should treat `Done` as terminal — further items (if any
/// slip through due to upstream misbehavior) are safe to ignore.
#[derive(Debug, Clone)]
pub enum LlmStreamEvent {
/// Incremental content token(s) from the model. Concatenate in order to
/// reconstruct the assistant's final text.
TextDelta(String),
/// Terminal event with the full assembled message (content + any
/// tool_calls). `message.content` equals the concatenation of every
/// preceding `TextDelta.0`.
Done {
message: ChatMessage,
prompt_eval_count: Option<i32>,
eval_count: Option<i32>,
},
}
/// Tool definition sent to the model (OpenAI-compatible function schema).
#[derive(Serialize, Clone, Debug)]
pub struct Tool {
#[serde(rename = "type")]
pub tool_type: String, // always "function"
pub function: ToolFunction,
}
#[derive(Serialize, Clone, Debug)]
pub struct ToolFunction {
pub name: String,
pub description: String,
pub parameters: serde_json::Value,
}
impl Tool {
pub fn function(name: &str, description: &str, parameters: serde_json::Value) -> Self {
Self {
tool_type: "function".to_string(),
function: ToolFunction {
name: name.to_string(),
description: description.to_string(),
parameters,
},
}
}
}
/// A message in the chat conversation history.
#[derive(Serialize, Deserialize, Clone, Debug)]
pub struct ChatMessage {
pub role: String, // "system" | "user" | "assistant" | "tool"
/// Empty string (not null) when tool_calls is present — Ollama quirk.
#[serde(default)]
pub content: String,
#[serde(skip_serializing_if = "Option::is_none")]
pub tool_calls: Option<Vec<ToolCall>>,
/// Base64 images — only on user messages to vision-capable models.
#[serde(skip_serializing_if = "Option::is_none")]
pub images: Option<Vec<String>>,
}
impl ChatMessage {
pub fn system(content: impl Into<String>) -> Self {
Self {
role: "system".to_string(),
content: content.into(),
tool_calls: None,
images: None,
}
}
pub fn user(content: impl Into<String>) -> Self {
Self {
role: "user".to_string(),
content: content.into(),
tool_calls: None,
images: None,
}
}
pub fn tool_result(content: impl Into<String>) -> Self {
Self {
role: "tool".to_string(),
content: content.into(),
tool_calls: None,
images: None,
}
}
}
/// Tool call returned by the model in an assistant message.
#[derive(Serialize, Deserialize, Clone, Debug)]
pub struct ToolCall {
pub function: ToolCallFunction,
#[serde(skip_serializing_if = "Option::is_none")]
pub id: Option<String>,
}
#[derive(Serialize, Deserialize, Clone, Debug)]
pub struct ToolCallFunction {
pub name: String,
/// Canonical shape: native JSON. Providers that use JSON-encoded-string
/// arguments (OpenAI-compatible) translate at their wire boundary.
pub arguments: serde_json::Value,
}
#[derive(Serialize, Deserialize, Clone, Debug)]
pub struct ModelCapabilities {
pub name: String,
pub has_vision: bool,
pub has_tool_calling: bool,
}
/// Strip a leading `<think>…</think>` reasoning block from model output.
///
/// Thinking models sometimes emit chain-of-thought inside think tags before
/// the real answer. Everything after the first `</think>` is the answer;
/// when no tag is present — or the text after it is empty — the trimmed
/// input is returned unchanged. Mirrors the behavior Ollama's
/// `extract_final_answer` has applied to single-shot generation; shared here
/// so the tool-calling final-content paths (agentic generation + chat) can
/// apply the identical cleanup before parsing / persisting.
pub fn strip_think_blocks(response: &str) -> String {
let response = response.trim();
if let Some(pos) = response.find("</think>") {
let answer = response[pos + "</think>".len()..].trim();
if !answer.is_empty() {
return answer.to_string();
}
}
response.to_string()
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn strip_think_blocks_removes_leading_think_block() {
let raw = "<think>\nLet me reason about this.\n</think>\n\nTitle: A Day Out\n\nThe body.";
assert_eq!(strip_think_blocks(raw), "Title: A Day Out\n\nThe body.");
}
#[test]
fn strip_think_blocks_passes_through_plain_content() {
assert_eq!(strip_think_blocks(" just an answer "), "just an answer");
}
#[test]
fn strip_think_blocks_keeps_content_when_answer_after_tag_is_empty() {
// A think block with nothing after it: better to return the trimmed
// original than an empty string (matches Ollama's fallback).
let raw = "<think>only thoughts</think>";
assert_eq!(strip_think_blocks(raw), raw);
}
#[test]
fn strip_think_blocks_handles_unclosed_tag() {
let raw = "<think>thinking forever";
assert_eq!(strip_think_blocks(raw), raw);
}
}
+88
View File
@@ -0,0 +1,88 @@
//! Bundle of the local LLM pair (Ollama + optional llama-swap) with the
//! `LLM_BACKEND` dispatch baked in.
//!
//! Exists because passing the pair around as loose values invited the same
//! bug three times: import/backfill tooling embedded corpora via
//! `OllamaClient` directly while the query side dispatched through
//! `embed_one`, so flipping `LLM_BACKEND=llamacpp` silently split queries
//! and corpus into different vector spaces. Anything that writes or reads
//! embeddings should go through this type (or `embed_one`/`embed_many`),
//! never a concrete client.
//!
//! Deliberately knows nothing about chat policy — hybrid/OpenRouter routing
//! is request-scoped and stays in `ResolvedBackend`. This is only the
//! local stack: embeddings and offline single-shot generation.
// Constructed by binaries, not the server — dead code from main.rs's view.
#![allow(dead_code)]
use std::sync::Arc;
use anyhow::Result;
use super::llamacpp::LlamaCppClient;
use super::llm_client::LlmClient;
use super::ollama::{EMBEDDING_MODEL, OllamaClient};
#[derive(Clone)]
pub struct LocalLlm {
ollama: OllamaClient,
llamacpp: Option<Arc<LlamaCppClient>>,
}
impl LocalLlm {
pub fn new(ollama: OllamaClient, llamacpp: Option<Arc<LlamaCppClient>>) -> Self {
Self { ollama, llamacpp }
}
/// Construct from the canonical env wiring shared with `AppState`.
pub fn from_env() -> Self {
Self::new(
crate::state::build_ollama_from_env(),
crate::state::build_llamacpp_from_env(),
)
}
/// Embed a search query (applies `EMBED_QUERY_PREFIX`). Callers must
/// pick query vs document — retrieval models treat the two sides
/// differently and an unmarked embed invites prefix-mismatch bugs.
pub async fn embed_query(&self, text: &str) -> Result<Vec<f32>> {
super::embed_query(&self.ollama, self.llamacpp.as_deref(), text).await
}
/// Embed corpus text (applies `EMBED_DOCUMENT_PREFIX`).
pub async fn embed_document(&self, text: &str) -> Result<Vec<f32>> {
super::embed_document(&self.ollama, self.llamacpp.as_deref(), text).await
}
/// Single-shot local text generation via the `LLM_BACKEND`-selected
/// client (offline tooling; chat turns belong to `ResolvedBackend`).
pub async fn generate(&self, prompt: &str, system: Option<&str>) -> Result<String> {
if super::local_backend_is_llamacpp() {
if let Some(lc) = self.llamacpp.as_deref() {
return <LlamaCppClient as LlmClient>::generate(lc, prompt, system, None).await;
}
anyhow::bail!(
"LLM_BACKEND=llamacpp but LlamaCppClient is unconfigured — \
set LLAMA_SWAP_URL or switch to LLM_BACKEND=ollama"
);
}
self.ollama.generate(prompt, system).await
}
/// Label identifying which backend + model produces embeddings right
/// now. Store it alongside vectors (`model_version` columns) so a
/// backend flip is detectable in the data, not just in env history.
pub fn embedding_model_version(&self) -> String {
if super::local_backend_is_llamacpp() {
let slot = self
.llamacpp
.as_deref()
.map(|c| c.embedding_model.as_str())
.unwrap_or("embed");
format!("llama-swap:{}", slot)
} else {
EMBEDDING_MODEL.to_string()
}
}
}
+196 -5
View File
@@ -1,17 +1,208 @@
pub mod apollo_client;
pub mod backend;
pub mod clip_client;
pub mod daily_summary_job;
pub mod face_client;
pub mod gpu;
pub mod handlers;
pub mod insight_chat;
pub mod insight_generator;
pub mod llamacpp;
pub mod llm_client;
pub mod local_llm;
pub mod ollama;
pub mod openrouter;
pub mod pronunciation;
pub mod sms_client;
pub mod tts;
pub mod turn_registry;
// strip_summary_boilerplate is used by binaries (test_daily_summary), not the library
#[allow(unused_imports)]
pub use daily_summary_job::{generate_daily_summaries, strip_summary_boilerplate};
pub use daily_summary_job::{
DAILY_SUMMARY_MESSAGE_LIMIT, DAILY_SUMMARY_SYSTEM_PROMPT, build_daily_summary_prompt,
generate_daily_summaries, strip_summary_boilerplate,
};
pub use handlers::{
delete_insight_handler, export_training_data_handler, generate_agentic_insight_handler,
generate_insight_handler, get_all_insights_handler, get_available_models_handler,
get_insight_handler, rate_insight_handler,
cancel_generation_handler, cancel_turn_handler, chat_history_handler, chat_rewind_handler,
chat_stream_handler, chat_turn_handler, delete_insight_handler, export_training_data_handler,
generate_agentic_insight_handler, generate_insight_handler, generation_status_handler,
get_all_insights_handler, get_available_models_handler, get_insight_handler,
get_insight_history_handler, get_openrouter_models_handler, rate_insight_handler,
turn_async_handler, turn_replay_handler,
};
pub use insight_generator::InsightGenerator;
pub use ollama::{ModelCapabilities, OllamaClient};
pub use llamacpp::LlamaCppClient;
#[allow(unused_imports)]
pub use llm_client::{
ChatMessage, LlmClient, ModelCapabilities, Tool, ToolCall, ToolCallFunction, ToolFunction,
};
// LocalLlm is constructed by binaries (reembed_embeddings, importers), not the server
#[allow(unused_imports)]
pub use local_llm::LocalLlm;
pub use ollama::{EMBEDDING_MODEL, OllamaClient};
pub use sms_client::{SmsApiClient, SmsMessage};
pub use tts::{
cancel_speech_job_handler, create_speech_job_handler, create_voice_from_library_handler,
create_voice_upload_handler, delete_voice_handler, list_voices_handler,
speech_job_status_handler, tts_speech_handler,
};
/// Display name used for the user in message transcripts and first-person
/// prompt text. Reads the `USER_NAME` env var; defaults to `"Me"`. Models
/// often confuse `"Me:"` in a transcript with their own role — setting
/// `USER_NAME=Cameron` (or similar) in the environment eliminates that
/// ambiguity across daily summaries, insight generation, and chat.
pub fn user_display_name() -> String {
std::env::var("USER_NAME").unwrap_or_else(|_| "Me".to_string())
}
/// One switch for the "local" LLM stack: when `LLM_BACKEND=llamacpp` is
/// set, chat / vision describe / embeddings all route through llama-swap
/// instead of Ollama. Any other value (including unset, the default) is
/// Ollama. This is intentionally global — embeddings must be drawn from
/// a single source or similarity search across the index breaks (mixed
/// vector spaces, possibly mixed dims). The `backend=hybrid` per-request
/// override remains orthogonal: it always sends chat to OpenRouter, and
/// uses `LLM_BACKEND` for the describe-then-inline vision pass.
pub fn local_backend_is_llamacpp() -> bool {
matches!(
std::env::var("LLM_BACKEND")
.ok()
.as_deref()
.map(|s| s.trim().to_lowercase())
.as_deref(),
Some("llamacpp")
)
}
/// Expected embedding dimensionality, env-overridable via `EMBEDDING_DIM`
/// (default 768, nomic-embed-text). Every store/query dim check reads this —
/// swapping to a different-dim model (e.g. Qwen3-Embedding-0.6B at 1024) is
/// then a config flip plus a `reembed_embeddings` run, not a code change.
/// Cached for the process lifetime; a flip requires a restart anyway since
/// the corpus must be re-embedded with it.
pub fn embedding_dim() -> usize {
static DIM: std::sync::OnceLock<usize> = std::sync::OnceLock::new();
*DIM.get_or_init(|| {
std::env::var("EMBEDDING_DIM")
.ok()
.and_then(|v| v.parse().ok())
.unwrap_or(768)
})
}
/// Read an embedding prefix from the environment. `.env` values can't hold
/// real newlines, so a literal `\n` in the value is expanded — Qwen3-style
/// query instructions need one ("Instruct: ...\nQuery: ").
fn embed_prefix(key: &str) -> String {
std::env::var(key)
.map(|v| v.replace("\\n", "\n"))
.unwrap_or_default()
}
/// Embed a search query. Applies `EMBED_QUERY_PREFIX` (default empty) —
/// retrieval models distinguish query-side from document-side text:
/// nomic v1.5 wants `search_query: `, Qwen3-Embedding wants
/// `Instruct: <task>\nQuery: `. Must pair with the document prefix the
/// corpus was embedded with or similarity degrades.
pub async fn embed_query(
ollama: &OllamaClient,
llamacpp: Option<&LlamaCppClient>,
text: &str,
) -> anyhow::Result<Vec<f32>> {
let prefixed = format!("{}{}", embed_prefix("EMBED_QUERY_PREFIX"), text);
embed_one(ollama, llamacpp, &prefixed).await
}
/// Embed corpus text (the stored side of retrieval). Applies
/// `EMBED_DOCUMENT_PREFIX` (default empty; nomic v1.5 wants
/// `search_document: `, Qwen3-Embedding wants none).
pub async fn embed_document(
ollama: &OllamaClient,
llamacpp: Option<&LlamaCppClient>,
text: &str,
) -> anyhow::Result<Vec<f32>> {
let prefixed = format!("{}{}", embed_prefix("EMBED_DOCUMENT_PREFIX"), text);
embed_one(ollama, llamacpp, &prefixed).await
}
/// Embed a batch of strings via the configured local backend. Routes
/// through llama-swap when `LLM_BACKEND=llamacpp` (and a client is
/// configured), else Ollama. See [`local_backend_is_llamacpp`] for the
/// rationale on consistency.
pub async fn embed_many(
ollama: &OllamaClient,
llamacpp: Option<&LlamaCppClient>,
texts: &[&str],
) -> anyhow::Result<Vec<Vec<f32>>> {
if local_backend_is_llamacpp() {
if let Some(lc) = llamacpp {
return <LlamaCppClient as LlmClient>::generate_embeddings(lc, texts).await;
}
anyhow::bail!(
"LLM_BACKEND=llamacpp but LlamaCppClient is unconfigured — \
set LLAMA_SWAP_URL or switch to LLM_BACKEND=ollama"
);
}
ollama.generate_embeddings(texts).await
}
/// Embed one string via the configured local backend. Single-text
/// convenience over [`embed_many`].
pub async fn embed_one(
ollama: &OllamaClient,
llamacpp: Option<&LlamaCppClient>,
text: &str,
) -> anyhow::Result<Vec<f32>> {
let mut vecs = embed_many(ollama, llamacpp, &[text]).await?;
vecs.pop()
.ok_or_else(|| anyhow::anyhow!("embedding backend returned no embeddings"))
}
#[cfg(test)]
mod env_dispatch_tests {
use super::*;
/// Env vars are process-global, and the test harness runs in parallel —
/// without this lock the `LLM_BACKEND` tests race each other and flake.
static ENV_LOCK: std::sync::Mutex<()> = std::sync::Mutex::new(());
fn with_env<F: FnOnce()>(key: &str, val: Option<&str>, f: F) {
let _guard = ENV_LOCK.lock().unwrap_or_else(|p| p.into_inner());
let prev = std::env::var(key).ok();
match val {
Some(v) => unsafe { std::env::set_var(key, v) },
None => unsafe { std::env::remove_var(key) },
}
f();
match prev {
Some(v) => unsafe { std::env::set_var(key, v) },
None => unsafe { std::env::remove_var(key) },
}
}
#[test]
fn llm_backend_defaults_to_ollama() {
with_env("LLM_BACKEND", None, || {
assert!(!local_backend_is_llamacpp());
});
}
#[test]
fn llm_backend_llamacpp_case_insensitive() {
with_env("LLM_BACKEND", Some("LlamaCpp"), || {
assert!(local_backend_is_llamacpp());
});
with_env("LLM_BACKEND", Some(" llamacpp "), || {
assert!(local_backend_is_llamacpp());
});
}
#[test]
fn llm_backend_unknown_value_is_ollama() {
with_env("LLM_BACKEND", Some("vllm"), || {
assert!(!local_backend_is_llamacpp());
});
}
}
+559 -173
View File
@@ -1,14 +1,43 @@
use anyhow::{Context, Result};
use async_trait::async_trait;
use chrono::NaiveDate;
use reqwest::Client;
use serde::{Deserialize, Serialize};
use std::collections::HashMap;
use std::sync::atomic::{AtomicBool, Ordering};
use std::sync::{Arc, Mutex};
use std::time::{Duration, Instant};
use crate::ai::llm_client::{LlmClient, LlmStreamEvent};
use futures::stream::{BoxStream, StreamExt};
// Re-export shared types so existing `crate::ai::ollama::{...}` imports
// continue to resolve.
pub use crate::ai::llm_client::{ChatMessage, ModelCapabilities, Tool};
#[allow(unused_imports)]
pub use crate::ai::llm_client::{ToolCall, ToolCallFunction, ToolFunction};
// Cache duration: 15 minutes
const CACHE_DURATION_SECS: u64 = 15 * 60;
/// Default total request timeout for generation calls, in seconds.
/// Overridable via `OLLAMA_REQUEST_TIMEOUT_SECONDS` env var for slow
/// CPU-offloaded models where inference can take several minutes.
const DEFAULT_REQUEST_TIMEOUT_SECS: u64 = 120;
fn configured_request_timeout_secs() -> u64 {
std::env::var("OLLAMA_REQUEST_TIMEOUT_SECONDS")
.ok()
.and_then(|v| v.parse::<u64>().ok())
.filter(|&s| s > 0)
.unwrap_or(DEFAULT_REQUEST_TIMEOUT_SECS)
}
/// Embedding model used across the app. Callers that persist a
/// `model_version` alongside an embedding should read this constant so the
/// stored label always matches what `generate_embeddings` actually ran.
pub const EMBEDDING_MODEL: &str = "nomic-embed-text:v1.5";
// Cached entry with timestamp
#[derive(Clone)]
struct CachedEntry<T> {
@@ -50,6 +79,12 @@ pub struct OllamaClient {
top_p: Option<f32>,
top_k: Option<i32>,
min_p: Option<f32>,
/// Sticky preference shared across clones: when the fallback server
/// succeeded most recently, try it first on the next call. Avoids
/// re-probing the primary with a model it doesn't have loaded across
/// every iteration of the agent loop. `Arc<AtomicBool>` so cloning
/// `OllamaClient` shares the flag rather than resetting it.
prefer_fallback: Arc<AtomicBool>,
}
impl OllamaClient {
@@ -62,7 +97,7 @@ impl OllamaClient {
Self {
client: Client::builder()
.connect_timeout(Duration::from_secs(5)) // Quick connection timeout
.timeout(Duration::from_secs(120)) // Total request timeout for generation
.timeout(Duration::from_secs(configured_request_timeout_secs()))
.build()
.unwrap_or_else(|_| Client::new()),
primary_url,
@@ -74,9 +109,44 @@ impl OllamaClient {
top_p: None,
top_k: None,
min_p: None,
prefer_fallback: Arc::new(AtomicBool::new(false)),
}
}
/// Return the server attempt order as `(label, url, model)` tuples.
/// Respects the sticky `prefer_fallback` flag so the most recently
/// successful server is tried first.
fn attempt_order(&self) -> Vec<(&'static str, String, String)> {
let primary = (
"primary",
self.primary_url.clone(),
self.primary_model.clone(),
);
let fallback = self.fallback_url.as_ref().map(|url| {
let model = self
.fallback_model
.clone()
.unwrap_or_else(|| self.primary_model.clone());
("fallback", url.clone(), model)
});
let prefer_fallback = fallback.is_some() && self.prefer_fallback.load(Ordering::Relaxed);
let mut order = Vec::with_capacity(2);
if prefer_fallback {
if let Some(fb) = fallback.clone() {
order.push(fb);
}
order.push(primary);
} else {
order.push(primary);
if let Some(fb) = fallback {
order.push(fb);
}
}
order
}
pub fn set_num_ctx(&mut self, num_ctx: Option<i32>) {
self.num_ctx = num_ctx;
}
@@ -120,6 +190,7 @@ impl OllamaClient {
/// Replace the HTTP client with one using a custom request timeout.
/// Useful for slow models where the default 120s may be insufficient.
#[allow(dead_code)]
pub fn with_request_timeout(mut self, secs: u64) -> Self {
self.client = Client::builder()
.connect_timeout(Duration::from_secs(5))
@@ -174,6 +245,7 @@ impl OllamaClient {
}
/// Clear the model list cache for a specific URL or all URLs
#[allow(dead_code)]
pub fn clear_model_cache(url: Option<&str>) {
let mut cache = MODEL_LIST_CACHE.lock().unwrap();
if let Some(url) = url {
@@ -186,6 +258,7 @@ impl OllamaClient {
}
/// Clear the model capabilities cache for a specific URL or all URLs
#[allow(dead_code)]
pub fn clear_capabilities_cache(url: Option<&str>) {
let mut cache = MODEL_CAPABILITIES_CACHE.lock().unwrap();
if let Some(url) = url {
@@ -287,18 +360,7 @@ impl OllamaClient {
/// Extract final answer from thinking model output
/// Handles <think>...</think> tags and takes everything after
fn extract_final_answer(&self, response: &str) -> String {
let response = response.trim();
// Look for </think> tag and take everything after it
if let Some(pos) = response.find("</think>") {
let answer = response[pos + 8..].trim();
if !answer.is_empty() {
return answer.to_string();
}
}
// Fallback: return the whole response trimmed
response.to_string()
crate::ai::llm_client::strip_think_blocks(response)
}
async fn try_generate(
@@ -308,6 +370,7 @@ impl OllamaClient {
prompt: &str,
system: Option<&str>,
images: Option<Vec<String>>,
think: Option<bool>,
) -> Result<String> {
let request = OllamaRequest {
model: model.to_string(),
@@ -316,6 +379,7 @@ impl OllamaClient {
system: system.map(|s| s.to_string()),
options: self.build_options(),
images,
think,
};
let response = self
@@ -336,6 +400,12 @@ impl OllamaClient {
}
let result: OllamaResponse = response.json().await?;
log_chat_metrics(
result.prompt_eval_count,
result.prompt_eval_duration,
result.eval_count,
result.eval_duration,
);
Ok(result.response)
}
@@ -343,11 +413,28 @@ impl OllamaClient {
self.generate_with_images(prompt, system, None).await
}
#[allow(dead_code)]
pub async fn generate_no_think(&self, prompt: &str, system: Option<&str>) -> Result<String> {
self.generate_with_options(prompt, system, None, Some(false))
.await
}
pub async fn generate_with_images(
&self,
prompt: &str,
system: Option<&str>,
images: Option<Vec<String>>,
) -> Result<String> {
self.generate_with_options(prompt, system, images, None)
.await
}
async fn generate_with_options(
&self,
prompt: &str,
system: Option<&str>,
images: Option<Vec<String>>,
think: Option<bool>,
) -> Result<String> {
log::debug!("=== Ollama Request ===");
log::debug!("Primary model: {}", self.primary_model);
@@ -373,6 +460,7 @@ impl OllamaClient {
prompt,
system,
images.clone(),
think,
)
.await;
@@ -396,7 +484,14 @@ impl OllamaClient {
fallback_model
);
match self
.try_generate(fallback_url, fallback_model, prompt, system, images.clone())
.try_generate(
fallback_url,
fallback_model,
prompt,
system,
images.clone(),
think,
)
.await
{
Ok(response) => {
@@ -453,7 +548,16 @@ Capture the key moment or theme. Return ONLY the title, nothing else."#,
let title = self
.generate_with_images(&prompt, Some(system), None)
.await?;
Ok(title.trim().trim_matches('"').to_string())
// Models decorate despite "Return ONLY the title": quotes, bold
// markers, sometimes a "Title:" label.
use crate::ai::insight_generator::strip_title_markdown;
let cleaned = strip_title_markdown(title.trim());
let cleaned = cleaned
.strip_prefix("Title:")
.or_else(|| cleaned.strip_prefix("title:"))
.map(strip_title_markdown)
.unwrap_or(cleaned);
Ok(cleaned.to_string())
}
/// Generate a summary for a single photo based on its context
@@ -468,6 +572,7 @@ Capture the key moment or theme. Return ONLY the title, nothing else."#,
) -> Result<String> {
let location_str = location.unwrap_or("Unknown");
let sms_str = sms_summary.unwrap_or("No messages");
let user_name = crate::ai::user_display_name();
let prompt = if image_base64.is_some() {
if let Some(contact_name) = contact {
@@ -479,13 +584,14 @@ Location: {}
Person/Contact: {}
Messages: {}
Analyze the image and use specific details from both the visual content and the context above. The photo is from a folder for {}, so they are likely in or related to this photo. Mention people's names (especially {}), places, or activities if they appear in either the image or the context. Write in first person as Cameron with the tone of a journal entry. If limited information is available, keep it simple and factual based on what you see and know. If the location is unknown omit it"#,
Analyze the image and use specific details from both the visual content and the context above. The photo is from a folder for {}, so they are likely in or related to this photo. Mention people's names (especially {}), places, or activities if they appear in either the image or the context. Write in first person as {} with the tone of a journal entry. If limited information is available, keep it simple and factual based on what you see and know. If the location is unknown omit it"#,
date.format("%B %d, %Y"),
location_str,
contact_name,
sms_str,
contact_name,
contact_name
contact_name,
user_name
)
} else {
format!(
@@ -495,10 +601,11 @@ Date: {}
Location: {}
Messages: {}
Analyze the image and use specific details from both the visual content and the context above. Mention people's names, places, or activities if they appear in either the image or the context. Write in first person as Cameron with the tone of a journal entry. If limited information is available, keep it simple and factual based on what you see and know. If the location is unknown omit it"#,
Analyze the image and use specific details from both the visual content and the context above. Mention people's names, places, or activities if they appear in either the image or the context. Write in first person as {} with the tone of a journal entry. If limited information is available, keep it simple and factual based on what you see and know. If the location is unknown omit it"#,
date.format("%B %d, %Y"),
location_str,
sms_str
sms_str,
user_name
)
}
} else if let Some(contact_name) = contact {
@@ -510,13 +617,14 @@ Analyze the image and use specific details from both the visual content and the
Person/Contact: {}
Messages: {}
Use only the specific details provided above. The photo is from a folder for {}, so they are likely related to this moment. Mention people's names (especially {}), places, or activities if they appear in the context. Write in first person as Cameron with the tone of a journal entry. If limited information is available, keep it simple and factual. If the location is unknown omit it"#,
Use only the specific details provided above. The photo is from a folder for {}, so they are likely related to this moment. Mention people's names (especially {}), places, or activities if they appear in the context. Write in first person as {} with the tone of a journal entry. If limited information is available, keep it simple and factual. If the location is unknown omit it"#,
date.format("%B %d, %Y"),
location_str,
contact_name,
sms_str,
contact_name,
contact_name
contact_name,
user_name
)
} else {
format!(
@@ -526,10 +634,11 @@ Analyze the image and use specific details from both the visual content and the
Location: {}
Messages: {}
Use only the specific details provided above. Mention people's names, places, or activities if they appear in the context. Write in first person as Cameron with the tone of a journal entry. If limited information is available, keep it simple and factual. If the location is unknown omit it"#,
Use only the specific details provided above. Mention people's names, places, or activities if they appear in the context. Write in first person as {} with the tone of a journal entry. If limited information is available, keep it simple and factual. If the location is unknown omit it"#,
date.format("%B %d, %Y"),
location_str,
sms_str
sms_str,
user_name
)
};
@@ -558,68 +667,232 @@ Analyze the image and use specific details from both the visual content and the
/// Send a chat request with tool definitions to /api/chat.
/// Returns the assistant's response message (may contain tool_calls or final content).
/// Uses primary/fallback URL routing same as other generation methods.
/// Tries servers in preference order — most recently successful first —
/// so a fallback-only model doesn't re-404 against the primary on every
/// iteration of the agent loop.
pub async fn chat_with_tools(
&self,
messages: Vec<ChatMessage>,
tools: Vec<Tool>,
) -> Result<(ChatMessage, Option<i32>, Option<i32>)> {
// Try primary server first
log::info!(
"Attempting chat_with_tools with primary server: {} (model: {})",
self.primary_url,
self.primary_model
);
let primary_result = self
.try_chat_with_tools(&self.primary_url, messages.clone(), tools.clone())
.await;
match primary_result {
Ok(result) => {
log::info!("Successfully got chat_with_tools response from primary server");
Ok(result)
}
Err(e) => {
log::warn!("Primary server chat_with_tools failed: {}", e);
// Try fallback server if available
if let Some(fallback_url) = &self.fallback_url {
let fallback_model =
self.fallback_model.as_ref().unwrap_or(&self.primary_model);
let order = self.attempt_order();
let mut errors: Vec<String> = Vec::new();
for (label, url, model) in &order {
log::info!(
"Attempting chat_with_tools with {} server: {} (model: {})",
label,
url,
model
);
match self
.try_chat_with_tools(url, messages.clone(), tools.clone())
.await
{
Ok(result) => {
log::info!(
"Attempting chat_with_tools with fallback server: {} (model: {})",
fallback_url,
fallback_model
"Successfully got chat_with_tools response from {} server",
label
);
match self
.try_chat_with_tools(fallback_url, messages, tools)
.await
{
Ok(result) => {
log::info!(
"Successfully got chat_with_tools response from fallback server"
);
Ok(result)
}
Err(fallback_e) => {
log::error!(
"Fallback server chat_with_tools also failed: {}",
fallback_e
);
Err(anyhow::anyhow!(
"Both primary and fallback servers failed. Primary: {}, Fallback: {}",
e,
fallback_e
))
}
}
} else {
log::error!("No fallback server configured");
Err(e)
self.prefer_fallback
.store(*label == "fallback", Ordering::Relaxed);
return Ok(result);
}
Err(e) => {
log::warn!("{} server chat_with_tools failed: {}", label, e);
errors.push(format!("{}: {}", label, e));
}
}
}
if order.len() <= 1 {
log::error!("No fallback server configured; chat_with_tools exhausted");
} else {
log::error!(
"All {} servers failed for chat_with_tools ({})",
order.len(),
errors.join(" / ")
);
}
Err(anyhow::anyhow!(
"chat_with_tools failed on all servers: {}",
errors.join(" / ")
))
}
/// Streaming variant of `chat_with_tools`. Tries primary, then falls
/// back if the initial connection fails; once the stream has begun
/// emitting, mid-stream errors propagate to the caller. Emits
/// `TextDelta` events as content tokens arrive and a single terminal
/// `Done` event when the model marks the turn complete (tool_calls, if
/// any, live on the final message).
pub async fn chat_with_tools_stream(
&self,
messages: Vec<ChatMessage>,
tools: Vec<Tool>,
) -> Result<BoxStream<'static, Result<LlmStreamEvent>>> {
// Same preference logic as `chat_with_tools`. Only the initial
// connection is retried across servers — once the stream begins,
// mid-stream errors propagate to the caller.
let order = self.attempt_order();
let mut last_err: Option<anyhow::Error> = None;
for (label, url, _model) in &order {
match self
.try_chat_with_tools_stream(url, messages.clone(), tools.clone())
.await
{
Ok(s) => {
self.prefer_fallback
.store(*label == "fallback", Ordering::Relaxed);
return Ok(s);
}
Err(e) => {
log::warn!("Streaming chat on {} server failed: {}", label, e);
last_err = Some(e);
}
}
}
Err(last_err.unwrap_or_else(|| anyhow::anyhow!("No Ollama server configured")))
}
async fn try_chat_with_tools_stream(
&self,
base_url: &str,
messages: Vec<ChatMessage>,
tools: Vec<Tool>,
) -> Result<BoxStream<'static, Result<LlmStreamEvent>>> {
let url = format!("{}/api/chat", base_url);
let model = if base_url == self.primary_url {
&self.primary_model
} else {
self.fallback_model
.as_deref()
.unwrap_or(&self.primary_model)
};
let options = self.build_options();
let request_body = OllamaChatRequest {
model,
messages: &messages,
stream: true,
tools,
options,
};
let response = self
.client
.post(&url)
.json(&request_body)
.send()
.await
.with_context(|| format!("Failed to connect to Ollama at {}", url))?;
if !response.status().is_success() {
let status = response.status();
let body = response.text().await.unwrap_or_default();
anyhow::bail!(
"Ollama stream request failed with status {}: {}",
status,
body
);
}
// Ollama streams NDJSON: each line is a full `OllamaStreamChunk`.
// We buffer partial lines across chunks from the byte stream.
let byte_stream = response.bytes_stream();
let stream = async_stream::stream! {
let mut buf: Vec<u8> = Vec::new();
let mut accumulated = String::new();
let mut tool_calls: Option<Vec<crate::ai::llm_client::ToolCall>> = None;
let mut role = "assistant".to_string();
let mut prompt_eval_count: Option<i32> = None;
let mut eval_count: Option<i32> = None;
let mut prompt_eval_duration: Option<u64> = None;
let mut eval_duration: Option<u64> = None;
let mut done_seen = false;
let mut byte_stream = byte_stream;
while let Some(chunk) = byte_stream.next().await {
let chunk = match chunk {
Ok(b) => b,
Err(e) => {
yield Err(anyhow::anyhow!("stream read failed: {}", e));
return;
}
};
buf.extend_from_slice(&chunk);
// Drain complete lines; hold any trailing partial.
while let Some(nl) = buf.iter().position(|b| *b == b'\n') {
let line = buf.drain(..=nl).collect::<Vec<_>>();
let line_str = match std::str::from_utf8(&line) {
Ok(s) => s.trim(),
Err(_) => continue,
};
if line_str.is_empty() {
continue;
}
match serde_json::from_str::<OllamaStreamChunk>(line_str) {
Ok(chunk) => {
// Accumulate content delta.
if !chunk.message.content.is_empty() {
accumulated.push_str(&chunk.message.content);
yield Ok(LlmStreamEvent::TextDelta(chunk.message.content));
}
if !chunk.message.role.is_empty() {
role = chunk.message.role;
}
// Ollama ≥0.8 can stream tool_calls incrementally
// across chunks (older servers attach them all to
// one chunk) — append rather than overwrite so
// calls from earlier chunks survive.
if let Some(tcs) = chunk.message.tool_calls
&& !tcs.is_empty()
{
append_streamed_tool_calls(&mut tool_calls, tcs);
}
if chunk.done {
prompt_eval_count = chunk.prompt_eval_count;
eval_count = chunk.eval_count;
prompt_eval_duration = chunk.prompt_eval_duration;
eval_duration = chunk.eval_duration;
done_seen = true;
break;
}
}
Err(e) => {
log::warn!("malformed Ollama stream line: {} ({})", line_str, e);
}
}
}
if done_seen {
break;
}
}
// Emit the terminal Done event with the assembled message.
log_chat_metrics(
prompt_eval_count,
prompt_eval_duration,
eval_count,
eval_duration,
);
let message = ChatMessage {
role,
content: accumulated,
tool_calls,
images: None,
};
yield Ok(LlmStreamEvent::Done {
message,
prompt_eval_count,
eval_count,
});
};
Ok(Box::pin(stream))
}
async fn try_chat_with_tools(
@@ -662,8 +935,12 @@ Analyze the image and use specific details from both the visual content and the
if !response.status().is_success() {
let status = response.status();
let body = response.text().await.unwrap_or_default();
log::error!(
"chat_with_tools request body that caused {}: {}",
// warn, not error — the outer `chat_with_tools` may recover via
// the fallback server. When both fail, the outer layer emits the
// actual error log.
log::warn!(
"chat_with_tools request to {} got {}: {}",
base_url,
status,
request_json
);
@@ -679,6 +956,17 @@ Analyze the image and use specific details from both the visual content and the
.await
.with_context(|| "Failed to parse Ollama chat response")?;
// Log performance counters returned by Ollama. Durations are
// reported in nanoseconds; we render ms + tokens/sec for skim-ability
// in the server log. Missing fields are left off the line rather
// than printed as `None`.
log_chat_metrics(
chat_response.prompt_eval_count,
chat_response.prompt_eval_duration,
chat_response.eval_count,
chat_response.eval_duration,
);
Ok((
chat_response.message,
chat_response.prompt_eval_count,
@@ -700,7 +988,7 @@ Analyze the image and use specific details from both the visual content and the
/// Returns a vector of 768-dimensional vectors
/// This is much more efficient than calling generate_embedding multiple times
pub async fn generate_embeddings(&self, texts: &[&str]) -> Result<Vec<Vec<f32>>> {
let embedding_model = "nomic-embed-text:v1.5";
let embedding_model = EMBEDDING_MODEL;
log::debug!("=== Ollama Batch Embedding Request ===");
log::debug!("Model: {}", embedding_model);
@@ -767,13 +1055,14 @@ Analyze the image and use specific details from both the visual content and the
}
};
// Validate embedding dimensions (should be 768 for nomic-embed-text:v1.5)
// Validate embedding dimensions (EMBEDDING_DIM; 768 for nomic-embed-text:v1.5)
for (i, embedding) in embeddings.iter().enumerate() {
if embedding.len() != 768 {
if embedding.len() != crate::ai::embedding_dim() {
log::warn!(
"Unexpected embedding dimensions for item {}: {} (expected 768)",
"Unexpected embedding dimensions for item {}: {} (expected {})",
i,
embedding.len()
embedding.len(),
crate::ai::embedding_dim()
);
}
}
@@ -815,6 +1104,54 @@ Analyze the image and use specific details from both the visual content and the
}
}
#[async_trait]
impl LlmClient for OllamaClient {
async fn generate(
&self,
prompt: &str,
system: Option<&str>,
images: Option<Vec<String>>,
) -> Result<String> {
self.generate_with_images(prompt, system, images).await
}
async fn chat_with_tools(
&self,
messages: Vec<ChatMessage>,
tools: Vec<Tool>,
) -> Result<(ChatMessage, Option<i32>, Option<i32>)> {
OllamaClient::chat_with_tools(self, messages, tools).await
}
async fn chat_with_tools_stream(
&self,
messages: Vec<ChatMessage>,
tools: Vec<Tool>,
) -> Result<BoxStream<'static, Result<LlmStreamEvent>>> {
OllamaClient::chat_with_tools_stream(self, messages, tools).await
}
async fn generate_embeddings(&self, texts: &[&str]) -> Result<Vec<Vec<f32>>> {
OllamaClient::generate_embeddings(self, texts).await
}
async fn describe_image(&self, image_base64: &str) -> Result<String> {
self.generate_photo_description(image_base64).await
}
async fn list_models(&self) -> Result<Vec<ModelCapabilities>> {
Self::list_models_with_capabilities(&self.primary_url).await
}
async fn model_capabilities(&self, model: &str) -> Result<ModelCapabilities> {
Self::check_model_capabilities(&self.primary_url, model).await
}
fn primary_model(&self) -> &str {
&self.primary_model
}
}
#[derive(Serialize)]
struct OllamaRequest {
model: String,
@@ -826,6 +1163,12 @@ struct OllamaRequest {
options: Option<OllamaOptions>,
#[serde(skip_serializing_if = "Option::is_none")]
images: Option<Vec<String>>,
/// Ollama's top-level reasoning-mode toggle (~0.4+). `Some(false)`
/// asks the server to skip thinking on models that expose a toggle
/// (Qwen3, Ollama-integrated DeepSeek-R1 distills, GPT-OSS, etc).
/// Ignored by non-reasoning models. None = use the model's default.
#[serde(skip_serializing_if = "Option::is_none")]
think: Option<bool>,
}
#[derive(Serialize)]
@@ -842,90 +1185,6 @@ struct OllamaOptions {
min_p: Option<f32>,
}
/// Tool definition sent in /api/chat requests (OpenAI-compatible format)
#[derive(Serialize, Clone, Debug)]
pub struct Tool {
#[serde(rename = "type")]
pub tool_type: String, // always "function"
pub function: ToolFunction,
}
#[derive(Serialize, Clone, Debug)]
pub struct ToolFunction {
pub name: String,
pub description: String,
pub parameters: serde_json::Value,
}
impl Tool {
pub fn function(name: &str, description: &str, parameters: serde_json::Value) -> Self {
Self {
tool_type: "function".to_string(),
function: ToolFunction {
name: name.to_string(),
description: description.to_string(),
parameters,
},
}
}
}
/// A message in the chat conversation history
#[derive(Serialize, Deserialize, Clone, Debug)]
pub struct ChatMessage {
pub role: String, // "system" | "user" | "assistant" | "tool"
/// Empty string (not null) when tool_calls is present — Ollama quirk
#[serde(default)]
pub content: String,
#[serde(skip_serializing_if = "Option::is_none")]
pub tool_calls: Option<Vec<ToolCall>>,
/// Base64 images — only on user messages to vision-capable models
#[serde(skip_serializing_if = "Option::is_none")]
pub images: Option<Vec<String>>,
}
impl ChatMessage {
pub fn system(content: impl Into<String>) -> Self {
Self {
role: "system".to_string(),
content: content.into(),
tool_calls: None,
images: None,
}
}
pub fn user(content: impl Into<String>) -> Self {
Self {
role: "user".to_string(),
content: content.into(),
tool_calls: None,
images: None,
}
}
pub fn tool_result(content: impl Into<String>) -> Self {
Self {
role: "tool".to_string(),
content: content.into(),
tool_calls: None,
images: None,
}
}
}
/// Tool call returned by the model in an assistant message
#[derive(Serialize, Deserialize, Clone, Debug)]
pub struct ToolCall {
pub function: ToolCallFunction,
#[serde(skip_serializing_if = "Option::is_none")]
pub id: Option<String>,
}
#[derive(Serialize, Deserialize, Clone, Debug)]
pub struct ToolCallFunction {
pub name: String,
/// Native JSON object (NOT a JSON-encoded string like OpenAI)
pub arguments: serde_json::Value,
}
#[derive(Serialize)]
struct OllamaChatRequest<'a> {
model: &'a str,
@@ -947,13 +1206,102 @@ struct OllamaChatResponse {
done_reason: String,
#[serde(default)]
prompt_eval_count: Option<i32>,
/// Nanoseconds spent evaluating the prompt (context ingestion).
#[serde(default)]
prompt_eval_duration: Option<u64>,
#[serde(default)]
eval_count: Option<i32>,
/// Nanoseconds spent generating the response tokens.
#[serde(default)]
eval_duration: Option<u64>,
}
/// One chunk in the NDJSON stream from `/api/chat` with `stream: true`.
/// Early chunks carry content deltas in `message.content`; the final chunk
/// has `done: true`, optional `tool_calls`, and usage counters.
#[derive(Deserialize, Debug)]
struct OllamaStreamChunk {
#[serde(default)]
message: OllamaStreamMessage,
#[serde(default)]
done: bool,
#[serde(default)]
prompt_eval_count: Option<i32>,
#[serde(default)]
prompt_eval_duration: Option<u64>,
#[serde(default)]
eval_count: Option<i32>,
#[serde(default)]
eval_duration: Option<u64>,
}
#[derive(Deserialize, Debug, Default)]
struct OllamaStreamMessage {
#[serde(default)]
role: String,
#[serde(default)]
content: String,
#[serde(default)]
tool_calls: Option<Vec<crate::ai::llm_client::ToolCall>>,
}
#[derive(Deserialize)]
struct OllamaResponse {
response: String,
#[serde(default)]
prompt_eval_count: Option<i32>,
#[serde(default)]
prompt_eval_duration: Option<u64>,
#[serde(default)]
eval_count: Option<i32>,
#[serde(default)]
eval_duration: Option<u64>,
}
fn log_chat_metrics(
prompt_eval_count: Option<i32>,
prompt_eval_duration_ns: Option<u64>,
eval_count: Option<i32>,
eval_duration_ns: Option<u64>,
) {
// Compute tokens/sec when both count and duration are present.
fn tokens_per_sec(count: Option<i32>, duration_ns: Option<u64>) -> Option<f64> {
match (count, duration_ns) {
(Some(c), Some(d)) if c > 0 && d > 0 => Some((c as f64) * 1_000_000_000.0 / (d as f64)),
_ => None,
}
}
let prompt_ms = prompt_eval_duration_ns.map(|ns| ns as f64 / 1_000_000.0);
let eval_ms = eval_duration_ns.map(|ns| ns as f64 / 1_000_000.0);
let prompt_tps = tokens_per_sec(prompt_eval_count, prompt_eval_duration_ns);
let eval_tps = tokens_per_sec(eval_count, eval_duration_ns);
let mut parts: Vec<String> = Vec::new();
if let Some(c) = prompt_eval_count {
let mut s = format!("prompt={} tok", c);
if let Some(ms) = prompt_ms {
s.push_str(&format!(" ({:.0} ms", ms));
if let Some(tps) = prompt_tps {
s.push_str(&format!(", {:.1} tok/s", tps));
}
s.push(')');
}
parts.push(s);
}
if let Some(c) = eval_count {
let mut s = format!("gen={} tok", c);
if let Some(ms) = eval_ms {
s.push_str(&format!(" ({:.0} ms", ms));
if let Some(tps) = eval_tps {
s.push_str(&format!(", {:.1} tok/s", tps));
}
s.push(')');
}
parts.push(s);
}
if !parts.is_empty() {
log::info!("Ollama chat metrics — {}", parts.join(", "));
}
}
#[derive(Deserialize)]
@@ -972,13 +1320,6 @@ struct OllamaShowResponse {
capabilities: Vec<String>,
}
#[derive(Serialize, Deserialize, Clone, Debug)]
pub struct ModelCapabilities {
pub name: String,
pub has_vision: bool,
pub has_tool_calling: bool,
}
#[derive(Serialize)]
struct OllamaBatchEmbedRequest {
model: String,
@@ -990,9 +1331,20 @@ struct OllamaEmbedResponse {
embeddings: Vec<Vec<f32>>,
}
/// Accumulate tool calls streamed across NDJSON chunks. Ollama ≥0.8 may
/// emit each tool call on its own chunk; replacing the accumulator on every
/// chunk would keep only the last call, so extend instead.
fn append_streamed_tool_calls(
acc: &mut Option<Vec<crate::ai::llm_client::ToolCall>>,
new: Vec<crate::ai::llm_client::ToolCall>,
) {
acc.get_or_insert_with(Vec::new).extend(new);
}
#[cfg(test)]
mod tests {
use super::*;
use super::append_streamed_tool_calls;
use crate::ai::llm_client::{ToolCall, ToolCallFunction};
#[test]
fn generate_photo_description_prompt_is_concise() {
@@ -1003,4 +1355,38 @@ mod tests {
Focus on the people, location, and activity.";
assert!(prompt.len() < 200, "Prompt should be concise");
}
fn call(name: &str) -> ToolCall {
ToolCall {
id: None,
function: ToolCallFunction {
name: name.to_string(),
arguments: serde_json::json!({}),
},
}
}
#[test]
fn streamed_tool_calls_across_chunks_accumulate() {
// Two tool calls arriving in two separate stream chunks must BOTH
// survive assembly — the old `tool_calls = Some(tcs)` kept only the
// last chunk's calls.
let mut acc: Option<Vec<ToolCall>> = None;
append_streamed_tool_calls(&mut acc, vec![call("get_sms_messages")]);
append_streamed_tool_calls(&mut acc, vec![call("reverse_geocode")]);
let calls = acc.expect("tool calls accumulated");
assert_eq!(calls.len(), 2);
assert_eq!(calls[0].function.name, "get_sms_messages");
assert_eq!(calls[1].function.name, "reverse_geocode");
}
#[test]
fn streamed_tool_calls_single_chunk_batch_kept_intact() {
// Older Ollama servers attach all calls to one chunk — unchanged.
let mut acc: Option<Vec<ToolCall>> = None;
append_streamed_tool_calls(&mut acc, vec![call("a"), call("b")]);
let calls = acc.expect("tool calls accumulated");
assert_eq!(calls.len(), 2);
}
}
+998
View File
@@ -0,0 +1,998 @@
// First consumer lands in a later PR (hybrid backend routing). Tests exercise
// the translation helpers directly.
#![allow(dead_code)]
use anyhow::{Context, Result, anyhow, bail};
use async_trait::async_trait;
use reqwest::Client;
use serde::Deserialize;
use serde_json::{Value, json};
use std::collections::HashMap;
use std::sync::{Arc, Mutex};
use std::time::{Duration, Instant};
use crate::ai::llm_client::{
ChatMessage, LlmClient, LlmStreamEvent, ModelCapabilities, Tool, ToolCall, ToolCallFunction,
};
use futures::stream::{BoxStream, StreamExt};
const DEFAULT_BASE_URL: &str = "https://openrouter.ai/api/v1";
const DEFAULT_EMBEDDING_MODEL: &str = "openai/text-embedding-3-small";
const CACHE_DURATION_SECS: u64 = 15 * 60;
#[derive(Clone)]
struct CachedEntry<T> {
data: T,
cached_at: Instant,
}
impl<T> CachedEntry<T> {
fn new(data: T) -> Self {
Self {
data,
cached_at: Instant::now(),
}
}
fn is_expired(&self) -> bool {
self.cached_at.elapsed().as_secs() > CACHE_DURATION_SECS
}
}
lazy_static::lazy_static! {
static ref MODEL_CAPABILITIES_CACHE: Arc<Mutex<HashMap<String, CachedEntry<Vec<ModelCapabilities>>>>> =
Arc::new(Mutex::new(HashMap::new()));
}
/// OpenAI-compatible client for OpenRouter (https://openrouter.ai).
///
/// Translates canonical `ChatMessage` / `Tool` shapes to OpenAI wire format:
/// - Tool-call `arguments` serialized as JSON-encoded strings (vs Ollama's
/// native JSON).
/// - Image content rewritten into content-parts array with `image_url` entries.
/// - `role=tool` messages attach a `tool_call_id` inferred from the preceding
/// assistant turn's tool call.
#[derive(Clone)]
pub struct OpenRouterClient {
client: Client,
pub api_key: String,
pub base_url: String,
pub primary_model: String,
pub embedding_model: String,
num_ctx: Option<i32>,
temperature: Option<f32>,
top_p: Option<f32>,
top_k: Option<i32>,
min_p: Option<f32>,
/// Optional `HTTP-Referer` header OpenRouter uses for attribution.
pub referer: Option<String>,
/// Optional `X-Title` header OpenRouter uses for attribution.
pub app_title: Option<String>,
}
impl OpenRouterClient {
pub fn new(api_key: String, base_url: Option<String>, primary_model: String) -> Self {
Self {
client: Client::builder()
.connect_timeout(Duration::from_secs(10))
.timeout(Duration::from_secs(180))
.build()
.unwrap_or_else(|_| Client::new()),
api_key,
base_url: base_url.unwrap_or_else(|| DEFAULT_BASE_URL.to_string()),
primary_model,
embedding_model: DEFAULT_EMBEDDING_MODEL.to_string(),
num_ctx: None,
temperature: None,
top_p: None,
top_k: None,
min_p: None,
referer: None,
app_title: None,
}
}
pub fn set_embedding_model(&mut self, model: String) {
self.embedding_model = model;
}
#[allow(dead_code)]
pub fn set_num_ctx(&mut self, num_ctx: Option<i32>) {
self.num_ctx = num_ctx;
}
#[allow(dead_code)]
pub fn set_sampling_params(
&mut self,
temperature: Option<f32>,
top_p: Option<f32>,
top_k: Option<i32>,
min_p: Option<f32>,
) {
self.temperature = temperature;
self.top_p = top_p;
self.top_k = top_k;
self.min_p = min_p;
}
pub fn set_attribution(&mut self, referer: Option<String>, app_title: Option<String>) {
self.referer = referer;
self.app_title = app_title;
}
fn authed(&self, builder: reqwest::RequestBuilder) -> reqwest::RequestBuilder {
let mut b = builder.bearer_auth(&self.api_key);
if let Some(r) = &self.referer {
b = b.header("HTTP-Referer", r);
}
if let Some(t) = &self.app_title {
b = b.header("X-Title", t);
}
b
}
/// Translate canonical messages to the OpenAI-compatible wire shape.
///
/// Walks in order so it can attach `tool_call_id` to `role=tool` messages
/// based on the most recent assistant turn's tool call.
fn messages_to_openai(messages: &[ChatMessage]) -> Vec<Value> {
let mut out = Vec::with_capacity(messages.len());
let mut last_tool_call_ids: Vec<String> = Vec::new();
let mut next_tool_result_idx: usize = 0;
for msg in messages {
let mut obj = serde_json::Map::new();
obj.insert("role".into(), Value::String(msg.role.clone()));
// Content: string OR content-parts array (when images present).
match &msg.images {
Some(images) if !images.is_empty() => {
let mut parts: Vec<Value> = Vec::new();
if !msg.content.is_empty() {
parts.push(json!({"type": "text", "text": msg.content}));
}
for img in images {
let url = image_to_data_url(img);
parts.push(json!({
"type": "image_url",
"image_url": { "url": url }
}));
}
obj.insert("content".into(), Value::Array(parts));
}
_ => {
obj.insert("content".into(), Value::String(msg.content.clone()));
}
}
// Assistant message with tool_calls: stringify arguments, remember
// the ids so the subsequent tool messages can reference them.
if let Some(tcs) = &msg.tool_calls
&& msg.role == "assistant"
{
let converted: Vec<Value> = tcs
.iter()
.enumerate()
.map(|(i, call)| {
let id = call.id.clone().unwrap_or_else(|| format!("call_{}", i));
let args_str = serde_json::to_string(&call.function.arguments)
.unwrap_or_else(|_| "{}".to_string());
json!({
"id": id,
"type": "function",
"function": {
"name": call.function.name,
"arguments": args_str,
}
})
})
.collect();
last_tool_call_ids = converted
.iter()
.filter_map(|v| v.get("id").and_then(|x| x.as_str()).map(String::from))
.collect();
next_tool_result_idx = 0;
obj.insert("tool_calls".into(), Value::Array(converted));
}
// Tool result messages: attach tool_call_id from the last assistant turn.
if msg.role == "tool" {
let id = last_tool_call_ids
.get(next_tool_result_idx)
.cloned()
.unwrap_or_else(|| "call_0".to_string());
obj.insert("tool_call_id".into(), Value::String(id));
next_tool_result_idx += 1;
}
out.push(Value::Object(obj));
}
out
}
/// Parse an OpenAI-compatible assistant message back into canonical shape.
fn openai_message_to_chat(msg: &Value) -> Result<ChatMessage> {
let obj = msg
.as_object()
.ok_or_else(|| anyhow!("response message is not an object"))?;
let role = obj
.get("role")
.and_then(|v| v.as_str())
.unwrap_or("assistant")
.to_string();
let content = obj
.get("content")
.and_then(|v| v.as_str())
.unwrap_or("")
.to_string();
let tool_calls = if let Some(tcs) = obj.get("tool_calls").and_then(|v| v.as_array()) {
let mut parsed = Vec::with_capacity(tcs.len());
for tc in tcs {
let id = tc.get("id").and_then(|v| v.as_str()).map(String::from);
let function = tc
.get("function")
.ok_or_else(|| anyhow!("tool_call missing function field"))?;
let name = function
.get("name")
.and_then(|v| v.as_str())
.unwrap_or_default()
.to_string();
let args_value = match function.get("arguments") {
// OpenAI-compat: stringified JSON.
Some(Value::String(s)) => {
serde_json::from_str::<Value>(s).unwrap_or_else(|_| json!({}))
}
// Some providers emit arguments as an object directly — accept both.
Some(v @ Value::Object(_)) => v.clone(),
_ => json!({}),
};
parsed.push(ToolCall {
id,
function: ToolCallFunction {
name,
arguments: args_value,
},
});
}
Some(parsed)
} else {
None
};
Ok(ChatMessage {
role,
content,
tool_calls,
images: None,
})
}
fn build_options(&self) -> Vec<(&'static str, Value)> {
let mut v = Vec::new();
if let Some(t) = self.temperature {
v.push(("temperature", json!(t)));
}
if let Some(p) = self.top_p {
v.push(("top_p", json!(p)));
}
if let Some(k) = self.top_k {
v.push(("top_k", json!(k)));
}
if let Some(m) = self.min_p {
v.push(("min_p", json!(m)));
}
if let Some(c) = self.num_ctx {
// OpenAI uses max_tokens for generation bound; num_ctx isn't
// directly transferable. Skip rather than silently mis-map.
let _ = c;
}
v
}
}
#[async_trait]
impl LlmClient for OpenRouterClient {
async fn generate(
&self,
prompt: &str,
system: Option<&str>,
images: Option<Vec<String>>,
) -> Result<String> {
let mut messages: Vec<ChatMessage> = Vec::new();
if let Some(sys) = system {
messages.push(ChatMessage::system(sys));
}
let mut user = ChatMessage::user(prompt);
user.images = images;
messages.push(user);
let (reply, _, _) = self.chat_with_tools(messages, Vec::new()).await?;
Ok(reply.content)
}
async fn chat_with_tools(
&self,
messages: Vec<ChatMessage>,
tools: Vec<Tool>,
) -> Result<(ChatMessage, Option<i32>, Option<i32>)> {
let url = format!("{}/chat/completions", self.base_url);
let mut body = serde_json::Map::new();
body.insert("model".into(), Value::String(self.primary_model.clone()));
body.insert(
"messages".into(),
Value::Array(Self::messages_to_openai(&messages)),
);
body.insert("stream".into(), Value::Bool(false));
if !tools.is_empty() {
body.insert(
"tools".into(),
serde_json::to_value(&tools).context("serializing tools")?,
);
}
for (k, v) in self.build_options() {
body.insert(k.into(), v);
}
log::info!(
"OpenRouter chat_with_tools: model={} messages={} tools={}",
self.primary_model,
messages.len(),
tools.len()
);
let resp = self
.authed(self.client.post(&url))
.json(&Value::Object(body))
.send()
.await
.with_context(|| format!("POST {} failed", url))?;
if !resp.status().is_success() {
let status = resp.status();
let body = resp.text().await.unwrap_or_default();
bail!("OpenRouter chat request failed: {} — {}", status, body);
}
let parsed: Value = resp.json().await.context("parsing chat response")?;
let choice = parsed
.get("choices")
.and_then(|v| v.as_array())
.and_then(|a| a.first())
.ok_or_else(|| {
anyhow!(
"response missing choices[0]: {}",
extract_openrouter_error_detail(&parsed)
)
})?;
let msg = choice.get("message").ok_or_else(|| {
anyhow!(
"choices[0] missing message: {}",
extract_openrouter_error_detail(&parsed)
)
})?;
let chat_msg = Self::openai_message_to_chat(msg)?;
let usage = parsed.get("usage");
let prompt_tokens = usage
.and_then(|u| u.get("prompt_tokens"))
.and_then(|v| v.as_i64())
.map(|n| n as i32);
let completion_tokens = usage
.and_then(|u| u.get("completion_tokens"))
.and_then(|v| v.as_i64())
.map(|n| n as i32);
Ok((chat_msg, prompt_tokens, completion_tokens))
}
async fn chat_with_tools_stream(
&self,
messages: Vec<ChatMessage>,
tools: Vec<Tool>,
) -> Result<BoxStream<'static, Result<LlmStreamEvent>>> {
let url = format!("{}/chat/completions", self.base_url);
let mut body = serde_json::Map::new();
body.insert("model".into(), Value::String(self.primary_model.clone()));
body.insert(
"messages".into(),
Value::Array(Self::messages_to_openai(&messages)),
);
body.insert("stream".into(), Value::Bool(true));
// Ask for usage data in the final chunk (OpenAI + OpenRouter
// both honor this options bag).
body.insert(
"stream_options".into(),
serde_json::json!({ "include_usage": true }),
);
if !tools.is_empty() {
body.insert(
"tools".into(),
serde_json::to_value(&tools).context("serializing tools")?,
);
}
for (k, v) in self.build_options() {
body.insert(k.into(), v);
}
let resp = self
.authed(self.client.post(&url))
.json(&Value::Object(body))
.send()
.await
.with_context(|| format!("POST {} failed", url))?;
if !resp.status().is_success() {
let status = resp.status();
let body = resp.text().await.unwrap_or_default();
bail!("OpenRouter stream request failed: {} — {}", status, body);
}
// OpenAI-compat SSE stream. Each event is `data: <json>\n\n`, with
// `data: [DONE]` signalling completion. Tool calls arrive as
// `delta.tool_calls[i]` chunks that must be concatenated by index.
let byte_stream = resp.bytes_stream();
let stream = async_stream::stream! {
let mut byte_stream = byte_stream;
let mut buf: Vec<u8> = Vec::new();
let mut accumulated_content = String::new();
// tool call state: index -> (id, name, args_string)
let mut tool_state: std::collections::BTreeMap<
usize,
(Option<String>, Option<String>, String),
> = std::collections::BTreeMap::new();
let mut role = "assistant".to_string();
let mut prompt_tokens: Option<i32> = None;
let mut completion_tokens: Option<i32> = None;
let mut done_seen = false;
while let Some(chunk) = byte_stream.next().await {
let chunk = match chunk {
Ok(b) => b,
Err(e) => {
yield Err(anyhow!("stream read failed: {}", e));
return;
}
};
buf.extend_from_slice(&chunk);
// SSE frames are delimited by a blank line. Walk the buffer
// for "\n\n" markers; anything before them is a complete
// frame (possibly multi-line).
while let Some(sep) = find_double_newline(&buf) {
let frame = buf.drain(..sep + 2).collect::<Vec<_>>();
let frame_str = match std::str::from_utf8(&frame) {
Ok(s) => s,
Err(_) => continue,
};
// A frame is one or more lines; the payload is on data:
// lines. Ignore comments and other fields.
for line in frame_str.lines() {
let line = line.trim_end_matches('\r');
let payload = match line.strip_prefix("data: ") {
Some(p) => p,
None => continue,
};
if payload == "[DONE]" {
done_seen = true;
break;
}
let v: Value = match serde_json::from_str(payload) {
Ok(v) => v,
Err(e) => {
log::warn!(
"malformed OpenRouter SSE frame: {} ({})",
payload,
e
);
continue;
}
};
// Usage can arrive in a dedicated final frame with
// empty choices.
if let Some(usage) = v.get("usage") {
prompt_tokens = usage
.get("prompt_tokens")
.and_then(|n| n.as_i64())
.map(|n| n as i32);
completion_tokens = usage
.get("completion_tokens")
.and_then(|n| n.as_i64())
.map(|n| n as i32);
}
let Some(choices) = v.get("choices").and_then(|c| c.as_array())
else {
continue;
};
let Some(choice) = choices.first() else { continue };
let delta = match choice.get("delta") {
Some(d) => d,
None => continue,
};
if let Some(r) = delta.get("role").and_then(|v| v.as_str()) {
role = r.to_string();
}
if let Some(content) =
delta.get("content").and_then(|v| v.as_str())
&& !content.is_empty()
{
accumulated_content.push_str(content);
yield Ok(LlmStreamEvent::TextDelta(content.to_string()));
}
if let Some(tcs) = delta.get("tool_calls").and_then(|v| v.as_array()) {
for tc_delta in tcs {
let idx = tc_delta
.get("index")
.and_then(|n| n.as_u64())
.unwrap_or(0) as usize;
let entry = tool_state
.entry(idx)
.or_insert((None, None, String::new()));
if let Some(id) =
tc_delta.get("id").and_then(|v| v.as_str())
{
entry.0 = Some(id.to_string());
}
if let Some(func) = tc_delta.get("function") {
if let Some(name) =
func.get("name").and_then(|v| v.as_str())
{
entry.1 = Some(name.to_string());
}
if let Some(args) =
func.get("arguments").and_then(|v| v.as_str())
{
entry.2.push_str(args);
}
}
}
}
}
if done_seen {
break;
}
}
if done_seen {
break;
}
}
// Finalize tool calls: parse accumulated argument strings.
let tool_calls: Option<Vec<ToolCall>> = if tool_state.is_empty() {
None
} else {
let mut v = Vec::with_capacity(tool_state.len());
for (_idx, (id, name, args)) in tool_state {
let arguments: Value = if args.trim().is_empty() {
Value::Object(Default::default())
} else {
serde_json::from_str(&args).unwrap_or_else(|_| {
Value::Object(Default::default())
})
};
v.push(ToolCall {
id,
function: ToolCallFunction {
name: name.unwrap_or_default(),
arguments,
},
});
}
Some(v)
};
let message = ChatMessage {
role,
content: accumulated_content,
tool_calls,
images: None,
};
yield Ok(LlmStreamEvent::Done {
message,
prompt_eval_count: prompt_tokens,
eval_count: completion_tokens,
});
};
Ok(Box::pin(stream))
}
async fn generate_embeddings(&self, texts: &[&str]) -> Result<Vec<Vec<f32>>> {
let url = format!("{}/embeddings", self.base_url);
let body = json!({
"model": self.embedding_model,
"input": texts,
});
let resp = self
.authed(self.client.post(&url))
.json(&body)
.send()
.await
.with_context(|| format!("POST {} failed", url))?;
if !resp.status().is_success() {
let status = resp.status();
let body = resp.text().await.unwrap_or_default();
bail!("OpenRouter embedding request failed: {} — {}", status, body);
}
#[derive(Deserialize)]
struct EmbedResponse {
data: Vec<EmbedItem>,
}
#[derive(Deserialize)]
struct EmbedItem {
embedding: Vec<f32>,
}
let parsed: EmbedResponse = resp.json().await.context("parsing embed response")?;
Ok(parsed.data.into_iter().map(|i| i.embedding).collect())
}
async fn describe_image(&self, image_base64: &str) -> Result<String> {
let prompt = "Briefly describe what you see in this image in 1-2 sentences. \
Focus on the people, location, and activity.";
self.generate(
prompt,
Some("You are a scene description assistant. Be concise and factual."),
Some(vec![image_base64.to_string()]),
)
.await
}
async fn list_models(&self) -> Result<Vec<ModelCapabilities>> {
{
let cache = MODEL_CAPABILITIES_CACHE.lock().unwrap();
if let Some(entry) = cache.get(&self.base_url)
&& !entry.is_expired()
{
return Ok(entry.data.clone());
}
}
let url = format!("{}/models", self.base_url);
let resp = self
.authed(self.client.get(&url))
.send()
.await
.with_context(|| format!("GET {} failed", url))?;
if !resp.status().is_success() {
let status = resp.status();
let body = resp.text().await.unwrap_or_default();
bail!("OpenRouter list_models failed: {} — {}", status, body);
}
let parsed: Value = resp.json().await.context("parsing models response")?;
let data = parsed
.get("data")
.and_then(|v| v.as_array())
.ok_or_else(|| anyhow!("models response missing data[]"))?;
let caps: Vec<ModelCapabilities> = data.iter().map(parse_model_capabilities).collect();
{
let mut cache = MODEL_CAPABILITIES_CACHE.lock().unwrap();
cache.insert(self.base_url.clone(), CachedEntry::new(caps.clone()));
}
Ok(caps)
}
async fn model_capabilities(&self, model: &str) -> Result<ModelCapabilities> {
let all = self.list_models().await?;
all.into_iter()
.find(|m| m.name == model)
.ok_or_else(|| anyhow!("model '{}' not found on OpenRouter", model))
}
fn primary_model(&self) -> &str {
&self.primary_model
}
}
/// Extract a diagnostic fragment from an OpenRouter response body that
/// doesn't match the expected `{choices: [...]}` shape. OpenRouter will
/// sometimes return 200 OK with `{"error": {"message": "...", "code": ...}}`
/// when the upstream provider (Anthropic/OpenAI/Google/etc) errored out
/// — rate limits, content moderation, model overload, provider timeout.
/// Surface the structured error if present; otherwise fall back to a
/// truncated raw-JSON view so the log line is actionable.
fn extract_openrouter_error_detail(parsed: &Value) -> String {
if let Some(err) = parsed.get("error") {
let message = err
.get("message")
.and_then(|v| v.as_str())
.unwrap_or("(no message)");
let code = err
.get("code")
.map(|v| match v {
Value::String(s) => s.clone(),
other => other.to_string(),
})
.unwrap_or_else(|| "?".to_string());
let short_message: String = message.chars().take(240).collect();
return format!("error code={} message=\"{}\"", code, short_message);
}
let raw = parsed.to_string();
raw.chars().take(300).collect()
}
/// Find the byte offset of the first `\n\n` (end of an SSE frame) in `buf`.
/// Returns the index of the first `\n` of the pair, so the full separator is
/// `buf[idx..=idx+1]`. Also handles `\r\n\r\n` since some servers emit it.
fn find_double_newline(buf: &[u8]) -> Option<usize> {
for i in 0..buf.len().saturating_sub(1) {
if buf[i] == b'\n' && buf[i + 1] == b'\n' {
return Some(i);
}
// \r\n\r\n: the second \n of this pattern is at i+2; flag at i so the
// drain call (which consumes ..sep+2) takes exactly the frame.
if i + 3 < buf.len()
&& buf[i] == b'\r'
&& buf[i + 1] == b'\n'
&& buf[i + 2] == b'\r'
&& buf[i + 3] == b'\n'
{
return Some(i + 1);
}
}
None
}
/// Build a `data:` URL if the provided string is raw base64, otherwise pass it through.
fn image_to_data_url(img: &str) -> String {
if img.starts_with("data:") {
img.to_string()
} else {
format!("data:image/jpeg;base64,{}", img)
}
}
fn parse_model_capabilities(m: &Value) -> ModelCapabilities {
let name = m
.get("id")
.and_then(|v| v.as_str())
.unwrap_or_default()
.to_string();
let has_tool_calling = m
.get("supported_parameters")
.and_then(|v| v.as_array())
.map(|arr| arr.iter().any(|x| x.as_str() == Some("tools")))
.unwrap_or(false);
let has_vision = m
.get("architecture")
.and_then(|v| v.get("input_modalities"))
.and_then(|v| v.as_array())
.map(|arr| arr.iter().any(|x| x.as_str() == Some("image")))
.unwrap_or(false);
ModelCapabilities {
name,
has_vision,
has_tool_calling,
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn tool_call_arguments_stringified_on_send() {
let mut msg = ChatMessage {
role: "assistant".into(),
content: String::new(),
tool_calls: Some(vec![ToolCall {
id: Some("call_abc".into()),
function: ToolCallFunction {
name: "search_sms".into(),
arguments: json!({"query": "hello", "limit": 5}),
},
}]),
images: None,
};
msg.tool_calls.as_mut().unwrap()[0].function.arguments =
json!({"query": "hello", "limit": 5});
let wire = OpenRouterClient::messages_to_openai(&[msg]);
let tcs = wire[0]
.get("tool_calls")
.and_then(|v| v.as_array())
.expect("tool_calls present");
let args = tcs[0]
.get("function")
.and_then(|f| f.get("arguments"))
.and_then(|a| a.as_str())
.expect("arguments stringified");
let parsed: Value = serde_json::from_str(args).unwrap();
assert_eq!(parsed["query"], "hello");
assert_eq!(parsed["limit"], 5);
}
#[test]
fn tool_call_arguments_parsed_on_receive() {
let response_msg = json!({
"role": "assistant",
"content": "",
"tool_calls": [{
"id": "call_xyz",
"type": "function",
"function": {
"name": "get_weather",
"arguments": "{\"city\":\"Boston\",\"units\":\"celsius\"}"
}
}]
});
let parsed = OpenRouterClient::openai_message_to_chat(&response_msg).unwrap();
let tcs = parsed.tool_calls.unwrap();
assert_eq!(tcs.len(), 1);
assert_eq!(tcs[0].function.name, "get_weather");
assert_eq!(tcs[0].function.arguments["city"], "Boston");
assert_eq!(tcs[0].function.arguments["units"], "celsius");
assert_eq!(tcs[0].id.as_deref(), Some("call_xyz"));
}
#[test]
fn tool_call_arguments_accept_native_json_on_receive() {
// Some providers return arguments as an object directly; accept both.
let response_msg = json!({
"role": "assistant",
"content": "",
"tool_calls": [{
"id": "call_1",
"type": "function",
"function": {
"name": "foo",
"arguments": {"nested": {"k": 1}}
}
}]
});
let parsed = OpenRouterClient::openai_message_to_chat(&response_msg).unwrap();
let tc = &parsed.tool_calls.unwrap()[0];
assert_eq!(tc.function.arguments["nested"]["k"], 1);
}
#[test]
fn images_become_content_parts() {
let mut msg = ChatMessage::user("What is in this photo?");
msg.images = Some(vec!["BASE64DATA".into()]);
let wire = OpenRouterClient::messages_to_openai(&[msg]);
let content = wire[0].get("content").and_then(|v| v.as_array()).unwrap();
assert_eq!(content.len(), 2);
assert_eq!(content[0]["type"], "text");
assert_eq!(content[0]["text"], "What is in this photo?");
assert_eq!(content[1]["type"], "image_url");
assert_eq!(
content[1]["image_url"]["url"],
"data:image/jpeg;base64,BASE64DATA"
);
}
#[test]
fn data_url_images_pass_through_unchanged() {
let mut msg = ChatMessage::user("");
msg.images = Some(vec!["data:image/png;base64,ABCDEF".into()]);
let wire = OpenRouterClient::messages_to_openai(&[msg]);
let content = wire[0].get("content").and_then(|v| v.as_array()).unwrap();
// No text part when content is empty.
assert_eq!(content.len(), 1);
assert_eq!(
content[0]["image_url"]["url"],
"data:image/png;base64,ABCDEF"
);
}
#[test]
fn text_only_message_stays_string() {
let msg = ChatMessage::user("hello");
let wire = OpenRouterClient::messages_to_openai(&[msg]);
assert_eq!(wire[0]["content"], "hello");
assert!(wire[0]["content"].as_str().is_some());
}
#[test]
fn tool_result_inherits_tool_call_id_from_prior_assistant() {
let assistant = ChatMessage {
role: "assistant".into(),
content: String::new(),
tool_calls: Some(vec![ToolCall {
id: Some("call_42".into()),
function: ToolCallFunction {
name: "lookup".into(),
arguments: json!({}),
},
}]),
images: None,
};
let tool_result = ChatMessage::tool_result("found it");
let wire = OpenRouterClient::messages_to_openai(&[assistant, tool_result]);
assert_eq!(wire[1]["role"], "tool");
assert_eq!(wire[1]["tool_call_id"], "call_42");
}
#[test]
fn multiple_tool_results_map_to_sequential_call_ids() {
let assistant = ChatMessage {
role: "assistant".into(),
content: String::new(),
tool_calls: Some(vec![
ToolCall {
id: Some("call_A".into()),
function: ToolCallFunction {
name: "a".into(),
arguments: json!({}),
},
},
ToolCall {
id: Some("call_B".into()),
function: ToolCallFunction {
name: "b".into(),
arguments: json!({}),
},
},
]),
images: None,
};
let r1 = ChatMessage::tool_result("a result");
let r2 = ChatMessage::tool_result("b result");
let wire = OpenRouterClient::messages_to_openai(&[assistant, r1, r2]);
assert_eq!(wire[1]["tool_call_id"], "call_A");
assert_eq!(wire[2]["tool_call_id"], "call_B");
}
#[test]
fn missing_tool_call_id_gets_synthetic_fallback() {
let assistant = ChatMessage {
role: "assistant".into(),
content: String::new(),
tool_calls: Some(vec![ToolCall {
id: None,
function: ToolCallFunction {
name: "noid".into(),
arguments: json!({}),
},
}]),
images: None,
};
let wire = OpenRouterClient::messages_to_openai(&[assistant]);
let tcs = wire[0]
.get("tool_calls")
.and_then(|v| v.as_array())
.unwrap();
assert_eq!(tcs[0]["id"], "call_0");
}
#[test]
fn parse_model_capabilities_extracts_tools_and_vision() {
let m = json!({
"id": "anthropic/claude-sonnet-4",
"supported_parameters": ["temperature", "top_p", "tools", "max_tokens"],
"architecture": {
"input_modalities": ["text", "image"]
}
});
let caps = parse_model_capabilities(&m);
assert_eq!(caps.name, "anthropic/claude-sonnet-4");
assert!(caps.has_tool_calling);
assert!(caps.has_vision);
}
#[test]
fn parse_model_capabilities_handles_missing_fields() {
let m = json!({
"id": "some/text-only-model"
});
let caps = parse_model_capabilities(&m);
assert_eq!(caps.name, "some/text-only-model");
assert!(!caps.has_tool_calling);
assert!(!caps.has_vision);
}
}
+282
View File
@@ -0,0 +1,282 @@
// User-configurable pronunciation overrides for TTS. Chatterbox mispronounces
// place names ("Worcester"), initialisms ("WSL"), and clipped abbreviations
// ("blvd"), so we rewrite them to phonetic spellings before synthesis.
//
// The map lives in a JSON file on the server — a flat object of
// `"written form": "spoken form"` pairs, e.g.:
//
// {
// "Worcester": "Wuster",
// "WSL": "W S L",
// "blvd": "boulevard",
// "Dr.": "Doctor"
// }
//
// Path comes from `TTS_PRONUNCIATIONS_PATH` (default `tts_pronunciations.json`
// in the working directory). A missing file simply disables the feature. The
// file is re-read whenever its mtime changes, so edits apply to the next
// synthesis without a restart; a malformed edit keeps the last good map and
// logs the parse error instead of silently dropping all overrides.
//
// Matching rules:
// - Whole words only — `cat` never rewrites `category`. (Boundaries are only
// asserted next to word characters, so keys like `Dr.` still work.)
// - Smartcase: an all-lowercase key matches case-insensitively; a key with
// any uppercase matches exactly. That lets `worcester` catch every casing
// while `US` (the country) leaves the pronoun `us` alone.
// - Longer keys win over shorter ones (`New York Times` before `New York`).
use regex::Regex;
use std::collections::HashMap;
use std::path::Path;
use std::sync::{Arc, LazyLock, Mutex as StdMutex};
use std::time::SystemTime;
/// A compiled pronunciation map: one alternation regex over every key plus
/// the lookup tables the replacement closure resolves matches against.
#[derive(Default)]
struct CompiledMap {
/// `None` when the map is empty — apply() is then a no-op.
regex: Option<Regex>,
/// Case-sensitive entries, keyed verbatim.
exact: HashMap<String, String>,
/// Case-insensitive entries, keyed lowercased.
folded: HashMap<String, String>,
}
impl CompiledMap {
fn from_entries(entries: &HashMap<String, String>) -> Self {
let mut keys: Vec<&str> = entries
.keys()
.map(|k| k.as_str())
.filter(|k| !k.trim().is_empty())
.collect();
if keys.is_empty() {
return Self::default();
}
// Longest key first so overlapping entries prefer the more specific
// one (regex alternation is first-match-wins, not longest-match).
keys.sort_by(|a, b| b.len().cmp(&a.len()).then(a.cmp(b)));
let mut exact = HashMap::new();
let mut folded = HashMap::new();
let alternatives: Vec<String> = keys
.iter()
.map(|key| {
let escaped = regex::escape(key);
// Only assert a word boundary where the key edge is a word
// character — `\b` adjacent to punctuation (e.g. the dot in
// `Dr.`) would otherwise never match.
let lead = if key
.chars()
.next()
.is_some_and(|c| c.is_alphanumeric() || c == '_')
{
r"\b"
} else {
""
};
let trail = if key
.chars()
.last()
.is_some_and(|c| c.is_alphanumeric() || c == '_')
{
r"\b"
} else {
""
};
let case_sensitive = key.chars().any(|c| c.is_uppercase());
if case_sensitive {
exact.insert(key.to_string(), entries[*key].clone());
format!("{lead}{escaped}{trail}")
} else {
folded.insert(key.to_lowercase(), entries[*key].clone());
format!("{lead}(?i:{escaped}){trail}")
}
})
.collect();
// Escaped fixed strings can't produce an invalid pattern; if one ever
// does, treat the whole map as empty rather than panicking a handler.
let pattern = alternatives.join("|");
let regex = match Regex::new(&pattern) {
Ok(r) => Some(r),
Err(e) => {
log::error!("pronunciation map failed to compile: {e}");
None
}
};
Self {
regex,
exact,
folded,
}
}
fn apply(&self, text: &str) -> String {
let Some(re) = &self.regex else {
return text.to_string();
};
re.replace_all(text, |caps: &regex::Captures| {
let m = &caps[0];
self.exact
.get(m)
.or_else(|| self.folded.get(&m.to_lowercase()))
.cloned()
// Unreachable in practice — every alternative came from one
// of the two maps — but never drop the user's text.
.unwrap_or_else(|| m.to_string())
})
.into_owned()
}
}
struct CacheEntry {
mtime: Option<SystemTime>,
compiled: Arc<CompiledMap>,
}
static CACHE: LazyLock<StdMutex<Option<CacheEntry>>> = LazyLock::new(|| StdMutex::new(None));
fn config_path() -> String {
std::env::var("TTS_PRONUNCIATIONS_PATH")
.ok()
.map(|s| s.trim().to_string())
.filter(|s| !s.is_empty())
.unwrap_or_else(|| "tts_pronunciations.json".to_string())
}
/// Load the compiled map, re-reading the file only when its mtime changed
/// since the last call (or it appeared/disappeared). Synthesis is serialized
/// on a single GPU permit, so a stat per call is noise.
fn current_map() -> Arc<CompiledMap> {
let path_s = config_path();
let path = Path::new(&path_s);
let mtime = std::fs::metadata(path).and_then(|m| m.modified()).ok();
let mut cache = CACHE.lock().unwrap();
if let Some(entry) = cache.as_ref()
&& entry.mtime == mtime
{
return entry.compiled.clone();
}
let compiled = match mtime {
None => Arc::new(CompiledMap::default()), // no file → no overrides
Some(_) => match std::fs::read_to_string(path)
.map_err(anyhow::Error::from)
.and_then(|s| Ok(serde_json::from_str::<HashMap<String, String>>(&s)?))
{
Ok(entries) => {
log::info!(
"loaded {} pronunciation override(s) from {path_s}",
entries.len()
);
Arc::new(CompiledMap::from_entries(&entries))
}
Err(e) => {
log::error!("failed to load pronunciation map {path_s}: {e}");
// Keep serving the previous map rather than regressing to
// none mid-edit; still record the new mtime so the error
// logs once per bad save, not once per synthesis.
cache
.as_ref()
.map(|c| c.compiled.clone())
.unwrap_or_default()
}
},
};
*cache = Some(CacheEntry {
mtime,
compiled: compiled.clone(),
});
compiled
}
/// Rewrite configured words/abbreviations to their phonetic spellings.
/// Call on cleaned (post-markdown-strip) text, right before synthesis.
pub fn apply_pronunciations(text: &str) -> String {
current_map().apply(text)
}
#[cfg(test)]
mod tests {
use super::*;
fn compile(pairs: &[(&str, &str)]) -> CompiledMap {
let entries = pairs
.iter()
.map(|(k, v)| (k.to_string(), v.to_string()))
.collect();
CompiledMap::from_entries(&entries)
}
#[test]
fn empty_map_is_a_noop() {
let m = compile(&[]);
assert_eq!(m.apply("nothing changes"), "nothing changes");
}
#[test]
fn replaces_whole_words_only() {
let m = compile(&[("cat", "kitty")]);
assert_eq!(m.apply("the cat sat"), "the kitty sat");
// No substring rewrites.
assert_eq!(m.apply("the category"), "the category");
assert_eq!(m.apply("concatenate"), "concatenate");
}
#[test]
fn lowercase_keys_match_any_casing() {
let m = compile(&[("worcester", "Wuster")]);
assert_eq!(m.apply("Worcester is nice"), "Wuster is nice");
assert_eq!(m.apply("in WORCESTER today"), "in Wuster today");
assert_eq!(m.apply("worcester sauce"), "Wuster sauce");
}
#[test]
fn uppercase_keys_match_case_sensitively() {
let m = compile(&[("US", "U S")]);
assert_eq!(m.apply("the US economy"), "the U S economy");
// The pronoun survives.
assert_eq!(m.apply("join us today"), "join us today");
}
#[test]
fn keys_with_punctuation_work() {
// `\b` is only asserted next to word characters, so the trailing dot
// doesn't break matching.
let m = compile(&[("Dr.", "Doctor"), ("blvd", "boulevard")]);
assert_eq!(
m.apply("Dr. Smith on Sunset blvd"),
"Doctor Smith on Sunset boulevard"
);
}
#[test]
fn longer_keys_win_over_shorter() {
let m = compile(&[("new york", "Noo York"), ("new york times", "the Times")]);
assert_eq!(m.apply("read the new york times"), "read the the Times");
assert_eq!(m.apply("visit new york soon"), "visit Noo York soon");
}
#[test]
fn multiple_occurrences_all_rewrite() {
let m = compile(&[("wsl", "W S L")]);
assert_eq!(m.apply("WSL and wsl and Wsl"), "W S L and W S L and W S L");
}
#[test]
fn replacement_text_is_verbatim() {
// Replacements aren't re-scanned — a value containing another key
// doesn't cascade.
let m = compile(&[("a1", "b2"), ("b2", "c3")]);
assert_eq!(m.apply("a1"), "b2");
}
#[test]
fn blank_keys_are_ignored() {
let m = compile(&[("", "x"), (" ", "y"), ("ok", "fine")]);
assert_eq!(m.apply("ok then"), "fine then");
}
}
+163 -14
View File
@@ -20,31 +20,36 @@ impl SmsApiClient {
}
}
/// Fetch messages for a specific contact within ±4 days of the given timestamp
/// Falls back to all contacts if no messages found for the specific contact
/// Messages are sorted by proximity to the center timestamp
/// Compute a `[start, end]` unix-second window of `2 * radius_days`
/// centered on `center_ts`. `radius_days < 1` is clamped to 1 to avoid
/// degenerate zero-width windows.
pub(crate) fn window_for_radius(center_ts: i64, radius_days: i64) -> (i64, i64) {
let r = radius_days.max(1);
let span = r * 86400;
(center_ts - span, center_ts + span)
}
/// Fetch messages for a specific contact within ±`radius_days` of the
/// given timestamp. Falls back to all contacts when no messages found
/// for the named contact. Sorted by proximity to the center timestamp.
pub async fn fetch_messages_for_contact(
&self,
contact: Option<&str>,
center_timestamp: i64,
radius_days: i64,
) -> Result<Vec<SmsMessage>> {
use chrono::Duration;
let effective_radius = radius_days.max(1);
let (start_ts, end_ts) = Self::window_for_radius(center_timestamp, radius_days);
// Calculate ±4 days range around the center timestamp
let center_dt = chrono::DateTime::from_timestamp(center_timestamp, 0)
.ok_or_else(|| anyhow::anyhow!("Invalid timestamp"))?;
let start_dt = center_dt - Duration::days(4);
let end_dt = center_dt + Duration::days(4);
let start_ts = start_dt.timestamp();
let end_ts = end_dt.timestamp();
// If contact specified, try fetching for that contact first
if let Some(contact_name) = contact {
log::info!(
"Fetching SMS for contact: {} (±4 days from {})",
"Fetching SMS for contact: {} (±{} days from {})",
contact_name,
effective_radius,
center_dt.format("%Y-%m-%d %H:%M:%S")
);
let messages = self
@@ -68,7 +73,8 @@ impl SmsApiClient {
// Fallback to all contacts
log::info!(
"Fetching all SMS messages (±4 days from {})",
"Fetching all SMS messages (±{} days from {})",
effective_radius,
center_dt.format("%Y-%m-%d %H:%M:%S")
);
self.fetch_messages(start_ts, end_ts, None, Some(center_timestamp))
@@ -250,6 +256,70 @@ impl SmsApiClient {
.collect())
}
/// Search message bodies via the Django side's FTS5 / semantic / hybrid
/// endpoint. `params.mode` selects the ranking strategy:
/// - "fts5" keyword-only, supports phrase / prefix / boolean / NEAR
/// - "semantic" embedding similarity
/// - "hybrid" both merged via reciprocal rank fusion (recommended)
///
/// All of `contact_id`, `date_from` / `date_to` (unix seconds), `is_mms`,
/// `has_media`, and `offset` are pushed to SMS-API server-side so the
/// filtered+paginated result set is exact rather than a client-side
/// over-fetch.
pub async fn search_messages(
&self,
query: &str,
params: &SmsSearchParams<'_>,
) -> Result<Vec<SmsSearchHit>> {
let mut url = format!(
"{}/api/messages/search/?q={}&mode={}&limit={}",
self.base_url,
urlencoding::encode(query),
urlencoding::encode(params.mode),
params.limit,
);
if let Some(cid) = params.contact_id {
url.push_str(&format!("&contact_id={}", cid));
}
if let Some(ref c) = params.contact {
url.push_str(&format!("&contact={}", urlencoding::encode(c)));
}
if let Some(off) = params.offset {
url.push_str(&format!("&offset={}", off));
}
if let Some(from) = params.date_from {
url.push_str(&format!("&date_from={}", from));
}
if let Some(to) = params.date_to {
url.push_str(&format!("&date_to={}", to));
}
if let Some(is_mms) = params.is_mms {
url.push_str(&format!("&is_mms={}", is_mms));
}
if let Some(has_media) = params.has_media {
url.push_str(&format!("&has_media={}", has_media));
}
let mut request = self.client.get(&url);
if let Some(token) = &self.token {
request = request.header("Authorization", format!("Bearer {}", token));
}
let response = request.send().await?;
if !response.status().is_success() {
let status = response.status();
let body = response.text().await.unwrap_or_default();
return Err(anyhow::anyhow!(
"SMS search request failed: {} - {}",
status,
body
));
}
let data: SmsSearchResponse = response.json().await?;
Ok(data.results)
}
pub async fn summarize_context(
&self,
messages: &[SmsMessage],
@@ -260,12 +330,13 @@ impl SmsApiClient {
}
// Create prompt for Ollama with sender/receiver distinction
let user_name = crate::ai::user_display_name();
let messages_text: String = messages
.iter()
.take(60) // Limit to avoid token overflow
.map(|m| {
if m.is_sent {
format!("Me: {}", m.body)
format!("{}: {}", user_name, m.body)
} else {
format!("{}: {}", m.contact, m.body)
}
@@ -314,3 +385,81 @@ struct SmsApiMessage {
#[serde(rename = "type")]
type_: i32,
}
#[derive(Debug, Clone, Deserialize)]
pub struct SmsSearchHit {
#[allow(dead_code)]
pub message_id: i64,
pub contact_name: String,
#[allow(dead_code)]
pub contact_address: String,
pub body: String,
pub date: i64,
/// Message direction code: 1 = received, 2 = sent.
#[serde(rename = "type")]
pub type_: i32,
/// Present for semantic / hybrid modes; absent for fts5.
#[serde(default)]
pub similarity_score: Option<f32>,
/// SMS-API-generated excerpt around the match, wrapped in `<mark>` tags.
/// For MMS messages that only matched via attachment text / filename
/// (empty `body`), the snippet is the only meaningful preview.
#[serde(default)]
pub snippet: Option<String>,
}
/// Optional filter / paging knobs for [`SmsApiClient::search_messages`].
/// All fields except `mode` and `limit` map 1:1 to the same-named SMS-API
/// query params (added in the 2026-05 search-enhancements release).
#[derive(Debug, Clone)]
pub struct SmsSearchParams<'a> {
pub mode: &'a str,
pub limit: usize,
pub contact_id: Option<i64>,
/// Contact name (case-insensitive). Resolved to a numeric ID by the
/// SMS-API server when `contact_id` is not set.
pub contact: Option<String>,
/// Unix-seconds inclusive lower bound on `date`.
pub date_from: Option<i64>,
/// Unix-seconds inclusive upper bound on `date`.
pub date_to: Option<i64>,
/// `Some(true)` = MMS only, `Some(false)` = SMS only, `None` = both.
pub is_mms: Option<bool>,
/// `Some(true)` = only messages with image/video/audio attachments.
pub has_media: Option<bool>,
pub offset: Option<usize>,
}
#[derive(Deserialize)]
struct SmsSearchResponse {
results: Vec<SmsSearchHit>,
#[allow(dead_code)]
#[serde(default)]
search_method: String,
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn window_for_radius_produces_2n_day_span() {
let center: i64 = 1_700_000_000;
let (start, end) = SmsApiClient::window_for_radius(center, 7);
assert_eq!(end - start, 14 * 86400);
assert_eq!(start + 7 * 86400, center);
assert_eq!(end - 7 * 86400, center);
}
#[test]
fn window_for_radius_clamps_zero_to_one() {
let (start, end) = SmsApiClient::window_for_radius(100_000, 0);
assert_eq!(end - start, 2 * 86400);
}
#[test]
fn window_for_radius_clamps_negative_to_one() {
let (start, end) = SmsApiClient::window_for_radius(100_000, -7);
assert_eq!(end - start, 2 * 86400);
}
}
+1278
View File
File diff suppressed because it is too large Load Diff
+748
View File
@@ -0,0 +1,748 @@
use crate::ai::insight_chat::ChatStreamEvent;
use std::collections::HashMap;
use std::sync::Arc;
use std::sync::Mutex as StdMutex;
use std::sync::atomic::{AtomicU32, Ordering};
use std::time::Instant;
use tokio::sync::{Mutex, Notify};
use tokio::task::AbortHandle;
/// Maximum number of events buffered per turn. Agentic turns typically
/// produce ~120 events; 500 provides 4× headroom. When exceeded, oldest
/// events are evicted from the front.
const MAX_BUFFERED_EVENTS: usize = 500;
/// Turn status codes used by `TurnEntry::status`.
#[derive(Debug, Clone, Copy, PartialEq)]
pub enum TurnStatus {
Running = 0,
Done = 1,
Error = 2,
Cancelled = 3,
}
impl From<u32> for TurnStatus {
fn from(v: u32) -> Self {
match v {
0 => TurnStatus::Running,
1 => TurnStatus::Done,
2 => TurnStatus::Error,
3 => TurnStatus::Cancelled,
_ => TurnStatus::Running,
}
}
}
impl TurnStatus {
pub fn as_str(&self) -> &'static str {
match self {
TurnStatus::Running => "running",
TurnStatus::Done => "done",
TurnStatus::Error => "error",
TurnStatus::Cancelled => "cancelled",
}
}
}
/// Shared metadata about a turn, read by the SSE replay handler to emit
/// the initial `turn_info` event and to decide whether to wait for new
/// events or close immediately.
#[derive(Debug, Clone)]
pub struct TurnInfo {
pub turn_id: String,
pub file_path: String,
pub library_id: i32,
pub status: TurnStatus,
pub total_events_pushed: u32,
pub buffered_count: u32,
}
/// Result of reading events at or after an absolute `skip_before` index.
#[derive(Debug)]
pub enum ReplayOutcome {
/// New events are available. `next_skip` is the absolute index to pass
/// on the next read (i.e. one past the last event returned).
Events {
events: Vec<ChatStreamEvent>,
next_skip: u32,
},
/// The reader is caught up to the live edge — no events past `skip_before`
/// yet. `next_skip` is the current high-water mark.
CaughtUp { next_skip: u32 },
/// `skip_before` points below the buffer's base index: the requested
/// events were evicted. Maps to HTTP 410 Gone.
Gone,
}
/// Per-turn state shared between the agentic loop (writer) and all SSE
/// replay connections (readers).
pub struct TurnEntry {
pub turn_id: String,
pub file_path: String,
pub library_id: i32,
/// Shared event buffer — multiple SSE connections can read independently.
/// Each connection tracks its own `skip_before` offset.
events: Mutex<Vec<ChatStreamEvent>>,
/// Monotonic counter: total events pushed (may exceed events.len()
/// due to eviction). Used for skip_before indexing.
total_events_pushed: AtomicU32,
/// The event index that this entry started with. Adjusts on eviction
/// so that `skip_before` stays absolute across connections.
base_index: AtomicU32,
pub status: AtomicU32,
/// Abort handle for the spawned agentic task, set once after spawn.
/// Behind a std `Mutex` because the entry is shared via `Arc` and the
/// handle is installed after the entry is already in the registry.
abort_handle: StdMutex<Option<AbortHandle>>,
pub created_at: Instant,
notify: Arc<Notify>,
}
impl TurnEntry {
pub fn new(turn_id: String, file_path: String, library_id: i32) -> Self {
Self {
turn_id,
file_path,
library_id,
events: Mutex::new(Vec::new()),
total_events_pushed: AtomicU32::new(0),
base_index: AtomicU32::new(0),
status: AtomicU32::new(TurnStatus::Running as u32),
abort_handle: StdMutex::new(None),
created_at: Instant::now(),
notify: Arc::new(Notify::new()),
}
}
/// Install the abort handle for the spawned agentic task. Called once,
/// right after the task is spawned.
pub fn set_abort_handle(&self, handle: AbortHandle) {
*self.abort_handle.lock().expect("abort_handle poisoned") = Some(handle);
}
/// Abort the spawned agentic task, if a handle was installed. Returns
/// `true` if a task was aborted.
pub fn abort(&self) -> bool {
if let Some(handle) = self
.abort_handle
.lock()
.expect("abort_handle poisoned")
.take()
{
handle.abort();
true
} else {
false
}
}
/// Push an event into the buffer. Evicts oldest events if the buffer
/// exceeds `MAX_BUFFERED_EVENTS`. Notifies all waiting SSE connections.
pub async fn push_event(&self, event: ChatStreamEvent) {
{
let mut events = self.events.lock().await;
// Evict oldest events if we've hit the cap.
if events.len() >= MAX_BUFFERED_EVENTS {
// Drop the oldest event to make room and advance the base
// index so skip_before stays absolute across connections.
events.remove(0);
self.base_index.fetch_add(1, Ordering::Relaxed);
}
events.push(event);
// Increment while holding the buffer lock so the counter stays in
// lock-step with the buffer even if multiple writers ever exist.
self.total_events_pushed.fetch_add(1, Ordering::Relaxed);
}
self.notify.notify_waiters();
}
/// Get a snapshot of turn metadata for the `turn_info` SSE event.
pub async fn info(&self) -> TurnInfo {
let events = self.events.lock().await;
let buffered = events.len() as u32;
let total = self.total_events_pushed.load(Ordering::Relaxed);
drop(events);
TurnInfo {
turn_id: self.turn_id.clone(),
file_path: self.file_path.clone(),
library_id: self.library_id,
status: self.status.load(Ordering::Relaxed).into(),
total_events_pushed: total,
buffered_count: buffered,
}
}
/// Set the terminal status and notify all waiters.
pub fn set_terminal_status(&self, status: TurnStatus) {
self.status.store(status as u32, Ordering::Relaxed);
self.notify.notify_waiters();
}
/// Read buffered events at or after absolute index `skip_before` without
/// waiting. Distinguishes "evicted" (Gone) from "caught up" (no new
/// events yet) — the previous boolean/`Option` API conflated the two.
pub async fn replay_from(&self, skip_before: u32) -> ReplayOutcome {
let events = self.events.lock().await;
let base = self.base_index.load(Ordering::Relaxed);
// The buffer holds absolute indices [base, base + len). A request
// below `base` asked for events that have been evicted.
if skip_before < base {
return ReplayOutcome::Gone;
}
let offset = (skip_before - base) as usize;
let next_skip = base + events.len() as u32;
if offset >= events.len() {
// Caught up to (or past) the live edge — nothing new yet.
return ReplayOutcome::CaughtUp { next_skip };
}
ReplayOutcome::Events {
events: events[offset..].to_vec(),
next_skip,
}
}
/// Wait for the next batch of events past `skip_before`, the turn to
/// finish, or eviction. Returns:
/// - `Events` when new events are available (drained before any terminal
/// signal so the final `Done`/`Error` is never dropped),
/// - `CaughtUp` only when the turn has reached a terminal status and the
/// reader is fully drained (the caller should close the stream),
/// - `Gone` when `skip_before` points into evicted territory.
pub async fn next_batch(&self, skip_before: u32) -> ReplayOutcome {
loop {
// Register interest BEFORE inspecting state so a push/terminal that
// races between our read and our await can't be lost (Notify's
// `notify_waiters` does not store a permit).
let notified = self.notify.notified();
tokio::pin!(notified);
notified.as_mut().enable();
match self.replay_from(skip_before).await {
ReplayOutcome::CaughtUp { next_skip } => {
// No new events. If the turn is finished, every event
// (including the terminal one) has already been drained
// above on a prior call, so signal the caller to close.
if !self.is_running() {
return ReplayOutcome::CaughtUp { next_skip };
}
// Still running — wait for the next push or terminal.
}
other => return other, // Events or Gone
}
notified.await;
}
}
/// Check if this turn is still running.
pub fn is_running(&self) -> bool {
self.status.load(Ordering::Relaxed) == TurnStatus::Running as u32
}
}
/// In-memory registry of all active chat turns. Injected into `AppState`
/// and shared across all handlers.
pub struct TurnRegistry {
entries: Mutex<HashMap<String, Arc<TurnEntry>>>,
timeout_secs: u64,
}
impl TurnRegistry {
pub fn new(timeout_secs: u64) -> Self {
Self {
entries: Mutex::new(HashMap::new()),
timeout_secs,
}
}
/// Returns the cleanup timeout in seconds.
pub fn timeout_secs(&self) -> u64 {
self.timeout_secs
}
/// Insert a new turn entry. Returns the turn_id.
pub async fn insert(&self, entry: Arc<TurnEntry>) -> String {
let turn_id = entry.turn_id.clone();
let mut entries = self.entries.lock().await;
entries.insert(turn_id.clone(), entry);
turn_id
}
/// Look up a turn by id. Returns None if not found or expired.
pub async fn get(&self, turn_id: &str) -> Option<Arc<TurnEntry>> {
let entries = self.entries.lock().await;
entries.get(turn_id).cloned()
}
/// Clean up stale entries older than the timeout. Returns the count of
/// entries removed.
pub async fn cleanup_stale(&self) -> usize {
let mut entries = self.entries.lock().await;
let _now = Instant::now();
let stale: Vec<String> = entries
.iter()
.filter(|(_, entry)| entry.created_at.elapsed().as_secs() > self.timeout_secs)
.map(|(id, _)| id.clone())
.collect();
for id in &stale {
entries.remove(id);
}
if !stale.is_empty() {
log::info!(
"TurnRegistry: cleaned up {} stale entries (timeout={}s)",
stale.len(),
self.timeout_secs
);
}
stale.len()
}
}
#[cfg(test)]
mod tests {
use super::*;
use crate::ai::insight_chat::ChatStreamEvent;
use std::time::Duration;
/// Unwrap the events from a `ReplayOutcome::Events`, panicking otherwise.
fn events_of(outcome: ReplayOutcome) -> Vec<ChatStreamEvent> {
match outcome {
ReplayOutcome::Events { events, .. } => events,
other => panic!("expected Events, got {other:?}"),
}
}
// ── TurnStatus ──────────────────────────────────────────────────
#[test]
fn turn_status_from_u32_valid_values() {
assert_eq!(TurnStatus::from(0), TurnStatus::Running);
assert_eq!(TurnStatus::from(1), TurnStatus::Done);
assert_eq!(TurnStatus::from(2), TurnStatus::Error);
assert_eq!(TurnStatus::from(3), TurnStatus::Cancelled);
}
#[test]
fn turn_status_from_u32_unknown_defaults_to_running() {
assert_eq!(TurnStatus::from(4), TurnStatus::Running);
assert_eq!(TurnStatus::from(u32::MAX), TurnStatus::Running);
}
#[test]
fn turn_status_as_str() {
assert_eq!(TurnStatus::Running.as_str(), "running");
assert_eq!(TurnStatus::Done.as_str(), "done");
assert_eq!(TurnStatus::Error.as_str(), "error");
assert_eq!(TurnStatus::Cancelled.as_str(), "cancelled");
}
// ── TurnEntry ───────────────────────────────────────────────────
#[tokio::test]
async fn turn_entry_push_and_replay() {
let entry = Arc::new(TurnEntry::new(
"t1".to_string(),
"/photo.jpg".to_string(),
1,
));
entry
.push_event(ChatStreamEvent::TextDelta("hello".to_string()))
.await;
entry
.push_event(ChatStreamEvent::TextDelta(" world".to_string()))
.await;
let events = events_of(entry.replay_from(0).await);
assert_eq!(events.len(), 2);
}
#[tokio::test]
async fn turn_entry_replay_with_skip() {
let entry = Arc::new(TurnEntry::new(
"t1".to_string(),
"/photo.jpg".to_string(),
1,
));
for i in 0..5 {
entry
.push_event(ChatStreamEvent::TextDelta(format!("e{i}")))
.await;
}
// skip_before=0 → all 5 events
let all = events_of(entry.replay_from(0).await);
assert_eq!(all.len(), 5);
// skip_before=2 → events 2,3,4 (3 events)
let skipped = events_of(entry.replay_from(2).await);
assert_eq!(skipped.len(), 3);
// skip_before=5 → caught up to the live edge (not Gone).
assert!(matches!(
entry.replay_from(5).await,
ReplayOutcome::CaughtUp { next_skip: 5 }
));
}
#[tokio::test]
async fn turn_entry_replay_empty_by_default() {
let entry = Arc::new(TurnEntry::new(
"t1".to_string(),
"/photo.jpg".to_string(),
1,
));
// Empty buffer with skip_before=0 → caught up (nothing to replay yet).
assert!(matches!(
entry.replay_from(0).await,
ReplayOutcome::CaughtUp { next_skip: 0 }
));
}
#[tokio::test]
async fn turn_entry_is_running_initially() {
let entry = TurnEntry::new("t1".to_string(), "/photo.jpg".to_string(), 1);
assert!(entry.is_running());
}
#[tokio::test]
async fn turn_entry_set_terminal_status() {
let entry = Arc::new(TurnEntry::new(
"t1".to_string(),
"/photo.jpg".to_string(),
1,
));
assert!(entry.is_running());
entry.set_terminal_status(TurnStatus::Done);
assert!(!entry.is_running());
}
#[tokio::test]
async fn turn_entry_info() {
let entry = Arc::new(TurnEntry::new(
"t1".to_string(),
"/photo.jpg".to_string(),
42,
));
entry
.push_event(ChatStreamEvent::TextDelta("x".to_string()))
.await;
entry.set_terminal_status(TurnStatus::Done);
let info = entry.info().await;
assert_eq!(info.turn_id, "t1");
assert_eq!(info.file_path, "/photo.jpg");
assert_eq!(info.library_id, 42);
assert_eq!(info.status, TurnStatus::Done);
assert_eq!(info.total_events_pushed, 1);
assert_eq!(info.buffered_count, 1);
}
#[tokio::test]
async fn turn_entry_eviction_caps_buffer() {
let entry = Arc::new(TurnEntry::new(
"t1".to_string(),
"/photo.jpg".to_string(),
1,
));
// Push MAX_BUFFERED_EVENTS + 10 events.
for i in 0..(MAX_BUFFERED_EVENTS + 10) {
entry
.push_event(ChatStreamEvent::TextDelta(format!("e{i}")))
.await;
}
// Asking from absolute 0 after eviction is Gone (0-9 were dropped).
assert!(matches!(entry.replay_from(0).await, ReplayOutcome::Gone));
// Reading from the new base (10) returns the full capped buffer.
let events = events_of(entry.replay_from(10).await);
assert_eq!(events.len(), MAX_BUFFERED_EVENTS);
// First event should be at index 10 (0-9 were evicted).
if let ChatStreamEvent::TextDelta(s) = &events[0] {
assert_eq!(s, "e10");
} else {
panic!("expected TextDelta");
}
// Last event should be at index MAX_BUFFERED_EVENTS + 9.
if let ChatStreamEvent::TextDelta(s) = &events[events.len() - 1] {
assert_eq!(s, &format!("e{}", MAX_BUFFERED_EVENTS + 9));
} else {
panic!("expected TextDelta");
}
}
#[tokio::test]
async fn turn_entry_replay_evicted_index_is_gone() {
let entry = Arc::new(TurnEntry::new(
"t1".to_string(),
"/photo.jpg".to_string(),
1,
));
// Push one past the cap so exactly one event (index 0) is evicted.
for i in 0..=MAX_BUFFERED_EVENTS {
entry
.push_event(ChatStreamEvent::TextDelta(format!("e{i}")))
.await;
}
// Base is now 1; asking from absolute 0 is evicted territory → Gone.
assert!(matches!(entry.replay_from(0).await, ReplayOutcome::Gone));
// skip_before = MAX_BUFFERED_EVENTS → last event only (index valid).
let last = events_of(entry.replay_from(MAX_BUFFERED_EVENTS as u32).await);
assert_eq!(last.len(), 1);
// skip_before = MAX_BUFFERED_EVENTS + 1 → caught up to the live edge.
assert!(matches!(
entry.replay_from((MAX_BUFFERED_EVENTS + 1) as u32).await,
ReplayOutcome::CaughtUp { .. }
));
}
// ── TurnRegistry ────────────────────────────────────────────────
#[tokio::test]
async fn turn_registry_insert_and_get() {
let registry = TurnRegistry::new(300);
let entry = Arc::new(TurnEntry::new(
"t1".to_string(),
"/photo.jpg".to_string(),
1,
));
let id = registry.insert(entry).await;
assert_eq!(id, "t1");
let retrieved = registry.get("t1").await;
assert!(retrieved.is_some());
assert_eq!(retrieved.unwrap().turn_id, "t1");
}
#[tokio::test]
async fn turn_registry_get_nonexistent_returns_none() {
let registry = TurnRegistry::new(300);
assert!(registry.get("nonexistent").await.is_none());
}
#[tokio::test]
async fn turn_registry_cleanup_stale_removes_old_entries() {
let registry = TurnRegistry::new(0);
let mut entry = TurnEntry::new("t1".to_string(), "/photo.jpg".to_string(), 1);
entry.created_at = Instant::now() - Duration::from_secs(1);
registry.insert(Arc::new(entry)).await;
let cleaned = registry.cleanup_stale().await;
assert_eq!(cleaned, 1);
assert!(registry.get("t1").await.is_none());
}
#[tokio::test]
async fn turn_registry_cleanup_stale_preserves_recent() {
let registry = TurnRegistry::new(3600); // 1 hour
let entry = Arc::new(TurnEntry::new(
"t1".to_string(),
"/photo.jpg".to_string(),
1,
));
registry.insert(entry).await;
let cleaned = registry.cleanup_stale().await;
assert_eq!(cleaned, 0);
assert!(registry.get("t1").await.is_some());
}
#[tokio::test]
async fn turn_registry_cleanup_stale_multiple() {
let registry = TurnRegistry::new(0);
for i in 0..5 {
let mut entry = TurnEntry::new(format!("t{i}"), "/photo.jpg".to_string(), 1);
entry.created_at = Instant::now() - Duration::from_secs(1);
registry.insert(Arc::new(entry)).await;
}
let cleaned = registry.cleanup_stale().await;
assert_eq!(cleaned, 5);
}
#[tokio::test]
async fn turn_registry_timeout_secs() {
let registry = TurnRegistry::new(600);
assert_eq!(registry.timeout_secs(), 600);
}
// ── next_batch / live replay ────────────────────────────────────
/// Drain a turn the way the SSE replay handler does: pull batches via
/// `next_batch` until the turn is finished and fully drained.
async fn drain_to_end(entry: Arc<TurnEntry>) -> Vec<ChatStreamEvent> {
let mut out = Vec::new();
let mut skip = 0u32;
while let ReplayOutcome::Events { events, next_skip } = entry.next_batch(skip).await {
out.extend(events);
skip = next_skip;
}
out
}
fn is_terminal(ev: &ChatStreamEvent) -> bool {
matches!(ev, ChatStreamEvent::Done { .. } | ChatStreamEvent::Error(_))
}
/// The core guarantee behind the replay rewrite: a reader waiting on
/// `next_batch` always receives the terminal event, even though the
/// writer flips status to terminal immediately after pushing it.
#[tokio::test]
async fn next_batch_always_delivers_terminal_event() {
for _ in 0..50 {
let entry = Arc::new(TurnEntry::new("t".into(), "/p.jpg".into(), 1));
let writer = entry.clone();
let w = tokio::spawn(async move {
writer
.push_event(ChatStreamEvent::IterationStart { n: 1, max: 6 })
.await;
writer
.push_event(ChatStreamEvent::TextDelta("hi".into()))
.await;
// Push terminal then flip status with no await between — the
// race that previously dropped the Done on the reader side.
writer
.push_event(ChatStreamEvent::Done {
tool_calls_made: 0,
iterations_used: 1,
truncated: false,
prompt_tokens: None,
eval_tokens: None,
num_ctx: None,
amended_insight_id: None,
backend_used: "local".into(),
model_used: "m".into(),
cancelled: false,
})
.await;
writer.set_terminal_status(TurnStatus::Done);
});
let events = drain_to_end(entry).await;
w.await.unwrap();
assert!(
events.last().is_some_and(is_terminal),
"terminal event missing; got {} events",
events.len()
);
assert_eq!(events.len(), 3, "expected IterationStart, TextDelta, Done");
}
}
/// A reader that connects before any event is pushed blocks in
/// `next_batch` and then receives events as the writer produces them.
#[tokio::test]
async fn next_batch_waits_for_late_events() {
let entry = Arc::new(TurnEntry::new("t".into(), "/p.jpg".into(), 1));
let writer = entry.clone();
tokio::spawn(async move {
tokio::task::yield_now().await;
writer
.push_event(ChatStreamEvent::TextDelta("late".into()))
.await;
writer.set_terminal_status(TurnStatus::Done);
});
// First call blocks until the writer pushes, rather than returning
// CaughtUp on the empty buffer of a running turn.
match entry.next_batch(0).await {
ReplayOutcome::Events { events, next_skip } => {
assert_eq!(events.len(), 1);
assert_eq!(next_skip, 1);
}
other => panic!("expected Events, got {other:?}"),
}
}
#[tokio::test]
async fn next_batch_closes_on_terminal_when_caught_up() {
let entry = Arc::new(TurnEntry::new("t".into(), "/p.jpg".into(), 1));
entry
.push_event(ChatStreamEvent::TextDelta("x".into()))
.await;
entry.set_terminal_status(TurnStatus::Done);
// Caught up (skip past the one buffered event) on a finished turn →
// CaughtUp so the handler closes the stream rather than hanging.
assert!(matches!(
entry.next_batch(1).await,
ReplayOutcome::CaughtUp { .. }
));
}
#[tokio::test]
async fn next_batch_reports_gone_for_evicted_index() {
let entry = Arc::new(TurnEntry::new("t".into(), "/p.jpg".into(), 1));
for i in 0..=MAX_BUFFERED_EVENTS {
entry
.push_event(ChatStreamEvent::TextDelta(format!("e{i}")))
.await;
}
// Index 0 was evicted (base advanced to 1).
assert!(matches!(entry.next_batch(0).await, ReplayOutcome::Gone));
}
// ── abort handle (#1 cancellation) ──────────────────────────────
#[tokio::test]
async fn abort_handle_aborts_task_once() {
let entry = Arc::new(TurnEntry::new("t".into(), "/p.jpg".into(), 1));
// No handle installed yet → abort is a no-op.
assert!(!entry.abort());
let handle = tokio::spawn(async {
// Long-lived task that only ends via abort.
futures::future::pending::<()>().await;
});
entry.set_abort_handle(handle.abort_handle());
assert!(entry.abort(), "first abort should fire");
assert!(!entry.abort(), "handle is taken; second abort is a no-op");
// The aborted task resolves to a cancellation JoinError.
let join = handle.await;
assert!(join.unwrap_err().is_cancelled());
}
#[tokio::test]
async fn base_index_tracks_eviction() {
let entry = Arc::new(TurnEntry::new("t".into(), "/p.jpg".into(), 1));
for i in 0..(MAX_BUFFERED_EVENTS + 5) {
entry
.push_event(ChatStreamEvent::TextDelta(format!("e{i}")))
.await;
}
let info = entry.info().await;
// 5 events evicted; total keeps climbing, buffer stays capped.
assert_eq!(info.total_events_pushed, (MAX_BUFFERED_EVENTS + 5) as u32);
assert_eq!(info.buffered_count, MAX_BUFFERED_EVENTS as u32);
// First live index is 5: reading from there yields the full buffer.
let from_base = events_of(entry.replay_from(5).await);
assert_eq!(from_base.len(), MAX_BUFFERED_EVENTS);
}
}
+796
View File
@@ -0,0 +1,796 @@
//! Per-tick drains the watcher runs alongside ingest.
//!
//! These passes were previously inlined in `main.rs`; they exist because
//! a quick scan only walks recently-modified files, so any backlog of
//! rows missing a `content_hash` / `date_taken` / face detection
//! wouldn't otherwise drain except during the once-an-hour full scan.
//! Each function is bounded per call by a `*_PER_TICK` env-var cap.
use std::collections::HashMap;
use std::path::PathBuf;
use std::sync::{Arc, Mutex};
use log::{debug, info, warn};
use crate::content_hash;
use crate::database::ExifDao;
use crate::date_resolver;
use crate::face_watch;
use crate::faces;
use crate::file_types;
use crate::libraries;
use crate::tags;
/// Compute and persist content_hash for image_exif rows where it's NULL.
///
/// Bounded per call by `FACE_HASH_BACKFILL_MAX_PER_TICK` (default 2000)
/// so a watcher tick on a large legacy library doesn't block for hours
/// blake3-ing every photo at once. Subsequent scans pick up the rest.
/// For 50k+ libraries the dedicated `cargo run --bin backfill_hashes`
/// is still faster (it doesn't fight a watcher loop for the DAO mutex).
///
/// Drains unhashed image_exif rows by querying them directly, independent
/// of the filesystem walk. Quick scans only walk recently-modified files,
/// so a backlog of pre-existing unhashed rows never enters
/// `process_new_files`'s candidate set — left alone, it would only drain
/// on full scans (default once an hour). Calling this every tick keeps
/// the face-detection backlog moving regardless.
///
/// Returns the number of rows successfully backfilled this pass.
pub fn backfill_unhashed_backlog(
context: &opentelemetry::Context,
library: &libraries::Library,
exif_dao: &Arc<Mutex<Box<dyn ExifDao>>>,
) -> usize {
let cap: i64 = dotenv::var("FACE_HASH_BACKFILL_MAX_PER_TICK")
.ok()
.and_then(|s| s.parse().ok())
.filter(|n: &i64| *n > 0)
.unwrap_or(2000);
// Fetch up to cap+1 rows so we can tell "more remain" without a
// separate count query. Across libraries — there's no per-library
// filter on get_rows_missing_hash today — but we only ever update
// rows whose library_id matches the caller's library, so other
// libraries' rows just get skipped here and picked up on the next
// library's tick. Negligible cost given the cap.
let rows: Vec<(i32, String)> = {
let mut dao = exif_dao.lock().expect("Unable to lock ExifDao");
dao.get_rows_missing_hash(context, cap + 1)
.unwrap_or_default()
};
if rows.is_empty() {
return 0;
}
let more_than_cap = rows.len() as i64 > cap;
let base_path = std::path::Path::new(&library.root_path);
let mut backfilled = 0usize;
let mut errors = 0usize;
let mut skipped_other_lib = 0usize;
for (lib_id, rel_path) in rows.iter().take(cap as usize) {
if *lib_id != library.id {
skipped_other_lib += 1;
continue;
}
let abs = base_path.join(rel_path);
if !abs.exists() {
// File walked away — the watcher's reconciliation pass will
// remove the orphan exif row eventually.
continue;
}
match content_hash::compute(&abs) {
Ok(id) => {
let mut dao = exif_dao.lock().expect("Unable to lock ExifDao");
if let Err(e) = dao.backfill_content_hash(
context,
library.id,
rel_path,
&id.content_hash,
id.size_bytes,
) {
warn!(
"face_watch: backfill_content_hash failed for {}: {:?}",
rel_path, e
);
errors += 1;
} else {
backfilled += 1;
}
}
Err(e) => {
debug!(
"face_watch: hash compute failed for {} ({:?})",
abs.display(),
e
);
errors += 1;
}
}
}
if backfilled > 0 || errors > 0 || more_than_cap {
info!(
"face_watch: backfill pass for library '{}': hashed {} ({} error(s), {} skipped to other libraries; {} cap, more_remain={})",
library.name, backfilled, errors, skipped_other_lib, cap, more_than_cap
);
}
backfilled
}
/// Drain image_exif rows whose `date_taken` was never resolved or was
/// resolved by the weakest fallback (`fs_time`). Runs the canonical-date
/// waterfall — exiftool batch (one subprocess for the whole tick's
/// rows) → filename regex → earliest_fs_time — and persists each
/// resolution with its source tag. Capped per tick by
/// `DATE_BACKFILL_MAX_PER_TICK` (default 500) so a 14k-row library
/// drains over a few quick-scan ticks without blocking the watcher.
///
/// kamadak-exif is intentionally skipped here: the row already has a
/// NULL date_taken because the ingest path's kamadak-exif call returned
/// nothing, and re-running it would just produce the same answer.
/// exiftool is the meaningful new attempt — it handles videos and
/// MakerNote-hosted dates kamadak can't reach.
pub fn backfill_missing_date_taken(
context: &opentelemetry::Context,
library: &libraries::Library,
exif_dao: &Arc<Mutex<Box<dyn ExifDao>>>,
) -> usize {
let cap: i64 = dotenv::var("DATE_BACKFILL_MAX_PER_TICK")
.ok()
.and_then(|s| s.parse().ok())
.filter(|n: &i64| *n > 0)
.unwrap_or(500);
let rows: Vec<(i32, String)> = {
let mut dao = exif_dao.lock().expect("Unable to lock ExifDao");
dao.get_rows_needing_date_backfill(context, library.id, cap + 1)
.unwrap_or_default()
};
if rows.is_empty() {
return 0;
}
let more_than_cap = rows.len() as i64 > cap;
let base_path = std::path::Path::new(&library.root_path);
// Build absolute paths and drop rows whose files no longer exist —
// the missing-file scan in library_maintenance retires deleted rows
// separately. Without this filter, NULL-date rows for missing files
// would loop through the drain forever (no source can resolve them).
let mut existing: Vec<(String, PathBuf)> = Vec::with_capacity(rows.len());
for (_, rel_path) in rows.iter().take(cap as usize) {
let abs = base_path.join(rel_path);
if abs.exists() {
existing.push((rel_path.clone(), abs));
}
}
if existing.is_empty() {
return 0;
}
// One exiftool subprocess for the whole batch; the resolver falls
// through to filename / fs_time per file when exiftool can't supply
// a date (or isn't installed at all).
let paths: Vec<PathBuf> = existing.iter().map(|(_, p)| p.clone()).collect();
let resolved = date_resolver::resolve_dates_batch(&paths, &HashMap::new());
let mut backfilled = 0usize;
let mut unresolved = 0usize;
let mut by_source: HashMap<&'static str, usize> = HashMap::new();
{
let mut dao = exif_dao.lock().expect("Unable to lock ExifDao");
for (rel_path, abs) in &existing {
let Some(rd) = resolved.get(abs).copied() else {
unresolved += 1;
continue;
};
match dao.backfill_date_taken(
context,
library.id,
rel_path,
rd.timestamp,
rd.source.as_str(),
) {
Ok(()) => {
backfilled += 1;
*by_source.entry(rd.source.as_str()).or_insert(0) += 1;
}
Err(e) => {
warn!(
"date_backfill: update failed for lib {} {}: {:?}",
library.id, rel_path, e
);
}
}
}
}
if backfilled > 0 || unresolved > 0 || more_than_cap {
info!(
"date_backfill: library '{}': resolved {} ({:?}), {} unresolved, cap={}, more_remain={}",
library.name, backfilled, by_source, unresolved, cap, more_than_cap
);
}
backfilled
}
/// Per-tick face-detection drain. Pulls a capped batch of hashed-but-
/// unscanned image_exif rows directly via the FaceDao anti-join and
/// hands them to the existing detection pass. Runs on every tick (not
/// just full scans) so the backlog moves at quick-scan cadence.
/// Per-tick CLIP encoding drain. Mirrors `process_face_backlog`: pull
/// up to `CLIP_BACKLOG_MAX_PER_TICK` candidates with a known
/// `content_hash` but no `clip_embedding`, hand them to
/// `clip_watch::run_clip_encoding_pass` for parallel fan-out, and let
/// that module write the result back via `backfill_clip_embedding`.
///
/// Idempotent — a row stays in the candidate set until its embedding
/// lands, so a transient failure (Apollo unreachable, CUDA OOM) just
/// defers to the next tick. Permanent failures (un-decodable bytes)
/// retry every tick at this point; future Branch may add a status
/// column like face_detections has.
pub fn process_clip_backlog(
context: &opentelemetry::Context,
library: &libraries::Library,
clip_client: &crate::ai::clip_client::ClipClient,
exif_dao: &Arc<Mutex<Box<dyn ExifDao>>>,
excluded_dirs: &[String],
) {
if !clip_client.is_enabled() {
return;
}
let cap: i64 = dotenv::var("CLIP_BACKLOG_MAX_PER_TICK")
.ok()
.and_then(|s| s.parse().ok())
.filter(|n: &i64| *n > 0)
.unwrap_or(32);
let rows: Vec<(String, String)> = {
let mut dao = exif_dao.lock().expect("exif dao");
match dao.list_clip_unencoded_candidates(context, library.id, cap) {
Ok(r) => r,
Err(e) => {
warn!(
"clip_watch: list_clip_unencoded_candidates failed for library '{}': {:?}",
library.name, e
);
return;
}
}
};
if rows.is_empty() {
return;
}
info!(
"clip_watch: backlog drain — encoding {} candidate(s) for library '{}' (cap={})",
rows.len(),
library.name,
cap
);
let candidates: Vec<crate::clip_watch::ClipCandidate> = rows
.into_iter()
.map(
|(rel_path, content_hash)| crate::clip_watch::ClipCandidate {
rel_path,
content_hash,
},
)
.collect();
crate::clip_watch::run_clip_encoding_pass(
library,
excluded_dirs,
clip_client,
Arc::clone(exif_dao),
candidates,
);
}
pub fn process_face_backlog(
context: &opentelemetry::Context,
library: &libraries::Library,
face_client: &crate::ai::face_client::FaceClient,
face_dao: &Arc<Mutex<Box<dyn faces::FaceDao>>>,
tag_dao: &Arc<Mutex<Box<dyn tags::TagDao>>>,
excluded_dirs: &[String],
) {
let cap: i64 = dotenv::var("FACE_BACKLOG_MAX_PER_TICK")
.ok()
.and_then(|s| s.parse().ok())
.filter(|n: &i64| *n > 0)
.unwrap_or(64);
let rows: Vec<(String, String)> = {
let mut dao = face_dao.lock().expect("face dao");
match dao.list_unscanned_candidates(context, library.id, cap) {
Ok(r) => r,
Err(e) => {
warn!(
"face_watch: list_unscanned_candidates failed for library '{}': {:?}",
library.name, e
);
return;
}
}
};
if rows.is_empty() {
return;
}
info!(
"face_watch: backlog drain — running detection on {} candidate(s) for library '{}' (cap={})",
rows.len(),
library.name,
cap
);
let candidates: Vec<face_watch::FaceCandidate> = rows
.into_iter()
.map(|(rel_path, content_hash)| face_watch::FaceCandidate {
rel_path,
content_hash,
})
.collect();
face_watch::run_face_detection_pass(
library,
excluded_dirs,
face_client,
Arc::clone(face_dao),
Arc::clone(tag_dao),
candidates,
);
}
/// Compute content_hash for any image rows the walker just touched
/// whose stored EXIF row is still hash-less. Called from
/// `process_new_files` so freshly-ingested files don't have to wait for
/// the next standalone `backfill_unhashed_backlog` tick before face
/// detection can key on their bytes.
///
/// Cap is on **successes only**. An earlier version counted errors too,
/// so a pocket of chronically-unhashable files at the front of the
/// table (vanished mid-scan, permission denied, etc.) burned the budget
/// every tick and the rest of the backlog never advanced.
pub fn backfill_missing_content_hashes(
context: &opentelemetry::Context,
files: &[(PathBuf, String)],
library: &libraries::Library,
exif_dao: &Arc<Mutex<Box<dyn ExifDao>>>,
) {
let image_paths: Vec<String> = files
.iter()
.filter(|(p, _)| !file_types::is_video_file(p))
.map(|(_, rel)| rel.clone())
.collect();
if image_paths.is_empty() {
return;
}
let exif_records = {
let mut dao = exif_dao.lock().expect("Unable to lock ExifDao");
dao.get_exif_batch(context, Some(library.id), &image_paths)
.unwrap_or_default()
};
// Cheap lookup back from rel_path → absolute file_path so
// content_hash::compute can read the bytes.
let path_by_rel: HashMap<String, &PathBuf> =
files.iter().map(|(p, rel)| (rel.clone(), p)).collect();
let cap: usize = dotenv::var("FACE_HASH_BACKFILL_MAX_PER_TICK")
.ok()
.and_then(|s| s.parse().ok())
.filter(|n: &usize| *n > 0)
.unwrap_or(2000);
// Count the unhashed backlog up front so we can surface "still needs
// backfill: N" in the log — without it, a face-scan that's stuck at
// 44% looks stalled when really it's chipping through hashes.
let unhashed_total = exif_records
.iter()
.filter(|r| r.content_hash.is_none())
.count();
let mut backfilled = 0usize;
let mut errors = 0usize;
for record in &exif_records {
if backfilled >= cap {
break;
}
if record.content_hash.is_some() {
continue;
}
let Some(file_path) = path_by_rel.get(&record.file_path) else {
// Walked file went missing between the directory scan and now;
// next tick will retry naturally.
continue;
};
match content_hash::compute(file_path) {
Ok(id) => {
let mut dao = exif_dao.lock().expect("Unable to lock ExifDao");
if let Err(e) = dao.backfill_content_hash(
context,
library.id,
&record.file_path,
&id.content_hash,
id.size_bytes,
) {
warn!(
"face_watch: backfill_content_hash failed for {}: {:?}",
record.file_path, e
);
errors += 1;
} else {
backfilled += 1;
}
}
Err(e) => {
debug!(
"face_watch: hash compute failed for {} ({:?})",
file_path.display(),
e
);
errors += 1;
}
}
}
// Always log when there's an unhashed backlog so an operator
// looking at "scan stuck at 44%" can see backfill is running and
// how much remains. Quiet only when there's nothing to do.
if unhashed_total > 0 || backfilled > 0 || errors > 0 {
let remaining = unhashed_total.saturating_sub(backfilled);
info!(
"face_watch: backfilled {}/{} content_hash for library '{}' ({} error(s); {} still need backfill; cap={})",
backfilled, unhashed_total, library.name, errors, remaining, cap
);
}
}
/// Build the face-detection candidate list for a scan tick.
///
/// Returns `(rel_path, content_hash)` for every image file that has a
/// content_hash recorded in image_exif but no row in face_detections
/// yet. Re-querying image_exif here picks up rows the EXIF write loop
/// just inserted alongside any pre-existing rows the watcher walked
/// over — covers both new uploads and the initial backlog scan.
pub fn build_face_candidates(
context: &opentelemetry::Context,
library: &libraries::Library,
files: &[(PathBuf, String)],
exif_dao: &Arc<Mutex<Box<dyn ExifDao>>>,
face_dao: &Arc<Mutex<Box<dyn faces::FaceDao>>>,
) -> Vec<face_watch::FaceCandidate> {
// Restrict to image files; videos aren't face-scanned in v1 (kamadak
// doesn't even register them in image_exif).
let image_paths: Vec<String> = files
.iter()
.filter(|(p, _)| !file_types::is_video_file(p))
.map(|(_, rel)| rel.clone())
.collect();
if image_paths.is_empty() {
return Vec::new();
}
let exif_records = {
let mut dao = exif_dao.lock().expect("Unable to lock ExifDao");
dao.get_exif_batch(context, Some(library.id), &image_paths)
.unwrap_or_default()
};
// rel_path → content_hash (only rows with a hash; without one we have
// nothing to key face data against).
let mut hash_by_path: HashMap<String, String> = HashMap::with_capacity(exif_records.len());
for record in exif_records {
if let Some(h) = record.content_hash {
hash_by_path.insert(record.file_path, h);
}
}
let mut candidates = Vec::new();
let mut dao = face_dao.lock().expect("face dao");
for rel_path in image_paths {
let Some(hash) = hash_by_path.get(&rel_path) else {
continue;
};
match dao.already_scanned(context, hash) {
Ok(true) => continue,
Ok(false) => candidates.push(face_watch::FaceCandidate {
rel_path,
content_hash: hash.clone(),
}),
Err(e) => {
warn!("face_watch: already_scanned errored for {}: {:?}", hash, e);
}
}
}
candidates
}
#[cfg(test)]
mod tests {
use super::*;
use std::fs;
use std::sync::{Arc, Mutex};
use diesel::prelude::*;
use tempfile::TempDir;
use crate::database::models::{InsertImageExif, InsertLibrary};
use crate::database::test::in_memory_db_connection;
use crate::database::{ExifDao, SqliteExifDao, schema};
use crate::faces::{FaceDao, SqliteFaceDao};
use crate::libraries::Library;
fn ctx() -> opentelemetry::Context {
opentelemetry::Context::new()
}
/// Everything `setup` hands back to a test: tempdir, library, shared
/// connection, and the two DAOs. Aliased to keep clippy's
/// type-complexity lint satisfied.
type SetupFixture = (
TempDir,
Library,
Arc<Mutex<diesel::SqliteConnection>>,
Arc<Mutex<Box<dyn ExifDao>>>,
Arc<Mutex<Box<dyn FaceDao>>>,
);
/// Build a tempdir-backed library + DAOs sharing a single in-memory
/// SQLite connection (so cross-table joins like
/// `list_unscanned_candidates` see consistent state).
fn setup() -> SetupFixture {
let tmp = TempDir::new().expect("tempdir");
let mut conn = in_memory_db_connection();
// Migration seeds library id=1 with a placeholder root; rewrite it
// to point at the tempdir so `<root>/<rel_path>` resolves to real
// files this test creates.
diesel::update(schema::libraries::table.filter(schema::libraries::id.eq(1)))
.set(schema::libraries::root_path.eq(tmp.path().to_string_lossy().to_string()))
.execute(&mut conn)
.expect("rewrite library 1 root");
// Add a second library so cross-library skip cases have somewhere
// to put their rows.
diesel::insert_into(schema::libraries::table)
.values(InsertLibrary {
name: "other",
root_path: "/tmp/other-test-lib",
created_at: 0,
enabled: true,
excluded_dirs: None,
})
.execute(&mut conn)
.expect("seed second library");
let library = Library {
id: 1,
name: "main".to_string(),
root_path: tmp.path().to_string_lossy().to_string(),
enabled: true,
excluded_dirs: Vec::new(),
};
let shared = Arc::new(Mutex::new(conn));
let exif_dao: Arc<Mutex<Box<dyn ExifDao>>> = Arc::new(Mutex::new(Box::new(
SqliteExifDao::from_shared(Arc::clone(&shared)),
)));
let face_dao: Arc<Mutex<Box<dyn FaceDao>>> = Arc::new(Mutex::new(Box::new(
SqliteFaceDao::from_connection(Arc::clone(&shared)),
)));
(tmp, library, shared, exif_dao, face_dao)
}
fn insert_exif(
exif_dao: &Arc<Mutex<Box<dyn ExifDao>>>,
lib_id: i32,
rel: &str,
content_hash: Option<&str>,
) {
let mut dao = exif_dao.lock().unwrap();
dao.store_exif(
&ctx(),
InsertImageExif {
library_id: lib_id,
file_path: rel.to_string(),
camera_make: None,
camera_model: None,
lens_model: None,
width: None,
height: None,
orientation: None,
gps_latitude: None,
gps_longitude: None,
gps_altitude: None,
focal_length: None,
aperture: None,
shutter_speed: None,
iso: None,
date_taken: None,
created_time: 0,
last_modified: 0,
content_hash: content_hash.map(|s| s.to_string()),
size_bytes: None,
phash_64: None,
dhash_64: None,
date_taken_source: None,
},
)
.expect("insert");
}
fn write_image(root: &std::path::Path, rel: &str, bytes: &[u8]) {
let abs = root.join(rel);
if let Some(parent) = abs.parent() {
fs::create_dir_all(parent).expect("mkdir");
}
fs::write(abs, bytes).expect("write file");
}
#[test]
fn backfill_unhashed_backlog_hashes_missing_rows_in_this_library() {
let (tmp, library, _conn, exif_dao, _face_dao) = setup();
write_image(tmp.path(), "a.jpg", b"alpha-bytes");
write_image(tmp.path(), "b.jpg", b"bravo-bytes");
insert_exif(&exif_dao, 1, "a.jpg", None);
insert_exif(&exif_dao, 1, "b.jpg", None);
let backfilled = backfill_unhashed_backlog(&ctx(), &library, &exif_dao);
assert_eq!(backfilled, 2);
let mut dao = exif_dao.lock().unwrap();
let rows = dao
.get_exif_batch(&ctx(), Some(1), &["a.jpg".to_string(), "b.jpg".to_string()])
.unwrap();
assert_eq!(rows.len(), 2);
for r in rows {
assert!(
r.content_hash.is_some(),
"row {} should have a hash",
r.file_path
);
}
}
#[test]
fn backfill_unhashed_backlog_skips_other_libraries_and_missing_files() {
let (tmp, library, _conn, exif_dao, _face_dao) = setup();
write_image(tmp.path(), "exists.jpg", b"hello");
// Row for this library whose file is missing on disk:
insert_exif(&exif_dao, 1, "ghost.jpg", None);
insert_exif(&exif_dao, 1, "exists.jpg", None);
// Row in the other library — must be skipped (different lib_id).
insert_exif(&exif_dao, 2, "other.jpg", None);
let backfilled = backfill_unhashed_backlog(&ctx(), &library, &exif_dao);
assert_eq!(backfilled, 1, "only the existing in-library file hashes");
let mut dao = exif_dao.lock().unwrap();
let other = dao
.get_exif_batch(&ctx(), Some(2), &["other.jpg".to_string()])
.unwrap();
assert_eq!(other.len(), 1);
assert!(
other[0].content_hash.is_none(),
"other-library row must remain unhashed"
);
let ghost = dao
.get_exif_batch(&ctx(), Some(1), &["ghost.jpg".to_string()])
.unwrap();
assert_eq!(ghost.len(), 1);
assert!(
ghost[0].content_hash.is_none(),
"missing-on-disk row stays unhashed (reconciliation removes it later)"
);
}
#[test]
fn backfill_unhashed_backlog_respects_per_tick_cap() {
// Env-var-driven cap; the function reads it on every call, so we
// can set it just for this test and unset before returning.
// Serial guard: tests in the same binary may share env, but each
// backfill call re-reads — and we only care that the cap shape
// (success count <= cap, more_remain logged) holds.
unsafe {
std::env::set_var("FACE_HASH_BACKFILL_MAX_PER_TICK", "2");
}
let (tmp, library, _conn, exif_dao, _face_dao) = setup();
for i in 0..5 {
let rel = format!("img_{}.jpg", i);
write_image(tmp.path(), &rel, format!("bytes-{}", i).as_bytes());
insert_exif(&exif_dao, 1, &rel, None);
}
let backfilled = backfill_unhashed_backlog(&ctx(), &library, &exif_dao);
assert_eq!(backfilled, 2, "cap=2 must bound the per-tick successes");
unsafe {
std::env::remove_var("FACE_HASH_BACKFILL_MAX_PER_TICK");
}
}
#[test]
fn backfill_missing_content_hashes_skips_videos_and_hashed_rows() {
let (tmp, library, _conn, exif_dao, _face_dao) = setup();
// Two image rows (one already hashed, one not), one video.
write_image(tmp.path(), "fresh.jpg", b"fresh-pixels");
write_image(tmp.path(), "already.jpg", b"already-pixels");
write_image(tmp.path(), "clip.mp4", b"video-bytes");
insert_exif(&exif_dao, 1, "fresh.jpg", None);
insert_exif(&exif_dao, 1, "already.jpg", Some("pre-existing-hash"));
insert_exif(&exif_dao, 1, "clip.mp4", None);
let files: Vec<(PathBuf, String)> = vec![
(tmp.path().join("fresh.jpg"), "fresh.jpg".to_string()),
(tmp.path().join("already.jpg"), "already.jpg".to_string()),
(tmp.path().join("clip.mp4"), "clip.mp4".to_string()),
];
backfill_missing_content_hashes(&ctx(), &files, &library, &exif_dao);
let mut dao = exif_dao.lock().unwrap();
let rows = dao
.get_exif_batch(
&ctx(),
Some(1),
&[
"fresh.jpg".to_string(),
"already.jpg".to_string(),
"clip.mp4".to_string(),
],
)
.unwrap();
let by_path: HashMap<String, Option<String>> = rows
.into_iter()
.map(|r| (r.file_path, r.content_hash))
.collect();
assert!(
by_path["fresh.jpg"].is_some(),
"fresh image must get a hash"
);
assert_eq!(
by_path["already.jpg"].as_deref(),
Some("pre-existing-hash"),
"already-hashed image left untouched"
);
assert!(
by_path["clip.mp4"].is_none(),
"video skipped (not face-scanned, no hash needed via this path)"
);
}
#[test]
fn build_face_candidates_filters_videos_unhashed_and_already_scanned() {
let (tmp, library, _conn, exif_dao, face_dao) = setup();
// Seed image_exif with: hashed unscanned, hashed scanned, unhashed,
// and a video. Files don't need to exist on disk — the function
// doesn't read them, only the DB rows.
insert_exif(&exif_dao, 1, "fresh.jpg", Some("hash-fresh"));
insert_exif(&exif_dao, 1, "scanned.jpg", Some("hash-scanned"));
insert_exif(&exif_dao, 1, "unhashed.jpg", None);
insert_exif(&exif_dao, 1, "clip.mp4", Some("hash-video"));
// Mark `scanned.jpg`'s hash as already detected.
{
let mut dao = face_dao.lock().unwrap();
dao.mark_status(&ctx(), 1, "hash-scanned", "scanned.jpg", "no_faces", "test")
.expect("mark scanned");
}
let files: Vec<(PathBuf, String)> = vec![
(tmp.path().join("fresh.jpg"), "fresh.jpg".to_string()),
(tmp.path().join("scanned.jpg"), "scanned.jpg".to_string()),
(tmp.path().join("unhashed.jpg"), "unhashed.jpg".to_string()),
(tmp.path().join("clip.mp4"), "clip.mp4".to_string()),
];
let candidates = build_face_candidates(&ctx(), &library, &files, &exif_dao, &face_dao);
assert_eq!(
candidates.len(),
1,
"exactly fresh.jpg should be a candidate"
);
assert_eq!(candidates[0].rel_path, "fresh.jpg");
assert_eq!(candidates[0].content_hash, "hash-fresh");
}
}
+186
View File
@@ -0,0 +1,186 @@
//! Backfill `image_exif.content_hash` + `size_bytes` for rows that were
//! ingested before hash computation was wired into the watcher.
//!
//! The watcher computes hashes for new files as they're ingested, so this
//! binary is a one-shot tool for the historical backlog. Safe to re-run;
//! only rows with NULL content_hash are processed.
use std::path::Path;
use std::sync::{Arc, Mutex};
use std::time::Instant;
use clap::Parser;
use log::{error, warn};
use rayon::prelude::*;
use image_api::bin_progress;
use image_api::content_hash;
use image_api::database::{ExifDao, SqliteExifDao, connect};
use image_api::libraries::{self, Library};
#[derive(Parser, Debug)]
#[command(name = "backfill_hashes")]
#[command(about = "Compute content_hash for image_exif rows missing one")]
struct Args {
/// Max rows to hash per batch. The process loops until no rows remain.
#[arg(long, default_value_t = 500)]
batch_size: i64,
/// Rayon parallelism override. 0 uses the default thread pool size.
#[arg(long, default_value_t = 0)]
parallelism: usize,
/// Dry-run: log what would be hashed without writing to the DB.
#[arg(long)]
dry_run: bool,
}
fn main() -> anyhow::Result<()> {
env_logger::init();
dotenv::dotenv().ok();
let args = Args::parse();
if args.parallelism > 0 {
rayon::ThreadPoolBuilder::new()
.num_threads(args.parallelism)
.build_global()
.expect("Unable to configure rayon thread pool");
}
// Resolve libraries (patch placeholder if still unset) so we can map
// library_id back to a root_path on disk.
let base_path = dotenv::var("BASE_PATH").ok();
let mut seed_conn = connect();
if let Some(base) = base_path.as_deref() {
libraries::seed_or_patch_from_env(&mut seed_conn, base);
}
let libs = libraries::load_all(&mut seed_conn);
drop(seed_conn);
if libs.is_empty() {
anyhow::bail!("No libraries configured; cannot backfill hashes");
}
let libs_by_id: std::collections::HashMap<i32, Library> =
libs.into_iter().map(|lib| (lib.id, lib)).collect();
println!(
"Configured libraries: {}",
libs_by_id
.values()
.map(|l| format!("{} -> {}", l.name, l.root_path))
.collect::<Vec<_>>()
.join(", ")
);
let dao: Arc<Mutex<Box<dyn ExifDao>>> = Arc::new(Mutex::new(Box::new(SqliteExifDao::new())));
let ctx = opentelemetry::Context::new();
let mut total_hashed = 0u64;
let mut total_missing = 0u64;
let mut total_errors = 0u64;
let start = Instant::now();
let pb = bin_progress::spinner("hashing");
loop {
let rows = {
let mut guard = dao.lock().expect("Unable to lock ExifDao");
guard
.get_rows_missing_hash(&ctx, args.batch_size)
.map_err(|e| anyhow::anyhow!("DB error: {:?}", e))?
};
if rows.is_empty() {
break;
}
let batch_size = rows.len();
pb.set_message(format!(
"batch of {} (hashed={} missing={} errors={})",
batch_size, total_hashed, total_missing, total_errors
));
// Compute hashes in parallel (I/O-bound; rayon helps on local disks,
// throttled by network on SMB mounts — use --parallelism to tune).
let results: Vec<(i32, String, Option<content_hash::FileIdentity>)> = rows
.into_par_iter()
.map(|(library_id, rel_path)| {
let abs = libs_by_id
.get(&library_id)
.map(|lib| Path::new(&lib.root_path).join(&rel_path));
match abs {
Some(abs_path) if abs_path.exists() => match content_hash::compute(&abs_path) {
Ok(id) => (library_id, rel_path, Some(id)),
Err(e) => {
error!("hash error for {}: {:?}", abs_path.display(), e);
(library_id, rel_path, None)
}
},
Some(_) => (library_id, rel_path, None), // file missing on disk
None => {
warn!("Row refers to unknown library_id {}", library_id);
(library_id, rel_path, None)
}
}
})
.collect();
// Persist sequentially — SQLite writes serialize anyway.
if !args.dry_run {
let mut guard = dao.lock().expect("Unable to lock ExifDao");
for (library_id, rel_path, ident) in &results {
match ident {
Some(id) => {
match guard.backfill_content_hash(
&ctx,
*library_id,
rel_path,
&id.content_hash,
id.size_bytes,
) {
Ok(_) => {
total_hashed += 1;
pb.inc(1);
}
Err(e) => {
pb.println(format!("persist error for {}: {:?}", rel_path, e));
total_errors += 1;
}
}
}
None => {
total_missing += 1;
}
}
}
} else {
for (_, rel_path, ident) in &results {
match ident {
Some(id) => {
pb.println(format!(
"[dry-run] {} -> {} ({} bytes)",
rel_path, id.content_hash, id.size_bytes
));
total_hashed += 1;
pb.inc(1);
}
None => {
total_missing += 1;
}
}
}
pb.println(format!(
"[dry-run] processed one batch of {}. Stopping — a real run would continue \
until no NULL content_hash rows remain.",
results.len()
));
break;
}
}
pb.finish_and_clear();
println!(
"Done. hashed={}, skipped (missing on disk)={}, errors={}, elapsed={:.1}s",
total_hashed,
total_missing,
total_errors,
start.elapsed().as_secs_f64()
);
Ok(())
}
+243
View File
@@ -0,0 +1,243 @@
//! Backfill `image_exif.phash_64` + `dhash_64` for image rows that
//! were ingested before perceptual hashing was wired into the watcher.
//!
//! The watcher computes perceptual hashes for new images as they're
//! ingested, so this binary is a one-shot for the historical backlog.
//! Idempotent — only rows with a non-null content_hash and a null
//! phash are processed, so re-runs are safe and pick up where they
//! left off (e.g. after a crash or interrupt).
//!
//! Image-only by design: `get_rows_missing_perceptual_hash` filters by
//! file extension at the DB layer so videos and other non-decodable
//! media are skipped without round-tripping `image_hasher`. Files that
//! can't be opened (missing on disk, permission errors) are quietly
//! left as null and counted as "missing"; on next run, if the file is
//! restored, the row will surface again.
use std::path::Path;
use std::sync::{Arc, Mutex};
use std::time::Instant;
use clap::Parser;
use log::{error, warn};
use rayon::prelude::*;
use image_api::bin_progress;
use image_api::database::{ExifDao, SqliteExifDao, connect};
use image_api::libraries::{self, Library};
use image_api::perceptual_hash;
#[derive(Parser, Debug)]
#[command(name = "backfill_perceptual_hash")]
#[command(about = "Compute pHash + dHash for image_exif rows missing one")]
struct Args {
/// Max rows to hash per batch. The process loops until no rows remain.
#[arg(long, default_value_t = 256)]
batch_size: i64,
/// Rayon parallelism override. 0 uses the default thread pool size.
#[arg(long, default_value_t = 0)]
parallelism: usize,
/// Dry-run: log what would be hashed without writing to the DB.
#[arg(long)]
dry_run: bool,
}
fn main() -> anyhow::Result<()> {
env_logger::init();
dotenv::dotenv().ok();
let args = Args::parse();
if args.parallelism > 0 {
rayon::ThreadPoolBuilder::new()
.num_threads(args.parallelism)
.build_global()
.expect("Unable to configure rayon thread pool");
}
let base_path = dotenv::var("BASE_PATH").ok();
let mut seed_conn = connect();
if let Some(base) = base_path.as_deref() {
libraries::seed_or_patch_from_env(&mut seed_conn, base);
}
let libs = libraries::load_all(&mut seed_conn);
drop(seed_conn);
if libs.is_empty() {
anyhow::bail!("No libraries configured; cannot backfill perceptual hashes");
}
let libs_by_id: std::collections::HashMap<i32, Library> =
libs.into_iter().map(|lib| (lib.id, lib)).collect();
println!(
"Configured libraries: {}",
libs_by_id
.values()
.map(|l| format!("{} -> {}", l.name, l.root_path))
.collect::<Vec<_>>()
.join(", ")
);
let dao: Arc<Mutex<Box<dyn ExifDao>>> = Arc::new(Mutex::new(Box::new(SqliteExifDao::new())));
let ctx = opentelemetry::Context::new();
let mut total_hashed = 0u64;
let mut total_missing = 0u64;
let mut total_decode_failures = 0u64;
let mut total_errors = 0u64;
let start = Instant::now();
let pb = bin_progress::spinner("perceptual-hashing");
loop {
let rows = {
let mut guard = dao.lock().expect("Unable to lock ExifDao");
guard
.get_rows_missing_perceptual_hash(&ctx, args.batch_size)
.map_err(|e| anyhow::anyhow!("DB error: {:?}", e))?
};
if rows.is_empty() {
break;
}
let batch_size = rows.len();
pb.set_message(format!(
"batch of {} (hashed={} decode_fail={} missing={} errors={})",
batch_size, total_hashed, total_decode_failures, total_missing, total_errors
));
// Compute perceptual hashes in parallel — CPU-bound, decoder
// releases the GIL-equivalent. rayon's default thread pool
// matches the host's logical-core count which is the right
// ceiling for image_hasher's DCT pass.
let results: Vec<(i32, String, FilePerceptualResult)> = rows
.into_par_iter()
.map(|(library_id, rel_path)| {
let abs = libs_by_id
.get(&library_id)
.map(|lib| Path::new(&lib.root_path).join(&rel_path));
match abs {
Some(abs_path) if abs_path.exists() => {
match perceptual_hash::compute(&abs_path) {
Some(id) => (library_id, rel_path, FilePerceptualResult::Ok(id)),
None => (library_id, rel_path, FilePerceptualResult::DecodeFailed),
}
}
Some(_) => (library_id, rel_path, FilePerceptualResult::MissingOnDisk),
None => {
warn!("Row refers to unknown library_id {}", library_id);
(library_id, rel_path, FilePerceptualResult::MissingOnDisk)
}
}
})
.collect();
// Persist sequentially — SQLite writes serialize anyway.
if !args.dry_run {
let mut guard = dao.lock().expect("Unable to lock ExifDao");
for (library_id, rel_path, result) in &results {
match result {
FilePerceptualResult::Ok(id) => {
match guard.backfill_perceptual_hash(
&ctx,
*library_id,
rel_path,
Some(id.phash_64),
Some(id.dhash_64),
) {
Ok(_) => {
total_hashed += 1;
pb.inc(1);
}
Err(e) => {
pb.println(format!("persist error for {}: {:?}", rel_path, e));
total_errors += 1;
}
}
}
FilePerceptualResult::DecodeFailed => {
// Persist phash_64=0/dhash_64=0 as a "tried,
// unhashable" sentinel so this row leaves the
// `phash_64 IS NULL` candidate set and the
// backfill doesn't infinite-loop on a queue of
// unbreakable formats (HEIC, RAW, CMYK JPEGs,
// truncated bytes). The all-zero hash is
// explicitly excluded from clustering by
// is_informative_hash in duplicates.rs, so it
// won't pollute group output — it just becomes
// invisible to the duplicate finder.
log::debug!(
"perceptual decode failed for {} (lib {}); marking unhashable",
rel_path,
library_id
);
match guard.backfill_perceptual_hash(
&ctx,
*library_id,
rel_path,
Some(0),
Some(0),
) {
Ok(_) => {
total_decode_failures += 1;
}
Err(e) => {
pb.println(format!(
"persist error (decode-fail sentinel) for {}: {:?}",
rel_path, e
));
total_errors += 1;
}
}
}
FilePerceptualResult::MissingOnDisk => {
total_missing += 1;
}
}
}
} else {
for (_, rel_path, result) in &results {
match result {
FilePerceptualResult::Ok(id) => {
pb.println(format!(
"[dry-run] {} -> phash={:016x} dhash={:016x}",
rel_path, id.phash_64, id.dhash_64
));
total_hashed += 1;
pb.inc(1);
}
FilePerceptualResult::DecodeFailed => {
total_decode_failures += 1;
}
FilePerceptualResult::MissingOnDisk => {
total_missing += 1;
}
}
}
pb.println(format!(
"[dry-run] processed one batch of {}. Stopping — a real run would continue \
until no NULL phash_64 image rows remain.",
results.len()
));
break;
}
}
pb.finish_and_clear();
println!(
"Done. hashed={}, decode_failed={}, skipped (missing on disk)={}, errors={}, elapsed={:.1}s",
total_hashed,
total_decode_failures,
total_missing,
total_errors,
start.elapsed().as_secs_f64()
);
if total_errors > 0 {
error!("Backfill completed with {} persist errors", total_errors);
}
Ok(())
}
enum FilePerceptualResult {
Ok(perceptual_hash::PerceptualIdentity),
DecodeFailed,
MissingOnDisk,
}
+36 -43
View File
@@ -1,11 +1,11 @@
use anyhow::{Context, Result};
use chrono::Utc;
use clap::Parser;
use image_api::ai::ollama::OllamaClient;
use image_api::ai::LocalLlm;
use image_api::bin_progress;
use image_api::database::calendar_dao::{InsertCalendarEvent, SqliteCalendarEventDao};
use image_api::parsers::ical_parser::parse_ics_file;
use log::{error, info};
use std::sync::{Arc, Mutex};
// Import the trait to use its methods
use image_api::database::CalendarEventDao;
@@ -44,29 +44,19 @@ async fn main() -> Result<()> {
let context = opentelemetry::Context::current();
let ollama = if args.generate_embeddings {
let primary_url = dotenv::var("OLLAMA_PRIMARY_URL")
.or_else(|_| dotenv::var("OLLAMA_URL"))
.unwrap_or_else(|_| "http://localhost:11434".to_string());
let fallback_url = dotenv::var("OLLAMA_FALLBACK_URL").ok();
let primary_model = dotenv::var("OLLAMA_PRIMARY_MODEL")
.or_else(|_| dotenv::var("OLLAMA_MODEL"))
.unwrap_or_else(|_| "nomic-embed-text:v1.5".to_string());
let fallback_model = dotenv::var("OLLAMA_FALLBACK_MODEL").ok();
Some(OllamaClient::new(
primary_url,
fallback_url,
primary_model,
fallback_model,
))
// LocalLlm dispatches per LLM_BACKEND, so embeddings written here land
// in the same vector space the query side searches.
let llm = if args.generate_embeddings {
Some(LocalLlm::from_env())
} else {
None
};
let inserted_count = Arc::new(Mutex::new(0));
let skipped_count = Arc::new(Mutex::new(0));
let error_count = Arc::new(Mutex::new(0));
let mut inserted_count = 0usize;
let mut skipped_count = 0usize;
let mut error_count = 0usize;
let pb = bin_progress::determinate(events.len() as u64, "importing");
// Process events in batches
// Can't use rayon with async, so process sequentially
@@ -82,12 +72,13 @@ async fn main() -> Result<()> {
)
&& exists
{
*skipped_count.lock().unwrap() += 1;
skipped_count += 1;
pb.inc(1);
continue;
}
// Generate embedding if requested (blocking call)
let embedding = if let Some(ref ollama_client) = ollama {
let embedding = if let Some(ref llm) = llm {
let text = format!(
"{} {} {}",
event.summary,
@@ -97,14 +88,11 @@ async fn main() -> Result<()> {
match tokio::task::block_in_place(|| {
tokio::runtime::Handle::current()
.block_on(async { ollama_client.generate_embedding(&text).await })
.block_on(async { llm.embed_document(&text).await })
}) {
Ok(emb) => Some(emb),
Err(e) => {
error!(
"Failed to generate embedding for event '{}': {}",
event.summary, e
);
pb.println(format!("embedding failed for '{}': {}", event.summary, e));
None
}
}
@@ -133,28 +121,26 @@ async fn main() -> Result<()> {
};
match dao_instance.store_event(&context, insert_event) {
Ok(_) => {
*inserted_count.lock().unwrap() += 1;
if *inserted_count.lock().unwrap() % 100 == 0 {
info!("Imported {} events...", *inserted_count.lock().unwrap());
}
}
Ok(_) => inserted_count += 1,
Err(e) => {
error!("Failed to store event '{}': {:?}", event.summary, e);
*error_count.lock().unwrap() += 1;
pb.println(format!("store failed for '{}': {:?}", event.summary, e));
error_count += 1;
}
}
pb.set_message(format!(
"inserted={} skipped={} errors={}",
inserted_count, skipped_count, error_count
));
pb.inc(1);
}
let final_inserted = *inserted_count.lock().unwrap();
let final_skipped = *skipped_count.lock().unwrap();
let final_errors = *error_count.lock().unwrap();
pb.finish_and_clear();
info!("\n=== Import Summary ===");
info!("=== Import Summary ===");
info!("Total events found: {}", events.len());
info!("Successfully inserted: {}", final_inserted);
info!("Skipped (already exist): {}", final_skipped);
info!("Errors: {}", final_errors);
info!("Successfully inserted: {}", inserted_count);
info!("Skipped (already exist): {}", skipped_count);
info!("Errors: {}", error_count);
if args.generate_embeddings {
info!("Embeddings were generated for semantic search");
@@ -162,5 +148,12 @@ async fn main() -> Result<()> {
info!("No embeddings generated (use --generate-embeddings to enable semantic search)");
}
if error_count > 0 {
error!(
"Completed with {} errors — review log output above",
error_count
);
}
Ok(())
}
+28 -20
View File
@@ -1,6 +1,7 @@
use anyhow::{Context, Result};
use chrono::Utc;
use clap::Parser;
use image_api::bin_progress;
use image_api::database::location_dao::{InsertLocationRecord, SqliteLocationHistoryDao};
use image_api::parsers::location_json_parser::parse_location_json;
use log::{error, info};
@@ -38,23 +39,20 @@ async fn main() -> Result<()> {
let context = opentelemetry::Context::current();
let mut inserted_count = 0;
let mut skipped_count = 0;
let mut error_count = 0;
let mut inserted_count = 0usize;
let mut skipped_count = 0usize;
let mut error_count = 0usize;
let mut dao_instance = SqliteLocationHistoryDao::new();
let created_at = Utc::now().timestamp();
// Process in batches using batch insert for massive speedup
for (batch_idx, chunk) in locations.chunks(args.batch_size).enumerate() {
info!(
"Processing batch {} ({} records)...",
batch_idx + 1,
chunk.len()
);
let pb = bin_progress::determinate(locations.len() as u64, "importing");
// Process in batches using batch insert for massive speedup
for chunk in locations.chunks(args.batch_size) {
// Convert to InsertLocationRecord
let mut batch_inserts = Vec::with_capacity(chunk.len());
let mut chunk_skipped = 0usize;
for location in chunk {
// Skip existing check if requested (makes import much slower)
@@ -68,6 +66,7 @@ async fn main() -> Result<()> {
&& exists
{
skipped_count += 1;
chunk_skipped += 1;
continue;
}
@@ -89,26 +88,35 @@ async fn main() -> Result<()> {
// Batch insert entire chunk in single transaction
if !batch_inserts.is_empty() {
match dao_instance.store_locations_batch(&context, batch_inserts) {
Ok(count) => {
inserted_count += count;
info!(
"Imported {} locations (total: {})...",
count, inserted_count
);
}
Ok(count) => inserted_count += count,
Err(e) => {
error!("Failed to store batch: {:?}", e);
error_count += chunk.len();
pb.println(format!("batch insert failed: {:?}", e));
error_count += chunk.len() - chunk_skipped;
}
}
}
pb.set_message(format!(
"inserted={} skipped={} errors={}",
inserted_count, skipped_count, error_count
));
pb.inc(chunk.len() as u64);
}
info!("\n=== Import Summary ===");
pb.finish_and_clear();
info!("=== Import Summary ===");
info!("Total locations found: {}", locations.len());
info!("Successfully inserted: {}", inserted_count);
info!("Skipped (already exist): {}", skipped_count);
info!("Errors: {}", error_count);
if error_count > 0 {
error!(
"Completed with {} errors — review log output above",
error_count
);
}
Ok(())
}
+36 -37
View File
@@ -1,10 +1,11 @@
use anyhow::{Context, Result};
use chrono::Utc;
use clap::Parser;
use image_api::ai::ollama::OllamaClient;
use image_api::ai::LocalLlm;
use image_api::bin_progress;
use image_api::database::search_dao::{InsertSearchRecord, SqliteSearchHistoryDao};
use image_api::parsers::search_html_parser::parse_search_html;
use log::{error, info, warn};
use log::{error, info};
// Import the trait to use its methods
use image_api::database::SearchHistoryDao;
@@ -37,46 +38,36 @@ async fn main() -> Result<()> {
info!("Found {} search records", searches.len());
let primary_url = dotenv::var("OLLAMA_PRIMARY_URL")
.or_else(|_| dotenv::var("OLLAMA_URL"))
.unwrap_or_else(|_| "http://localhost:11434".to_string());
let fallback_url = dotenv::var("OLLAMA_FALLBACK_URL").ok();
let primary_model = dotenv::var("OLLAMA_PRIMARY_MODEL")
.or_else(|_| dotenv::var("OLLAMA_MODEL"))
.unwrap_or_else(|_| "nomic-embed-text:v1.5".to_string());
let fallback_model = dotenv::var("OLLAMA_FALLBACK_MODEL").ok();
let ollama = OllamaClient::new(primary_url, fallback_url, primary_model, fallback_model);
// LocalLlm dispatches per LLM_BACKEND, so embeddings written here land
// in the same vector space the query side searches.
let llm = LocalLlm::from_env();
let context = opentelemetry::Context::current();
let mut inserted_count = 0;
let mut skipped_count = 0;
let mut error_count = 0;
let mut inserted_count = 0usize;
let mut skipped_count = 0usize;
let mut error_count = 0usize;
let mut dao_instance = SqliteSearchHistoryDao::new();
let created_at = Utc::now().timestamp();
let pb = bin_progress::determinate(searches.len() as u64, "importing");
let total_batches = searches.len().div_ceil(args.batch_size);
// Process searches in batches (embeddings are REQUIRED for searches)
for (batch_idx, chunk) in searches.chunks(args.batch_size).enumerate() {
info!(
"Processing batch {} ({} searches)...",
batch_idx + 1,
chunk.len()
);
// Generate embeddings for this batch
let queries: Vec<String> = chunk.iter().map(|s| s.query.clone()).collect();
let pb_for_warn = pb.clone();
let embeddings_result = tokio::task::spawn({
let ollama_client = ollama.clone();
let llm = llm.clone();
async move {
// Generate embeddings in parallel for the batch
let mut embeddings = Vec::new();
for query in &queries {
match ollama_client.generate_embedding(query).await {
match llm.embed_document(query).await {
Ok(emb) => embeddings.push(Some(emb)),
Err(e) => {
warn!("Failed to generate embedding for query '{}': {}", query, e);
pb_for_warn.println(format!("embedding failed for '{}': {}", query, e));
embeddings.push(None);
}
}
@@ -112,10 +103,7 @@ async fn main() -> Result<()> {
source_file: Some(args.path.clone()),
});
} else {
error!(
"Skipping search '{}' due to missing embedding",
search.query
);
pb.println(format!("skipping '{}' — missing embedding", search.query));
error_count += 1;
}
}
@@ -123,30 +111,41 @@ async fn main() -> Result<()> {
// Batch insert entire chunk in single transaction
if !batch_inserts.is_empty() {
match dao_instance.store_searches_batch(&context, batch_inserts) {
Ok(count) => {
inserted_count += count;
info!("Imported {} searches (total: {})...", count, inserted_count);
}
Ok(count) => inserted_count += count,
Err(e) => {
error!("Failed to store batch: {:?}", e);
pb.println(format!("batch insert failed: {:?}", e));
error_count += chunk.len();
}
}
}
pb.set_message(format!(
"inserted={} skipped={} errors={}",
inserted_count, skipped_count, error_count
));
pb.inc(chunk.len() as u64);
// Rate limiting between batches
if batch_idx < searches.len() / args.batch_size {
info!("Waiting 500ms before next batch...");
if batch_idx + 1 < total_batches {
tokio::time::sleep(tokio::time::Duration::from_millis(500)).await;
}
}
info!("\n=== Import Summary ===");
pb.finish_and_clear();
info!("=== Import Summary ===");
info!("Total searches found: {}", searches.len());
info!("Successfully inserted: {}", inserted_count);
info!("Skipped (already exist): {}", skipped_count);
info!("Errors: {}", error_count);
info!("All imported searches have embeddings for semantic search");
if error_count > 0 {
error!(
"Completed with {} errors — review log output above",
error_count
);
}
Ok(())
}
-195
View File
@@ -1,195 +0,0 @@
use std::path::PathBuf;
use std::sync::{Arc, Mutex};
use chrono::Utc;
use clap::Parser;
use rayon::prelude::*;
use walkdir::WalkDir;
use image_api::database::models::InsertImageExif;
use image_api::database::{ExifDao, SqliteExifDao};
use image_api::exif;
#[derive(Parser, Debug)]
#[command(name = "migrate_exif")]
#[command(about = "Extract and store EXIF data from images", long_about = None)]
struct Args {
#[arg(long, help = "Skip files that already have EXIF data in database")]
skip_existing: bool,
}
fn main() -> anyhow::Result<()> {
env_logger::init();
dotenv::dotenv()?;
let args = Args::parse();
let base_path = dotenv::var("BASE_PATH")?;
let base = PathBuf::from(&base_path);
println!("EXIF Migration Tool");
println!("===================");
println!("Base path: {}", base.display());
if args.skip_existing {
println!("Mode: Skip existing (incremental)");
} else {
println!("Mode: Upsert (insert new, update existing)");
}
println!();
// Collect all image files that support EXIF
println!("Scanning for images...");
let image_files: Vec<PathBuf> = WalkDir::new(&base)
.into_iter()
.filter_map(|e| e.ok())
.filter(|e| e.file_type().is_file())
.filter(|e| exif::supports_exif(e.path()))
.map(|e| e.path().to_path_buf())
.collect();
println!("Found {} images to process", image_files.len());
if image_files.is_empty() {
println!("No EXIF-supporting images found. Exiting.");
return Ok(());
}
println!();
println!("Extracting EXIF data...");
// Create a thread-safe DAO
let dao = Arc::new(Mutex::new(SqliteExifDao::new()));
// Process in parallel using rayon
let results: Vec<_> = image_files
.par_iter()
.map(|path| {
// Create context for this processing iteration
let context = opentelemetry::Context::new();
let relative_path = match path.strip_prefix(&base) {
Ok(p) => p.to_str().unwrap().to_string(),
Err(_) => {
eprintln!(
"Error: Could not create relative path for {}",
path.display()
);
return Err(anyhow::anyhow!("Path error"));
}
};
// Check if EXIF data already exists
let existing = if let Ok(mut dao_lock) = dao.lock() {
dao_lock.get_exif(&context, &relative_path).ok().flatten()
} else {
eprintln!("{} - Failed to acquire database lock", relative_path);
return Err(anyhow::anyhow!("Lock error"));
};
// Skip if exists and skip_existing flag is set
if args.skip_existing && existing.is_some() {
return Ok(("skip".to_string(), relative_path));
}
match exif::extract_exif_from_path(path) {
Ok(exif_data) => {
let timestamp = Utc::now().timestamp();
let insert_exif = InsertImageExif {
file_path: relative_path.clone(),
camera_make: exif_data.camera_make,
camera_model: exif_data.camera_model,
lens_model: exif_data.lens_model,
width: exif_data.width,
height: exif_data.height,
orientation: exif_data.orientation,
gps_latitude: exif_data.gps_latitude.map(|v| v as f32),
gps_longitude: exif_data.gps_longitude.map(|v| v as f32),
gps_altitude: exif_data.gps_altitude.map(|v| v as f32),
focal_length: exif_data.focal_length.map(|v| v as f32),
aperture: exif_data.aperture.map(|v| v as f32),
shutter_speed: exif_data.shutter_speed,
iso: exif_data.iso,
date_taken: exif_data.date_taken,
created_time: existing
.as_ref()
.map(|e| e.created_time)
.unwrap_or(timestamp),
last_modified: timestamp,
};
// Store or update in database
if let Ok(mut dao_lock) = dao.lock() {
let result = if existing.is_some() {
// Update existing record
dao_lock
.update_exif(&context, insert_exif)
.map(|_| "update")
} else {
// Insert new record
dao_lock.store_exif(&context, insert_exif).map(|_| "insert")
};
match result {
Ok(action) => {
if action == "update" {
println!("{} (updated)", relative_path);
} else {
println!("{} (inserted)", relative_path);
}
Ok((action.to_string(), relative_path))
}
Err(e) => {
eprintln!("{} - Database error: {:?}", relative_path, e);
Err(anyhow::anyhow!("Database error"))
}
}
} else {
eprintln!("{} - Failed to acquire database lock", relative_path);
Err(anyhow::anyhow!("Lock error"))
}
}
Err(e) => {
eprintln!("{} - No EXIF data: {:?}", relative_path, e);
Err(e)
}
}
})
.collect();
// Count results
let mut success_count = 0;
let mut inserted_count = 0;
let mut updated_count = 0;
let mut skipped_count = 0;
for (action, _) in results.iter().flatten() {
success_count += 1;
match action.as_str() {
"insert" => inserted_count += 1,
"update" => updated_count += 1,
"skip" => skipped_count += 1,
_ => {}
}
}
let error_count = results.len() - success_count - skipped_count;
println!();
println!("===================");
println!("Migration complete!");
println!("Total images processed: {}", image_files.len());
if inserted_count > 0 {
println!(" New EXIF records inserted: {}", inserted_count);
}
if updated_count > 0 {
println!(" Existing records updated: {}", updated_count);
}
if skipped_count > 0 {
println!(" Skipped (already exists): {}", skipped_count);
}
if error_count > 0 {
println!(" Errors (no EXIF data or failures): {}", error_count);
}
Ok(())
}
+138 -38
View File
@@ -1,16 +1,22 @@
use std::path::PathBuf;
use std::path::{Path, PathBuf};
use std::sync::{Arc, Mutex};
use clap::Parser;
use log::warn;
use walkdir::WalkDir;
use image_api::ai::apollo_client::ApolloClient;
use image_api::ai::{InsightGenerator, OllamaClient, SmsApiClient};
use image_api::bin_progress;
use image_api::database::{
CalendarEventDao, DailySummaryDao, ExifDao, InsightDao, KnowledgeDao, LocationHistoryDao,
SearchHistoryDao, SqliteCalendarEventDao, SqliteDailySummaryDao, SqliteExifDao,
SqliteInsightDao, SqliteKnowledgeDao, SqliteLocationHistoryDao, SqliteSearchHistoryDao,
connect,
};
use image_api::faces::{FaceDao, SqliteFaceDao};
use image_api::file_types::{IMAGE_EXTENSIONS, VIDEO_EXTENSIONS};
use image_api::libraries::{self, Library};
use image_api::tags::{SqliteTagDao, TagDao};
#[derive(Parser, Debug)]
@@ -19,7 +25,13 @@ use image_api::tags::{SqliteTagDao, TagDao};
about = "Batch populate the knowledge base by running the agentic insight loop over a folder"
)]
struct Args {
/// Directory to scan. Defaults to BASE_PATH from .env
/// Restrict to a single library by numeric id or name. Defaults to all
/// configured libraries.
#[arg(long)]
library: Option<String>,
/// Optional subdirectory to scan instead of full library roots. Must be
/// an absolute path under one of the selected libraries.
#[arg(long)]
path: Option<String>,
@@ -67,10 +79,57 @@ async fn main() -> anyhow::Result<()> {
let args = Args::parse();
let base_path = dotenv::var("BASE_PATH")?;
let scan_path = args.path.as_deref().unwrap_or(&base_path).to_string();
// Load libraries from the DB. Patch the placeholder row from BASE_PATH
// first when present so a fresh install still gets a valid root.
let env_base_path = dotenv::var("BASE_PATH").ok();
let mut seed_conn = connect();
if let Some(base) = env_base_path.as_deref() {
libraries::seed_or_patch_from_env(&mut seed_conn, base);
}
let all_libs = libraries::load_all(&mut seed_conn);
drop(seed_conn);
if all_libs.is_empty() {
anyhow::bail!("No libraries configured");
}
// Ollama config from env with CLI overrides
// Resolve --library to a concrete subset.
let selected_libs: Vec<Library> = match args.library.as_deref() {
None => all_libs.clone(),
Some(raw) => {
let raw = raw.trim();
let matched = if let Ok(id) = raw.parse::<i32>() {
all_libs.iter().find(|l| l.id == id).cloned()
} else {
all_libs.iter().find(|l| l.name == raw).cloned()
};
match matched {
Some(lib) => vec![lib],
None => anyhow::bail!("Unknown library: {}", raw),
}
}
};
// Resolve --path to (target_library, walk_root). When provided, the path
// must live under exactly one of the selected libraries.
let scan_targets: Vec<(Library, PathBuf)> = match args.path.as_deref() {
None => selected_libs
.iter()
.map(|lib| (lib.clone(), PathBuf::from(&lib.root_path)))
.collect(),
Some(raw) => {
let abs = PathBuf::from(raw);
let matched = selected_libs
.iter()
.find(|lib| abs.starts_with(&lib.root_path))
.cloned();
match matched {
Some(lib) => vec![(lib, abs)],
None => anyhow::bail!("--path {} is not under any selected library root", raw),
}
}
};
// Ollama config from env with CLI overrides.
let primary_url = std::env::var("OLLAMA_PRIMARY_URL")
.or_else(|_| std::env::var("OLLAMA_URL"))
.unwrap_or_else(|_| "http://localhost:11434".to_string());
@@ -106,8 +165,8 @@ async fn main() -> anyhow::Result<()> {
std::env::var("SMS_API_URL").unwrap_or_else(|_| "http://localhost:8000".to_string());
let sms_api_token = std::env::var("SMS_API_TOKEN").ok();
let sms_client = SmsApiClient::new(sms_api_url, sms_api_token);
let apollo_client = ApolloClient::new(std::env::var("APOLLO_API_BASE_URL").ok());
// Wire up all DAOs
let insight_dao: Arc<Mutex<Box<dyn InsightDao>>> =
Arc::new(Mutex::new(Box::new(SqliteInsightDao::new())));
let exif_dao: Arc<Mutex<Box<dyn ExifDao>>> =
@@ -124,10 +183,21 @@ async fn main() -> anyhow::Result<()> {
Arc::new(Mutex::new(Box::new(SqliteTagDao::default())));
let knowledge_dao: Arc<Mutex<Box<dyn KnowledgeDao>>> =
Arc::new(Mutex::new(Box::new(SqliteKnowledgeDao::new())));
let face_dao: Arc<Mutex<Box<dyn FaceDao>>> =
Arc::new(Mutex::new(Box::new(SqliteFaceDao::new())));
let persona_dao: Arc<Mutex<Box<dyn image_api::database::PersonaDao>>> = Arc::new(Mutex::new(
Box::new(image_api::database::SqlitePersonaDao::new()),
));
// Pass the full library set so `resolve_full_path` probes every root,
// even when --library restricts the walk. A rel_path shared across
// libraries will resolve against the first existing match.
let generator = InsightGenerator::new(
ollama,
None,
None,
sms_client,
apollo_client,
insight_dao.clone(),
exif_dao,
daily_summary_dao,
@@ -135,13 +205,18 @@ async fn main() -> anyhow::Result<()> {
location_dao,
search_dao,
tag_dao,
face_dao,
knowledge_dao,
base_path.clone(),
persona_dao,
all_libs.clone(),
);
println!("Knowledge Base Population");
println!("=========================");
println!("Scan path: {}", scan_path);
for (lib, root) in &scan_targets {
println!("Library: {} (id={})", lib.name, lib.id);
println!("Scan root: {}", root.display());
}
println!("Model: {}", primary_model);
println!("Max iterations: {}", args.max_iterations);
println!("Timeout: {}s", args.timeout_secs);
@@ -170,30 +245,56 @@ async fn main() -> anyhow::Result<()> {
);
println!();
// Collect all image and video files
let all_extensions: Vec<&str> = IMAGE_EXTENSIONS
.iter()
.chain(VIDEO_EXTENSIONS.iter())
.copied()
.collect();
println!("Scanning {}...", scan_path);
let files: Vec<PathBuf> = WalkDir::new(&scan_path)
.into_iter()
.filter_map(|e| e.ok())
.filter(|e| e.file_type().is_file())
.filter(|e| {
e.path()
// Collect (library, abs_path, rel_path) for every media file across all
// scan targets so the progress counter spans the full job.
let mut files: Vec<(Library, PathBuf, String)> = Vec::new();
for (lib, walk_root) in &scan_targets {
let lib_root = Path::new(&lib.root_path);
let scan_pb = bin_progress::spinner(format!("scanning {}", walk_root.display()));
let count_before = files.len();
for entry in WalkDir::new(walk_root).into_iter().filter_map(|e| e.ok()) {
if !entry.file_type().is_file() {
continue;
}
let abs_path = entry.path().to_path_buf();
let ext_ok = abs_path
.extension()
.and_then(|ext| ext.to_str())
.map(|ext| all_extensions.contains(&ext.to_lowercase().as_str()))
.unwrap_or(false)
})
.map(|e| e.path().to_path_buf())
.collect();
.unwrap_or(false);
if !ext_ok {
continue;
}
let rel = match abs_path.strip_prefix(lib_root) {
Ok(p) => p.to_string_lossy().replace('\\', "/"),
Err(_) => {
warn!(
"{} is not under library root {}; skipping",
abs_path.display(),
lib_root.display()
);
continue;
}
};
files.push((lib.clone(), abs_path, rel));
scan_pb.inc(1);
}
let added = files.len() - count_before;
scan_pb.finish_with_message(format!(
"scanned {} ({} media files)",
walk_root.display(),
added
));
}
let total = files.len();
println!("Found {} files\n", total);
println!("\nTotal files to consider: {}\n", total);
if total == 0 {
println!("Nothing to process.");
@@ -205,35 +306,29 @@ async fn main() -> anyhow::Result<()> {
let mut skipped = 0usize;
let mut errors = 0usize;
for (i, path) in files.iter().enumerate() {
let relative = match path.strip_prefix(&base_path) {
Ok(p) => p.to_string_lossy().replace('\\', "/"),
Err(_) => path.to_string_lossy().replace('\\', "/"),
};
let pb = bin_progress::determinate(total as u64, "");
let prefix = format!("[{}/{}]", i + 1, total);
for (lib, _abs_path, relative) in files.iter() {
pb.set_message(format!("{}: {}", lib.name, relative));
// Check for existing insight unless --reprocess
if !args.reprocess {
let has_insight = insight_dao
.lock()
.unwrap()
.get_insight(&cx, &relative)
.get_insight(&cx, relative)
.unwrap_or(None)
.is_some();
if has_insight {
println!("{} skip {}", prefix, relative);
skipped += 1;
pb.inc(1);
continue;
}
}
println!("{} start {}", prefix, relative);
match generator
.generate_agentic_insight_for_photo(
&relative,
relative,
args.model.clone(),
None,
args.num_ctx,
@@ -242,20 +337,25 @@ async fn main() -> anyhow::Result<()> {
args.top_k,
args.min_p,
args.max_iterations,
None,
Vec::new(),
Vec::new(),
1, // operator user_id — populate_knowledge is single-user offline tool
"default".to_string(),
)
.await
{
Ok(_) => {
println!("{} done {}", prefix, relative);
processed += 1;
}
Ok(_) => processed += 1,
Err(e) => {
eprintln!("{} error {}{:?}", prefix, relative, e);
pb.println(format!("error {}: {}{:?}", lib.name, relative, e));
errors += 1;
}
}
pb.inc(1);
}
pb.finish_and_clear();
println!();
println!("=========================");
println!("Complete");
+273
View File
@@ -0,0 +1,273 @@
//! Probe binary for CLIP semantic search.
//!
//! No DB writes. Walks a library's `image_exif` rows, encodes a sample
//! via Apollo's `/encode_image`, encodes the user's --query via
//! `/encode_text`, and prints the top-K most similar photos by cosine
//! similarity so the operator can eyeball quality before committing to
//! the persistence phase (column populated by backlog drain, search
//! endpoint, UI).
//!
//! Usage:
//! cargo run --release --bin probe_clip_search -- \
//! --library 1 --limit 200 --query "a beach at sunset" --top 10
//!
//! Env: standard ImageApi `.env`. Requires either
//! `APOLLO_CLIP_API_BASE_URL` or `APOLLO_API_BASE_URL` to be set.
use std::path::{Path, PathBuf};
use std::sync::{Arc, Mutex};
use std::time::Instant;
use clap::Parser;
use log::{info, warn};
use image_api::ai::clip_client::{ClipClient, ClipError, EncodeImageMeta};
use image_api::database::{ExifDao, SqliteExifDao, connect};
use image_api::exif;
use image_api::file_types;
use image_api::libraries::{self, Library};
#[derive(Parser, Debug)]
#[command(name = "probe_clip_search")]
#[command(about = "Top-K CLIP semantic search over a sample of image_exif rows")]
struct Args {
/// Library id to sample from.
#[arg(long)]
library: i32,
/// Max files to encode. CPU inference is slow (~1-3 s per photo at
/// ViT-L/14); start small and grow once GPU is sorted.
#[arg(long, default_value_t = 50)]
limit: usize,
/// Natural-language query. Empty triggers an error from Apollo.
#[arg(long)]
query: String,
/// How many top results to print.
#[arg(long, default_value_t = 10)]
top: usize,
/// Offset into the library's rel_path listing.
#[arg(long, default_value_t = 0)]
offset: i64,
/// How many DB rows to scan before giving up on hitting the limit.
#[arg(long, default_value_t = 5000)]
max_scan: i64,
}
/// Same as `face_watch::read_image_bytes_for_detect` (which is pub(crate)).
/// Inlined for the throwaway probe.
fn read_image_bytes(path: &Path) -> std::io::Result<Vec<u8>> {
if file_types::needs_ffmpeg_thumbnail(path)
&& let Some(preview) = exif::extract_embedded_jpeg_preview(path)
{
return Ok(preview);
}
std::fs::read(path)
}
/// Decode a base64'd LE float32 vector to a `Vec<f32>`.
fn decode_f32_vec(b64: &str) -> anyhow::Result<Vec<f32>> {
use base64::Engine;
let bytes = base64::engine::general_purpose::STANDARD.decode(b64.as_bytes())?;
if bytes.len() % 4 != 0 {
anyhow::bail!("embedding byte length {} not divisible by 4", bytes.len());
}
let mut out = Vec::with_capacity(bytes.len() / 4);
for chunk in bytes.chunks_exact(4) {
out.push(f32::from_le_bytes([chunk[0], chunk[1], chunk[2], chunk[3]]));
}
Ok(out)
}
/// Plain dot product. Apollo L2-normalizes both sides, so this is cosine sim.
fn dot(a: &[f32], b: &[f32]) -> f32 {
a.iter().zip(b.iter()).map(|(x, y)| x * y).sum()
}
#[tokio::main]
async fn main() -> anyhow::Result<()> {
env_logger::init();
dotenv::dotenv().ok();
let args = Args::parse();
if args.query.trim().is_empty() {
anyhow::bail!("--query must not be empty");
}
let client = ClipClient::from_env();
if !client.is_enabled() {
anyhow::bail!(
"ClipClient disabled: set APOLLO_CLIP_API_BASE_URL or APOLLO_API_BASE_URL in .env"
);
}
match client.health().await {
Ok(h) => info!(
"clip engine: loaded={} device={} model={} dim={}",
h.loaded, h.device, h.model_version, h.embedding_dim
),
Err(e) => warn!("health probe failed (continuing): {e}"),
}
let mut seed_conn = connect();
if let Some(base) = dotenv::var("BASE_PATH").ok().as_deref() {
libraries::seed_or_patch_from_env(&mut seed_conn, base);
}
let libs = libraries::load_all(&mut seed_conn);
drop(seed_conn);
let lib: Library = libs
.into_iter()
.find(|l| l.id == args.library)
.ok_or_else(|| anyhow::anyhow!("library id {} not found", args.library))?;
info!(
"probing library #{} ({}) at {}",
lib.id, lib.name, lib.root_path
);
let dao: Arc<Mutex<Box<dyn ExifDao>>> = Arc::new(Mutex::new(Box::new(SqliteExifDao::new())));
let ctx = opentelemetry::Context::new();
// Encode the query up-front so the long image-encode loop doesn't
// race a slow query encode. Fails fast on a misspelled query.
let query_resp = client
.encode_text(&args.query)
.await
.map_err(|e| anyhow::anyhow!("encode_text: {e}"))?;
let query_vec = decode_f32_vec(&query_resp.embedding)?;
info!(
"query encoded ({}d, {}ms): {:?}",
query_resp.embedding_dim, query_resp.duration_ms, args.query
);
// Page through (id, rel_path), filter to images on disk, encode up
// to `limit`. Each encoded photo gets scored against the query and
// kept in a top-K heap.
const PAGE: i64 = 500;
let mut offset = args.offset;
let mut scanned: i64 = 0;
let mut encoded = 0usize;
let mut perm_fail = 0usize;
let mut transient_fail = 0usize;
let root = PathBuf::from(&lib.root_path);
let started = Instant::now();
// (similarity, rel_path) — we keep all scored results and sort at
// the end. With limit≤few-hundred this is trivial.
let mut scores: Vec<(f32, String)> = Vec::with_capacity(args.limit);
'outer: loop {
if scanned >= args.max_scan {
warn!(
"scan cap ({}) reached before hitting limit ({}); bump --max-scan to scan deeper",
args.max_scan, args.limit
);
break;
}
let rows = {
let mut guard = dao.lock().expect("dao lock");
guard
.list_rel_paths_for_library_page(&ctx, lib.id, PAGE, offset)
.map_err(|e| anyhow::anyhow!("list rel_paths: {:?}", e))?
};
if rows.is_empty() {
info!("no more rows after offset {}", offset);
break;
}
offset += rows.len() as i64;
scanned += rows.len() as i64;
for (_id, rel_path) in rows {
if encoded >= args.limit {
break 'outer;
}
let abs = root.join(&rel_path);
if !file_types::is_image_file(&abs) || !abs.exists() {
continue;
}
let bytes = match read_image_bytes(&abs) {
Ok(b) => b,
Err(e) => {
warn!("read {rel_path}: {e}");
continue;
}
};
let meta = EncodeImageMeta {
content_hash: String::new(),
library_id: lib.id,
rel_path: rel_path.clone(),
};
let call_start = Instant::now();
match client.encode_image(bytes, meta).await {
Ok(resp) => {
encoded += 1;
let vec = match decode_f32_vec(&resp.embedding) {
Ok(v) => v,
Err(e) => {
warn!("decode {rel_path}: {e}");
continue;
}
};
if vec.len() != query_vec.len() {
warn!(
"dim mismatch for {rel_path}: image={} query={}",
vec.len(),
query_vec.len()
);
continue;
}
let sim = dot(&vec, &query_vec);
scores.push((sim, rel_path.clone()));
if encoded.is_multiple_of(10) {
info!(
"progress: {} encoded, {:.1}s elapsed",
encoded,
started.elapsed().as_secs_f32()
);
}
let _ = call_start;
}
Err(ClipError::Permanent(e)) => {
perm_fail += 1;
warn!("permanent encode failure for {rel_path}: {e}");
}
Err(ClipError::Transient(e)) => {
transient_fail += 1;
warn!("transient encode failure for {rel_path}: {e}");
}
Err(ClipError::Disabled) => {
anyhow::bail!("clip client became disabled mid-run; impossible");
}
}
}
}
scores.sort_by(|a, b| b.0.partial_cmp(&a.0).unwrap_or(std::cmp::Ordering::Equal));
let elapsed = started.elapsed();
println!();
println!(
"── top {} for query: {:?} ──",
args.top.min(scores.len()),
args.query
);
for (i, (sim, path)) in scores.iter().take(args.top).enumerate() {
println!("[{:>2}] sim={:.3} {}", i + 1, sim, path);
}
println!();
println!("── summary ─────────────────────────────────────");
println!("query : {:?}", args.query);
println!("scanned rows : {scanned}");
println!("encoded photos : {encoded}");
println!("permanent failures : {perm_fail}");
println!("transient failures : {transient_fail}");
println!("elapsed : {:.1}s", elapsed.as_secs_f32());
if encoded > 0 {
println!(
"throughput : {:.2} photos/s ({:.0}ms/photo avg)",
encoded as f32 / elapsed.as_secs_f32().max(0.001),
elapsed.as_millis() as f32 / encoded as f32
);
}
Ok(())
}
+465
View File
@@ -0,0 +1,465 @@
//! Re-embed stored corpora through `LocalLlm`, i.e. the same
//! `LLM_BACKEND` dispatch the query side uses. The original import /
//! backfill tools always embedded via Ollama, so a deploy running
//! `LLM_BACKEND=llamacpp` queries vector spaces the corpora may not live
//! in. Three tables share the problem and are all covered here:
//!
//! - `daily_conversation_summaries` — re-embeds
//! `strip_summary_boilerplate(summary)` (what the original job fed the
//! embedder); also rewrites `model_version`.
//! - `calendar_events` — re-embeds "summary description location" exactly
//! as `import_calendar` does; rows without an embedding are skipped (the
//! import only embeds under `--generate-embeddings`).
//! - `search_history` — re-embeds the raw query text.
//! - `entities` (knowledge graph) — re-embeds "name description" exactly as
//! `tool_store_entity` does; embedding-less rows are skipped (embedding
//! is best-effort at store time).
//!
//! Source text is untouched — only vectors are rewritten. The old↔new
//! cosine report doubles as a diagnostic: ~1.0 means both backends already
//! shared a space (re-embedding was a no-op); low values confirm the
//! mismatch this tool exists to fix.
use anyhow::{Context, Result};
use clap::Parser;
use diesel::prelude::*;
use diesel::sql_query;
use diesel::sqlite::SqliteConnection;
use image_api::ai::{LocalLlm, strip_summary_boilerplate};
use image_api::bin_progress;
use std::env;
#[derive(Parser, Debug)]
#[command(author, version, about = "Re-embed stored corpora via the configured LLM_BACKEND", long_about = None)]
struct Args {
/// Comma-separated tables to process: summaries, calendar, search, entities
#[arg(long, default_value = "summaries,calendar,search,entities")]
tables: String,
/// Only process the first N rows per table (smoke test)
#[arg(long)]
limit: Option<usize>,
/// Compute embeddings and report old↔new similarity without writing
#[arg(long, default_value_t = false)]
dry_run: bool,
}
#[derive(QueryableByName)]
struct SummaryRow {
#[diesel(sql_type = diesel::sql_types::Integer)]
id: i32,
#[diesel(sql_type = diesel::sql_types::Text)]
summary: String,
#[diesel(sql_type = diesel::sql_types::Binary)]
embedding: Vec<u8>,
#[diesel(sql_type = diesel::sql_types::Text)]
model_version: String,
}
#[derive(QueryableByName)]
struct CalendarRow {
#[diesel(sql_type = diesel::sql_types::Integer)]
id: i32,
#[diesel(sql_type = diesel::sql_types::Text)]
summary: String,
#[diesel(sql_type = diesel::sql_types::Nullable<diesel::sql_types::Text>)]
description: Option<String>,
#[diesel(sql_type = diesel::sql_types::Nullable<diesel::sql_types::Text>)]
location: Option<String>,
#[diesel(sql_type = diesel::sql_types::Binary)]
embedding: Vec<u8>,
}
#[derive(QueryableByName)]
struct SearchRow {
#[diesel(sql_type = diesel::sql_types::BigInt)]
id: i64,
#[diesel(sql_type = diesel::sql_types::Text)]
query: String,
#[diesel(sql_type = diesel::sql_types::Binary)]
embedding: Vec<u8>,
}
#[derive(QueryableByName)]
struct EntityRow {
#[diesel(sql_type = diesel::sql_types::Integer)]
id: i32,
#[diesel(sql_type = diesel::sql_types::Text)]
name: String,
#[diesel(sql_type = diesel::sql_types::Text)]
description: String,
#[diesel(sql_type = diesel::sql_types::Binary)]
embedding: Vec<u8>,
}
/// One unit of re-embed work, normalized across tables.
struct WorkItem {
/// Row key, as i64 so both i32 ids and rowids fit.
id: i64,
/// Text fed to the embedder — must match what the original writer used.
text: String,
/// Existing vector bytes, for the old↔new similarity report.
old_embedding: Vec<u8>,
}
fn deserialize_vector(bytes: &[u8]) -> Option<Vec<f32>> {
if !bytes.len().is_multiple_of(4) {
return None;
}
Some(
bytes
.chunks_exact(4)
.map(|c| f32::from_le_bytes([c[0], c[1], c[2], c[3]]))
.collect(),
)
}
fn serialize_vector(vec: &[f32]) -> Vec<u8> {
vec.iter().flat_map(|f| f.to_le_bytes()).collect()
}
fn cosine_similarity(a: &[f32], b: &[f32]) -> f32 {
if a.len() != b.len() {
return 0.0;
}
let dot: f32 = a.iter().zip(b).map(|(x, y)| x * y).sum();
let mag_a: f32 = a.iter().map(|x| x * x).sum::<f32>().sqrt();
let mag_b: f32 = b.iter().map(|x| x * x).sum::<f32>().sqrt();
if mag_a == 0.0 || mag_b == 0.0 {
return 0.0;
}
dot / (mag_a * mag_b)
}
/// Embed `text`, halving it on "input too large" errors until it fits the
/// server's physical batch (`--ubatch-size`). Mirrors the silent truncation
/// Ollama applied when these corpora were first embedded — llama-server
/// returns a 500 instead — except here it's surfaced via the returned flag.
/// Returns `(embedding, truncated)`.
async fn embed_with_truncation(llm: &LocalLlm, text: &str) -> Result<(Vec<f32>, bool)> {
let mut text = text.to_string();
let mut truncated = false;
loop {
match llm.embed_document(&text).await {
Ok(emb) => return Ok((emb, truncated)),
Err(e)
if e.to_string().contains("too large to process") && text.chars().count() > 64 =>
{
let keep = text.chars().count() / 2;
text = text.chars().take(keep).collect();
truncated = true;
}
Err(e) => return Err(e),
}
}
}
/// Re-embed `items`, writing each new vector via `update`. Returns the
/// old↔new cosines for the similarity report.
async fn reembed_table(
conn: &mut SqliteConnection,
llm: &LocalLlm,
label: &str,
items: Vec<WorkItem>,
dry_run: bool,
update: impl Fn(&mut SqliteConnection, i64, Vec<u8>) -> Result<()>,
) -> Result<Vec<f32>> {
println!("\n[{}] re-embedding {} rows...", label, items.len());
let pb = bin_progress::determinate(items.len() as u64, format!("re-embedding {}", label));
let mut sims: Vec<f32> = Vec::with_capacity(items.len());
let mut updated = 0usize;
let mut failed = 0usize;
let mut truncated_count = 0usize;
for item in &items {
let new_emb = match embed_with_truncation(llm, &item.text).await {
Ok((e, truncated)) => {
if truncated {
truncated_count += 1;
pb.println(format!(
"⚠ {} id={}: input exceeded the embed server's batch size, \
truncated before embedding",
label, item.id
));
}
e
}
Err(e) => {
pb.inc(1);
failed += 1;
eprintln!("{} id={}: {}", label, item.id, e);
continue;
}
};
// The whole pipeline (DAO checks, stored corpora) assumes
// EMBEDDING_DIM dims. A mismatch means the active embed slot is not
// serving the configured model — stop rather than corrupt the table.
anyhow::ensure!(
new_emb.len() == image_api::ai::embedding_dim(),
"backend returned {}-dim embedding (expected {}) — '{}' does not \
match the configured EMBEDDING_DIM",
new_emb.len(),
image_api::ai::embedding_dim(),
llm.embedding_model_version()
);
if let Some(old_emb) = deserialize_vector(&item.old_embedding) {
sims.push(cosine_similarity(&old_emb, &new_emb));
}
if !dry_run {
update(conn, item.id, serialize_vector(&new_emb))
.with_context(|| format!("updating {} id={}", label, item.id))?;
}
updated += 1;
pb.inc(1);
}
pb.finish_and_clear();
println!(
"[{}] {} re-embedded ({} truncated), {} failed",
label, updated, truncated_count, failed
);
Ok(sims)
}
fn report_similarity(label: &str, mut sims: Vec<f32>) {
if sims.is_empty() {
println!("[{}] no old↔new pairs to compare", label);
return;
}
sims.sort_by(|a, b| a.partial_cmp(b).unwrap_or(std::cmp::Ordering::Equal));
let mean: f32 = sims.iter().sum::<f32>() / sims.len() as f32;
let median = sims[sims.len() / 2];
println!(
"[{}] old↔new cosine over identical text: min={:.3} median={:.3} mean={:.3} max={:.3}",
label,
sims.first().unwrap(),
median,
mean,
sims.last().unwrap()
);
if median > 0.98 {
println!(
"[{}] → old and new backends agree (~same vector space); poor search \
results are coming from something else (prefixes, thresholds, corpus).",
label
);
} else if median > 0.9 {
println!(
"[{}] → same model family but measurably different vectors \
(quantization / runtime drift); re-embedding was worthwhile.",
label
);
} else {
println!(
"[{}] → vector-space mismatch confirmed — queries were searching a \
different space than the corpus. This re-embed should fix it.",
label
);
}
}
#[tokio::main]
async fn main() -> Result<()> {
dotenv::dotenv().ok();
env_logger::init();
let args = Args::parse();
let tables: Vec<&str> = args.tables.split(',').map(|t| t.trim()).collect();
for t in &tables {
anyhow::ensure!(
matches!(*t, "summaries" | "calendar" | "search" | "entities"),
"unknown table '{}' — expected summaries, calendar, search, entities",
t
);
}
let database_url = env::var("DATABASE_URL").unwrap_or_else(|_| "auth.db".to_string());
println!("Database: {}", database_url);
let mut conn = SqliteConnection::establish(&database_url)
.with_context(|| format!("connecting to {}", database_url))?;
let llm = LocalLlm::from_env();
let model_version = llm.embedding_model_version();
println!("Embedding via '{}'", model_version);
if args.dry_run {
println!("DRY RUN — no rows will be written");
}
if tables.contains(&"summaries") {
let mut rows: Vec<SummaryRow> = sql_query(
"SELECT id, summary, embedding, model_version
FROM daily_conversation_summaries ORDER BY date",
)
.load(&mut conn)
.context("loading daily summaries")?;
if let Some(limit) = args.limit {
rows.truncate(limit);
}
if let Some(first) = rows.first() {
println!(
"\n[summaries] previous model_version '{}' → '{}'",
first.model_version, model_version
);
}
let items = rows
.into_iter()
.map(|r| WorkItem {
id: r.id as i64,
text: strip_summary_boilerplate(&r.summary),
old_embedding: r.embedding,
})
.collect();
let mv = model_version.clone();
let sims = reembed_table(
&mut conn,
&llm,
"summaries",
items,
args.dry_run,
move |conn, id, emb| {
sql_query(
"UPDATE daily_conversation_summaries
SET embedding = ?1, model_version = ?2 WHERE id = ?3",
)
.bind::<diesel::sql_types::Binary, _>(emb)
.bind::<diesel::sql_types::Text, _>(&mv)
.bind::<diesel::sql_types::Integer, _>(id as i32)
.execute(conn)?;
Ok(())
},
)
.await?;
report_similarity("summaries", sims);
}
if tables.contains(&"calendar") {
let mut rows: Vec<CalendarRow> = sql_query(
"SELECT id, summary, description, location, embedding
FROM calendar_events WHERE embedding IS NOT NULL ORDER BY id",
)
.load(&mut conn)
.context("loading calendar events")?;
if let Some(limit) = args.limit {
rows.truncate(limit);
}
let items = rows
.into_iter()
.map(|r| WorkItem {
id: r.id as i64,
// Same text construction as import_calendar.
text: format!(
"{} {} {}",
r.summary,
r.description.as_deref().unwrap_or(""),
r.location.as_deref().unwrap_or("")
),
old_embedding: r.embedding,
})
.collect();
let sims = reembed_table(
&mut conn,
&llm,
"calendar",
items,
args.dry_run,
|conn, id, emb| {
sql_query("UPDATE calendar_events SET embedding = ?1 WHERE id = ?2")
.bind::<diesel::sql_types::Binary, _>(emb)
.bind::<diesel::sql_types::Integer, _>(id as i32)
.execute(conn)?;
Ok(())
},
)
.await?;
report_similarity("calendar", sims);
}
if tables.contains(&"search") {
let mut rows: Vec<SearchRow> = sql_query(
"SELECT rowid AS id, query, embedding
FROM search_history ORDER BY rowid",
)
.load(&mut conn)
.context("loading search history")?;
if let Some(limit) = args.limit {
rows.truncate(limit);
}
let items = rows
.into_iter()
.map(|r| WorkItem {
id: r.id,
text: r.query,
old_embedding: r.embedding,
})
.collect();
let sims = reembed_table(
&mut conn,
&llm,
"search",
items,
args.dry_run,
|conn, id, emb| {
sql_query("UPDATE search_history SET embedding = ?1 WHERE rowid = ?2")
.bind::<diesel::sql_types::Binary, _>(emb)
.bind::<diesel::sql_types::BigInt, _>(id)
.execute(conn)?;
Ok(())
},
)
.await?;
report_similarity("search", sims);
}
if tables.contains(&"entities") {
let mut rows: Vec<EntityRow> = sql_query(
"SELECT id, name, description, embedding
FROM entities WHERE embedding IS NOT NULL ORDER BY id",
)
.load(&mut conn)
.context("loading knowledge entities")?;
if let Some(limit) = args.limit {
rows.truncate(limit);
}
let items = rows
.into_iter()
.map(|r| WorkItem {
id: r.id as i64,
// Same text construction as tool_store_entity.
text: format!("{} {}", r.name, r.description),
old_embedding: r.embedding,
})
.collect();
let sims = reembed_table(
&mut conn,
&llm,
"entities",
items,
args.dry_run,
|conn, id, emb| {
sql_query("UPDATE entities SET embedding = ?1 WHERE id = ?2")
.bind::<diesel::sql_types::Binary, _>(emb)
.bind::<diesel::sql_types::Integer, _>(id as i32)
.execute(conn)?;
Ok(())
},
)
.await?;
report_similarity("entities", sims);
}
println!(
"\n{}",
if args.dry_run {
"Dry run complete"
} else {
"Done"
}
);
Ok(())
}
+50 -60
View File
@@ -1,7 +1,10 @@
use anyhow::Result;
use chrono::NaiveDate;
use clap::Parser;
use image_api::ai::{OllamaClient, SmsApiClient, strip_summary_boilerplate};
use image_api::ai::{
EMBEDDING_MODEL, OllamaClient, SmsApiClient, build_daily_summary_prompt,
strip_summary_boilerplate, user_display_name,
};
use image_api::database::{DailySummaryDao, InsertDailySummary, SqliteDailySummaryDao};
use std::env;
use std::sync::{Arc, Mutex};
@@ -25,6 +28,26 @@ struct Args {
#[arg(short, long)]
model: Option<String>,
/// Context window size passed as Ollama `num_ctx`. Omit for server default.
#[arg(long)]
num_ctx: Option<i32>,
/// Sampling temperature. Omit for server default.
#[arg(long)]
temperature: Option<f32>,
/// Top-p (nucleus) sampling. Omit for server default.
#[arg(long)]
top_p: Option<f32>,
/// Top-k sampling. Omit for server default.
#[arg(long)]
top_k: Option<i32>,
/// Min-p sampling. Omit for server default.
#[arg(long)]
min_p: Option<f32>,
/// Test mode: Generate but don't save to database (shows output only)
#[arg(short = 't', long, default_value_t = false)]
test_mode: bool,
@@ -86,12 +109,28 @@ async fn main() -> Result<()> {
.unwrap_or_else(|_| "nemotron-3-nano:30b".to_string())
});
let ollama = OllamaClient::new(
let mut ollama = OllamaClient::new(
ollama_primary_url,
ollama_fallback_url.clone(),
model_to_use.clone(),
Some(model_to_use), // Use same model for fallback
);
if let Some(ctx) = args.num_ctx {
ollama.set_num_ctx(Some(ctx));
}
if args.temperature.is_some()
|| args.top_p.is_some()
|| args.top_k.is_some()
|| args.min_p.is_some()
{
ollama.set_sampling_params(args.temperature, args.top_p, args.top_k, args.min_p);
}
// Surface what's actually configured so comparison runs are auditable.
println!(
"num_ctx={:?} temperature={:?} top_p={:?} top_k={:?} min_p={:?}",
args.num_ctx, args.temperature, args.top_p, args.top_k, args.min_p
);
let sms_api_url =
env::var("SMS_API_URL").unwrap_or_else(|_| "http://localhost:8000".to_string());
@@ -160,9 +199,14 @@ async fn main() -> Result<()> {
println!("━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━");
if args.verbose {
let user_name = user_display_name();
println!("\nMessage preview:");
for (i, msg) in messages.iter().take(3).enumerate() {
let sender = if msg.is_sent { "Me" } else { &msg.contact };
let sender: &str = if msg.is_sent {
&user_name
} else {
&msg.contact
};
let preview = msg.body.chars().take(60).collect::<String>();
println!(" {}. {}: {}...", i + 1, sender, preview);
}
@@ -172,64 +216,11 @@ async fn main() -> Result<()> {
println!();
}
// Format messages for LLM
let messages_text: String = messages
.iter()
.take(200)
.map(|m| {
if m.is_sent {
format!("Me: {}", m.body)
} else {
format!("{}: {}", m.contact, m.body)
}
})
.collect::<Vec<_>>()
.join("\n");
let prompt = format!(
r#"Summarize this day's conversation between me and {}.
CRITICAL FORMAT RULES:
- Do NOT start with "Based on the conversation..." or "Here is a summary..." or similar preambles
- Do NOT repeat the date at the beginning
- Start DIRECTLY with the content - begin with a person's name or action
- Write in past tense, as if recording what happened
NARRATIVE (3-5 sentences):
- What specific topics, activities, or events were discussed?
- What places, people, or organizations were mentioned?
- What plans were made or decisions discussed?
- Clearly distinguish between what "I" did versus what {} did
KEYWORDS (comma-separated):
5-10 specific keywords that capture this conversation's unique content:
- Proper nouns (people, places, brands)
- Specific activities ("drum corps audition" not just "music")
- Distinctive terms that make this day unique
Date: {} ({})
Messages:
{}
YOUR RESPONSE (follow this format EXACTLY):
Summary: [Start directly with content, NO preamble]
Keywords: [specific, unique terms]"#,
args.contact,
args.contact,
date.format("%B %d, %Y"),
weekday,
messages_text
);
let (prompt, system_prompt) = build_daily_summary_prompt(&args.contact, date, messages);
println!("Generating summary...");
let summary = ollama
.generate(
&prompt,
Some("You are a conversation summarizer. Create clear, factual summaries with precise subject attribution AND extract distinctive keywords. Focus on specific, unique terms that differentiate this conversation from others."),
)
.await?;
let summary = ollama.generate(&prompt, Some(system_prompt)).await?;
println!("\n📝 GENERATED SUMMARY:");
println!("─────────────────────────────────────────");
@@ -256,8 +247,7 @@ Keywords: [specific, unique terms]"#,
message_count: messages.len() as i32,
embedding,
created_at: chrono::Utc::now().timestamp(),
// model_version: "nomic-embed-text:v1.5".to_string(),
model_version: "mxbai-embed-large:335m".to_string(),
model_version: EMBEDDING_MODEL.to_string(),
};
let mut dao = summary_dao.lock().expect("Unable to lock DailySummaryDao");
+34
View File
@@ -0,0 +1,34 @@
//! Shared progress-bar styling for the utility binaries. Centralised so every
//! `cargo run --bin ...` tool gets the same look and feel.
use indicatif::{ProgressBar, ProgressStyle};
const DETERMINATE_TEMPLATE: &str = "{spinner:.green} [{elapsed_precise}] [{wide_bar:.cyan/blue}] \
{human_pos}/{human_len} ({percent}%) {per_sec} eta {eta} {msg}";
const SPINNER_TEMPLATE: &str = "{spinner:.green} [{elapsed_precise}] {human_pos} {per_sec} {msg}";
/// Determinate progress bar used when the total work is known up front.
pub fn determinate(total: u64, message: impl Into<String>) -> ProgressBar {
let pb = ProgressBar::new(total);
pb.set_style(
ProgressStyle::with_template(DETERMINATE_TEMPLATE)
.expect("hard-coded template parses")
.progress_chars("=> "),
);
pb.set_message(message.into());
pb
}
/// Spinner used for open-ended work (e.g. paginated DB scans that loop until
/// empty). Throughput is shown via `{per_sec}`; tick at a steady cadence so
/// it animates even when work is bursty.
pub fn spinner(message: impl Into<String>) -> ProgressBar {
let pb = ProgressBar::new_spinner();
pb.set_style(
ProgressStyle::with_template(SPINNER_TEMPLATE).expect("hard-coded template parses"),
);
pb.set_message(message.into());
pb.enable_steady_tick(std::time::Duration::from_millis(120));
pb
}
+17 -7
View File
@@ -1,8 +1,9 @@
use crate::bin_progress;
use crate::cleanup::database_updater::DatabaseUpdater;
use crate::cleanup::types::{CleanupConfig, CleanupStats};
use crate::file_types::IMAGE_EXTENSIONS;
use anyhow::Result;
use log::{error, warn};
use log::error;
use std::path::PathBuf;
// All supported image extensions to try
@@ -25,15 +26,17 @@ pub fn resolve_missing_files(
stats.files_checked = all_paths.len();
println!("Checking file existence...");
let mut missing_count = 0;
let mut resolved_count = 0;
let pb = bin_progress::determinate(stats.files_checked as u64, "checking");
for path_str in all_paths {
let full_path = config.base_path.join(&path_str);
// Check if file exists
if full_path.exists() {
pb.inc(1);
continue;
}
@@ -43,16 +46,16 @@ pub fn resolve_missing_files(
// Try to find the file with different extensions
match find_file_with_alternative_extension(&config.base_path, &path_str) {
Some(new_path_str) => {
println!(
"✓ {} → found as {} {}",
pb.println(format!(
"✓ {} → found as {}{}",
path_str,
new_path_str,
if config.dry_run {
"(dry-run, not updated)"
" (dry-run, not updated)"
} else {
""
}
);
));
if !config.dry_run {
// Update database
@@ -71,11 +74,18 @@ pub fn resolve_missing_files(
}
}
None => {
warn!("{} → not found with any extension", path_str);
pb.println(format!("{} not found with any extension", path_str));
}
}
pb.set_message(format!(
"missing={} resolved={}",
missing_count, resolved_count
));
pb.inc(1);
}
pb.finish_and_clear();
println!("\nResults:");
println!("- Files checked: {}", stats.files_checked);
println!("- Missing files: {}", missing_count);
+32 -14
View File
@@ -1,7 +1,9 @@
use crate::bin_progress;
use crate::cleanup::database_updater::DatabaseUpdater;
use crate::cleanup::file_type_detector::{detect_file_type, should_rename};
use crate::cleanup::types::{CleanupConfig, CleanupStats};
use anyhow::Result;
use indicatif::ProgressBar;
use log::{error, warn};
use std::fs;
use std::path::{Path, PathBuf};
@@ -32,16 +34,20 @@ pub fn validate_file_types(
println!("Files found: {}\n", files.len());
stats.files_checked = files.len();
println!("Detecting file types...");
let mut mismatches_found = 0;
let mut files_renamed = 0;
let mut user_skipped = 0;
let pb = bin_progress::determinate(files.len() as u64, "detecting");
for file_path in files {
// Get current extension
let current_ext = match file_path.extension() {
Some(ext) => ext.to_str().unwrap_or(""),
None => continue, // Skip files without extensions
None => {
pb.inc(1);
continue;
}
};
// Detect actual file type
@@ -57,14 +63,15 @@ pub fn validate_file_types(
Ok(rel) => rel.to_str().unwrap_or(""),
Err(_) => {
error!("Failed to get relative path for {:?}", file_path);
pb.inc(1);
continue;
}
};
println!("\nFile type mismatch:");
println!(" Path: {}", relative_path);
println!(" Current: .{}", current_ext);
println!(" Actual: .{}", detected_ext);
pb.println(format!(
"mismatch: {} .{} → .{}",
relative_path, current_ext, detected_ext
));
// Calculate new path
let new_file_path = file_path.with_extension(&detected_ext);
@@ -72,6 +79,7 @@ pub fn validate_file_types(
Ok(rel) => rel.to_str().unwrap_or(""),
Err(_) => {
error!("Failed to get new relative path for {:?}", new_file_path);
pb.inc(1);
continue;
}
};
@@ -83,22 +91,26 @@ pub fn validate_file_types(
"Destination exists for {}: {}",
relative_path, new_relative_path
));
pb.inc(1);
continue;
}
// Determine if we should proceed
let should_proceed = if config.dry_run {
println!(" (dry-run mode - would rename to {})", new_relative_path);
pb.println(format!(
" (dry-run — would rename to {})",
new_relative_path
));
false
} else if skip_all {
println!(" Skipped (skip all)");
user_skipped += 1;
false
} else if auto_fix_all {
true
} else {
// Interactive prompt
match prompt_for_rename(new_relative_path) {
// Interactive prompt — suspend the bar so the prompt is visible.
let decision = pb.suspend(|| prompt_for_rename(new_relative_path, &pb));
match decision {
RenameDecision::Yes => true,
RenameDecision::No => {
user_skipped += 1;
@@ -120,8 +132,6 @@ pub fn validate_file_types(
// Rename the file
match fs::rename(&file_path, &new_file_path) {
Ok(_) => {
println!("✓ Renamed file");
// Update database
match db_updater.update_file_path(relative_path, new_relative_path)
{
@@ -160,8 +170,15 @@ pub fn validate_file_types(
warn!("Failed to detect type for {:?}: {:?}", file_path, e);
}
}
pb.set_message(format!(
"mismatches={} renamed={} skipped={}",
mismatches_found, files_renamed, user_skipped
));
pb.inc(1);
}
pb.finish_and_clear();
println!("\nResults:");
println!("- Files scanned: {}", stats.files_checked);
println!("- Mismatches found: {}", mismatches_found);
@@ -195,8 +212,9 @@ enum RenameDecision {
SkipAll,
}
/// Prompt the user for rename decision
fn prompt_for_rename(new_path: &str) -> RenameDecision {
/// Prompt the user for rename decision. Caller must `pb.suspend` so the
/// progress bar isn't redrawing over the prompt.
fn prompt_for_rename(new_path: &str, _pb: &ProgressBar) -> RenameDecision {
println!("\nRename to {}?", new_path);
println!(" [y] Yes");
println!(" [n] No (default)");
+352
View File
@@ -0,0 +1,352 @@
//! `/photos/search?q=<text>` — CLIP semantic photo search.
//!
//! The route lives outside `files.rs` to keep that 1500+ line module
//! focused on EXIF / tag listing. The flow is:
//!
//! 1. Parse query params (`q`, `limit`, `threshold`, optional `library`).
//! 2. Call Apollo's `/api/internal/clip/encode_text` to get the query
//! vector (L2-normalized 768-d f32 for ViT-L/14).
//! 3. Load every `(content_hash, clip_embedding)` for the scope from
//! `image_exif` via `ExifDao::list_clip_index`. ~2843 MB for a 14k
//! library at ViT-L/14; loaded fresh per request — fast enough for
//! v1, optimize via an AppState cache later if needed.
//! 4. Dot product (= cosine since both sides are L2-normalized), filter
//! above `threshold`, top-K by score.
//! 5. Resolve each surviving hash back to a `(library_id, rel_path)` so
//! the frontend can render the photo / hand off to the carousel.
//!
//! Response shape is intentionally minimal — paths + score — so the
//! frontend can reuse existing PhotoGrid rendering by joining against
//! `/api/photos/match` (or calling `/image/metadata` lazily). Don't
//! bake camera/EXIF metadata into this route; it would force a fan-out
//! per result and balloon the response.
use crate::AppState;
use crate::ai::clip_client::ClipError;
use crate::database::ExifDao;
use actix_web::{HttpResponse, Result as ActixResult, web};
use base64::Engine;
use serde::{Deserialize, Serialize};
use std::sync::Mutex;
#[derive(Debug, Deserialize)]
pub struct SearchQuery {
/// Natural-language query. Required; empty triggers 400.
pub q: String,
/// Max results to return in this page. Capped to 200 server-side.
/// Defaults to 20. Pair with `offset` for pagination.
#[serde(default = "default_limit")]
pub limit: usize,
/// Zero-based offset into the sorted-and-filtered result set. The
/// scoring loop still runs over the full embedding matrix on every
/// page (cheap at personal-library scale — sub-100ms — and avoids
/// stateful pagination cursors). Defaults to 0.
#[serde(default)]
pub offset: usize,
/// Cosine-similarity floor below which results are dropped.
/// 0.20 is the rough "this is plausibly relevant" line for OpenAI
/// CLIP; tunable per call when sweeping. Defaults to 0.20.
#[serde(default = "default_threshold")]
pub threshold: f32,
/// Optional single-library scope. Legacy param — new clients pass
/// `library_ids` instead so multi-select scopes (Apollo's HUD library
/// chips, FileViewer-React's library picker) actually filter. Kept
/// for back-compat; `library_ids` wins when both are supplied.
pub library: Option<i32>,
/// Optional multi-library scope, comma-separated id list
/// (`?library_ids=1,3`). Empty / omitted = every enabled library
/// (the historical default). Apollo and FileViewer-React both send
/// this when 2+ libraries are selected; the single-library case
/// works through either param interchangeably.
pub library_ids: Option<String>,
/// Optional model-version filter. Defaults to the live engine's
/// version (queried lazily). Forces a strict join so mid-flight
/// model swaps can't mix geometries in a single response.
#[serde(default)]
pub model_version: Option<String>,
}
fn default_limit() -> usize {
20
}
fn default_threshold() -> f32 {
0.20
}
#[derive(Debug, Serialize)]
pub struct SearchHit {
pub library_id: i32,
pub rel_path: String,
pub content_hash: String,
/// Cosine similarity in [-1, 1]. In practice OpenAI CLIP returns
/// 0.100.40 for the typical photo library.
pub score: f32,
}
#[derive(Debug, Serialize)]
pub struct SearchResponse {
pub query: String,
pub model_version: String,
pub threshold: f32,
/// Total embeddings scored (= every photo in scope with a stored
/// embedding). Same value across pages of the same query.
pub considered: usize,
/// Count of results above threshold, before pagination. Lets the
/// client decide whether a "Load more" button is meaningful and
/// stop fetching when ``offset + results.len() >= total_matching``.
pub total_matching: usize,
pub offset: usize,
pub results: Vec<SearchHit>,
}
#[derive(Debug, Serialize)]
struct SearchError {
error: String,
}
/// Decode a stored `clip_embedding` BLOB back into a `Vec<f32>`. Returns
/// `None` on malformed bytes — those rows get skipped rather than
/// failing the whole query.
fn decode_embedding(bytes: &[u8]) -> Option<Vec<f32>> {
if bytes.is_empty() || !bytes.len().is_multiple_of(4) {
return None;
}
let mut out = Vec::with_capacity(bytes.len() / 4);
for chunk in bytes.chunks_exact(4) {
out.push(f32::from_le_bytes([chunk[0], chunk[1], chunk[2], chunk[3]]));
}
Some(out)
}
#[inline]
fn dot(a: &[f32], b: &[f32]) -> f32 {
a.iter().zip(b.iter()).map(|(x, y)| x * y).sum()
}
pub async fn search_photos(
state: web::Data<AppState>,
exif_dao: web::Data<Mutex<Box<dyn ExifDao>>>,
query: web::Query<SearchQuery>,
) -> ActixResult<HttpResponse> {
let q_text = query.q.trim().to_string();
if q_text.is_empty() {
return Ok(HttpResponse::BadRequest().json(SearchError {
error: "query parameter `q` is required".into(),
}));
}
if !state.clip_client.is_enabled() {
return Ok(HttpResponse::ServiceUnavailable().json(SearchError {
error: "CLIP search is disabled (no Apollo CLIP endpoint configured)".into(),
}));
}
let limit = query.limit.clamp(1, 200);
let offset = query.offset;
let threshold = query.threshold.clamp(-1.0, 1.0);
// 1. Encode the query text. Fast — Apollo's text encoder is ~50ms
// on CPU. Bail with a clear error message if Apollo's down so the
// user sees "service unavailable" rather than empty results.
let query_resp = match state.clip_client.encode_text(&q_text).await {
Ok(r) => r,
Err(ClipError::Permanent(e)) => {
return Ok(HttpResponse::BadRequest().json(SearchError {
error: format!("query rejected: {e}"),
}));
}
Err(ClipError::Transient(e)) => {
return Ok(HttpResponse::BadGateway().json(SearchError {
error: format!("CLIP service unavailable: {e}"),
}));
}
Err(ClipError::Disabled) => {
return Ok(HttpResponse::ServiceUnavailable().json(SearchError {
error: "CLIP service disabled".into(),
}));
}
};
// decode_embedding works on raw bytes; the wire format is b64.
let query_bytes = base64::engine::general_purpose::STANDARD
.decode(query_resp.embedding.as_bytes())
.unwrap_or_default();
let query_vec = match decode_embedding(&query_bytes) {
Some(v) => v,
None => {
return Ok(HttpResponse::BadGateway().json(SearchError {
error: "CLIP service returned a malformed query embedding".into(),
}));
}
};
// 2. Decide which library scope to search. `library_ids` (multi)
// wins over the legacy `library` (single) when both are present;
// either / both empty falls back to "every enabled library".
let library_ids: Vec<i32> = if let Some(raw) = query.library_ids.as_deref() {
let mut out: Vec<i32> = Vec::new();
for piece in raw.split(',') {
let trimmed = piece.trim();
if trimmed.is_empty() {
continue;
}
match trimmed.parse::<i32>() {
Ok(id) => {
if !out.contains(&id) {
out.push(id);
}
}
Err(_) => {
return Ok(HttpResponse::BadRequest().json(SearchError {
error: format!("invalid library_ids entry: {trimmed:?}"),
}));
}
}
}
out
} else if let Some(id) = query.library {
vec![id]
} else {
Vec::new()
};
// 3. Pull the (hash, embedding) matrix. Lock contention here is
// bounded — one big SELECT under a mutex Arc<Mutex<dyn ExifDao>>
// and then we release before scoring. If this becomes a hotspot
// we'll cache the decoded matrix in AppState with TTL.
let ctx = opentelemetry::Context::current();
let rows: Vec<(String, Vec<u8>)> = {
let mut dao = exif_dao.lock().expect("exif dao");
match dao.list_clip_index(
&ctx,
&library_ids,
query
.model_version
.as_deref()
.or(Some(&query_resp.model_version)),
) {
Ok(r) => r,
Err(e) => {
log::warn!("clip_search: list_clip_index failed: {:?}", e);
return Ok(HttpResponse::InternalServerError().json(SearchError {
error: "failed to load search index".into(),
}));
}
}
};
let considered = rows.len();
if considered == 0 {
return Ok(HttpResponse::Ok().json(SearchResponse {
query: q_text,
model_version: query_resp.model_version,
threshold,
considered,
total_matching: 0,
offset,
results: Vec::new(),
}));
}
// 4. Score. Cap the loop's transient allocation; we keep all scores
// and sort at the end. With ~14k entries the sort is microseconds.
let mut scored: Vec<(f32, String)> = Vec::with_capacity(considered);
for (hash, blob) in rows {
let Some(emb) = decode_embedding(&blob) else {
continue;
};
if emb.len() != query_vec.len() {
continue;
}
let sim = dot(&emb, &query_vec);
if sim < threshold {
continue;
}
scored.push((sim, hash));
}
scored.sort_by(|a, b| b.0.partial_cmp(&a.0).unwrap_or(std::cmp::Ordering::Equal));
let total_matching = scored.len();
// Pagination — slice the sorted list at `[offset, offset+limit)`.
// Offsets past the end produce empty pages rather than an error so
// the client can stop fetching naturally on "load more" past the end.
let scored: Vec<(f32, String)> = if offset >= total_matching {
Vec::new()
} else {
let end = (offset + limit).min(total_matching);
scored[offset..end].to_vec()
};
if scored.is_empty() {
return Ok(HttpResponse::Ok().json(SearchResponse {
query: q_text,
model_version: query_resp.model_version,
threshold,
considered,
total_matching,
offset,
results: Vec::new(),
}));
}
// 5. Resolve each surviving hash back to a `(library_id, rel_path)`.
// `get_rel_paths_by_hash` returns every rel_path; we pick the first
// one for the result. Apollo / the UI can fetch alternatives via
// /image/metadata when needed.
let hashes: Vec<String> = scored.iter().map(|(_, h)| h.clone()).collect();
let path_map = {
let mut dao = exif_dao.lock().expect("exif dao");
match dao.get_rel_paths_for_hashes(&ctx, &hashes) {
Ok(m) => m,
Err(e) => {
log::warn!("clip_search: get_rel_paths_for_hashes failed: {:?}", e);
return Ok(HttpResponse::InternalServerError().json(SearchError {
error: "failed to resolve photo paths".into(),
}));
}
}
};
// We need (library_id, rel_path) — get_rel_paths_for_hashes only
// returns rel_paths. Cross-reference via find_by_content_hash to
// pick the library too. Single call per surviving hash; cheap at
// top-20.
let mut results = Vec::with_capacity(scored.len());
{
let mut dao = exif_dao.lock().expect("exif dao");
for (score, hash) in scored {
let row = match dao.find_by_content_hash(&ctx, &hash) {
Ok(Some(r)) => r,
Ok(None) => continue,
Err(e) => {
log::warn!(
"clip_search: find_by_content_hash failed for {}: {:?}",
hash,
e
);
continue;
}
};
// Prefer get_rel_paths_for_hashes's first entry if it
// exists (it shares semantics with `image_exif`'s natural
// order), falling back to the ImageExif row.
let rel_path = path_map
.get(&hash)
.and_then(|paths| paths.first().cloned())
.unwrap_or(row.file_path);
results.push(SearchHit {
library_id: row.library_id,
rel_path,
content_hash: hash,
score,
});
}
}
Ok(HttpResponse::Ok().json(SearchResponse {
query: q_text,
model_version: query_resp.model_version,
threshold,
considered,
total_matching,
offset,
results,
}))
}
+246
View File
@@ -0,0 +1,246 @@
//! CLIP-encoding pass for the file watcher.
//!
//! `process_clip_backlog` in `backfill.rs` calls [`run_clip_encoding_pass`]
//! with the page of candidates returned by
//! `ExifDao::list_clip_unencoded_candidates`. We walk those, fan out K
//! parallel encode calls to Apollo, and persist the resulting embeddings
//! into `image_exif.clip_embedding` / `clip_model_version`.
//!
//! Unlike the face pipeline, CLIP has no marker rows — a permanent
//! failure (un-decodable bytes) leaves the row's `clip_embedding` NULL
//! and the drain will retry on the next tick. For personal-library
//! scale this is fine; the per-tick cap bounds the wasted work, and
//! `file_types::is_image_file` filters out videos / non-media client-
//! side so most permanent failures are decoded-but-corrupt files (rare).
//!
//! The watcher thread isn't in any pre-existing async context, so we
//! build a short-lived tokio runtime per pass and `block_on` the join
//! of K encode futures. Concurrency knob: `CLIP_ENCODE_CONCURRENCY`
//! (default 4 — lower than faces because Apollo's CLIP path doesn't
//! release the GIL between preprocess and forward as cleanly).
use crate::ai::clip_client::{ClipClient, ClipError, EncodeImageMeta};
use crate::database::ExifDao;
use crate::exif;
use crate::file_types;
use crate::libraries::Library;
use crate::memories::PathExcluder;
use log::{debug, info, warn};
use std::path::Path;
use std::sync::{Arc, Mutex};
use tokio::sync::Semaphore;
/// One file the watcher would like to CLIP-encode. Built from the DAO
/// `list_clip_unencoded_candidates` result — needs the `content_hash`
/// for traceability in Apollo's log lines, even though the embedding
/// itself is keyed on `(library_id, rel_path)` for the back-write.
#[derive(Debug, Clone)]
pub struct ClipCandidate {
pub rel_path: String,
pub content_hash: String,
}
/// Synchronous entry point. Returns once every candidate has been
/// processed (or definitively skipped). No-op when the client is
/// disabled so the caller can call unconditionally.
pub fn run_clip_encoding_pass(
library: &Library,
excluded_dirs: &[String],
clip_client: &ClipClient,
exif_dao: Arc<Mutex<Box<dyn ExifDao>>>,
candidates: Vec<ClipCandidate>,
) {
if !clip_client.is_enabled() {
return;
}
if candidates.is_empty() {
return;
}
let base = Path::new(&library.root_path);
let filtered = filter_excluded(base, excluded_dirs, candidates, Some(&library.name));
if filtered.is_empty() {
return;
}
let concurrency: usize = std::env::var("CLIP_ENCODE_CONCURRENCY")
.ok()
.and_then(|s| s.parse().ok())
.filter(|n: &usize| *n > 0)
.unwrap_or(4);
info!(
"clip_watch: encoding {} candidate(s) for library '{}' (concurrency {})",
filtered.len(),
library.name,
concurrency
);
let rt = match tokio::runtime::Builder::new_multi_thread()
.worker_threads(2)
.enable_all()
.build()
{
Ok(rt) => rt,
Err(e) => {
warn!("clip_watch: failed to build tokio runtime: {e}");
return;
}
};
let library_id = library.id;
let library_root = library.root_path.clone();
rt.block_on(async move {
let sem = Arc::new(Semaphore::new(concurrency));
let mut handles = Vec::with_capacity(filtered.len());
for cand in filtered {
let permit_sem = sem.clone();
let clip_client = clip_client.clone();
let exif_dao = exif_dao.clone();
let library_root = library_root.clone();
handles.push(tokio::spawn(async move {
let _permit = permit_sem.acquire().await.expect("clip semaphore");
process_one(library_id, &library_root, cand, &clip_client, exif_dao).await;
}));
}
for h in handles {
let _ = h.await;
}
});
}
async fn process_one(
library_id: i32,
library_root: &str,
cand: ClipCandidate,
clip_client: &ClipClient,
exif_dao: Arc<Mutex<Box<dyn ExifDao>>>,
) {
let abs = Path::new(library_root).join(&cand.rel_path);
let bytes = match read_image_bytes_for_encode(&abs) {
Ok(b) => b,
Err(e) => {
// Same rationale as face_watch: don't mark — the file may
// have been moved/renamed mid-scan; let the next pass retry.
warn!(
"clip_watch: read failed for {} (lib {}): {}",
cand.rel_path, library_id, e
);
return;
}
};
let meta = EncodeImageMeta {
content_hash: cand.content_hash.clone(),
library_id,
rel_path: cand.rel_path.clone(),
};
let ctx = opentelemetry::Context::current();
match clip_client.encode_image(bytes, meta).await {
Ok(resp) => {
let emb_bytes = match resp.decode_embedding() {
Ok(b) => b,
Err(e) => {
warn!("clip_watch: bad embedding for {}: {:?}", cand.rel_path, e);
return;
}
};
let mut dao = exif_dao.lock().expect("exif dao");
if let Err(e) = dao.backfill_clip_embedding(
&ctx,
library_id,
&cand.rel_path,
&emb_bytes,
&resp.model_version,
) {
warn!(
"clip_watch: backfill_clip_embedding failed for {}: {:?}",
cand.rel_path, e
);
return;
}
debug!(
"clip_watch: {} → dim={} ({}ms, {})",
cand.rel_path, resp.embedding_dim, resp.duration_ms, resp.model_version
);
}
Err(ClipError::Permanent(e)) => {
// No marker — the row sits with NULL embedding and the drain
// retries next pass. For personal-library scale the cost of
// re-attempting permanently-broken files is bounded by the
// per-tick cap. If this becomes a recurring noise source,
// add a `clip_status` column with `failed` semantics like
// face_detections has.
warn!(
"clip_watch: permanent failure on {} (will retry next pass): {}",
cand.rel_path, e
);
}
Err(ClipError::Transient(e)) => {
debug!(
"clip_watch: transient on {}: {} (will retry next pass)",
cand.rel_path, e
);
}
Err(ClipError::Disabled) => {
// Defensive — the entry-point already checked is_enabled().
}
}
}
/// Drop candidates whose paths land in an excluded dir or whose
/// extension isn't an image. Mirrors `face_watch::filter_excluded` so
/// the two backlogs stay shape-consistent. Library name is passed
/// purely for the log line that surfaces an exclusion hit.
pub fn filter_excluded(
base: &Path,
excluded_dirs: &[String],
candidates: Vec<ClipCandidate>,
library_name: Option<&str>,
) -> Vec<ClipCandidate> {
let excluder = if excluded_dirs.is_empty() {
None
} else {
Some(PathExcluder::new(base, excluded_dirs))
};
candidates
.into_iter()
.filter(|c| {
let abs = base.join(&c.rel_path);
if !file_types::is_image_file(&abs) {
debug!(
"clip_watch: skipping non-image '{}' (lib {})",
c.rel_path,
library_name.unwrap_or("<unknown>")
);
return false;
}
if let Some(ex) = excluder.as_ref()
&& ex.is_excluded(&abs)
{
debug!(
"clip_watch: skipping excluded '{}' (lib {})",
c.rel_path,
library_name.unwrap_or("<unknown>")
);
return false;
}
true
})
.collect()
}
/// Read image bytes for CLIP encoding. Same logic as
/// `face_watch::read_image_bytes_for_detect` — RAW / HEIC files don't
/// decode in Apollo's PIL pipeline, so we pull the embedded JPEG
/// preview the thumbnail pipeline already extracts. Plain JPEG / PNG /
/// WebP go through a direct read.
pub fn read_image_bytes_for_encode(path: &Path) -> std::io::Result<Vec<u8>> {
if file_types::needs_ffmpeg_thumbnail(path)
&& let Some(preview) = exif::extract_embedded_jpeg_preview(path)
{
return Ok(preview);
}
std::fs::read(path)
}

Some files were not shown because too many files have changed in this diff Show More