Commit Graph

651 Commits

Author SHA1 Message Date
Cameron Cordes f707353807 feat: nightly agentic pre-generation of memory reels
Implement end-to-end nightly pre-generation of memory reels with agentic
scripting that grounds narration in calendar, location, messages, and RAG.

Sections A-E from the plan:

A. Extract produce_reel pipeline core from run_reel_job with
   ScripterMode::Fast/Agentic and progress callbacks.

B. Agentic scripter: factor run_readonly_tool_loop from the insight
   generator, build read-only tool gate, prompt builder with GPS, and
   generate_script_agentic with fallback to fast path.

C. Precomputed reels ledger (SQLite table + DAO), GET /reels/precomputed
   handler with validity gate, GET /reels/by-key/{key}/video streaming,
   and normalize_library_key helper.

D. Nightly scheduler: spawn_pregen_scheduler with configurable hour,
   run_pregen_batch (day/week/month spans), pregen_one with dedup and
   disk-check, secs_until_next_run_hour time math.

E. user_ai_prefs passive mirror table + DAO for param capture in
   create_reel_handler and replay in the scheduler.

Also fixes resolve_library_param signature to take &[Library] and adds
resolve_library_param_state wrapper for AppState callers.

New files: migrations/2026-06-13-000000_add_precomputed_reels/,
  migrations/2026-06-13-000010_add_user_ai_prefs/,
  src/database/precomputed_reel_dao.rs,
  src/database/user_ai_prefs_dao.rs
2026-06-13 14:29:34 -04:00
Cameron Cordes b30c8c16d0 Reels: clips play through the beat instead of freezing early
A clip beat capped playback at CLIP_SECONDS and filled the rest of the
narration with a tpad freeze-frame, so a clip stopped dead on its last
frame for a second or two before the transition — a glitchy pause that
stills don't have. Extract clip_beat_plan: the clip now plays for as
much of its beat as the source footage covers, and we freeze only when
the source is genuinely shorter than the narration. Bump RENDER_VERSION.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-13 11:00:01 -04:00
Cameron Cordes f5581edf5e Reels: ease burst fade 0.08s → 0.12s
0.08s read as too abrupt; 0.12s keeps the burst clearly snappier than the
0.35s held-shot fade without jarring. Bumps RENDER_VERSION.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-13 00:07:41 -04:00
Cameron Cordes 65793a2dda Reels: mixed-media (video clip beats) + faster burst fade
Videos in a span now appear as clip beats: the first few seconds of the
video (capped at CLIP_SECONDS=5, and to the source length) filled to the
portrait canvas like photos, with its live audio ducked under the
narration (amix at 0.35). If the narration outlasts the clip, the last
frame is held (tpad); clips with no audio track just play under narration.

Selection splits the beat budget between photo beats and clip beats —
clips get up to half (≥1 when present), photos the rest — then merges
both back into chronological order. SegmentMedia gains a Clip variant;
beats carry `media` (photos or one clip) and the cache key tags P/C so a
path used as a still vs a clip differ.

Also drops the burst fade from 0.15s to 0.08s so a quick burst reads
clearly differently from a held shot. Bumps RENDER_VERSION.

The clip filtergraph (fill + duck-mix + last-frame hold) is unit-tested
but, like the rest of the ffmpeg path, wants a real render check on the
GPU host.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-13 00:02:51 -04:00
Cameron Cordes 299e32b014 Bump version to 1.4.0
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-12 23:45:24 -04:00
Cameron Cordes 6e90f24307 Reels: burst beats + duration budget for week/month, plus step logging
Restructures a reel around beats — one narration line over one or more
photos — instead of one line per photo. A single-photo beat is a held
shot; a multi-photo beat is a quick burst that flashes through several
moments of an event while the line is read. So a week/month reel can show
everything it spans without a narrated (and timed) segment per photo.

Selection (selector.rs):
- Duration budget: cap the number of narrated beats to ~REEL_TARGET_SECONDS
  (default 90, env-tunable) so week/month reels don't run minutes long.
- Event clustering by time gap; when there are more events than the beat
  budget, adjacent events merge so the whole span stays covered. Each beat
  bursts up to MAX_BURST_PHOTOS (an even spread), so a 40-shot dinner
  contributes a handful of quick frames, not forty narrated seconds.

Render (render.rs): a beat renders its photos as a concat of per-photo
fills (blurred-bg portrait, fps-before-fade) under one muxed narration;
burst photos get a snappier fade. beat_durations splits the narration
across the photos, stretching only if a long burst would flash too fast.

Adds high-level info logs across the steps (request → script → per-beat
narrate/render → join → done with elapsed) for visibility. Bumps
RENDER_VERSION to re-render cached reels.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-12 23:43:18 -04:00
Cameron Cordes 740fc4d841 Reels: fix steppy fade (fps before fade) and ease the expression bump
The fade looked steppy/low-frame-rate because the filtergraph normalized
fps AFTER the fade filters: the brightness ramp was sampled at the looped
still's coarse input cadence, then duplicated up to 30fps. Move fps ahead
of the fades, pin the still's input framerate (-framerate), and force CFR
output (-r) so the dip ramps across a full 30 frames and plays steadily.

Ease narration expressiveness from 0.7 to 0.6 (still tunable via
REEL_TTS_EXAGGERATION). Bump RENDER_VERSION so existing reels re-render.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-12 23:20:52 -04:00
Cameron Cordes 7715a7a905 Reels: portrait canvas with blurred fill, fade transitions, warmer TTS
Fixes the "image is tiny" problem: a 1920x1080 landscape reel letterboxes
to a ~25%-height band on a portrait phone. Switch to a portrait 1080x1920
canvas and fill it per photo with a blurred, zoomed copy of the image
behind the sharp fitted photo — so the frame is always full regardless of
the photo's orientation, with no black bars and no cropping of the subject.

Add a quick 0.35s fade in/out baked into each segment so concatenated
photos dip smoothly instead of hard-cutting (fade-out lands in the
narration's silent tail, so speech isn't clipped). Drop the unused
Ken Burns branch — motion can return deliberately later.

Warm up the narration a touch: thread Chatterbox's `exaggeration` through
synthesize_serialized and default reels to 0.7 (tunable via
REEL_TTS_EXAGGERATION). Bump RENDER_VERSION so existing landscape reels
re-render.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-12 23:10:26 -04:00
Cameron Cordes 42453d5786 Fix reel concat: force -f mp4 for the .tmp output path
The concat stage wrote to <key>.mp4.tmp (for an atomic publish-rename),
but ffmpeg infers the muxer from the output extension and can't map
.tmp to a format — "Unable to choose an output format". Force the mp4
muxer explicitly so the temp extension is irrelevant. Segment render,
NVENC, TTS, and scripting were already working end-to-end; this was the
only failure, at the final join.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-12 22:56:48 -04:00
Cameron Cordes e3f731b3b2 Add memory-reel backend: on-demand narrated photo slideshow
New POST /reels + GET /reels/{id} (+ /video) build an MP4 slideshow of a
memory span (day/week/month), narrated by the LLM in a cloned voice.

Pipeline (src/reels/): a selector resolves which photos + reel metadata,
the scripter writes one narration line per photo via a single LLM call
(reusing each photo's cached insight as context — no fresh vision calls,
so reel generation stays off the GPU's vision slot), each line is
synthesized to speech, and the renderer assembles stills + narration via
ffmpeg. Jobs run in the background (mirroring the TTS speech-job
registry) since a reel takes minutes; the finished MP4 is cached on disk
keyed by the selection so a repeat request is instant.

The segment model is media-typed (Photo today) so a video-clip segment
(phase 2) and a nightly pre-render (phase 3) slot in without reworking
the pipeline. Ken Burns motion is implemented but defaulted off pending a
visual check on the GPU box.

Supporting changes:
- memories: extract gather_memory_items() so the reel selector reuses the
  exact window/exclusion/tz/sort logic behind /memories.
- ai::tts: add synthesize_serialized() so reel narration honors the same
  single-GPU permit + write lease as user TTS requests.
- video::ffmpeg: make get_duration_seconds() pub for narration timing.
- AppState: reels_path (REELS_DIRECTORY, defaults beside preview clips).

Pure logic (cache key, script parsing, ffmpeg arg/filter construction,
even sampling, segment timing) is unit-tested (26 tests). The runtime
path (ffmpeg render, TTS, LLM) needs a real run on the GPU host to verify
end-to-end — not exercisable in CI.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-12 22:31:08 -04:00
cameron 98274c3301 Merge pull request 'Feature/tts voice management' (#105) from feature/tts-voice-management into master
Reviewed-on: #105
2026-06-13 02:01:37 +00:00
Cameron Cordes 1017fe73af Include start offset in voice-name window tag
Clones that don't start at 0:00 are tagged with where the reference
window begins (grandma-at1m32s-30s), so voices cloned from different
sections of the same source are distinguishable in the voice list.
Zero-start names keep the existing -30s form.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-12 16:21:41 -04:00
Cameron Cordes 1dec34540d Add start/duration window selection for voice-clone reference clips
Both voice creation endpoints (upload + from-library) now accept optional
start_seconds/duration_seconds, threaded to ffmpeg as -ss/-t, so the
reference window can target clean speech anywhere in a long recording
instead of always the first N seconds. Duration is clamped to the
LLAMA_SWAP_TTS_REF_SECONDS cap and the voice-name tag reflects the
actual window length.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-12 16:09:03 -04:00
Cameron Cordes 2e0f78aa1b Add user-configurable TTS pronunciation overrides
A JSON map (TTS_PRONUNCIATIONS_PATH, default tts_pronunciations.json)
rewrites mispronounced words — place names, initialisms, dotted
abbreviations — to phonetic spellings before synthesis, applied after
markdown cleanup in both /tts/speech paths. Whole-word smartcase
matching (lowercase keys match any casing, uppercase keys exact),
longest key wins, hot-reloaded on mtime change with last-good fallback
on parse errors. See tts_pronunciations.example.json.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 23:06:18 -04:00
Cameron Cordes 3fa4fa8501 Strip markdown decoration from model-emitted insight titles
Models wrap the title line despite the prompt — "**Title: A Day in the
Woods**", "## Title: ...", bold around just the label — which made
parse_title_body's bare "Title:" prefix match fall through to the
fallbacks and leak asterisks into the stored title.

strip_title_markdown trims bold/italic markers, heading hashes,
backticks, and quotes from both ends; applied to the label line, the
extracted title, both fallback paths, and generate_photo_title (which
previously stripped only quotes).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 22:18:43 -04:00
Cameron Cordes efd05db523 Make the embedding model swappable via env for A/B testing
Trialing Qwen3-Embedding-0.6B (1024-dim, instruct-prefixed queries)
against nomic required code changes at every hardcoded seam; now it's a
config flip plus a reembed_embeddings run.

- EMBEDDING_DIM env (default 768) replaces every hardcoded dim check:
  daily summary / calendar / search / location DAOs, Ollama batch
  validation, reembed_embeddings
- entities gains the dim guard it never had — a wrong-dim vector
  silently kills dedup/recall (cosine over mismatched lengths is 0),
  so store None and warn instead
- embed_query / embed_document split with EMBED_QUERY_PREFIX /
  EMBED_DOCUMENT_PREFIX (literal \n expanded): retrieval models treat
  the two sides differently — nomic wants search_query:/search_document:,
  Qwen3 wants Instruct:...\nQuery: on queries only. All query-side
  call sites and all corpus writers now declare their side.
- document the contract in CLAUDE.md: change the model or any of these
  vars → re-run reembed_embeddings or search is garbage

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 21:40:40 -04:00
Cameron Cordes b1493f5aca Wait out TTS GPU hold before the insight job timeout starts
The GPU lease keeps per-request reqwest budgets from burning behind a
cross-model swap, but the job-level INSIGHT_GENERATION_TIMEOUT_SECS
wall-clock started at spawn — an insight queued behind a running TTS
synthesis parked its first chat call on the lease and timed out
("timeout after 180s") before chatterbox even finished loading.

Acquire-and-drop an LLM read lease before starting the job clock in
both insight handlers: the wait for the GPU happens before the
timeout begins, mirroring the per-request lease semantics. Dropped
immediately — holding it across the generation would deadlock the
chat calls' own lease acquisitions.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 19:15:38 -04:00
Cameron Cordes a022a3d15d Fix RAG vector-space mismatch and search_rag retrieval quality
Queries embedded via llama-swap were searching corpora embedded via
Ollama (measured: spaces diverged). Introduce LocalLlm — the local
Ollama + llama-swap pair with LLM_BACKEND dispatch baked in — and route
all embedding writers through it; anything embedding via a concrete
client reintroduces the bug.

- search_rag: embed the model's query verbatim (no metadata boilerplate),
  make date optional — no time-decay when omitted, so "when did X
  happen?" queries rank purely by similarity across all time
- reembed_embeddings bin: re-embed summaries / calendar / search /
  knowledge entities via the active backend, with old-new cosine report
  per table and truncate-and-retry for inputs over the embed server's
  physical batch size
- import_calendar, import_search_history: embed through LocalLlm
- search_messages / get_sms_messages: render sender → recipient so sent
  messages are attributable to a conversation
- insight job failures: store the one-line anyhow context chain ({:#})
  instead of the Debug dump the client was shown verbatim
- serialize env_dispatch tests behind a lock (parallel-runner flake)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 19:06:52 -04:00
Cameron Cordes 0accc4ef2f Add GPU lease coordinating LLM and TTS requests through llama-swap
llama-swap runs chat/vision/Chatterbox as a mutually-exclusive set on
one GPU and HOLDS a request for a non-resident model until the resident
model drains, then swaps. That hold burned the holder's reqwest timeout
(measured: a queued TTS lost 77s behind one LLM turn; an LLM request
behind a synthesis waited the entire remaining synth), so concurrent
insight + read-aloud timed out instead of queueing.

ai::gpu adds a fair RwLock lease acquired before each request is sent,
so cross-model waits happen before the HTTP timeout starts: chat/vision
share the read lease, TTS synthesis and voice-library ops (which spin
Chatterbox up) take the write lease, and embeddings take none (the
embed slot is in llama-swap's always-resident group). Speech jobs now
flip queued->running only after acquiring the GPU, letting the client
anchor its poll deadline to that transition.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 18:20:06 -04:00
Cameron Cordes 03699f7413 Add TTS voice deletion, async speech jobs, voice-list cache, ref-seconds name tags
- DELETE /tts/voices/{name}: remove a cloned voice via the llama-swap
  passthrough (upstream chatterbox-tts-api exposes DELETE /voices/{name}).
- POST/GET/DELETE /tts/speech/jobs: durable job flow for long syntheses —
  dispatch returns 202 + job id, the synth queues on the GPU permit instead
  of fast-failing 429, and clients poll for the result (kept ~10 min).
- GET /tts/voices now serves an in-memory cache so listing voices doesn't
  make llama-swap spin up the TTS model (evicting the resident LLM);
  invalidated on create/delete, ?refresh=1 forces an upstream re-query.
- Created voice names are tagged with LLAMA_SWAP_TTS_REF_SECONDS (e.g.
  grandma-30s) so the library shows which ref length produced each clone.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 17:36:15 -04:00
cameron c78e751743 Merge pull request 'Feature/insight history' (#104) from feature/insight-history into master
Reviewed-on: #104
2026-06-10 19:01:14 +00:00
Cameron Cordes 31904fef80 Raise chat truncation default num_ctx to 32k, env-overridable
The history-truncation budget assumed an 8192-token context whenever a
chat request omitted num_ctx, while the llama-swap chat slots serve
20k-131k. Replayed transcripts past ~6k tokens were silently gutted
every turn — losing conversation history and destroying llama.cpp
KV-cache prefix reuse (full SWA re-prefill per turn).

Default is now 32768 (real conversations top out around 16k), with
AGENTIC_CHAT_DEFAULT_NUM_CTX to override per deploy, floored at
headroom + 1024. Explicit per-request num_ctx still wins.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-09 19:14:02 -04:00
Cameron Cordes 13f3635db2 Fix clippy lints in backfill and libraries tests
Keep `cargo clippy --tests` clean alongside the agentic-loop changes:
alias backfill's five-element setup() tuple as SetupFixture
(type_complexity) and build the single-library health map via
std::slice::from_ref instead of cloning (unnecessary clone-to-slice).
No behavior change.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-09 18:29:44 -04:00
Cameron Cordes b711252c23 Resolve persona prompts server-side; drop synthetic prompt in chat_turn
A request carrying persona_id but no system_prompt used to fall back to
the neutral default voice. Both agentic generation
(generate_agentic_insight_handler) and chat bootstrap now resolve the
persona's stored prompt from the persona store, with precedence:
explicit non-blank client system_prompt > persona store lookup >
existing default ("default" persona id behaves the same — used if the
store has a row, neutral default otherwise). Resolution happens at the
handler / bootstrap entry where the DAO is reachable; internals are
unchanged. resolve_bootstrap_system_prompt takes the resolved persona
prompt as a second argument, with precedence tests.

Also in insight_chat:

- Sync chat_turn no longer persists the synthetic "Please write your
  final answer now without calling any more tools." user message pushed
  on iteration exhaustion — extracted both streaming variants'
  synthetic_idx pattern into push/remove_synthetic_final_prompt (the
  remove is a defensive no-op on index drift) and applied it to all
  three loops; round-trip test included.
- Strip leaked <think> blocks from the final content persisted as the
  reply in chat_turn and both streaming AgenticLoopOutcomes (mid-stream
  TextDeltas are untouched; the raw transcript keeps the block).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-09 18:29:35 -04:00
Cameron Cordes 091982bdfc Add recall_facts_for_entity tool; fix generation gates and tool output
Agentic-loop fixes in the generator:

- New recall_facts_for_entity tool (always-on, like recall_entities):
  fetches facts for one entity by id so the model can follow up on
  entities surfaced by recall_entities that aren't photo-linked
  (recall_facts_for_photo only covers linked entities). Mirrors that
  tool's persona scoping (PersonaFilter::Single) and the persona's
  reviewed_only_facts filter exactly, and renders in the same
  "Entity: ... / - predicate object" style. Wired through execute_tool
  and the trajectory summarizer.
- Generation now resolves gates persona-aware:
  current_gate_opts_for_persona(images_inline, Some((user_id,
  persona_id))) instead of the None-defaulting wrapper, so a persona's
  allow_agent_corrections opens propose_correction during generation the
  same way chat turns already did. The now-unused current_gate_opts
  wrapper is removed.
- Strip leaked <think> blocks from the final assistant content before
  parse_title_body / store_insight (raw training transcript keeps them).
- Honest truncation labels: get_sms_messages and get_location_history
  said "Found N ..." while listing only the first K; found_header now
  emits "Found N ... (showing first K):" when truncated, and the
  summarizer still parses the count.
- Clamp days_radius in get_calendar_events and get_location_history to
  1..=30, matching get_sms_messages.
- persona_system_prompt helper (persona store lookup, blank-prompt ->
  None) for server-side persona resolution; callers land in the next
  commit.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-09 18:29:20 -04:00
Cameron Cordes 592dfcb42c Accumulate streamed tool calls across chunks in Ollama streaming
Ollama >=0.8 can stream tool_calls incrementally across NDJSON chunks;
chat_with_tools_stream did `tool_calls = Some(tcs)` per chunk, so only
the last chunk's calls survived assembly and earlier calls were silently
dropped. Append into the accumulator instead.

- ollama: append_streamed_tool_calls helper + tests covering two calls
  arriving in separate chunks and the single-chunk batch case.
- llamacpp: the SSE delta assembly was already correct (per-index
  BTreeMap, same-index argument fragments concatenate, distinct indexes
  accumulate); extracted it into apply_tool_call_deltas /
  finalize_tool_calls and added tests pinning that behavior.
- llm_client: new shared strip_think_blocks (moved from ollama's private
  extract_final_answer, which now delegates) so the tool-calling final
  content paths can reuse it; unit tests for tagged/plain/unclosed/empty
  cases.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-09 18:29:06 -04:00
Cameron Cordes 8e4f91561b Add per-file insight history endpoint and rate-by-id
Expose GET /insights/history?path=... returning every generated version
of a photo's insight (current plus superseded), newest-first, backing the
mobile per-file insight history view.

- New get_insight_history_handler; reuses the existing get_insight_history
  DAO method (removed its dead_code allow).
- impl From<PhotoInsight> for PhotoInsightResponse, collapsing the mapping
  that was duplicated across the single-get and all-insights handlers.
- rate_insight_by_id DAO method + optional insight_id on RateInsightRequest
  so previously generated versions can be approved/rejected (the path-based
  rate only touches the current row).
- DAO tests for history ordering/scoping and id-targeted rating.
- cargo fmt normalized a multi-line assert in insight_chat.rs tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 18:28:22 -04:00
cameron 750a8de6b1 Merge pull request 'Feature/tts integration' (#103) from feature/tts-integration into master
Reviewed-on: #103
2026-06-07 21:35:49 +00:00
Cameron Cordes 412da2ce8e Collapse blank lines to a single break in TTS text cleaning
Chatterbox inserts a long pause — sometimes ~20s of silence — for each
blank line it sees, and insight text is markdown full of paragraph
breaks. clean_for_tts previously preserved paragraph structure
(\n{3,} -> \n\n), so every paragraph boundary still reached the model
as a double newline. Now any run of 2+ newlines, including
whitespace-only blank lines, collapses to a single newline so the
worst pause a break can cause is a normal line-break pause.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-04 09:12:43 -04:00
Cameron Cordes dec6f21af9 Bump version to 1.3.0
TTS feature release: /tts/speech + voice library endpoints (Chatterbox via
llama-swap), input cleaning, tuning knobs, WAV-normalized voice cloning,
OTel spans, dedicated synth timeout, and single-flight serialization.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-03 14:07:10 -04:00
Cameron Cordes cab867da60 Serialize /tts/speech with a single permit; 429 when busy
The Chatterbox wrapper has no internal lock or cancellation, so concurrent
synth requests contend on the single GPU and abandoned (timed-out) jobs
cascade into stacked slowness. Gate synthesis behind a one-permit semaphore
and fast-fail concurrent requests with 429 instead of queueing.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-03 14:02:56 -04:00
Cameron Cordes d8dd260c6b Give TTS synthesis its own (longer) request timeout
Long insights are chunked + synthesized server-side and can run past the shared
180s chat/embedding client timeout, causing spurious timeouts. /tts/speech now
uses a per-request timeout from LLAMA_SWAP_TTS_REQUEST_TIMEOUT_SECONDS
(default 600), overriding the client default without affecting chat/embeddings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-03 10:25:06 -04:00
Cameron Cordes 9978b28b52 Document TTS endpoints + env in CLAUDE.md
Sync CLAUDE.md with the Chatterbox TTS feature: the /tts/* endpoints and the
LLAMA_SWAP_TTS_MODEL / _VOICE / _REF_SECONDS env vars (only need LLAMA_SWAP_URL).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-02 23:15:39 -04:00
Cameron Cordes ccacfe1113 Instrument TTS handlers with OTel spans (codebase standard)
Each /tts handler now opens an http.tts.* span via extract_context_from_request
+ global_tracer().start_with_context, sets Status::Ok / Status::error on every
outcome, and records useful attributes (model, format, voice_name, byte counts)
— matching the insight handlers. Prometheus request metrics were already
covered by the app-wide actix-web-prom middleware.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-02 23:10:43 -04:00
Cameron Cordes 62d517dcda Normalize voice-clone reference audio to WAV via ffmpeg
Chatterbox validates the reference clip by file extension and rejects formats
like .aac/.opus. Always transcode the reference (upload bytes and library
files alike) to mono 24 kHz WAV with ffmpeg before forwarding, so any source
format is accepted and the from-library audio/video paths are unified.

The reference length cap is now configurable via LLAMA_SWAP_TTS_REF_SECONDS
(default 30) — Chatterbox is zero-shot, so a clean ~10-20s clip is the sweet
spot. Drops the now-unused mime guesser.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-02 22:50:08 -04:00
Cameron Cordes 35c5ecb427 Document TTS endpoints and env in README + .env.example
Adds the /tts/speech and /tts/voices* endpoints plus LLAMA_SWAP_TTS_MODEL /
LLAMA_SWAP_TTS_VOICE (TTS only needs LLAMA_SWAP_URL, not LLM_BACKEND=llamacpp).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-02 22:34:34 -04:00
Cameron Cordes 51be5df214 Clean insight text for TTS and pass through Chatterbox tuning knobs
/tts/speech now normalizes input before synthesis: unwraps markdown
links/images to visible text, drops heading/list/blockquote/emphasis
markers and URLs, strips emoji (which non-turbo Chatterbox mispronounces
or skips), and collapses whitespace. Centralized in clean_for_tts so the
app, WebUI, and curl all get clean audio. Bracketed tags are deliberately
preserved for a future Turbo (paralinguistic) switch.

Adds optional exaggeration / cfg_weight / temperature to the request,
clamped to Chatterbox's documented ranges and forwarded on the speech
body. Unit tests cover markdown/emoji/URL stripping and tag preservation.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-02 22:15:05 -04:00
Cameron Cordes 69268d03fe Add TTS endpoints backed by Chatterbox via llama-swap
LlamaCppClient gains text_to_speech (OpenAI /audio/speech), list_voices and
create_voice (voice library at the swap-root /upstream/<model>/voices
passthrough), plus a tts_model slot configured via LLAMA_SWAP_TTS_MODEL
(default "chatterbox").

New Claims-gated routes:
- POST /tts/speech        -> { audio_base64, format } for data: URI playback
- GET  /tts/voices        -> voice library passthrough
- POST /tts/voices/upload -> clone a voice from an uploaded clip (multipart)
- POST /tts/voices/from-library -> clone from a library file (ffmpeg-extracts
  audio from video; audio forwarded as-is)

Security: voice_name sanitized to [A-Za-z0-9_-] (it becomes an upstream
filename), 25 MB upload cap, library refs restricted to real audio/video,
path confined via is_valid_full_path. Adds is_audio_file + unit tests for the
sanitizer, mime guesser, and swap-root derivation.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-02 22:04:42 -04:00
cameron 015dc976e3 Merge pull request 'feature/insight-jobs' (#102) from feature/insight-jobs into master
Reviewed-on: #102
2026-06-02 23:41:36 +00:00
Cameron Cordes b9b6e51af1 Stop ffprobe walking every frame in video stream probe
probe_video_stream_meta requested a bare `side_data_list` section in
-show_entries. On modern ffprobe that's the *frame* side-data section,
so ffprobe enumerated every frame to collect it — reading the entire
mdat. For non-faststart phone clips on the SMB mount this turned a
metadata probe into a full-file read: /video/generate took 10-32s per
open (0% CPU, time proportional to file size).

Switch to `stream_side_data_list`, which reads the Display Matrix
rotation from the stream header (moov) without touching frames. Codec,
frame rate, and rotation are unchanged; the existing rotation parser
already reads streams[0].side_data_list[].rotation. Fixes both the
open-path probe and the transcode actor's probe. Cold opens now return
near-instantly.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-30 13:19:47 -04:00
Cameron Cordes 16ae82ba70 Normalize video rel_path lookup to forward slashes on Windows
generate_video built the rel_path for its image_exif lookup by stripping
the library root from the absolute path, leaving backslashes on Windows
(Melissa\clip.mp4). file_scan stores rel_paths forward-slash and
get_exif_batch matches exactly with no normalization, so the lookup
missed and the handler re-hashed the entire video file on every request.

Extract rel_path_for_lookup and normalize separators with replace('\\',
'/'). Adds tests for Windows/Unix separators, file-at-root, leading
separator stripping, and the no-match fallback.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-30 12:51:44 -04:00
Cameron Cordes a542ea411b Exclude inlined image bytes from chat context budget
The truncation budget estimated message size by serializing the full
ChatMessage array, including the base64 image persisted in the first
user message. A 1024px JPEG is hundreds of KB of base64 characters —
8-19x the entire ~24KB text budget at the default num_ctx — and the
image lives in the protected prefix that's never dropped. The budget
check was therefore essentially always over, dropping all tool history
and firing the "trimmed context" banner on every turn for vision
backends that inline images.

estimate_bytes now strips image payloads before counting and charges a
flat IMAGE_TOKENS_EACH per image instead, so the budget reflects real
text token pressure. Adds a regression test covering a short
conversation with one large image.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-30 11:51:57 -04:00
Cameron Cordes 962f7bf05c Add reconnectable async chat-turn flow with in-memory TurnRegistry
Replace the one-shot SSE chat stream with an async dispatch + reconnectable
replay flow so the mobile client survives backgrounding, network blips, and
OS-killed sockets without losing an in-flight agentic turn.

- TurnRegistry/TurnEntry: in-memory per-turn event buffer (cap 500, front
  eviction) shared by the agentic loop (writer) and SSE replay readers.
  ReplayOutcome + replay_from/next_batch distinguish Events/CaughtUp/Gone;
  next_batch registers the Notify before reading state (no lost wakeup) and
  drains every buffered event before signaling terminal, so the final
  Done/Error is never dropped and the stream closes cleanly.
- Endpoints: POST /insights/chat/turn (202 + turn_id), GET
  /insights/chat/turn/{id} (SSE replay, ?skip_before= resume, per-event seq,
  410 on eviction), DELETE /insights/chat/turn/{id} (real task abort +
  cooperative is_running() check at each loop boundary).
- Cancellation actually stops the task (AbortHandle stored on the entry) and
  emits a Done{cancelled:true}; callers skip persistence on cancel.
- Background sweeper drops stale turns; interval clamped to <=300s.
- OpenTelemetry spans: ai.chat.turn.execute/replay/cancel.
- Legacy POST /insights/chat/stream path preserved unchanged.

Tests: registry coverage for terminal delivery (race guard), waiting, Gone,
abort, eviction; handler integration tests for 404/410, skip_before, seq
stamping, completed replay, and cancel.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 19:50:25 -04:00
Cameron Cordes 0c1c1c6792 fix: split token count columns into separate migration
A previous commit added prompt_eval_count and eval_count to the
existing 2026-05-27-000002_add_insight_generation_params migration,
but Diesel won't re-run an already-applied migration. Environments
that applied the original version of 000002 never got these two
columns, causing "no such column: photo_insights.prompt_eval_count"
on every insight read.

- Revert 000002 up.sql to its original 7-column form
- Add 000003_add_insight_token_counts for the two missing columns

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 22:34:44 -04:00
Cameron Cordes cdd981fe64 fix: inline DB error source into DbError struct
The previous fix logged the underlying error in a separate log line,
but the error that propagated up still showed just "DbError { kind:
InsertError }" at the call site. Now the source message is captured
on the struct itself, so Debug/Display output at any call site shows
the actual Diesel error inline.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 22:30:19 -04:00
Cameron Cordes dad0220587 fix: stop swallowing DB errors across the entire DAO layer
Every map_err(|_| DbError::new(...)) and map_err(|_| anyhow!("..."))
in the database layer was discarding the actual Diesel/SQLite error,
making failures impossible to diagnose from logs.

- Add DbError::log() that logs the source error before converting
- Replace all ~130 swallowed outer map_err closures with DbError::log
- Replace all ~47 swallowed inner anyhow closures to include the
  source error in the message

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 13:56:48 -04:00
Cameron Cordes 39ad83f55b fix: surface actual Diesel error in store_insight instead of generic InsertError
The previous map_err closures discarded the Diesel error, making
failures like missing columns impossible to diagnose from logs.
Now the underlying error is logged before converting to DbError.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 13:53:54 -04:00
Cameron Cordes 9654d256f4 fix: persist token counts and fix agentic insight_id mapping
- Add prompt_eval_count and eval_count columns to photo_insights so
  token usage from llama-swap/Ollama is stored and returned by the API
- Fix agentic generator return: was (prompt_eval_count, eval_count),
  handler destructured first element as insight_id — now returns
  (insight_id, prompt_eval_count, eval_count)
- Wire prompt_eval_count/eval_count from DB into PhotoInsightResponse
  instead of hardcoded None

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 13:47:57 -04:00
Cameron Cordes 449ce1fda1 chore: resolve all clippy warnings and formatting
- Replace impl ToString with impl Display for InsightJobStatus and
  InsightGenerationType
- Rename from_str → parse to avoid confusion with std::str::FromStr
- Collapse nested if statements (handlers, insight_chat, insight_generator,
  image handlers)
- Use is_multiple_of() instead of manual modulo checks
- Suppress deprecated diesel::dsl::count_distinct (no drop-in replacement
  available in current Diesel version)
- Scope MutexGuard in synthesize_merge to drop before await
- Allow dead_code on generate_no_think, enumerate_indexable_files,
  total_deleted (intended for future use)
- Allow type_complexity on Diesel query result tuples

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 13:13:48 -04:00
Cameron Cordes a410683edf fix: fail fast when LLM_BACKEND=llamacpp but LlamaCppClient is unconfigured
Previously embed_one() silently fell back to Ollama embeddings,
which would load nomic-embed-text into VRAM alongside llama-swap —
wasting memory on an unintended model. Now returns an error with
an actionable message instead.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 13:02:42 -04:00