A request carrying persona_id but no system_prompt used to fall back to
the neutral default voice. Both agentic generation
(generate_agentic_insight_handler) and chat bootstrap now resolve the
persona's stored prompt from the persona store, with precedence:
explicit non-blank client system_prompt > persona store lookup >
existing default ("default" persona id behaves the same — used if the
store has a row, neutral default otherwise). Resolution happens at the
handler / bootstrap entry where the DAO is reachable; internals are
unchanged. resolve_bootstrap_system_prompt takes the resolved persona
prompt as a second argument, with precedence tests.
Also in insight_chat:
- Sync chat_turn no longer persists the synthetic "Please write your
final answer now without calling any more tools." user message pushed
on iteration exhaustion — extracted both streaming variants'
synthetic_idx pattern into push/remove_synthetic_final_prompt (the
remove is a defensive no-op on index drift) and applied it to all
three loops; round-trip test included.
- Strip leaked <think> blocks from the final content persisted as the
reply in chat_turn and both streaming AgenticLoopOutcomes (mid-stream
TextDeltas are untouched; the raw transcript keeps the block).
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Agentic-loop fixes in the generator:
- New recall_facts_for_entity tool (always-on, like recall_entities):
fetches facts for one entity by id so the model can follow up on
entities surfaced by recall_entities that aren't photo-linked
(recall_facts_for_photo only covers linked entities). Mirrors that
tool's persona scoping (PersonaFilter::Single) and the persona's
reviewed_only_facts filter exactly, and renders in the same
"Entity: ... / - predicate object" style. Wired through execute_tool
and the trajectory summarizer.
- Generation now resolves gates persona-aware:
current_gate_opts_for_persona(images_inline, Some((user_id,
persona_id))) instead of the None-defaulting wrapper, so a persona's
allow_agent_corrections opens propose_correction during generation the
same way chat turns already did. The now-unused current_gate_opts
wrapper is removed.
- Strip leaked <think> blocks from the final assistant content before
parse_title_body / store_insight (raw training transcript keeps them).
- Honest truncation labels: get_sms_messages and get_location_history
said "Found N ..." while listing only the first K; found_header now
emits "Found N ... (showing first K):" when truncated, and the
summarizer still parses the count.
- Clamp days_radius in get_calendar_events and get_location_history to
1..=30, matching get_sms_messages.
- persona_system_prompt helper (persona store lookup, blank-prompt ->
None) for server-side persona resolution; callers land in the next
commit.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Ollama >=0.8 can stream tool_calls incrementally across NDJSON chunks;
chat_with_tools_stream did `tool_calls = Some(tcs)` per chunk, so only
the last chunk's calls survived assembly and earlier calls were silently
dropped. Append into the accumulator instead.
- ollama: append_streamed_tool_calls helper + tests covering two calls
arriving in separate chunks and the single-chunk batch case.
- llamacpp: the SSE delta assembly was already correct (per-index
BTreeMap, same-index argument fragments concatenate, distinct indexes
accumulate); extracted it into apply_tool_call_deltas /
finalize_tool_calls and added tests pinning that behavior.
- llm_client: new shared strip_think_blocks (moved from ollama's private
extract_final_answer, which now delegates) so the tool-calling final
content paths can reuse it; unit tests for tagged/plain/unclosed/empty
cases.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Expose GET /insights/history?path=... returning every generated version
of a photo's insight (current plus superseded), newest-first, backing the
mobile per-file insight history view.
- New get_insight_history_handler; reuses the existing get_insight_history
DAO method (removed its dead_code allow).
- impl From<PhotoInsight> for PhotoInsightResponse, collapsing the mapping
that was duplicated across the single-get and all-insights handlers.
- rate_insight_by_id DAO method + optional insight_id on RateInsightRequest
so previously generated versions can be approved/rejected (the path-based
rate only touches the current row).
- DAO tests for history ordering/scoping and id-targeted rating.
- cargo fmt normalized a multi-line assert in insight_chat.rs tests.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Chatterbox inserts a long pause — sometimes ~20s of silence — for each
blank line it sees, and insight text is markdown full of paragraph
breaks. clean_for_tts previously preserved paragraph structure
(\n{3,} -> \n\n), so every paragraph boundary still reached the model
as a double newline. Now any run of 2+ newlines, including
whitespace-only blank lines, collapses to a single newline so the
worst pause a break can cause is a normal line-break pause.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The Chatterbox wrapper has no internal lock or cancellation, so concurrent
synth requests contend on the single GPU and abandoned (timed-out) jobs
cascade into stacked slowness. Gate synthesis behind a one-permit semaphore
and fast-fail concurrent requests with 429 instead of queueing.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Long insights are chunked + synthesized server-side and can run past the shared
180s chat/embedding client timeout, causing spurious timeouts. /tts/speech now
uses a per-request timeout from LLAMA_SWAP_TTS_REQUEST_TIMEOUT_SECONDS
(default 600), overriding the client default without affecting chat/embeddings.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Each /tts handler now opens an http.tts.* span via extract_context_from_request
+ global_tracer().start_with_context, sets Status::Ok / Status::error on every
outcome, and records useful attributes (model, format, voice_name, byte counts)
— matching the insight handlers. Prometheus request metrics were already
covered by the app-wide actix-web-prom middleware.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Chatterbox validates the reference clip by file extension and rejects formats
like .aac/.opus. Always transcode the reference (upload bytes and library
files alike) to mono 24 kHz WAV with ffmpeg before forwarding, so any source
format is accepted and the from-library audio/video paths are unified.
The reference length cap is now configurable via LLAMA_SWAP_TTS_REF_SECONDS
(default 30) — Chatterbox is zero-shot, so a clean ~10-20s clip is the sweet
spot. Drops the now-unused mime guesser.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
/tts/speech now normalizes input before synthesis: unwraps markdown
links/images to visible text, drops heading/list/blockquote/emphasis
markers and URLs, strips emoji (which non-turbo Chatterbox mispronounces
or skips), and collapses whitespace. Centralized in clean_for_tts so the
app, WebUI, and curl all get clean audio. Bracketed tags are deliberately
preserved for a future Turbo (paralinguistic) switch.
Adds optional exaggeration / cfg_weight / temperature to the request,
clamped to Chatterbox's documented ranges and forwarded on the speech
body. Unit tests cover markdown/emoji/URL stripping and tag preservation.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
LlamaCppClient gains text_to_speech (OpenAI /audio/speech), list_voices and
create_voice (voice library at the swap-root /upstream/<model>/voices
passthrough), plus a tts_model slot configured via LLAMA_SWAP_TTS_MODEL
(default "chatterbox").
New Claims-gated routes:
- POST /tts/speech -> { audio_base64, format } for data: URI playback
- GET /tts/voices -> voice library passthrough
- POST /tts/voices/upload -> clone a voice from an uploaded clip (multipart)
- POST /tts/voices/from-library -> clone from a library file (ffmpeg-extracts
audio from video; audio forwarded as-is)
Security: voice_name sanitized to [A-Za-z0-9_-] (it becomes an upstream
filename), 25 MB upload cap, library refs restricted to real audio/video,
path confined via is_valid_full_path. Adds is_audio_file + unit tests for the
sanitizer, mime guesser, and swap-root derivation.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The truncation budget estimated message size by serializing the full
ChatMessage array, including the base64 image persisted in the first
user message. A 1024px JPEG is hundreds of KB of base64 characters —
8-19x the entire ~24KB text budget at the default num_ctx — and the
image lives in the protected prefix that's never dropped. The budget
check was therefore essentially always over, dropping all tool history
and firing the "trimmed context" banner on every turn for vision
backends that inline images.
estimate_bytes now strips image payloads before counting and charges a
flat IMAGE_TOKENS_EACH per image instead, so the budget reflects real
text token pressure. Adds a regression test covering a short
conversation with one large image.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Replace the one-shot SSE chat stream with an async dispatch + reconnectable
replay flow so the mobile client survives backgrounding, network blips, and
OS-killed sockets without losing an in-flight agentic turn.
- TurnRegistry/TurnEntry: in-memory per-turn event buffer (cap 500, front
eviction) shared by the agentic loop (writer) and SSE replay readers.
ReplayOutcome + replay_from/next_batch distinguish Events/CaughtUp/Gone;
next_batch registers the Notify before reading state (no lost wakeup) and
drains every buffered event before signaling terminal, so the final
Done/Error is never dropped and the stream closes cleanly.
- Endpoints: POST /insights/chat/turn (202 + turn_id), GET
/insights/chat/turn/{id} (SSE replay, ?skip_before= resume, per-event seq,
410 on eviction), DELETE /insights/chat/turn/{id} (real task abort +
cooperative is_running() check at each loop boundary).
- Cancellation actually stops the task (AbortHandle stored on the entry) and
emits a Done{cancelled:true}; callers skip persistence on cancel.
- Background sweeper drops stale turns; interval clamped to <=300s.
- OpenTelemetry spans: ai.chat.turn.execute/replay/cancel.
- Legacy POST /insights/chat/stream path preserved unchanged.
Tests: registry coverage for terminal delivery (race guard), waiting, Gone,
abort, eviction; handler integration tests for 404/410, skip_before, seq
stamping, completed replay, and cancel.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Add prompt_eval_count and eval_count columns to photo_insights so
token usage from llama-swap/Ollama is stored and returned by the API
- Fix agentic generator return: was (prompt_eval_count, eval_count),
handler destructured first element as insight_id — now returns
(insight_id, prompt_eval_count, eval_count)
- Wire prompt_eval_count/eval_count from DB into PhotoInsightResponse
instead of hardcoded None
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Replace impl ToString with impl Display for InsightJobStatus and
InsightGenerationType
- Rename from_str → parse to avoid confusion with std::str::FromStr
- Collapse nested if statements (handlers, insight_chat, insight_generator,
image handlers)
- Use is_multiple_of() instead of manual modulo checks
- Suppress deprecated diesel::dsl::count_distinct (no drop-in replacement
available in current Diesel version)
- Scope MutexGuard in synthesize_merge to drop before await
- Allow dead_code on generate_no_think, enumerate_indexable_files,
total_deleted (intended for future use)
- Allow type_complexity on Diesel query result tuples
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previously embed_one() silently fell back to Ollama embeddings,
which would load nomic-embed-text into VRAM alongside llama-swap —
wasting memory on an unintended model. Now returns an error with
an actionable message instead.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Fix query param mismatch: rename GenerationStatusQuery.file_path to
path so the client's app-resume buildQuery({ path: ... }) resolves
correctly instead of always getting 400
- Remove dead _lib_id bindings from both generate handlers
- Return 202 Accepted instead of 200 from generate endpoints
- Restore OpenTelemetry span instrumentation on generate handlers
- Remove stale UNIQUE constraint from initial migration (incompatible
with plain-INSERT DAO)
- Add tests for status guard: complete_job/fail_job are no-ops when
job is already cancelled, and cancel_job by id
- Persist generation params (num_ctx, temperature, top_p, top_k, min_p,
system_prompt, persona_id) on the photo_insights table for auditing
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Add insight_generation_jobs table migration and DAO
- Implement job lifecycle: create_or_get_active, complete, fail, cancel
- Refactor POST /insights/generate and /agentic to async spawn with timeout
- Add GET /insights/generation/status endpoint with job_id and file_path lookup
- Use String for enum fields in Diesel models to avoid private Bound type
- Add from_str() helpers on InsightJobStatus and InsightGenerationType
- Fix update_training_messages to return Result<usize, DbError>
- 7/7 DAO unit tests passing
When backend=hybrid with LLM_BACKEND=llamacpp, the user-selected model
(an OpenRouter id like "google/gemini-3-flash-preview") was being applied
to the local LlamaCppClient's primary_model and vision_model. This caused
describe_image to send the OpenRouter model name to llama-swap, which
returned 400 because it has no such slot.
Guard the local-client model override with !is_hybrid so it only applies
in local-only mode (where the user is selecting a different local model).
Bump to v1.2.0.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- sms search tool: accept contact name, trim/validate, skip when
contact_id is set, pass to API client
- sms_client: new contact field in SmsSearchParams, URL-encode on wire
- Tool description clarifies contact_id takes precedence when both given
- Add parse_title_body helper for LLM response parsing
- llamacpp backend improvements
Cover the properties that prevent mid-turn model swaps in llama-swap
exclusive mode: vision_model defaults to primary, cloned local client
mirrors the user-selected model, embeddings stay on their own slot.
Also test the content:null serialization for tool-calling messages.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When the user selects a model from the picker, the local client's
primary_model and vision_model now match the chat model. Prevents
llama-swap exclusive mode from swapping models when describe_photo
or rerank fires during an agentic turn.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace 5 copies of the ~80-line backend resolution pattern with a
single InsightGenerator::resolve_backend() builder that returns a
ResolvedBackend (chat + local clients, BackendKind enum, images_inline
flag). Tool dispatch now takes &ResolvedBackend instead of
&OllamaClient + model + backend strings.
Remove duplicated ollama/openrouter/llamacpp fields from
InsightChatService — InsightGenerator owns them and resolve_backend
uses them. Delete build_chat_clients (replaced by resolve_backend).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
llamacpp models now receive images via OpenAI content-parts instead of
the describe-then-inline strategy (hybrid mode unchanged). Fixes
assistant messages with tool_calls emitting content: null instead of ""
to satisfy strict Jinja template role-alternation checks. Adds debug
logging of message role sequences on llamacpp requests.
Introduces BackendKind enum, SamplingOverrides, and ResolvedBackend in
a new backend.rs module. InsightGenerator::resolve_backend centralises
client construction + vision capability detection — next step wires the
existing inline dispatch through it.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reverts the per-request backend="llamacpp" value. Chat/vision/embedding
backend is now a deploy-time decision (LLM_BACKEND=ollama|llamacpp),
applied globally across chat, vision describe, and embeddings — so
embedding vectors stay in one space across the index.
- Per-request backend whitelist back to "local"|"hybrid". A request
arriving with backend="llamacpp" is rejected.
- LLM_BACKEND=llamacpp swaps the entire local stack to llama-swap:
chat hits the chat slot, describe hits the vision slot, embeddings
hit the embed slot. Hybrid mode still routes chat to OpenRouter
but uses LLM_BACKEND for the describe pass.
- Drops env vars HYBRID_VISION_BACKEND, LLAMA_SWAP_VISION_MODELS,
EMBEDDING_BACKEND (the last never shipped). Drops the
LlamaCppClient.vision_models allowlist — capability inference now
reports has_vision only for the configured vision_model slot.
- Drops the /insights/llamacpp/models handler. /insights/models is
the single endpoint; returns Ollama servers under LLM_BACKEND=ollama
and llama-swap slots (from LLAMA_SWAP_ALLOWED_MODELS) under
LLM_BACKEND=llamacpp. Same envelope shape either way.
- New ai::embed_one helper routes embeddings through llama-swap when
LLM_BACKEND=llamacpp (else Ollama). Wires it into the four
insight_generator embedding sites.
- Cross-replay matrix simplifies to pre-llamacpp shape (local↔local,
hybrid↔hybrid, hybrid→local allowed; local→hybrid rejected).
Wires a new LlamaCppClient (OpenAI-compatible /v1 wire format) alongside
OllamaClient and OpenRouterClient. Per-slot routing for chat/vision/embed
via env (LLAMA_SWAP_URL + *_MODEL vars); capability inference uses an
env allowlist since /v1/models doesn't report modality.
InsightGenerator + InsightChatService gain three-way dispatch on
chat_backend = "local" | "hybrid" | "llamacpp". Hybrid and llamacpp
share the describe-then-inline path (text-only chat after a separate
vision describe). HYBRID_VISION_BACKEND=llamacpp lets hybrid route its
describe pass through llama-swap's vision slot while chat still goes
to OpenRouter.
Cross-replay matrix added (validate_cross_replay): local<->llamacpp
and hybrid<->llamacpp allowed; local->hybrid and llamacpp->hybrid
rejected. New /insights/llamacpp/models handler mirrors the OpenRouter
shape.
Pulls cargo fmt + clippy pass over the new files only — pre-existing
files left untouched even though fmt has drift on them. clamp(1,200)
swaps a manual min/max chain that clippy flagged. test AppState
constructor needed ClipClient::new(None) so the lib-test target
compiles.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Probe-phase scaffolding for CLIP semantic search. Adds the column
that will hold per-photo embeddings, the HTTP client to Apollo's
inference service, and a throwaway probe binary so we can eyeball
search-result quality on the live library before building the
persistence layer (backlog drain, /photos/search endpoint, UI).
- migrations/2026-05-14-000000_add_clip_embedding/ — adds
image_exif.clip_embedding (BLOB) and clip_model_version (TEXT),
plus a partial index on (clip_embedding IS NULL AND content_hash
IS NOT NULL) for the future backfill drain.
- src/database/models.rs — extends ImageExif struct to match.
- src/ai/clip_client.rs — encode_image / encode_text / health,
same Permanent/Transient/Disabled taxonomy as face_client.
- src/bin/probe_clip_search.rs — --query <q> --library N --limit M
--top K. Encodes a sample and prints top-K cosine similarities.
No DB writes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Switch four `&s[..N]` / `&s[..s.len().min(N)]` sites to
`chars().take(N).collect::<String>()` so truncation lands on character
boundaries instead of mid-codepoint. The agentic summary preview log
was panicking when generated content hit an em-dash at byte 200; the
few-shot passage cap, brief_json_args debug formatter, and a test
assertion message had the same latent bug.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pure mechanical cleanup of accumulated drift in files outside the
HLS-content-hash branch's main change set. No behavior change.
- `cargo fmt` on every previously-misformatted file
(`ai/insight_generator.rs`, `database/knowledge_dao.rs`,
`faces.rs`, `knowledge.rs`, `libraries.rs`).
- `cargo clippy --fix`:
- `needless_borrow`: `&library` → `library` in `handlers/image.rs`
(two sites in the photo-listing path).
- Manual clippy pass for warnings clippy emits but can't auto-apply:
- `field_reassign_with_default` in `database/reconcile.rs::run` —
consolidated into a struct-literal initializer.
- `needless_range_loop` in `database/knowledge_dao.rs::union_perceptual_tags`
— inner `for b in (a+1)..indices.len() { let ib = indices[b]; ... }`
becomes `for &ib in &indices[a + 1..] { ... }`.
- Doc-list indentation: continuation lines under nested bullets in
`database/mod.rs::get_memories_in_window` and
`database/knowledge_dao.rs::build_entity_graph` realigned to the
list-item content column.
Deliberately not touched (each deserves its own focused commit, with
testing, rather than getting bundled into a sweep):
- 4× `deprecated count_distinct` in `faces.rs` — diesel API migration
to `AggregateExpressionMethods::aggregate_distinct` may shift result
types; needs verification against the existing stats queries.
- `await_holding_lock` in `knowledge.rs:807` — `std::sync::Mutex` held
across `ollama.generate(...).await`. Genuine concurrency bug; fix
requires understanding the surrounding flow before just dropping
the guard.
- 2× `type_complexity` in `database/mod.rs` — cosmetic, would need a
`type` alias and corresponding callers updated.
- Dead `total_deleted` on `library_maintenance::GcStats` and
`file_scan::enumerate_indexable_files` — both are public surface
retained for future use; deletion is a separate decision.
All 707 tests still pass. Release build clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two coupled changes to fight the speech-act-predicate problem
(facts like (Cameron, expressed, "I'm tempted to...")):
1. System prompt grows an explicit predicate-quality rule. The
agent is told to use relationship-shaped verbs (lives_in,
works_at, attended, is_friend_of, interested_in), and is
given an explicit DON'T list (expressed, said, mentioned,
stated, quoted, noted, discussed, thought, wondered). Plus a
concrete Bad / Good example contrasting the noise pattern
with the structured paraphrase the agent should be writing.
Stops the bleed for new insights.
2. Cleanup tools for the legacy noise that's already in the
table:
- get_predicate_stats(persona, limit) returns
[(predicate, count)] sorted desc — feeds the curation UI's
PREDICATES tab.
- bulk_reject_facts_by_predicate(persona, predicate, audit)
flips every ACTIVE fact under that predicate to 'rejected'
in one transaction, stamping last_modified_* so the action
is attributable + reversible per-fact through the entity
detail panel. REVIEWED facts under the same predicate are
left alone — the curator may have hand-approved an
exception ("interested_in" might be largely noise but a
reviewed entry is intentional).
New HTTP endpoints:
GET /knowledge/predicate-stats?limit=
POST /knowledge/predicates/{predicate}/bulk-reject
Persona-scoped via the existing X-Persona-Id header.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
No logic changes - line reflow, brace placement, and method-chain splits
across handlers / personas / state / faces / knowledge / insights_dao /
knowledge_dao / populate_knowledge. Picked up incidentally while running
fmt for the sms-search work.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Refactor search_messages_with_contact -> search_messages(query, &SmsSearchParams)
exposing date_from / date_to / offset / is_mms / has_media; drop the over-fetch
+ client-side date post-filter that could silently drop in-window hits past
position 100.
- Surface SMS-API's <mark>-wrapped snippet for MMS messages that only matched
via message_parts_fts (attachment text / filename) - pre-snippet, those
rendered as a blank body preview to the LLM.
- Expose is_mms / has_media on the search_messages tool schema; expand the
FTS5 syntax docs with worked examples for phrase / prefix / boolean / NEAR
/ grouping so the model picks the right operator.
- Unit tests for format_search_hits (body fallback, snippet preferred, MMS
attachment-only regression, empty-snippet fallback) and strip_mark_tags.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bundles three coupled changes so agent-side mutations stay
auditable and reversible:
1. Audit columns on entity_facts —
`last_modified_by_model` / `last_modified_by_backend` /
`last_modified_at`. Stamped on every mutation path
(update_fact, supersede_fact, manual PATCH, manual supersede,
the new revert). NULL on rows never touched since creation.
Partial index on `last_modified_at WHERE NOT NULL` keeps the
"show me recent edits" feed fast without bloating from legacy
rows.
2. Per-persona gate `personas.allow_agent_corrections` (BOOLEAN,
default 0). Defense in depth at two layers:
- build_tool_definitions: when off, `update_fact` and
`supersede_fact` aren't in the catalog at all, so even a
hallucinated tool call by the model fails fast.
- tool_update_fact / tool_supersede_fact: re-checks the persona
flag at call time and returns an explicit "corrections
disabled" error if it's somehow off (e.g. flag flipped mid-
loop).
ToolGateOpts grows the flag; current_gate_opts splits into
`current_gate_opts` (no persona context, defaults closed) +
`current_gate_opts_for_persona` for chat callers that have a
persona id. Both call sites in insight_chat are updated.
3. Revert action — new DAO method `revert_supersession` +
`POST /knowledge/facts/{id}/restore`. Flips status back to
'active', clears `superseded_by`, clears `valid_until` (we
don't track whether it was hand-set vs auto-stamped, so the
safe reset is to drop it — user can re-bound after). Stamps
`last_modified_*` so the revert itself is attributable.
Manual paths (PATCH / supersede via HTTP, plus restore) stamp the
audit columns with `("manual", "manual")`. Agent paths stamp the
loop-time chat model and backend (mirroring the existing
created_by_* convention).
FactDetail in the HTTP response now carries the audit triple
alongside the existing provenance. Apollo wires the new field set
in the matching commit.
PersonaView / UpdatePersonaRequest grow `allowAgentCorrections`;
the PersonaPatch + InsertPersona + bulk_import paths thread it.
317 lib tests pass, including unchanged update_fact / supersede
DAO tests (now passing audit=None — None means "no provenance
context to attribute", legacy semantics).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two coupled changes to the agent's recall surface:
1. Default scope expanded. recall_facts_for_photo and recall_entities
used to filter to status='active' only — which silently dropped
'reviewed' (human-verified) facts. Now they surface active +
reviewed by default. Reviewed is strictly more trusted than
active and shouldn't have been hidden. Rejected and superseded
stay filtered.
2. New persona toggle `reviewed_only_facts` (BOOLEAN, default false,
migration 2026-05-10-000400). When set, the agent's recall on
that persona returns ONLY facts with status='reviewed' — strict
mode for tasks where hallucinated agent claims are particularly
costly. Wired:
- schema.rs / Persona / InsertPersona / PersonaPatch grow the
field.
- PersonaView returns it as `reviewedOnlyFacts` (camelCase wire).
- PUT /personas/{id} accepts it (mobile editor surfaces it).
- InsightGenerator now carries a PersonaDao reference so
recall_facts_for_photo can read the active persona's flag at
start; one extra read per recall, cheap.
Composes with include_all_memories: that operates on the persona
*scope* axis (single vs hive), reviewed_only_facts on the *status*
axis. They're orthogonal.
Legacy persona rows pick up the default false on migration; no
behavior change unless explicitly toggled. The 4 existing persona
construction sites (one production, two tests, one InsertPersona in
knowledge_dao tests) all default the field. populate_knowledge bin
+ state.rs constructors also wire the new persona_dao arg.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds two nullable TEXT columns to entity_facts —
`created_by_model` (LLM identifier) and `created_by_backend`
("local" / "hybrid" / "manual" / NULL) — so the curator can audit
which configurations produce good fact-keeping and which produce
noise.
photo_insights already carries model_version + backend, and
entity_facts.source_insight_id links to it, but:
- source_insight_id is set post-loop, so chat-continuation and
regenerated-insight facts lose the link.
- JOINing per read is more friction than embedding provenance on
the row itself.
- Manual facts (POST /knowledge/facts) have no insight at all and
need their own "manual" provenance marker.
Threading: execute_tool grows `model` + `backend` params, passed
from the three call sites (agentic insight loop, chat single-turn,
chat stream) using the loop-time `chat_backend.primary_model()` +
`effective_backend` already in scope. tool_store_fact stamps the
new fact accordingly; manual create_fact stamps backend="manual".
Legacy rows leave both NULL — pre-tracking data can't be back-
filled reliably from training_messages without burning compute.
Indexes are partial (WHERE NOT NULL) so legacy rows don't bloat
them, and "show me all facts from model X" stays fast.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two Phase-2 followups in one commit since they're coupled at the
write path:
* Agent populates valid_from from the source photo's date_taken
when calling store_fact. Loose semantics — date_taken is *evidence
at that date*, not strictly when the fact started being true — but
gives the curator a calendar anchor and pairs with supersession to
close intervals cleanly. valid_until stays NULL (a single photo
can't tell us when something stopped). Honours the existing
upsert_fact dedup (corroborated facts keep their first-recorded
valid_from).
* Supersession: new column entity_facts.superseded_by INTEGER
(migration 2026-05-10-000200), new status value 'superseded',
new DAO method supersede_fact, new HTTP endpoint
POST /knowledge/facts/{id}/supersede.
Marking an old fact as replaced by a new one atomically: flips
status to 'superseded', sets superseded_by, and stamps
valid_until from the new fact's valid_from (when not already
set). delete_fact clears dangling supersession pointers in the
same transaction so the column never points at a missing row —
no FK because SQLite can't ALTER ADD with REFERENCES, but the
DAO maintains the invariant.
Pairs with conflict detection from the previous slice: once the
old fact's valid_until is closed, its interval no longer overlaps
the new fact's, so they stop flagging — the supersede action
resolves the conflict.
Two tests pin the contract: supersede stamps valid_until from
new.valid_from while respecting an existing valid_until, and
deleting the supersedeR clears the dangling pointer while leaving
the old fact's 'superseded' status in place for history.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds bitemporal support to entity_facts. Existing `created_at` is
transaction time (when we recorded the fact); the new
`valid_from` / `valid_until` BIGINT columns are valid time (when the
fact is/was true in the real world). NULL on either side = unbounded
on that side, both NULL = "always-true / unknown" — matches the
default state of every legacy row, no backfill needed.
The split matters for time-bounded predicates like
is_in_relationship_with / lives_in / works_at: recording the fact
once doesn't mean the relationship is still ongoing. Same predicate
across different windows ("lives_in NYC 2018-2020", "lives_in SF
2020-present") is no longer a conflict — the interval-aware check
in get_entity only flags pairs whose windows overlap. Facts with no
valid-time data still flag against everything (worst case for legacy
rows — user adds dates to suppress).
API surface:
- POST /knowledge/facts accepts optional valid_from / valid_until.
- PATCH /knowledge/facts/{id} accepts both with tri-state semantics:
field omitted = leave alone, JSON null = clear to NULL, number =
set. Implemented via a small serde helper around Option<Option>.
- GET /knowledge/entities/{id} surfaces both fields per fact and
uses them in conflict detection.
Agent path (insight_generator) writes NULL/NULL for now — deriving
valid_from from the source photo's date_taken is slated for a
follow-up agent tool alongside Phase 2's supersession.
Test pins set + clear semantics via update_fact: setting both
bounds, leaving them alone on a subsequent patch, then clearing
valid_until back to NULL.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 1 of the knowledge curation work. Three small server-side changes
to support an Apollo-side curation surface and reduce the agent's near-
duplicate output rate going forward:
- upsert_entity grows an embedding-cosine fallback after the exact name
match misses. New entities whose embedding sits above
ENTITY_DEDUP_COSINE_THRESHOLD (default 0.92) against any same-type
active entity collapse onto the existing row. Eliminates the Sarah /
Sara / Sarah J. trio the FTS5 prefix check was missing.
- POST /knowledge/facts symmetric with the existing PATCH/DELETE so the
curation UI can create facts directly. Persona-scoped via X-Persona-Id;
validates subject (and optional object) entity existence; reuses
KnowledgeDao::upsert_fact so corroboration semantics match the agent
path.
- One sentence in build_system_content telling the agent to call
recall_entities before store_entity when a name resembles something
already known. Cheap; complements the DAO-layer guard.
Includes upsert_entity_collapses_near_duplicate_by_embedding test
covering both the collapse-on-near-match path and the don't-collapse-on-
unrelated-embedding path.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When a photo exists in more than one library and the user
regenerates its insight from library A's chat, the regenerate
streams cleanly, store_insight flips library A's old row to
is_current=false, and inserts a new is_current=true row tagged
(library A, rel_path). On the next history fetch the user sees
their old transcript — the regenerate appears to vanish.
The cause: get_insight(file_path) filters on rel_path + is_current
only, so library B's untouched is_current=true row for the same
rel_path satisfies the query and gets returned by SQLite's .first()
ahead of A's new row. Because get_insight is also what
chat_turn_stream uses to decide bootstrap vs. continuation, the
next chat turn after the shadow hit also routes against the
wrong insight, so update_training_messages corrupts library B's
transcript with library A's chat.
Fix: add get_current_insight_for_library(library_id, file_path)
filtered on (library_id, rel_path, is_current=true) and route the
chat surface (load_history, chat_turn{,_stream}, rewind_history)
through it. load_history falls back to the cross-library
get_insight when the scoped lookup misses — preserves the
"scalar data merges across libraries" intent for the case where
the active library has no insight but another does. The path-only
get_insight stays for callers that don't have library context
(populate_knowledge, the photo-grid metadata fetch).
chat_history_handler stops dropping the parsed library on the
floor and threads it through. Single-library deploys see no
behaviour change.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When the LLM calls search_messages with { date, limit } and no
query, it's making the predictable mistake of conflating the two
"messages"-shaped tools. The previous behaviour returned an error
that pointed it at get_sms_messages — correct, but burning a turn
on the misroute. Long photo-chat threads where the user asks
"what was happening that weekend?" hit this on small models
roughly half the time.
Now the date-string-without-query case transparently dispatches
to get_sms_messages with the same args (date / limit / days_radius
/ contact name all pass through unchanged) and prepends a short
"(Note: routed to get_sms_messages — prefer it directly next time)"
to the result. The model sees real data on its first try while
still learning the right tool for next time. Cases that don't have
a get_sms_messages equivalent (numeric contact_id, or start_ts /
end_ts windows) keep the original error so the model knows to
either supply a query or restructure its call.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two persona-infrastructure correctness fixes that go together because
the second one (FK with CASCADE) requires the first (preventing the
persona row from being mutated out from under its facts).
1. update_persona handler refuses name/systemPrompt edits to built-ins
(409). includeAllMemories stays editable — that's a per-user
preference, not the persona's identity. Mirrors the existing
delete_persona guard. The DAO is intentionally permissive so the
guard sits at the HTTP layer; persona_dao test pins that contract.
2. Migration 2026-05-10 adds user_id to entity_facts and a composite
FK (user_id, persona_id) -> personas(user_id, persona_id) ON DELETE
CASCADE. This closes two issues at once:
- Persona orphans: deleting a custom persona used to leave its
facts dangling forever, readable only via PersonaFilter::All.
CASCADE now wipes them with the persona row.
- Multi-user fact leakage: PersonaFilter::Single("default") used
to surface every user's default-scoped facts. PersonaFilter is
now { user_id, persona_id } and all read paths
(get_facts_for_entity, list_facts, get_recent_activity) filter
on user_id first. upsert_fact's dedup key extends to user_id so
identical claims under shared persona names from different
users no longer corroborate-bump each other's confidence.
- user_id threads from Claims.sub.parse::<i32>().unwrap_or(1) at
the chat / insight handlers through ChatTurnRequest, the
streaming agentic loop, execute_tool, and into the leaf tools
(tool_store_fact, tool_recall_facts_for_photo). The ".unwrap_or(1)"
accommodates Apollo's service token whose sub is non-numeric on
legacy mints.
- Backfill picks the smallest user_id matching each legacy fact's
persona_id so the FK holds for already-stored rows.
Five new knowledge_dao tests with FK-on connection: persona scoping
isolation, All-variant union per-user, dedup not crossing users,
CASCADE delete, FK rejection of unknown personas. Plus
dao_update_does_not_block_built_ins documenting where the
HTTP-layer guard lives.
Apollo coordinates separately — the matching changes there add the
/api/personas proxy and start sending persona_id on photo-chat turns.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Move personas off the mobile client into ImageApi as first-class
records, and scope entity_facts by persona so each one builds its own
voice over a shared entity graph. The new include_all_memories flag
lets a persona opt back into the full hive-mind pool for human
browsing of /knowledge/*; agentic generation always stays in-voice.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The bootstrap system message gave the model a file path and (in
hybrid mode) a visual description, but no temporal anchor. Models
defaulted to today's date when calling get_sms_messages — Nov 2014
photos were getting "2024-03-11" passed as `date`, missing every
historical message and leading the model to confidently misreport
context.
This commit folds two more EXIF-sourced facts into the
--- PHOTO CONTEXT --- block:
Date taken: <YYYY-MM-DD or "unknown">
GPS: <lat, lon to 4dp> (omitted when no GPS)
Resolution waterfall for date_taken matches the documented canonical
date pipeline at the EXIF / filename steps, but intentionally stops
short of the fs-time fallback `generate_agentic_insight_for_photo`
uses — for chat we'd rather show "unknown" than mislead the model
with an inode mtime. GPS is taken straight from EXIF when both
lat/lon are populated; absent GPS suppresses the line entirely so
the model doesn't hallucinate coordinates.
InsightGenerator gains a `fetch_exif(file_path)` accessor (crate-
visible) so the chat service doesn't need its own ExifDao plumbing.
build_bootstrap_system_message picks up two new params (date,
gps); existing tests updated and 5 new tests cover:
- date present / absent / waterfall (EXIF wins, filename fallback,
None when neither source has it)
- GPS present / absent
- ordering (path → date → visual)
Total insight_chat unit tests: 33 (up from 27).
After refresh, the rendered transcript was showing two unwanted
artifacts in the initial user bubble:
Photo file path: pics/DSC_5171.jpg
please tell me about this photo and what was going on around it
Please write your final answer now without calling any more tools.
Two distinct bugs:
1. Bootstrap was prepending `Photo file path: <path>` (and, in
hybrid mode, the visual description block) into the user-turn
content. The model needed it to call file_path-keyed tools, but
the user could see it in their own bubble on replay.
2. The no-tools fallback ("Please write your final answer now…")
was a synthetic user message we never stripped from history,
so it persisted into training_messages, rendered as a second
user bubble, AND wiped the prior tool-call accumulator inside
load_history (user-turn handler clears pending_tools), which
is why the tool invocations disappeared from the assistant
bubble after refresh.
Fixes:
- New `build_bootstrap_system_message` helper composes the persona
with a `--- PHOTO CONTEXT ---` block (path + optional visual
description). Lives in the system message, not the user turn.
The user's bubble shows only what they typed.
- Streaming agentic loop's no-tools fallback now records its
insertion index and removes the synthetic user prompt from
`messages` after the model responds. Final assistant content
stays — it reads coherently on replay without the synthetic
prompt above it. Applies to both bootstrap and continuation.
3 new tests cover the system-message composer (path-only, with
visual block, persona-trim). Total insight_chat unit tests: 27.
Bug: bootstrap user_content was just the user's typed message (plus
the hybrid visual description). Tools that take a file_path arg —
recall_facts_for_photo, get_file_tags, get_faces_in_photo — had no
way to learn the canonical path. Small models would invent
placeholders like "input_file_0.png" or call the tool with a name
guessed from a hidden multimodal input handle, neither of which
matched any real photo.
Fix: prepend a single-line "Photo file path: <normalized>\n\n" block
to user_content. Same shape generate_agentic_insight_for_photo
already uses for non-chat callers — kept the bootstrap minimal
(no date / GPS / tags pre-stuffing; the agentic loop can fetch
those via tools when needed).
Hybrid still injects the visual description block between the path
block and the user message; local mode just gets path + user text.
resolve_bootstrap_system_prompt and resolve_bootstrap_backend run on
every bootstrap turn — they pick the persisted system prompt and the
chosen backend label. They were inline conditionals before; pulling
them out makes the rules testable without spinning up the full
streaming stack.
9 new tests cover:
- system prompt fallback to BOOTSTRAP_DEFAULT_SYSTEM_PROMPT for None,
empty string, whitespace-only
- supplied non-empty prompts pass through verbatim, with interior
newlines / spacing preserved (Apollo personas use multi-line tool
listings)
- backend defaults to "local" for None / empty
- "local" / "hybrid" accepted case-insensitively with edge-trim
- unknown labels return a descriptive error
Total insight_chat tests: 24 (up from 15). No behaviour change.
Tap-Discuss-on-no-insight previously failed silently: ImageApi's
/insights/chat/stream required an existing agentic insight, errored
when missing, and emitted the failure as `event: error` — which the
frontend SSE consumer ignored (it listens for `error_message`).
This commit closes both gaps with a server-side state machine:
- /insights/chat/stream now branches on insight presence. Missing
insight (or `regenerate: true` in the body) → bootstrap path:
builds [System(req.system_prompt), User(req.user_message + image)],
runs the agentic loop, generates a title, persists a new row via
store_insight (which auto-flips priors). Existing insight →
continuation path (unchanged behaviour).
- New `regenerate: bool` request field forces bootstrap even when an
insight exists. Takes precedence over `amend`.
- `done` SSE payload field-name alignment with Apollo's frontend
convention: prompt_eval_count → prompt_tokens, eval_count →
eval_tokens, num_ctx echo added.
- `amended_insight_id` semantics broaden — now populated whenever the
turn produced a new row (bootstrap, regenerate, or amend). Existing
amend clients keep working unchanged; new clients get the new row's
id for free.
- `event: error` → `event: error_message` so frontend errors stop
silently dropping.
Refactor: extracted run_streaming_agentic_loop, build_chat_clients,
and generate_title as shared helpers between bootstrap and
continuation. Continuation path's outer logic moves to
run_continuation_streaming with no behaviour change.
Mobile-ready: any client (Apollo backend, mobile, future) sends one
request to /insights/chat/stream and gets the right path. Apollo's
proxy stays a dumb pipe.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Drop redundant `use anyhow::Context` inside has_any_faces (already
imported at the module level).
- Drop dead `.unwrap_or("?")` on bound faces — the vec is filtered to
is_some() so the fallback can never fire.
- Reorder the face_dao constructor param + initializer to match the
struct declaration (between tag_dao and knowledge_dao). Update both
state.rs call sites and populate_knowledge.rs to match.
- Hold face_dao lock once across the library-resolver loop instead of
reacquiring per iteration.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>