ai: restructure agentic user message — facts up top + forcing gate

Small models (~8B) were producing generic responses regardless of persona, and bailing out of the agentic loop on iteration 1. Two underlying causes: 1. Photo facts (date, location, contact, tags, visual) were buried between "Please analyze this photo" preamble and "Use the available tools" outro. Small models skim and miss them, which is why outputs weren't anchoring to the actual photo. 2. The user message ended with "write a detailed insight" — small models took the path of least resistance and just wrote, ignoring the soft "aim to use 5 tools" floor in the system prompt. Restructured the user message: - Leads with a "## This photo" bulleted block so the metadata is visible top-down. File path, date+source, contact, location+GPS, tags, and (in hybrid) the visual description are all bullets the model can't skim past. - Replaces the prose body with a numbered "## What to do" recipe: (1) recall_facts_for_photo + recall_entities, (2) ≥3 of the time-window tools, (3) write only after tool results, referencing specific facts. "Generic narration is not acceptable" is explicit. - Ends with a hard forcing line: "YOUR FIRST RESPONSE MUST BE A TOOL CALL. Do not output any final answer text until you have called at least 5 tools." Replaces the soft "aim to" floor with a directive small models actually follow. Tradeoff: big models also follow the recipe literally and may call 5 tools when 3 would do. Optimizing for the small-model floor first; soften once that's working. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
backfill_date_taken: surface the actual diesel error in warnings
2026-05-07 10:59:39 -04:00 · 2026-05-07 10:41:09 -04:00 · 2026-05-07 10:36:05 -04:00 · 2026-05-06 22:37:32 -04:00
7 changed files with 654 additions and 156 deletions
--- a/src/ai/handlers.rs
+++ b/src/ai/handlers.rs
@@ -48,6 +48,11 @@ pub struct GeneratePhotoInsightRequest {
    /// falls back to `DEFAULT_FEWSHOT_INSIGHT_IDS`.
    #[serde(default)]
    pub fewshot_insight_ids: Option<Vec<i32>>,
+    /// When true, drop `store_entity` / `store_fact` from the tool palette
+    /// for this run. Use for one-off explorations (caption-style prompts,
+    /// experimentation) that shouldn't pollute the persistent knowledge KB.
+    #[serde(default)]
+    pub disable_writes: bool,
 }

 #[derive(Debug, Deserialize)]
@@ -390,6 +395,7 @@ pub async fn generate_agentic_insight_handler(
            request.backend.clone(),
            fewshot_examples,
            fewshot_ids,
+            request.disable_writes,
        )
        .await;

@@ -642,6 +648,10 @@ pub struct ChatTurnHttpRequest {
    pub max_iterations: Option<usize>,
    #[serde(default)]
    pub amend: bool,
+    /// Drop store_entity / store_fact from the tool palette for this turn —
+    /// useful for hypothetical/exploration chats that shouldn't pollute the KB.
+    #[serde(default)]
+    pub disable_writes: bool,
 }

 #[derive(Debug, Serialize)]
@@ -696,6 +706,7 @@ pub async fn chat_turn_handler(
        min_p: request.min_p,
        max_iterations: request.max_iterations,
        amend: request.amend,
+        disable_writes: request.disable_writes,
    };

    match app_state.insight_chat.chat_turn(chat_req).await {
@@ -910,6 +921,7 @@ pub async fn chat_stream_handler(
        min_p: request.min_p,
        max_iterations: request.max_iterations,
        amend: request.amend,
+        disable_writes: request.disable_writes,
    };

    let service = app_state.insight_chat.clone();
--- a/src/ai/insight_chat.rs
+++ b/src/ai/insight_chat.rs
@@ -48,6 +48,10 @@ pub struct ChatTurnRequest {
    /// When true, write a new insight row (regenerating title) instead of
    /// updating training_messages on the existing row.
    pub amend: bool,
+    /// When true, drop `store_entity` / `store_fact` from the tool palette
+    /// for this turn. Use to explore alternate phrasings or run
+    /// hypothetical chats without polluting the persistent KB.
+    pub disable_writes: bool,
 }

 #[derive(Debug)]
@@ -362,6 +366,7 @@ impl InsightChatService {
        let tools = InsightGenerator::build_tool_definitions(
            offer_describe_tool,
            self.generator.apollo_enabled(),
+            req.disable_writes,
        );

        // Image base64 only needed when describe_photo is on the menu. Load
@@ -397,6 +402,9 @@ impl InsightChatService {
        //    tighter and dispatching tools through the shared executor.
        let loop_span = tracer.start_with_context("ai.chat.loop", &insight_cx);
        let loop_cx = insight_cx.with_span(loop_span);
+        // Memoize describe_photo for this turn so repeated calls don't
+        // produce conflicting visual descriptions in the assistant transcript.
+        let describe_cache: tokio::sync::Mutex<Option<String>> = tokio::sync::Mutex::new(None);
        let mut tool_calls_made = 0usize;
        let mut iterations_used = 0usize;
        let mut last_prompt_eval_count: Option<i32> = None;
@@ -445,6 +453,7 @@ impl InsightChatService {
                            &image_base64,
                            &normalized,
                            &loop_cx,
+                            Some(&describe_cache),
                        )
                        .await;
                    messages.push(ChatMessage::tool_result(result));
@@ -793,6 +802,7 @@ impl InsightChatService {
        let tools = InsightGenerator::build_tool_definitions(
            offer_describe_tool,
            self.generator.apollo_enabled(),
+            req.disable_writes,
        );

        let image_base64: Option<String> = if offer_describe_tool {
@@ -814,6 +824,9 @@ impl InsightChatService {

        let original_system_content = annotate_system_with_budget(&mut messages, max_iterations);

+        // Per-turn describe_photo memo, same intent as the non-streaming
+        // path: avoid replaying conflicting visual descriptions in transcript.
+        let describe_cache: tokio::sync::Mutex<Option<String>> = tokio::sync::Mutex::new(None);
        let mut tool_calls_made = 0usize;
        let mut iterations_used = 0usize;
        let mut last_prompt_eval_count: Option<i32> = None;
@@ -889,6 +902,7 @@ impl InsightChatService {
                            &image_base64,
                            &normalized,
                            &cx,
+                            Some(&describe_cache),
                        )
                        .await;
                    let (result_preview, result_truncated) = truncate_tool_result(&result);
@@ -1134,8 +1148,12 @@ fn annotate_system_with_budget(
        return None;
    }
    let original = first.content.clone();
+    // Formatted as its own section so small models don't skim past it the
+    // way they tend to with parenthetical asides at the bottom of a long prompt.
+    // Phrasing matches the base prompt: budget = capacity, not a constraint
+    // to conserve. Small models otherwise tend to stop early.
    first.content = format!(
-        "{}\n\n(Budget for this chat turn: up to {} tool-calling iterations. Produce your final reply before the budget is exhausted.)",
+        "{}\n\n## Budget for this chat turn\n\nYou have up to {} iterations available. Use as many as the question warrants for context-gathering, and reserve the last one for your reply.",
        first.content, max_iterations
    );
    Some(original)
--- a/src/ai/insight_generator.rs
+++ b/src/ai/insight_generator.rs
--- a/src/ai/sms_client.rs
+++ b/src/ai/sms_client.rs
@@ -20,22 +20,24 @@ impl SmsApiClient {
        }
    }

-    /// Fetch messages for a specific contact within ±4 days of the given timestamp
-    /// Falls back to all contacts if no messages found for the specific contact
-    /// Messages are sorted by proximity to the center timestamp
+    /// Fetch messages for a specific contact within ±`days_radius` days of
+    /// the given timestamp (defaults to ±4 days when `None`). Falls back to
+    /// all contacts if no messages are found for the specified contact.
+    /// Messages are sorted by proximity to the center timestamp.
    pub async fn fetch_messages_for_contact(
        &self,
        contact: Option<&str>,
        center_timestamp: i64,
+        days_radius: Option<i64>,
    ) -> Result<Vec<SmsMessage>> {
        use chrono::Duration;

-        // Calculate ±4 days range around the center timestamp
+        let radius = days_radius.unwrap_or(4).clamp(1, 30);
        let center_dt = chrono::DateTime::from_timestamp(center_timestamp, 0)
            .ok_or_else(|| anyhow::anyhow!("Invalid timestamp"))?;

-        let start_dt = center_dt - Duration::days(4);
-        let end_dt = center_dt + Duration::days(4);
+        let start_dt = center_dt - Duration::days(radius);
+        let end_dt = center_dt + Duration::days(radius);

        let start_ts = start_dt.timestamp();
        let end_ts = end_dt.timestamp();
@@ -43,8 +45,9 @@ impl SmsApiClient {
        // If contact specified, try fetching for that contact first
        if let Some(contact_name) = contact {
            log::info!(
-                "Fetching SMS for contact: {} (±4 days from {})",
+                "Fetching SMS for contact: {} (±{} days from {})",
                contact_name,
+                radius,
                center_dt.format("%Y-%m-%d %H:%M:%S")
            );
            let messages = self
@@ -68,7 +71,8 @@ impl SmsApiClient {

        // Fallback to all contacts
        log::info!(
-            "Fetching all SMS messages (±4 days from {})",
+            "Fetching all SMS messages (±{} days from {})",
+            radius,
            center_dt.format("%Y-%m-%d %H:%M:%S")
        );
        self.fetch_messages(start_ts, end_ts, None, Some(center_timestamp))
--- a/src/bin/populate_knowledge.rs
+++ b/src/bin/populate_knowledge.rs
@@ -331,6 +331,7 @@ async fn main() -> anyhow::Result<()> {
                None,
                Vec::new(),
                Vec::new(),
+                false, // disable_writes — keep KB writes on for the population job
            )
            .await
        {
--- a/src/files.rs
+++ b/src/files.rs
@@ -1718,7 +1718,12 @@ mod tests {
            // Mock — files.rs tests don't exercise the date-override endpoints.
            // Returning a synthetic row keeps the trait satisfied without
            // depending on private DbError constructors.
-            Ok(mock_exif_row(library_id, rel_path, Some(date_taken), Some("manual".to_string())))
+            Ok(mock_exif_row(
+                library_id,
+                rel_path,
+                Some(date_taken),
+                Some("manual".to_string()),
+            ))
        }

        fn clear_manual_date_taken(
--- a/src/main.rs
+++ b/src/main.rs
@@ -995,10 +995,8 @@ async fn upload_image(
                            }
                        };
                        let perceptual = perceptual_hash::compute(&uploaded_path);
-                        let resolved_date = date_resolver::resolve_date_taken(
-                            &uploaded_path,
-                            exif_data.date_taken,
-                        );
+                        let resolved_date =
+                            date_resolver::resolve_date_taken(&uploaded_path, exif_data.date_taken);
                        let insert_exif = InsertImageExif {
                            library_id: target_library.id,
                            file_path: relative_path.clone(),
@@ -1022,8 +1020,7 @@ async fn upload_image(
                            size_bytes,
                            phash_64: perceptual.map(|h| h.phash_64),
                            dhash_64: perceptual.map(|h| h.dhash_64),
-                            date_taken_source: resolved_date
-                                .map(|r| r.source.as_str().to_string()),
+                            date_taken_source: resolved_date.map(|r| r.source.as_str().to_string()),
                        };

                        if let Ok(mut dao) = exif_dao.lock() {
Author	SHA1	Message	Date
Cameron Cordes	2ff06413c6	ai: restructure agentic user message — facts up top + forcing gate Small models (~8B) were producing generic responses regardless of persona, and bailing out of the agentic loop on iteration 1. Two underlying causes: 1. Photo facts (date, location, contact, tags, visual) were buried between "Please analyze this photo" preamble and "Use the available tools" outro. Small models skim and miss them, which is why outputs weren't anchoring to the actual photo. 2. The user message ended with "write a detailed insight" — small models took the path of least resistance and just wrote, ignoring the soft "aim to use 5 tools" floor in the system prompt. Restructured the user message: - Leads with a "## This photo" bulleted block so the metadata is visible top-down. File path, date+source, contact, location+GPS, tags, and (in hybrid) the visual description are all bullets the model can't skim past. - Replaces the prose body with a numbered "## What to do" recipe: (1) recall_facts_for_photo + recall_entities, (2) ≥3 of the time-window tools, (3) write only after tool results, referencing specific facts. "Generic narration is not acceptable" is explicit. - Ends with a hard forcing line: "YOUR FIRST RESPONSE MUST BE A TOOL CALL. Do not output any final answer text until you have called at least 5 tools." Replaces the soft "aim to" floor with a directive small models actually follow. Tradeoff: big models also follow the recipe literally and may call 5 tools when 3 would do. Optimizing for the small-model floor first; soften once that's working. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 10:59:39 -04:00
Cameron Cordes	66ea8490ab	backfill_date_taken: surface the actual diesel error in warnings The DAO swallowed every diesel::update failure as a flat `anyhow!("Update error")`, then trace_db_call further reduced it to `DbError { kind: UpdateError }`. Operators saw "update failed for lib 2 Snapchat/foo.mp4: DbError { kind: UpdateError }" with no clue why (constraint violation? type mismatch? row vanished mid-flight? DB locked?). Two changes: - Preserve the diesel error in the anyhow chain along with the input params (lib, rel_path, date_taken, source) so the cause is visible. - Log the chain at warn-level inside the DAO before the trace wrapper collapses it to DbErrorKind::UpdateError, so the warning at the call site finally has something diagnosable next to it. - Treat zero-row updates as a debug-level "row likely retired by the missing-file scan" rather than a hard failure — that case is benign and shouldn't poison the drain's error tally. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 10:41:09 -04:00
Cameron Cordes	10ba706b39	ai: reframe iteration budget as capacity, not constraint Small models (~8B) were bailing out of the agentic loop after one or two tool calls under the previous "hard budget … stop when nearly exhausted" phrasing. They read that as a conservation directive and the "trivial photos may need fewer" clause gave them an easy out. Flipped both the agentic and chat-turn prompts to frame the budget as capacity to spend, with a soft floor (≥5 tool calls before writing) and an explicit reserve clause for the final reply. Big models will still deviate when warranted; small models follow the floor. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 10:36:05 -04:00
Cameron Cordes	9071d05932	ai: insight tools audit — bug fixes, new tools, prompt structure Bug fixes: - get_sms_messages.days_radius is now actually honored (was hardcoded to ±4d in SmsApiClient::fetch_messages_for_contact). - describe_photo memoized for the lifetime of one agentic loop / one chat turn — re-running mid-loop produced conflicting visual descriptions in the transcript. Agentic user message: - Pre-resolve location via Apollo + Nominatim and emit one Location: line instead of bare GPS, mirroring the non-agentic flow. - Date now formats with weekday + canonical-date source so the model can hedge on fs_time-derived dates. - Hybrid mode visual block tells the model not to call describe_photo (the tool is already gated off in hybrid). System prompt structure: - custom_system_prompt now appends under an explicit "User overrides (these take precedence)" heading instead of prepending — so a custom voice/POV/format prompt actually beats the built-in defaults. - Numbered rules collapsed into bulleted "Tool-use guidance"; merged the contradictory "multiple tools BEFORE" / "after 5 calls" rules. - Chat budget annotation surfaces as its own ## heading. New tools: - recall_facts_for_entity(entity_id\|name) — facts for one entity without needing a photo path. Fills the "tell me about Sarah" chat case where recall_facts_for_photo doesn't apply. - find_photos_with_entity(entity_id\|name) — "when did I last see X / show me photos from the Tahoe trip" via entity_photo_links. - get_exif(file_path) — full EXIF row for any photo, for technical ("what camera was this on?") questions. Tools removed: - get_file_tags duplicated the inline Tags: line on the user message; exposing both gave models an excuse to "confirm" what they had. Tool descriptions tightened: - search_rag now correctly says "per-day, per-contact summaries" and explains the date is for time-decay weighting. - recall_entities warns about empty-filter dumps. - store_entity / store_fact document dedup return + snake_case predicate vocabulary. - reverse_geocode defers to the pre-resolved location and to get_personal_place_at for personal places. - get_current_datetime narrowed to time-since-photo use. Calendar / location: - get_calendar_events accepts query and embeds it for hybrid time + semantic ranking (was always passing None for the embedding). - get_location_history exposes limit; description tells the model there's no semantic ranking on this surface. New disable_writes flag: - POST /insights/generate/agentic and the chat endpoints accept disable_writes: bool. When true, drops store_entity / store_fact from the tool palette and rewrites the system prompt's knowledge- write line. Lets users explore alternate prompts (caption-style, third-person, haiku) without polluting the persistent KB. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-06 22:37:32 -04:00