knowledge: predicate-quality nudge + bulk-reject endpoint

Two coupled changes to fight the speech-act-predicate problem (facts like (Cameron, expressed, "I'm tempted to...")): 1. System prompt grows an explicit predicate-quality rule. The agent is told to use relationship-shaped verbs (lives_in, works_at, attended, is_friend_of, interested_in), and is given an explicit DON'T list (expressed, said, mentioned, stated, quoted, noted, discussed, thought, wondered). Plus a concrete Bad / Good example contrasting the noise pattern with the structured paraphrase the agent should be writing. Stops the bleed for new insights. 2. Cleanup tools for the legacy noise that's already in the table: - get_predicate_stats(persona, limit) returns [(predicate, count)] sorted desc — feeds the curation UI's PREDICATES tab. - bulk_reject_facts_by_predicate(persona, predicate, audit) flips every ACTIVE fact under that predicate to 'rejected' in one transaction, stamping last_modified_* so the action is attributable + reversible per-fact through the entity detail panel. REVIEWED facts under the same predicate are left alone — the curator may have hand-approved an exception ("interested_in" might be largely noise but a reviewed entry is intentional). New HTTP endpoints: GET /knowledge/predicate-stats?limit= POST /knowledge/predicates/{predicate}/bulk-reject Persona-scoped via the existing X-Persona-Id header. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 21:50:26 -04:00
parent fb078b4906
commit e67e00ef8a
3 changed files with 225 additions and 1 deletions
--- a/src/ai/insight_generator.rs
+++ b/src/ai/insight_generator.rs
@@ -3525,6 +3525,7 @@ Return ONLY the summary, nothing else."#,
             - Use recall_facts_for_photo + recall_entities to load any prior knowledge about subjects in the photo.\n\
             - When you identify people / places / events / things, use store_entity + store_fact to grow the persistent memory.\n\
             - Before store_entity, call recall_entities to check whether a similar name already exists; reuse the existing entity_id rather than creating a near-duplicate (e.g. \"Sara\" vs \"Sarah J.\"). The DAO will collapse obvious cosine matches, but choosing the existing id keeps facts and photo links consolidated.\n\
+             - Predicates should be relationship-shaped verbs that encode a queryable claim — `lives_in`, `works_at`, `attended`, `is_friend_of`, `is_parent_of`, `interested_in`, `married_to`, `owns`. DO NOT use vague speech-act predicates like `expressed`, `said`, `mentioned`, `stated`, `quoted`, `noted`, `discussed`, `thought`, `wondered`. DO NOT store quotations or sentence fragments as `object_value` — paraphrase into a structured claim. Bad: `(Cameron, expressed, \"I'm tempted to get a part-time job there\")`. Good: `(Cameron, considered_employment_at, <Place>)` or `(Cameron, interested_in, \"part-time work\")`.\n\
             - A tool returning no results is informative; continue with the others.",
        );