Files
ImageApi/migrations/2026-05-10-000500_entity_facts_audit_and_agent_gate/up.sql
Cameron Cordes fd4dd89bbb knowledge: agent self-correction with audit + per-persona gate + revert
Bundles three coupled changes so agent-side mutations stay
auditable and reversible:

1. Audit columns on entity_facts —
   `last_modified_by_model` / `last_modified_by_backend` /
   `last_modified_at`. Stamped on every mutation path
   (update_fact, supersede_fact, manual PATCH, manual supersede,
   the new revert). NULL on rows never touched since creation.
   Partial index on `last_modified_at WHERE NOT NULL` keeps the
   "show me recent edits" feed fast without bloating from legacy
   rows.

2. Per-persona gate `personas.allow_agent_corrections` (BOOLEAN,
   default 0). Defense in depth at two layers:
   - build_tool_definitions: when off, `update_fact` and
     `supersede_fact` aren't in the catalog at all, so even a
     hallucinated tool call by the model fails fast.
   - tool_update_fact / tool_supersede_fact: re-checks the persona
     flag at call time and returns an explicit "corrections
     disabled" error if it's somehow off (e.g. flag flipped mid-
     loop).
   ToolGateOpts grows the flag; current_gate_opts splits into
   `current_gate_opts` (no persona context, defaults closed) +
   `current_gate_opts_for_persona` for chat callers that have a
   persona id. Both call sites in insight_chat are updated.

3. Revert action — new DAO method `revert_supersession` +
   `POST /knowledge/facts/{id}/restore`. Flips status back to
   'active', clears `superseded_by`, clears `valid_until` (we
   don't track whether it was hand-set vs auto-stamped, so the
   safe reset is to drop it — user can re-bound after). Stamps
   `last_modified_*` so the revert itself is attributable.

Manual paths (PATCH / supersede via HTTP, plus restore) stamp the
audit columns with `("manual", "manual")`. Agent paths stamp the
loop-time chat model and backend (mirroring the existing
created_by_* convention).

FactDetail in the HTTP response now carries the audit triple
alongside the existing provenance. Apollo wires the new field set
in the matching commit.

PersonaView / UpdatePersonaRequest grow `allowAgentCorrections`;
the PersonaPatch + InsertPersona + bulk_import paths thread it.

317 lib tests pass, including unchanged update_fact / supersede
DAO tests (now passing audit=None — None means "no provenance
context to attribute", legacy semantics).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 20:56:56 -04:00

31 lines
1.5 KiB
SQL

-- Three coupled changes for agent self-correction safety:
--
-- 1. `entity_facts.last_modified_by_*` + `last_modified_at` track who
-- most recently mutated each fact. `created_by_*` from migration
-- 2026-05-10-000300 records who first wrote the row; this records
-- who last *changed* it. Separate columns so the create vs update
-- audit is independently grep-able ("show me every fact gpt-5
-- altered last week" stays a single index scan).
--
-- 2. `personas.allow_agent_corrections` is the gate for the new
-- agent-side `update_fact` / `supersede_fact` tools. Default OFF —
-- a fresh persona's agent can create but can't alter or replace.
-- Operator opts in per-persona after the model has earned trust,
-- typically via the strict-mode flow (curate, then ratchet up
-- agent autonomy as confidence rises). Parallel in shape to
-- `reviewed_only_facts` from 2026-05-10-000400; they compose.
--
-- 3. Index on `last_modified_at` (partial, NOT NULL) for the
-- audit-feed reads in the curation UI ("show recent agent edits
-- sorted newest first").
ALTER TABLE entity_facts ADD COLUMN last_modified_by_model TEXT;
ALTER TABLE entity_facts ADD COLUMN last_modified_by_backend TEXT;
ALTER TABLE entity_facts ADD COLUMN last_modified_at BIGINT;
CREATE INDEX idx_entity_facts_last_modified_at
ON entity_facts(last_modified_at)
WHERE last_modified_at IS NOT NULL;
ALTER TABLE personas ADD COLUMN allow_agent_corrections BOOLEAN NOT NULL DEFAULT 0;