Adds two nullable TEXT columns to entity_facts —
`created_by_model` (LLM identifier) and `created_by_backend`
("local" / "hybrid" / "manual" / NULL) — so the curator can audit
which configurations produce good fact-keeping and which produce
noise.
photo_insights already carries model_version + backend, and
entity_facts.source_insight_id links to it, but:
- source_insight_id is set post-loop, so chat-continuation and
regenerated-insight facts lose the link.
- JOINing per read is more friction than embedding provenance on
the row itself.
- Manual facts (POST /knowledge/facts) have no insight at all and
need their own "manual" provenance marker.
Threading: execute_tool grows `model` + `backend` params, passed
from the three call sites (agentic insight loop, chat single-turn,
chat stream) using the loop-time `chat_backend.primary_model()` +
`effective_backend` already in scope. tool_store_fact stamps the
new fact accordingly; manual create_fact stamps backend="manual".
Legacy rows leave both NULL — pre-tracking data can't be back-
filled reliably from training_messages without burning compute.
Indexes are partial (WHERE NOT NULL) so legacy rows don't bloat
them, and "show me all facts from model X" stays fast.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
31 lines
1.5 KiB
SQL
31 lines
1.5 KiB
SQL
-- Track which model + backend generated each fact so the curator
|
|
-- can audit which configurations produce trustworthy knowledge.
|
|
--
|
|
-- photo_insights already carries `model_version` + `backend`, and
|
|
-- entity_facts.source_insight_id links to it — but:
|
|
-- 1. source_insight_id is only set after an insight is stored
|
|
-- (post-loop), so chat-continuation facts and facts whose insight
|
|
-- was regenerated lose the link.
|
|
-- 2. JOINing for every read is more friction than just embedding the
|
|
-- provenance on the fact row itself.
|
|
-- 3. Manual facts (POST /knowledge/facts) have no insight at all and
|
|
-- need to record "manual" as their provenance.
|
|
--
|
|
-- Two nullable TEXT columns are enough for the audit use case: model
|
|
-- (e.g. "qwen2.5:7b", "anthropic/claude-sonnet-4") and backend
|
|
-- ("local", "hybrid", "manual"). Pre-existing rows leave both NULL —
|
|
-- legacy facts predate this tracking and can't be back-filled
|
|
-- reliably from training_messages without burning compute.
|
|
|
|
ALTER TABLE entity_facts ADD COLUMN created_by_model TEXT;
|
|
ALTER TABLE entity_facts ADD COLUMN created_by_backend TEXT;
|
|
|
|
-- Indexes are cheap and useful for "show me all facts from model X"
|
|
-- audit queries — partial so the legacy NULL rows don't bloat them.
|
|
CREATE INDEX idx_entity_facts_created_by_model
|
|
ON entity_facts(created_by_model)
|
|
WHERE created_by_model IS NOT NULL;
|
|
CREATE INDEX idx_entity_facts_created_by_backend
|
|
ON entity_facts(created_by_backend)
|
|
WHERE created_by_backend IS NOT NULL;
|