multi-library: hash-keyed tagged_photo + photo_insights with reconciliation
Branch B of the multi-library data-model rollout. tagged_photo and
photo_insights now follow the bytes (content_hash), not the path,
matching the policy pinned in CLAUDE.md "Multi-library data model".
Branch A's availability probe and EXIF scoping land first; this
branch builds on top.
Migration (2026-05-01-000000_hash_keyed_derived_data)
Adds nullable content_hash columns to tagged_photo and photo_insights,
with partial indexes on the non-null subset to keep the index small
during the transitional window. The migration backfills from
image_exif:
* tagged_photo joins on rel_path alone (no library_id available);
* photo_insights joins on (library_id, rel_path), unambiguous.
Rows whose image_exif hash isn't known yet stay null and the runtime
reconciliation pass populates them as the hash backlog drains.
Insert-time population
TagDao::tag_file looks up image_exif.content_hash by rel_path before
inserting; the hash is written into the new column.
InsightDao::store_insight does the same scoped to (library_id,
rel_path). Caller-supplied hash on InsertPhotoInsight wins; otherwise
the DAO does the lookup. Both paths fall back to None if the hash
isn't known yet — reconciliation backfills.
Reconciliation (database/reconcile.rs)
Three idempotent passes the watcher runs once per tick after the
per-library backfill loop:
1. tagged_photo NULL hashes → populate from image_exif by rel_path.
2. photo_insights NULL hashes → populate by (library_id, rel_path).
3. photo_insights scalar merge — when multiple is_current rows
share a content_hash, keep the earliest generated_at as
current; demote the rest. Demoted rows keep their data so
/insights/history is unaffected; only the "current" pointer
narrows to one per hash.
No filesystem dependency, so reconcile doesn't need the availability
gate; runs every tick. Logs once when something changed, debug
otherwise.
Tags are set-valued under the policy (union on read, already
DISTINCT in queries), so there is no analogous tag-collapse pass —
duplicate (tag_id, content_hash) rows across libraries are
harmless.
Read paths are unchanged in this branch — lookup_tags_batch's
existing rel_path-via-hash-sibling expansion still produces the
correct merge. A follow-up can simplify reads to use the new column
directly for performance.
Tests: 217 pass (212 pre-existing + 5 new in reconcile covering
NULL-fill, hash-not-yet-known no-op, library scoping on insights,
earliest-wins collapse, idempotency).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -1255,7 +1255,9 @@ impl InsightGenerator {
|
||||
.span()
|
||||
.set_attribute(KeyValue::new("summary_length", summary.len() as i64));
|
||||
|
||||
// 11. Store in database
|
||||
// 11. Store in database. content_hash is None here — store_insight
|
||||
// looks it up from image_exif before persisting; reconciliation
|
||||
// backfills if the hash isn't known yet.
|
||||
let insight = InsertPhotoInsight {
|
||||
library_id: crate::libraries::PRIMARY_LIBRARY_ID,
|
||||
file_path: file_path.to_string(),
|
||||
@@ -1267,6 +1269,7 @@ impl InsightGenerator {
|
||||
training_messages: None,
|
||||
backend: "local".to_string(),
|
||||
fewshot_source_ids: None,
|
||||
content_hash: None,
|
||||
};
|
||||
|
||||
let mut dao = self.insight_dao.lock().expect("Unable to lock InsightDao");
|
||||
@@ -3530,6 +3533,7 @@ Return ONLY the summary, nothing else."#,
|
||||
training_messages,
|
||||
backend: backend_label.clone(),
|
||||
fewshot_source_ids: fewshot_source_ids_json,
|
||||
content_hash: None,
|
||||
};
|
||||
|
||||
let stored = {
|
||||
|
||||
Reference in New Issue
Block a user