ImageApi

Author	SHA1	Message	Date
Cameron Cordes	32195ed89e	clip-search: backlog drain + /photos/search endpoint Wires the persistence layer for CLIP semantic search. The watcher's per-tick drain encodes any image_exif row with a known content_hash but no clip_embedding via Apollo (cap CLIP_BACKLOG_MAX_PER_TICK, default 32). On a query, /photos/search encodes the text via Apollo and reranks every stored embedding in-memory. ExifDao additions: - list_clip_unencoded_candidates — partial-index scan for drain - backfill_clip_embedding — touches only the two new columns - list_clip_index — dedup'd (hash, embedding) pull for search clip_watch::run_clip_encoding_pass is the parallel fan-out — tokio runtime per pass with CLIP_ENCODE_CONCURRENCY (default 4). No marker rows for permanent failures yet; per-tick cap bounds the retry cost. /photos/search params: q, limit, threshold (default 0.20), library, model_version. Response is intentionally minimal (path + score) so the frontend joins against existing photo-metadata routes lazily. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 16:10:52 -04:00
Cameron Cordes	8d9e76cf15	clip-search: migration + client + probe binary Probe-phase scaffolding for CLIP semantic search. Adds the column that will hold per-photo embeddings, the HTTP client to Apollo's inference service, and a throwaway probe binary so we can eyeball search-result quality on the live library before building the persistence layer (backlog drain, /photos/search endpoint, UI). - migrations/2026-05-14-000000_add_clip_embedding/ — adds image_exif.clip_embedding (BLOB) and clip_model_version (TEXT), plus a partial index on (clip_embedding IS NULL AND content_hash IS NOT NULL) for the future backfill drain. - src/database/models.rs — extends ImageExif struct to match. - src/ai/clip_client.rs — encode_image / encode_text / health, same Permanent/Transient/Disabled taxonomy as face_client. - src/bin/probe_clip_search.rs — --query <q> --library N --limit M --top K. Encodes a sample and prints top-K cosine similarities. No DB writes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 16:10:52 -04:00
Cameron Cordes	8503ef7884	chore: cargo fmt + clippy --fix sweep across the crate Pure mechanical cleanup of accumulated drift in files outside the HLS-content-hash branch's main change set. No behavior change. - `cargo fmt` on every previously-misformatted file (`ai/insight_generator.rs`, `database/knowledge_dao.rs`, `faces.rs`, `knowledge.rs`, `libraries.rs`). - `cargo clippy --fix`: - `needless_borrow`: `&library` → `library` in `handlers/image.rs` (two sites in the photo-listing path). - Manual clippy pass for warnings clippy emits but can't auto-apply: - `field_reassign_with_default` in `database/reconcile.rs::run` — consolidated into a struct-literal initializer. - `needless_range_loop` in `database/knowledge_dao.rs::union_perceptual_tags` — inner `for b in (a+1)..indices.len() { let ib = indices[b]; ... }` becomes `for &ib in &indices[a + 1..] { ... }`. - Doc-list indentation: continuation lines under nested bullets in `database/mod.rs::get_memories_in_window` and `database/knowledge_dao.rs::build_entity_graph` realigned to the list-item content column. Deliberately not touched (each deserves its own focused commit, with testing, rather than getting bundled into a sweep): - 4× `deprecated count_distinct` in `faces.rs` — diesel API migration to `AggregateExpressionMethods::aggregate_distinct` may shift result types; needs verification against the existing stats queries. - `await_holding_lock` in `knowledge.rs:807` — `std::sync::Mutex` held across `ollama.generate(...).await`. Genuine concurrency bug; fix requires understanding the surrounding flow before just dropping the guard. - 2× `type_complexity` in `database/mod.rs` — cosmetic, would need a `type` alias and corresponding callers updated. - Dead `total_deleted` on `library_maintenance::GcStats` and `file_scan::enumerate_indexable_files` — both are public surface retained for future use; deletion is a separate decision. All 707 tests still pass. Release build clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 16:25:05 -04:00
Cameron Cordes	7cd1ea3cf8	hls: per-library readiness gauges + GET /hls/stats endpoint The hash-keyed pipeline transcodes lazily, so a freshly mounted (or freshly upgraded) library is "mostly pending" for the first hour while the watcher works through the backlog. The operator wants a live read on remaining work so they can tune `HLS_CONCURRENCY` and know when to stop waiting. Adds: - `src/hls_stats.rs` — pure compute path (`stats_from_rows`) and an Arc<Mutex<dyn ExifDao>> wrapper (`compute_and_publish`). Per library: `total`, `with_playlist`, `pending`, `unsupported`, `hashless_videos`. Dedup is by content_hash so duplicate-bytes-at- N-paths counts once (same domain rule as `faces::stats`). `hashless_videos` is a separate counter so the operator can see the "hash backfill, then transcode" pipeline depth instead of having NULL-hash rows just hide. - Prometheus gauges labeled by library name: `imageserver_hls_videos_total`, `..._with_playlist`, `..._pending`, `..._unsupported`. Updated by the watcher at the end of every full- scan tick and on every `/hls/stats` hit, so whichever surface the operator is watching stays fresh. Registered in `main` alongside the existing image/video gauges. - `GET /hls/stats` — Claims-protected JSON snapshot of the same data plus a top-level cross-library aggregate. Runs on a blocking pool so it doesn't pin the actix worker; per-call cost is one `list_paths_and_hashes_for_library` SQL query per library plus a `stat()` per distinct video hash. Bounded — never invoked from middleware, only from the explicit endpoint and the full-scan tick. The watcher's end-of-tick `info!` summary line mirrors the endpoint output for operators tailing the log. - New `ExifDao::list_paths_and_hashes_for_library` method: `SELECT rel_path, content_hash FROM image_exif WHERE library_id = ?`. Single round-trip; callers filter to video extensions client-side because the schema doesn't carry media-type. Mock impl in `files.rs` returns an empty vec. Tests in `hls_stats::tests` exercise stats_from_rows directly (videos- only filter, hash dedup, playlist vs sentinel decision, NULL-hash hashless counting) plus a publish_gauges round-trip that reads the gauge value back. Full suite (347 lib + 360 bin = 707) passes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 15:58:46 -04:00
Cameron Cordes	b8e17e05b7	hls: rewrite orphan cleanup for hash-keyed layout The cleanup walk previously looked for `$VIDEO_PATH/<basename>.m3u8` and matched each file's stem against a recursive walk of every library. With the hash-keyed layout now in place, every playlist's file_stem is the literal string "playlist" — the old logic would treat every hash-keyed playlist as orphaned on its next run and wipe them all in one tick (default cleanup interval is 24h, so this is a 24-hour bomb on top of the prior commit). New approach: orphan-ness is decided in the database, not on the filesystem. The cleanup loop: - Snapshots every distinct non-NULL `image_exif.content_hash` into a HashSet (new `ExifDao::list_distinct_content_hashes` method — `SELECT DISTINCT content_hash WHERE content_hash IS NOT NULL`). - Walks `$VIDEO_PATH` two levels deep: top-level entries are filtered to 2-char lowercase hex shard dirs, each shard's children to 64-char hex hash dirs. Anything else (legacy `.m3u8` at root from the pre-content-hash era, operator-stashed dirs, partial writes) is left alone. - Hash dirs whose hash isn't in the alive set are `remove_dir_all`'d. Shard dirs that emptied as a result are reaped on the same pass via `remove_dir` (no-op if non-empty). - The library-stale safety gate is preserved: a stale library skips the cycle even though the orphan decision is DB-only, because the upstream missing-file scan that retires `image_exif` rows itself pauses for stale libraries. Belt-and-suspenders — keeping a hash dir for one extra 24h cycle is cheaper than wiping one whose source was briefly unreachable. The gate now also filters disabled libraries out of the stale set (they're intentionally absent from the health map). - The legacy `excluded_dirs` parameter is preserved on the function signature but unused (the walk no longer crosses library trees); flagged with a leading underscore. Callers in `main.rs` stay unchanged. `MockExifDao` in `files.rs` grows the new method (returns empty); unit tests for the new `is_hash_shard` / `is_full_hash` validators guard against an operator's stashed directory under VIDEO_PATH ever matching the orphan-rm path. Both pass. A follow-up commit handles the one-shot startup migration that retires the legacy basename-keyed `.m3u8` / `.ts` files at `$VIDEO_PATH` root. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 15:41:04 -04:00
Cameron Cordes	2d56047497	Drop fs_time from date-backfill eligibility The drain queried `date_taken IS NULL OR date_taken_source = 'fs_time'` ORDER BY id ASC LIMIT 500 every watcher tick. The resolver is deterministic on file bytes + filename + fs metadata, so any row that landed on fs_time once landed there again on every retry — the drain spun on the same lowest-id rows in perpetuity, never advancing to rows 501+ while still logging more_remain=true. Side effect: 500 auto-commit UPDATEs per tick sustained the SQLite write lock long enough that other writers on separate DAO connections hit the 5s busy_timeout. Manifested as intermittent 500s on PATCH /image/faces/{id} that succeeded on retry. Narrow the partial index and query predicate to `date_taken IS NULL`. If exiftool installs or a new filename regex lands, an operator can re-resolve fs_time rows out-of-band rather than re-introducing the steady-state churn. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 14:37:36 -04:00
Cameron Cordes	bec9857426	Split main.rs: extract backfill drains and thumbnails into modules main.rs drops from 3542 → ~2930 lines by moving: - src/backfill.rs (new): backfill_unhashed_backlog, backfill_missing_date_taken, backfill_missing_content_hashes, build_face_candidates, process_face_backlog. Now unit-tested for the first time — 5 tests covering cap behavior, library-id filtering, missing-on-disk skip, and the video/unhashed/scanned filters on face-candidate selection. - src/thumbnails.rs (new): unsupported_thumbnail_sentinel, generate_image_thumbnail, create_thumbnails, update_media_counts, is_image, is_video, plus the IMAGE_GAUGE / VIDEO_GAUGE Prometheus metrics. Replaces the no-op stubs that used to live in lib.rs. 4 new unit tests for the sentinel path math and the walker-counts-images-vs-videos smoke path. Supporting: - SqliteExifDao::from_shared (test-only) so an SqliteExifDao and SqliteFaceDao can share one in-memory connection — required to test build_face_candidates against the real join. - files.rs / video/{mod,actors}.rs import from crate::thumbnails::* instead of the now-removed stubs in lib.rs. cargo test --bin image-api: 325 passing (was 314). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 12:22:02 -04:00
Cameron Cordes	e67e00ef8a	knowledge: predicate-quality nudge + bulk-reject endpoint Two coupled changes to fight the speech-act-predicate problem (facts like (Cameron, expressed, "I'm tempted to...")): 1. System prompt grows an explicit predicate-quality rule. The agent is told to use relationship-shaped verbs (lives_in, works_at, attended, is_friend_of, interested_in), and is given an explicit DON'T list (expressed, said, mentioned, stated, quoted, noted, discussed, thought, wondered). Plus a concrete Bad / Good example contrasting the noise pattern with the structured paraphrase the agent should be writing. Stops the bleed for new insights. 2. Cleanup tools for the legacy noise that's already in the table: - get_predicate_stats(persona, limit) returns [(predicate, count)] sorted desc — feeds the curation UI's PREDICATES tab. - bulk_reject_facts_by_predicate(persona, predicate, audit) flips every ACTIVE fact under that predicate to 'rejected' in one transaction, stamping last_modified_* so the action is attributable + reversible per-fact through the entity detail panel. REVIEWED facts under the same predicate are left alone — the curator may have hand-approved an exception ("interested_in" might be largely noise but a reviewed entry is intentional). New HTTP endpoints: GET /knowledge/predicate-stats?limit= POST /knowledge/predicates/{predicate}/bulk-reject Persona-scoped via the existing X-Persona-Id header. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 21:50:26 -04:00
Cameron Cordes	d123cde333	knowledge: entity-graph endpoint for force-directed view New GET /knowledge/graph?type=&limit= returns the data the curation UI's graph tab needs: - nodes = entities with at least one in-scope fact (rejected / superseded excluded). Carries fact_count for visual sizing. Top-N by count desc; default cap 200 (clamped 1..1000). - edges = relational facts (object_entity_id set) grouped by (subject, object, predicate) so 3 "is_friend_of" facts between the same pair collapse into one edge with count=3. Two raw SQL queries: an INNER JOIN onto a persona-scoped fact- count subquery for nodes (skips 0-fact entities entirely so the sim doesn't waste time on disconnected islands), then a follow- up GROUP BY over the persona-scoped fact set restricted to the node id set via IN clauses (ids are i32 so inlining is safe). Pairs with the Apollo-side GraphPanel that runs d3-force over the returned payload and renders SVG with click-to-open. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 21:26:02 -04:00
Cameron Cordes	6dca0c027d	fmt: cargo fmt sweep No logic changes - line reflow, brace placement, and method-chain splits across handlers / personas / state / faces / knowledge / insights_dao / knowledge_dao / populate_knowledge. Picked up incidentally while running fmt for the sms-search work. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 19:21:00 -04:00
Cameron Cordes	6620fa48d7	knowledge: consolidation proposals endpoint Finds near-duplicate entities the upsert-time cosine guard didn't catch — typically legacy data from before that guard landed, or pairs whose embeddings sit between 0.85 (default proposal floor) and 0.92 (auto-collapse threshold). Pure read-side feature; the actual merging still goes through the existing /knowledge/entities/merge action. New DAO method `find_consolidation_proposals(threshold, max_groups)`: - Loads every non-rejected entity with an embedding. - Partitions by entity_type so a person can't cluster with a place. - Pairwise cosine, edges above threshold feed a union-find for transitive grouping (Sara → Sarah → Sarah J. all land in one cluster). - Tracks min/max cosine per component so the UI can show "how tight" each cluster is before clicking in. - Returns groups of >= 2 sorted by size desc then max cosine desc; trimmed to `max_groups`. New endpoint `GET /knowledge/consolidation-proposals?threshold= &limit=` accepts the threshold (clamped 0.5–0.99 to prevent the "every entity in one mega-cluster" case) and returns groups with per-entity persona fact-count breakdowns baked in — saves the UI a separate query per cluster member. ConsolidationGroup is exported through database/mod.rs so the handler can use it without depending on knowledge_dao internals. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 18:43:11 -04:00
Cameron Cordes	89d0a6527c	knowledge: per-entity persona breakdown for list + detail Entities are global; facts are persona-scoped. Under the active persona an entity can read as "0 facts" while having plenty under other personas the user owns — the curation UI had no way to surface that gap. Adds a batched DAO method `get_persona_breakdowns_for_entities` that returns {entity_id → [(persona_id, count)]} in one query (group by subject + persona, user-scoped, status != rejected), and wires it into both /knowledge/entities list rows and GET /knowledge/entities/{id}. EntitySummary grows an optional `persona_breakdown` field (skipped on serialization when None — keeps PATCH responses unchanged). EntityDetailResponse carries the breakdown as a non-optional Vec since the detail endpoint always populates it. One extra query per list page (50 entities → 50 subject ids batched in one IN clause); single-entity GET adds one round trip. Indexed by (subject_entity_id, persona_id) implicitly via the existing user-persona indexes on entity_facts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 18:29:20 -04:00
Cameron Cordes	fd4dd89bbb	knowledge: agent self-correction with audit + per-persona gate + revert Bundles three coupled changes so agent-side mutations stay auditable and reversible: 1. Audit columns on entity_facts — `last_modified_by_model` / `last_modified_by_backend` / `last_modified_at`. Stamped on every mutation path (update_fact, supersede_fact, manual PATCH, manual supersede, the new revert). NULL on rows never touched since creation. Partial index on `last_modified_at WHERE NOT NULL` keeps the "show me recent edits" feed fast without bloating from legacy rows. 2. Per-persona gate `personas.allow_agent_corrections` (BOOLEAN, default 0). Defense in depth at two layers: - build_tool_definitions: when off, `update_fact` and `supersede_fact` aren't in the catalog at all, so even a hallucinated tool call by the model fails fast. - tool_update_fact / tool_supersede_fact: re-checks the persona flag at call time and returns an explicit "corrections disabled" error if it's somehow off (e.g. flag flipped mid- loop). ToolGateOpts grows the flag; current_gate_opts splits into `current_gate_opts` (no persona context, defaults closed) + `current_gate_opts_for_persona` for chat callers that have a persona id. Both call sites in insight_chat are updated. 3. Revert action — new DAO method `revert_supersession` + `POST /knowledge/facts/{id}/restore`. Flips status back to 'active', clears `superseded_by`, clears `valid_until` (we don't track whether it was hand-set vs auto-stamped, so the safe reset is to drop it — user can re-bound after). Stamps `last_modified_` so the revert itself is attributable. Manual paths (PATCH / supersede via HTTP, plus restore) stamp the audit columns with `("manual", "manual")`. Agent paths stamp the loop-time chat model and backend (mirroring the existing created_by_ convention). FactDetail in the HTTP response now carries the audit triple alongside the existing provenance. Apollo wires the new field set in the matching commit. PersonaView / UpdatePersonaRequest grow `allowAgentCorrections`; the PersonaPatch + InsertPersona + bulk_import paths thread it. 317 lib tests pass, including unchanged update_fact / supersede DAO tests (now passing audit=None — None means "no provenance context to attribute", legacy semantics). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 20:56:56 -04:00
Cameron Cordes	86c331571d	knowledge: per-persona reviewed-only mode + agent reads include reviewed Two coupled changes to the agent's recall surface: 1. Default scope expanded. recall_facts_for_photo and recall_entities used to filter to status='active' only — which silently dropped 'reviewed' (human-verified) facts. Now they surface active + reviewed by default. Reviewed is strictly more trusted than active and shouldn't have been hidden. Rejected and superseded stay filtered. 2. New persona toggle `reviewed_only_facts` (BOOLEAN, default false, migration 2026-05-10-000400). When set, the agent's recall on that persona returns ONLY facts with status='reviewed' — strict mode for tasks where hallucinated agent claims are particularly costly. Wired: - schema.rs / Persona / InsertPersona / PersonaPatch grow the field. - PersonaView returns it as `reviewedOnlyFacts` (camelCase wire). - PUT /personas/{id} accepts it (mobile editor surfaces it). - InsightGenerator now carries a PersonaDao reference so recall_facts_for_photo can read the active persona's flag at start; one extra read per recall, cheap. Composes with include_all_memories: that operates on the persona scope axis (single vs hive), reviewed_only_facts on the status axis. They're orthogonal. Legacy persona rows pick up the default false on migration; no behavior change unless explicitly toggled. The 4 existing persona construction sites (one production, two tests, one InsertPersona in knowledge_dao tests) all default the field. populate_knowledge bin + state.rs constructors also wire the new persona_dao arg. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 20:21:39 -04:00
Cameron Cordes	f53338923d	knowledge: stamp model + backend on facts for audit Adds two nullable TEXT columns to entity_facts — `created_by_model` (LLM identifier) and `created_by_backend` ("local" / "hybrid" / "manual" / NULL) — so the curator can audit which configurations produce good fact-keeping and which produce noise. photo_insights already carries model_version + backend, and entity_facts.source_insight_id links to it, but: - source_insight_id is set post-loop, so chat-continuation and regenerated-insight facts lose the link. - JOINing per read is more friction than embedding provenance on the row itself. - Manual facts (POST /knowledge/facts) have no insight at all and need their own "manual" provenance marker. Threading: execute_tool grows `model` + `backend` params, passed from the three call sites (agentic insight loop, chat single-turn, chat stream) using the loop-time `chat_backend.primary_model()` + `effective_backend` already in scope. tool_store_fact stamps the new fact accordingly; manual create_fact stamps backend="manual". Legacy rows leave both NULL — pre-tracking data can't be back- filled reliably from training_messages without burning compute. Indexes are partial (WHERE NOT NULL) so legacy rows don't bloat them, and "show me all facts from model X" stays fast. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 20:05:14 -04:00
Cameron Cordes	85f3716379	knowledge: fact supersession + photo-date valid_from Two Phase-2 followups in one commit since they're coupled at the write path: * Agent populates valid_from from the source photo's date_taken when calling store_fact. Loose semantics — date_taken is evidence at that date, not strictly when the fact started being true — but gives the curator a calendar anchor and pairs with supersession to close intervals cleanly. valid_until stays NULL (a single photo can't tell us when something stopped). Honours the existing upsert_fact dedup (corroborated facts keep their first-recorded valid_from). * Supersession: new column entity_facts.superseded_by INTEGER (migration 2026-05-10-000200), new status value 'superseded', new DAO method supersede_fact, new HTTP endpoint POST /knowledge/facts/{id}/supersede. Marking an old fact as replaced by a new one atomically: flips status to 'superseded', sets superseded_by, and stamps valid_until from the new fact's valid_from (when not already set). delete_fact clears dangling supersession pointers in the same transaction so the column never points at a missing row — no FK because SQLite can't ALTER ADD with REFERENCES, but the DAO maintains the invariant. Pairs with conflict detection from the previous slice: once the old fact's valid_until is closed, its interval no longer overlaps the new fact's, so they stop flagging — the supersede action resolves the conflict. Two tests pin the contract: supersede stamps valid_until from new.valid_from while respecting an existing valid_until, and deleting the supersedeR clears the dangling pointer while leaving the old fact's 'superseded' status in place for history. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 19:47:06 -04:00
Cameron Cordes	01f5ad7527	knowledge: valid-time on facts + interval-aware conflict detection Adds bitemporal support to entity_facts. Existing `created_at` is transaction time (when we recorded the fact); the new `valid_from` / `valid_until` BIGINT columns are valid time (when the fact is/was true in the real world). NULL on either side = unbounded on that side, both NULL = "always-true / unknown" — matches the default state of every legacy row, no backfill needed. The split matters for time-bounded predicates like is_in_relationship_with / lives_in / works_at: recording the fact once doesn't mean the relationship is still ongoing. Same predicate across different windows ("lives_in NYC 2018-2020", "lives_in SF 2020-present") is no longer a conflict — the interval-aware check in get_entity only flags pairs whose windows overlap. Facts with no valid-time data still flag against everything (worst case for legacy rows — user adds dates to suppress). API surface: - POST /knowledge/facts accepts optional valid_from / valid_until. - PATCH /knowledge/facts/{id} accepts both with tri-state semantics: field omitted = leave alone, JSON null = clear to NULL, number = set. Implemented via a small serde helper around Option<Option>. - GET /knowledge/entities/{id} surfaces both fields per fact and uses them in conflict detection. Agent path (insight_generator) writes NULL/NULL for now — deriving valid_from from the source photo's date_taken is slated for a follow-up agent tool alongside Phase 2's supersession. Test pins set + clear semantics via update_fact: setting both bounds, leaving them alone on a subsequent patch, then clearing valid_until back to NULL. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 19:25:55 -04:00
Cameron Cordes	0b8478a5e4	knowledge: list sort + persona-scoped fact_count per entity Two related additions to /knowledge/entities: - New EntitySort enum (UpdatedDesc default, NameAsc, FactCountDesc) surfaced via `?sort=updated\|name\|count`. NameAsc clusters near- duplicate names so dupes stand out at a glance; FactCountDesc surfaces heavily-used entities and demotes 0-fact noise to the bottom. - New `list_entities_with_fact_counts` DAO method that returns each entity alongside a persona-scoped count of its non-rejected facts (subject side). Persona scope follows X-Persona-Id via the existing resolve_persona_filter chain — Single filters on (user_id, persona_id), All unions across the user's personas. Implemented as one raw SQL query with a LEFT JOIN to a fact-count subquery and ORDER BY tied to the chosen sort, so count-sort needs no second round trip. The agent's existing list_entities call site is unchanged — it doesn't need persona-scoped counts and the trait method stays cheap. EntitySummary grows an Option<i64> fact_count (skip_serializing_if none) so PATCH responses stay shaped as before. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 16:04:13 -04:00
Cameron Cordes	0e2b18224f	knowledge: pre-delete relational facts so entity delete succeeds DELETE /knowledge/entities/{id} was 500ing on any entity that was the object of a relational fact. entity_facts.object_entity_id has ON DELETE SET NULL, but the table also has CHECK (object_entity_id IS NOT NULL OR object_value IS NOT NULL) — purely relational facts (subject + predicate + object_entity_id, no object_value, like "Alice is_friend_of Bob") would have both NULL after SET NULL fired, the CHECK would abort, and the whole DELETE would fail with a CHECK violation. The user just saw QueryError because the DAO swallowed the diesel error string. Wrap delete_entity in a transaction that first deletes facts where the entity is the object AND object_value is null, then deletes the entity. Surviving siblings (typed facts about the entity as subject) are CASCADE'd by the FK as before. Also start surfacing the actual diesel error in a warn log before collapsing to DbErrorKind so future similar issues don't masquerade as the opaque QueryError. A schema-level fix (changing object FK to ON DELETE CASCADE via a table-rebuild migration) is the cleaner long-term resolution and is slated for Phase 2; the DAO-side pre-delete is sufficient and less invasive in the meantime. Test pins the contract: a relational fact pointing at the deleted entity is removed, an unrelated typed fact about an unrelated entity survives, and the entity itself is deleted. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 15:44:38 -04:00
Cameron Cordes	d7aee4f228	knowledge: cosine dedup, fact create endpoint, recall nudge Phase 1 of the knowledge curation work. Three small server-side changes to support an Apollo-side curation surface and reduce the agent's near- duplicate output rate going forward: - upsert_entity grows an embedding-cosine fallback after the exact name match misses. New entities whose embedding sits above ENTITY_DEDUP_COSINE_THRESHOLD (default 0.92) against any same-type active entity collapse onto the existing row. Eliminates the Sarah / Sara / Sarah J. trio the FTS5 prefix check was missing. - POST /knowledge/facts symmetric with the existing PATCH/DELETE so the curation UI can create facts directly. Persona-scoped via X-Persona-Id; validates subject (and optional object) entity existence; reuses KnowledgeDao::upsert_fact so corroboration semantics match the agent path. - One sentence in build_system_content telling the agent to call recall_entities before store_entity when a name resembles something already known. Cheap; complements the DAO-layer guard. Includes upsert_entity_collapses_near_duplicate_by_embedding test covering both the collapse-on-near-match path and the don't-collapse-on- unrelated-embedding path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 15:16:05 -04:00
Cameron Cordes	08a5f46be1	chat: scope insight lookup by library_id to fix regen-shadow bug When a photo exists in more than one library and the user regenerates its insight from library A's chat, the regenerate streams cleanly, store_insight flips library A's old row to is_current=false, and inserts a new is_current=true row tagged (library A, rel_path). On the next history fetch the user sees their old transcript — the regenerate appears to vanish. The cause: get_insight(file_path) filters on rel_path + is_current only, so library B's untouched is_current=true row for the same rel_path satisfies the query and gets returned by SQLite's .first() ahead of A's new row. Because get_insight is also what chat_turn_stream uses to decide bootstrap vs. continuation, the next chat turn after the shadow hit also routes against the wrong insight, so update_training_messages corrupts library B's transcript with library A's chat. Fix: add get_current_insight_for_library(library_id, file_path) filtered on (library_id, rel_path, is_current=true) and route the chat surface (load_history, chat_turn{,_stream}, rewind_history) through it. load_history falls back to the cross-library get_insight when the scoped lookup misses — preserves the "scalar data merges across libraries" intent for the case where the active library has no insight but another does. The path-only get_insight stays for callers that don't have library context (populate_knowledge, the photo-grid metadata fetch). chat_history_handler stops dropping the parsed library on the floor and threads it through. Single-library deploys see no behaviour change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 14:03:41 -04:00
Cameron Cordes	fbd769e475	personas: composite FK + built-in update guard Two persona-infrastructure correctness fixes that go together because the second one (FK with CASCADE) requires the first (preventing the persona row from being mutated out from under its facts). 1. update_persona handler refuses name/systemPrompt edits to built-ins (409). includeAllMemories stays editable — that's a per-user preference, not the persona's identity. Mirrors the existing delete_persona guard. The DAO is intentionally permissive so the guard sits at the HTTP layer; persona_dao test pins that contract. 2. Migration 2026-05-10 adds user_id to entity_facts and a composite FK (user_id, persona_id) -> personas(user_id, persona_id) ON DELETE CASCADE. This closes two issues at once: - Persona orphans: deleting a custom persona used to leave its facts dangling forever, readable only via PersonaFilter::All. CASCADE now wipes them with the persona row. - Multi-user fact leakage: PersonaFilter::Single("default") used to surface every user's default-scoped facts. PersonaFilter is now { user_id, persona_id } and all read paths (get_facts_for_entity, list_facts, get_recent_activity) filter on user_id first. upsert_fact's dedup key extends to user_id so identical claims under shared persona names from different users no longer corroborate-bump each other's confidence. - user_id threads from Claims.sub.parse::<i32>().unwrap_or(1) at the chat / insight handlers through ChatTurnRequest, the streaming agentic loop, execute_tool, and into the leaf tools (tool_store_fact, tool_recall_facts_for_photo). The ".unwrap_or(1)" accommodates Apollo's service token whose sub is non-numeric on legacy mints. - Backfill picks the smallest user_id matching each legacy fact's persona_id so the FK holds for already-stored rows. Five new knowledge_dao tests with FK-on connection: persona scoping isolation, All-variant union per-user, dedup not crossing users, CASCADE delete, FK rejection of unknown personas. Plus dao_update_does_not_block_built_ins documenting where the HTTP-layer guard lives. Apollo coordinates separately — the matching changes there add the /api/personas proxy and start sending persona_id on photo-chat turns. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 13:30:35 -04:00
cameron	25233904aa	Merge pull request 'personas: elevate to server with per-persona fact scoping' (#88 ) from feature/persona-knowledge-segmentation into master Reviewed-on: #88	2026-05-10 03:44:26 +00:00
Cameron Cordes	9871c685b4	date-override: cargo fmt Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 21:23:11 -04:00
Cameron Cordes	108bbeb029	date-override: union semantics across libraries + slash forms The date-override path used to look up `image_exif` strictly by `(library_id, rel_path)` with only the forward-slash form, while `/image/metadata`'s `get_exif` falls back across libraries and tries both slash forms. A photo whose row sat under a different library_id than its filesystem-resolved one — or whose rel_path was stored with backslashes — rendered fine in the modal but 404'd on save. `set_manual_date_taken` / `clear_manual_date_taken` now share a `locate_image_exif_row` helper that mirrors `get_exif`'s union semantics (scoped lookup first, library-agnostic fallback by rel_path in both slash forms), then update by primary key so the write hits exactly the row read. Inner anyhow errors are logged with `(library_id, rel_path)` so the next failure mode is debuggable. Handler-side: `resolve_library_param` errors no longer silently fall back to the primary library (which would have masked the original bug with a different "row not found"); a malformed library param now returns 400. New `DbErrorKind::NotFound` lets the handler distinguish genuine misses (404) from real DB failures (500). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 21:21:25 -04:00
Cameron Cordes	3e2f36a748	personas: elevate to server with per-persona fact scoping Move personas off the mobile client into ImageApi as first-class records, and scope entity_facts by persona so each one builds its own voice over a shared entity graph. The new include_all_memories flag lets a persona opt back into the full hive-mind pool for human browsing of /knowledge/*; agentic generation always stays in-voice. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 17:59:20 -04:00
Cameron Cordes	b42acbb3f3	fmt: cargo fmt sweep across drifted files No behavior change — purely whitespace/line-break cleanup that had accumulated since the last format run. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 16:42:41 -04:00
Cameron Cordes	e539c083c9	insight-chat: code-review polish on the tool-gating PR - search_messages now delegates to search_messages_with_contact(.., None) so the two methods share a single HTTP path. Drops the dead-code warning and the ~30-line duplication. - DailySummaryDao gains has_any_summaries (LIMIT 1 existence probe) used by current_gate_opts; the SELECT COUNT(*) get_total_summary_count added in the prior commit is removed (it had no other caller). - current_gate_opts doc comment corrected to describe what the probes actually do. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 15:07:57 -04:00
Cameron Cordes	f50d32667b	insight-chat: ToolGateOpts + per-tool description rewrites Tools whose backing tables are empty (calendar, location_history, daily_summaries) drop out of the catalog so the LLM doesn't waste iteration budget calling them only to receive "no results found". Vision and apollo gates already existed; this generalizes the pattern. search_messages gains start_ts/end_ts/contact_id filters (date filter is a client-side post-filter; SMS-API only accepts contact_id natively on the search endpoint). Descriptions follow a consistent convention: one sentence (what + when), param semantics, examples for tools with non-obvious param choices. No more all-caps headers, no more identity-prescriptive language inside descriptions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 14:56:58 -04:00
Cameron Cordes	7e1c4ab318	backfill_date_taken: surface the actual diesel error in warnings The DAO swallowed every diesel::update failure as a flat `anyhow!("Update error")`, then trace_db_call further reduced it to `DbError { kind: UpdateError }`. Operators saw "update failed for lib 2 Snapchat/foo.mp4: DbError { kind: UpdateError }" with no clue why (constraint violation? type mismatch? row vanished mid-flight? DB locked?). Two changes: - Preserve the diesel error in the anyhow chain along with the input params (lib, rel_path, date_taken, source) so the cause is visible. - Log the chain at warn-level inside the DAO before the trace wrapper collapses it to DbErrorKind::UpdateError, so the warning at the call site finally has something diagnosable next to it. - Treat zero-row updates as a debug-level "row likely retired by the missing-file scan" rather than a hard failure — that case is benign and shouldn't poison the drain's error tally. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 11:07:17 -04:00
Cameron Cordes	832b50d587	image_exif: manual date_taken override (set/clear endpoints) Add `POST /image/exif/date` and `POST /image/exif/date/clear` so an operator can correct a row whose canonical-date waterfall landed on the wrong value (camera clock reset, fs_time fallback for a copied-from- backup file, etc). New `original_date_taken` / `original_date_taken_source` columns snapshot the prior value on first override so revert is lossless. The waterfall source set is now `'exif' \| 'exiftool' \| 'filename' \| 'fs_time' \| 'manual'`. The existing `idx_image_exif_date_backfill` partial index already filters to `date_taken IS NULL OR date_taken_source = 'fs_time'`, so manual rows are naturally excluded from the per-tick drain — no index change needed. `ExifMetadata` now exposes `date_taken_source` + originals so a UI can render "manually set; was X via filename". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-06 19:26:43 -04:00
Cameron Cordes	7f12890f4b	memories: single-SQL rewrite + 20-year lookback Replaces the EXIF-loop + WalkDir-fallback pipeline that powered `/memories` with a single per-library SQL query (`get_memories_in_window`) that uses `strftime('%m-%d' \| '%W' \| '%m', date_taken, 'unixepoch', tz_offset)` for calendar matching in the client's timezone, plus a `years_back` lower bound and a no-future-dates upper bound. Returns only the matching rows; the handler applies per-library `PathExcluder` post-query and sorts. Drops: - `collect_exif_memories` — replaced by the single SQL query. - `collect_filesystem_memories` — the canonical-date pipeline now populates `date_taken` for every row at ingest, so the WalkDir fallback that scanned 14k+ files each request is no longer needed. - `get_memory_date_with_priority` and friends — request-time waterfall superseded by `date_resolver` running at ingest. The associated three priority-tests are dropped; their replacement lives in `date_resolver::tests`. On a ~14k-file library this drops `/memories` from 10–15 s (dominated by `fs::metadata` per row) to single-digit ms. Bumps `DEFAULT_YEARS_BACK` from 15 → 20 to surface deeper archives on matching anniversaries. Note vs. ISO weeks: the original Rust used `chrono::iso_week().week()` for week-span matching. SQLite's `%W` is Monday-anchored but uses week 0 for days before the first Monday, so it can disagree with ISO at year boundaries by ±1. Acceptable for nostalgia browsing. Adds 3 new DAO tests covering month-span filter, library scoping, and the unknown-span-token guard. Also adds a CLAUDE.md section describing the canonical-date pipeline end-to-end and the new `DATE_BACKFILL_MAX_PER_TICK` env var. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-06 16:04:09 -04:00
Cameron Cordes	54e0635a98	date_backfill: per-tick drain for unresolved date_taken rows Adds two ExifDao methods (`get_rows_needing_date_backfill` / `backfill_date_taken`) and a `backfill_missing_date_taken` watcher pass that runs on every tick alongside `backfill_unhashed_backlog`. The drain queries the partial index for rows where `date_taken IS NULL` or `date_taken_source = 'fs_time'`, batches up to `DATE_BACKFILL_MAX_PER_TICK` paths (default 500), and feeds them through `date_resolver::resolve_dates_batch` — a single exiftool subprocess covers the whole tick. Rows that newly resolve to `exiftool` / `filename` / `fs_time` get persisted via `backfill_date_taken` (touches only `date_taken` + `date_taken_source` so EXIF / hash / perceptual columns survive). `filename`-sourced rows are intentionally not re-resolved — the regex is authoritative when it matches and re-running exiftool wouldn't change the answer. Files that have disappeared from disk are skipped so a ghost row doesn't loop through the drain forever; the missing-file scan in `library_maintenance` retires those separately. Comes with two DAO unit tests (eligibility filter + column-isolation). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-06 16:03:03 -04:00
Cameron Cordes	84326501a9	image_exif: add date_taken_source column New nullable TEXT column tracks which step of the canonical-date waterfall (kamadak-exif → exiftool → filename → fs_time) populated `date_taken`. Lets a later per-tick drain re-resolve weak sources (`fs_time`) once stronger ones become available, and gives the UI/debug surface a way to answer "why does this photo show up under this date?". Adds the column at all `InsertImageExif` construction sites with `None` placeholders (the resolver wiring lands in a follow-up commit), and extends the `update_exif` SET tuple so the column survives the GPS-write re-read path. Partial index `idx_image_exif_date_backfill` is created for the upcoming drain query. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-06 15:57:49 -04:00
Cameron Cordes	67cf0c7f73	duplicates: folder-pair view of exact dups Bucket exact-dup rows by (library_id, dirname) pair on each side, then filter by coverage = shared / min(folder_a_total, folder_b_total) and an absolute floor on shared count. Surfaces "this folder is mostly contained in that folder" matches that the per-file EXACT view buries under one row each — e.g. an old phone-backup tree shadowing the organized library, or a topic-grouped folder duplicating a date-grouped one within the same library. New endpoint: GET /duplicates/folder-pairs?library=&include_resolved= &min_coverage=&min_shared=. Cached 5 min keyed on (library, include_resolved); the user-tunable thresholds filter the cached unfiltered pair list so slider drags don't re-bucket. Shares the resolve / unresolve flow with the existing tabs — the frontend fans out N parallel /resolve calls, one per shared content_hash. Folder names carry no signal (BMW lives under Night Photos, not BMW_backup), so bucketing is purely on (library_id, dirname) co-occurrence in exact-dup groups. Within-folder dups (same hash twice in the same folder) are skipped — those belong to the EXACT tab. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-06 12:43:29 -04:00
Cameron Cordes	57b7bad086	duplicates: library-aware visibility — only hide a demoted row when its survivor is reachable Soft-marked rows used to disappear from /photos globally, including from a library-scoped view that didn't contain the survivor at all. A user browsing lib A who'd promoted a file from lib B as the survivor would silently lose visibility on their own copy in lib A, even though lib B's file isn't reachable from lib A's view. Library-scoped queries now keep a demoted row visible when its survivor lives in a library outside the current scope. Implemented as a NOT EXISTS subquery against the same image_exif table aliased as `survivor`. The unscoped (all-libraries) view is unchanged — every survivor is reachable, so demoted rows stay hidden as before. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 18:24:07 -04:00
Cameron Cordes	7584cd8792	duplicates: perceptual hash + soft-mark resolution + upload 409 Adds pHash + dHash columns alongside the existing blake3 content_hash so near-duplicates (re-encoded, resized, format-converted copies) become queryable. /duplicates/{exact,perceptual} return groups; /duplicates/ {resolve,unresolve} flip a duplicate_of_hash soft-mark on losing rows and union perceptual-only tag sets onto the survivor. The default /photos listing filters duplicate_of_hash IS NULL so demoted siblings stop cluttering the grid; include_duplicates=true opts back in for Apollo's review modal. Upload now hashes bytes pre-write and returns 409 with the canonical sibling when a file's bytes already exist. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 17:36:01 -04:00
Cameron Cordes	fb4df4b195	style: cargo fmt sweep Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 19:01:00 -04:00
Cameron Cordes	814066551e	multi-library: per-library excluded_dirs Adds a nullable comma-separated TEXT column to the libraries table. Effective excludes for a walk = (env-var globals) ∪ (library.excluded_dirs). Empty / NULL = no library-specific extras; the global env var still applies. Migration (2026-05-01-110000_libraries_excluded_dirs) ALTER TABLE libraries ADD COLUMN excluded_dirs TEXT. NULL on every existing row — no behavior change on upgrade. Library struct + helpers (libraries.rs) - Library gains excluded_dirs: Vec<String>, parsed from the column by parse_excluded_dirs_column (drops empties / whitespace, matches the env-var parser). - Library::effective_excluded_dirs(globals) returns the union. - From<LibraryRow> hydrates the field on AppState construction so /libraries surfaces it. Watcher / walkers / memories Every per-library walker now consults the effective set: - process_new_files (file-watch ingest, RAW/EXIF/face) - process_face_backlog (filter_excluded inherits) - create_thumbnails (startup + new-file branch) - update_media_counts (Prometheus gauge) - cleanup_orphaned_playlists (per-library source-existence check) - memories endpoint (PathExcluder) Effective set is computed once per per-library iteration in the watcher tick and threaded through; called functions retain their flat &[String] signature (no per-library awareness needed inside the walker primitives). Use case: mount a parent directory while a sibling library covers a child subtree, and exclude the child subtree from the parent so the libraries don't double-walk / double-write image_exif. With hash-keyed derived data (Branches B/C), the duplication-avoidance is the only cost prevented — face / tag / insight sharing was already correct via content_hash. Tests: 228 pass (226 from previous + 2 new in libraries::tests: parse_excluded_dirs_column edge cases, effective_excluded_dirs_unions_global_and_per_library). CLAUDE.md gains a "Per-library excludes" subsection of the multi-library data model. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 19:54:17 +00:00
Cameron Cordes	3598bb2cfe	multi-library: operator kill switch via libraries.enabled A small follow-up to Branches A/B/C. Adds a nullable-default-1 boolean column to the `libraries` table that controls whether the watcher considers the library at all. Useful for staging a new mount before committing to ingest, and as a maintenance kill switch when a library needs to be quiet without being unmounted. Migration (2026-05-01-100000_libraries_enabled_flag) ALTER TABLE libraries ADD COLUMN enabled BOOLEAN NOT NULL DEFAULT 1. Existing rows stay enabled — no behavior change on upgrade. Watcher gate (main.rs) At the top of the per-library loop, if !lib.enabled { continue; } — runs BEFORE the availability probe. Disabled libraries don't enter the health map, don't get probed, don't get ingest, don't get any maintenance pass. The initial sweep before the loop's first sleep also skips disabled libraries. Orphan-GC consensus (library_maintenance.rs) all_libraries_online filters disabled libraries out of the consensus check — they're treated as out-of-scope, not as blockers. Otherwise flipping enabled=false would permanently halt orphan GC for the rest of the system, which is the opposite of the intended kill-switch semantics. Cross-library duplicates: safe by construction. Hash-keyed derived data (face_detections, tagged_photo with hash, photo_insights with hash) is anchored by ANY image_exif row carrying the hash. Disabling a library does NOT delete its image_exif rows, so a hash referenced by a disabled library's row stays anchored — derived data survives. collect_orphan_hashes deliberately doesn't filter image_exif by library.enabled for exactly this reason. No HTTP endpoint. Library mutation is rare-enough infra work that a SQL toggle is fine, and a public mutation endpoint without a role / permission story would be poorly-prioritized exposure for a single-user tool. Documented in CLAUDE.md. Tests: 226 pass (225 from Branch C + 1 new all_libraries_online_treats_disabled_as_out_of_scope, which proves that even an explicit Stale entry on a disabled library doesn't block the consensus). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 19:10:24 +00:00
Cameron Cordes	263e27e108	multi-library: handoff + orphan GC with two-tick consensus Branch C of the multi-library data-model rollout. Implements the operational maintenance pipeline pinned in CLAUDE.md → "Multi-library data model" / "Library availability and safety". Branches A and B land first; this branch builds on top. New module: src/library_maintenance.rs Three idempotent passes the watcher runs every tick after the per-library ingest loop: 1. Missing-file scan (per online library) For each Online library, load a paginated page of image_exif rows (IMAGE_EXIF_MISSING_SCAN_PAGE_SIZE, default 500), stat() each one, and delete rows whose source file is NotFound. Permission/IO errors are skipped, never deleted. Capped at IMAGE_EXIF_MISSING_DELETE_CAP_PER_TICK (default 200) per library per tick — so a pathological mount that returns NotFound for everything can't wipe the table in one cycle. Cursor advances across ticks, wraps on partial-page returns, and naturally cycles through the entire library over many minutes. Skipped wholesale for Stale libraries via the existing probe gate. 2. Back-ref refresh (DB-only) For face_detections / tagged_photo / photo_insights: any hash-keyed row whose (library_id, rel_path) no longer matches an image_exif row, but whose content_hash does, is repointed at a surviving image_exif location. Pure SQL with EXISTS guards so rows whose hash is fully orphaned are left alone (the orphan GC handles those). Idempotent; no availability gate needed. This is what makes a recent → archive move invisible to readers: when pass 1 retires the lib-A row, pass 2 pivots tags / faces / insights to lib-B's surviving path before any client notices. 3. Orphan GC (destructive) Hash-keyed derived rows whose content_hash has no image_exif referent are GC-eligible. Two-tick consensus: a hash must be observed orphaned on two consecutive ticks AND every library must be Online for both. A single Stale tick within the window cancels all pending deletes (they remain marked but won't be promoted) — they're re-evaluated next tick. The pending set lives in OrphanGcState (in-memory); a watcher restart resets it, which can only delay a delete, never cause one. Hashes that re-appear in image_exif between ticks are "revived" from the pending set (handles transient share unmount / remount). Two new ExifDao methods: - list_rel_paths_for_library_page(library_id, limit, offset) for the paginated missing-file scan. - (count_for_library landed in Branch A.) Watcher wiring (main.rs) Per-library: missing-file scan inside the existing per-library loop, after process_new_files, gated by the same probe check that already protects ingest. After the loop: reconcile (Branch B), back-ref refresh, then run_orphan_gc. The maintenance connection is opened once per tick (image_api::database::connect), used by all three DB-only passes, and dropped at end of tick. CLAUDE.md gains a "Maintenance pipeline" subsection that describes the three passes and their interaction with the existing availability-and-safety policy. Tests: 225 pass (217 from Branch B + 8 new in library_maintenance covering back-ref refresh including the fully-orphaned no-op case, two-tick GC consensus, Stale-tick consensus reset, image_exif re-appearance revival, multi-table delete, and the all_libraries_online helper). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 16:27:53 +00:00
Cameron Cordes	48cac8c285	multi-library: hash-keyed tagged_photo + photo_insights with reconciliation Branch B of the multi-library data-model rollout. tagged_photo and photo_insights now follow the bytes (content_hash), not the path, matching the policy pinned in CLAUDE.md "Multi-library data model". Branch A's availability probe and EXIF scoping land first; this branch builds on top. Migration (2026-05-01-000000_hash_keyed_derived_data) Adds nullable content_hash columns to tagged_photo and photo_insights, with partial indexes on the non-null subset to keep the index small during the transitional window. The migration backfills from image_exif: * tagged_photo joins on rel_path alone (no library_id available); * photo_insights joins on (library_id, rel_path), unambiguous. Rows whose image_exif hash isn't known yet stay null and the runtime reconciliation pass populates them as the hash backlog drains. Insert-time population TagDao::tag_file looks up image_exif.content_hash by rel_path before inserting; the hash is written into the new column. InsightDao::store_insight does the same scoped to (library_id, rel_path). Caller-supplied hash on InsertPhotoInsight wins; otherwise the DAO does the lookup. Both paths fall back to None if the hash isn't known yet — reconciliation backfills. Reconciliation (database/reconcile.rs) Three idempotent passes the watcher runs once per tick after the per-library backfill loop: 1. tagged_photo NULL hashes → populate from image_exif by rel_path. 2. photo_insights NULL hashes → populate by (library_id, rel_path). 3. photo_insights scalar merge — when multiple is_current rows share a content_hash, keep the earliest generated_at as current; demote the rest. Demoted rows keep their data so /insights/history is unaffected; only the "current" pointer narrows to one per hash. No filesystem dependency, so reconcile doesn't need the availability gate; runs every tick. Logs once when something changed, debug otherwise. Tags are set-valued under the policy (union on read, already DISTINCT in queries), so there is no analogous tag-collapse pass — duplicate (tag_id, content_hash) rows across libraries are harmless. Read paths are unchanged in this branch — lookup_tags_batch's existing rel_path-via-hash-sibling expansion still produces the correct merge. A follow-up can simplify reads to use the new column directly for performance. Tests: 217 pass (212 pre-existing + 5 new in reconcile covering NULL-fill, hash-not-yet-known no-op, library scoping on insights, earliest-wins collapse, idempotency). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 14:52:16 +00:00
Cameron Cordes	eea1bf3181	multi-library: availability probe + scoped EXIF queries + collision fixes Branch A of the multi-library data-model rollout. Three threads of correctness/safety work that ship together because the new mount needs all three before it can land: 1. Library availability probe (libraries.rs, state.rs, main.rs) New LibraryHealth (Online \| Stale { reason, since }) and a shared LibraryHealthMap on AppState. Probe checks root_path exists + is_dir + readable + non-empty (relative to a "had_data" signal so fresh mounts aren't downgraded). The watcher tick begins with a refresh_health() per library; stale libraries skip ingest, the hash backfill, and face-detection backlog drains for that tick. The orphaned-playlist cleanup also gates on every library being online — a missing source on a stale library is indistinguishable from a transient unmount, and the cleanup is destructive. /libraries now returns each library with its current health state. Logs only on Online↔Stale transitions so a long outage doesn't spam. New ExifDao::count_for_library is the "had_data" signal. 2. EXIF queries scoped by library_id (database/mod.rs, files.rs, main.rs, tags.rs) query_by_exif gains an Option<i32> library filter; /photos and /photos/exif now pass it. Without this, an EXIF-filtered request scoped to ?library=N returned cross-library results because the handler resolved the library but didn't push it through to SQL. get_exif_batch gains the same option. The watcher's per-library ingest, face-candidate build, and content-hash backfill all scope to their library; the union-mode /photos date-sort path and the library-agnostic tag fan-out (lookup_tags_batch, by design) keep using None. 3. Derivative-path collision fixes (content_hash.rs, main.rs) New content_hash::library_scoped_legacy_path helper: <derivative_dir>/<library_id>/<rel_path>. Thumbnail generation (startup walk + watcher needs-thumb check) and serving now use it; serving falls back to the bare-legacy mirrored path so pre-multi-library deployments keep working without regeneration. Without this, lib2 with the same rel_path as lib1 would have its thumbnail request short-circuit to lib1's image. Orphaned-playlist cleanup walks every library when checking for the source video (was: BASE_PATH only). Without this, mounting a 2nd library and waiting 24h would delete every playlist whose source lived only in the 2nd library. The HLS playlist write path collision (filename-only basename, not rel_path) is left as a known issue with a TODO at the call site — the actor-pipeline rewrite belongs in Branch B/C. Tests: 212 pass (cargo test --lib). New tests cover the probe states (online / missing root / non-dir / empty-with-prior-data), refresh_health transitions, query_by_exif scoping, get_exif_batch keying on (library_id, rel_path), library_scoped_legacy_path, and count_for_library. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 14:12:49 +00:00
Cameron	44d677528e	tags: add edit + delete endpoints, enable FK enforcement PUT /image/tags/{id} renames a tag globally; DELETE /image/tags/{id} removes a tag and every photo's reference. Rename returns 200/404/409 (case-insensitive name conflict) / 400 (empty name); delete returns 204/404. New migration adds a UNIQUE COLLATE NOCASE index on tags.name with a pre-flight pass that collapses existing case- insensitive duplicates onto the lowest id. The connection setup now sets PRAGMA foreign_keys = ON. The schema already declares ON DELETE CASCADE / SET NULL on several tables — those clauses were documentation-only because SQLite has FK enforcement off per-connection by default. Audited every diesel::delete site; each touches either no inbound FKs or has a matching policy. delete_tag relies on the tagged_photo cascade instead of doing manual cleanup. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 20:26:35 -04:00
Cameron Cordes	db9dc63e5e	sqlite: enable WAL + busy_timeout in connect(); 408/413/429 transient The DB connection helper now sets `journal_mode=WAL`, `busy_timeout=5000`, and `synchronous=NORMAL` on every connection. 13+ DAOs each open their own connection through this helper and share one SQLite file — without WAL, a writer's exclusive lock blocks readers and `load_persons` racing the face-watch write storm errored instantly with "database is locked". GPU face inference made this visible by speeding detect ~10× and flooding the writer side. WAL persists in the file once set so the debug binaries that bypass connect() inherit it automatically. Also widen face_client.rs's classifier: 408 / 413 / 429 are now Transient instead of Permanent. These are operator-fixable proxy/infra errors; marking them Permanent poisons every affected photo with status='failed' and requires manual SQL to recover. Specifically, Apollo's nginx defaulted to a 1 MB body cap and silently rejected normal-size photos before they reached the backend — the deferred-and-retry contract is the right behavior for that class of fault. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 18:13:15 +00:00
Cameron Cordes	6a6a4a6a46	tags: batch lookup expands content-hash siblings cross-library The first cut matched by rel_path only — fine for single-library deploys but wrong for multi-library setups where the same content lives under different rel_paths (e.g. a backup mount holding copies of the primary library). A tag applied under library A would silently not appear in the library-B grid badge even though the carousel's per-path /image/tags would resolve it correctly via siblings. The batch handler now does the expansion server-side in three queries regardless of input size: 1. image_exif batch lookup → query path → content_hash 2. image_exif JOIN by content_hash → all sibling rel_paths sharing each hash (paths are deduped across libraries) 3. tagged_photo + tags JOIN over the union of (query + sibling) rel_paths Tags are then aggregated back to query paths via a sibling→originals reverse map, deduped by tag id. Files without a content_hash (just indexed, hash compute pending, etc.) skip step 2 and only get tags from their own rel_path — same fallback the per-path handler uses. Adds ExifDao::get_rel_paths_for_hashes (batch counterpart of get_rel_paths_by_hash) chunked at 500 to stay under SQLite's SQLITE_LIMIT_VARIABLE_NUMBER. Five queries for a 4k-photo grid is still ~800x cheaper than per-path HTTP fan-out.	2026-04-30 00:36:44 +00:00
Cameron Cordes	7303fb8aa3	faces: ignore/junk bucket — DB schema + lazy-create endpoint A single global "Ignored" person row, marked is_ignored=true, that the frontend lazily creates on first use to hold strangers, false detections, and faces the user doesn't want bound to a real person. Schema (new migration 2026-04-29-000200_add_is_ignored): - persons.is_ignored BOOLEAN NOT NULL DEFAULT 0 - Partial index on (is_ignored) WHERE is_ignored = 1; small WHERE set means a tiny index that only ever services the bucket lookup. Why a real persons row instead of a separate table or status enum: - face_detections.person_id stays a clean foreign key — no special code paths for "ignored faces" anywhere else in the schema. - The cluster-suggester already filters by `person_id IS NULL`, so bound-to-ignored faces are naturally excluded from re-clustering without any change. - merge / rename / delete all work on it with the existing routes (the management UI just hides it from default views). DAO additions / changes: - get_or_create_ignored_person (idempotent; race-safe via the UNIQUE COLLATE NOCASE on persons.name + retry-on-409 fallback). - list_persons gains an include_ignored parameter; default false so the management screen hides the bucket unless asked. - find_persons_by_names_ci filters is_ignored=0 in SQL so the auto-bind path can NEVER target the bucket — even if the user happens to tag photos as "Ignored", the heuristic look-up skips it. Bucket assignment is always an explicit operator action. - update_person accepts is_ignored: Option<bool> so a person can be moved into / out of the bucket without a delete + recreate. Routes: - POST /persons/ignore-bucket — returns the bucket, creating it on first call. Frontend uses this lazily right before binding. - GET /persons gains ?include_ignored=true; default behavior unchanged. - PATCH /persons/{id} now accepts is_ignored. Tests: ignore_bucket_idempotent_and_filters_auto_bind covers the contract: bucket is idempotent across calls, find_persons_by_names_ci skips it (even on exact name match), default list_persons hides it, include_ignored=true surfaces it. All other tests updated to pass the new is_ignored: false / Option<bool> fields explicitly. cargo test --lib: 181/0; fmt + clippy clean for new code. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 22:48:16 +00:00
Cameron Cordes	860169032b	faces: phase 2 — schema + manual face/person CRUD Land the persistence model and HTTP surface for local face recognition. Inference still lives in Apollo (Phase 1); this side adds the data home plus every endpoint Apollo's UI and FileViewer-React will consume. Schema (new migration 2026-04-29-000000_add_faces): - persons: visual identities. Optional entity_id bridges to the existing knowledge-graph entities table; auto-bridging is left to the management UI (we don't muddy LLM provenance from face rows). UNIQUE(name COLLATE NOCASE) so 'alice' / 'Alice' fold to one row. - face_detections: keyed on content_hash (cross-library dedup), with status='detected' carrying bbox + 512-d embedding BLOB, and 'no_faces' / 'failed' marker rows that tell Phase 3's file watcher not to re-scan. Marker invariant enforced via CHECK; partial UNIQUE on content_hash WHERE status='no_faces' guards against double-marks. Schema regenerated with `diesel print-schema` against a clean migration run; joinables added for face_detections → libraries / persons and persons → entities. face_client.rs (sibling of apollo_client.rs): - reqwest multipart, 60 s timeout (CPU inference on a backlog can be slow; bounded threadpool on Apollo serializes calls anyway). - FaceDetectError::{Permanent, Transient, Disabled} — Phase 3 keys its marker-row decision on this. 422 → mark failed, 5xx → defer. - APOLLO_FACE_API_BASE_URL falls back to APOLLO_API_BASE_URL when unset; both unset = is_enabled() false, callers no-op. faces.rs (DAO + handlers): - SqliteFaceDao implements the full FaceDao trait; person face counts go through sql_query because diesel's BoxedSelectStatement + group_by trips trait-resolver recursion. - merge_persons re-points face rows in a transaction, copies notes when target's are empty, deletes src. - manual POST /image/faces resolves content_hash through image_exif, crops the user-drawn bbox with 10% padding (detector wants context around ears/jaw), POSTs the crop to face_client.embed for a real ArcFace vector, then inserts source='manual'. - Cluster-suggest (Phase 6) gets its data from GET /faces/embeddings — base64-encoded paged BLOBs so Apollo's DBSCAN can stream them without ImageApi pre-aggregating. Endpoints registered alongside add_*_services in main.rs: GET /faces/stats?library= GET /faces/embeddings?library=&unassigned=&limit=&offset= GET /image/faces?path=&library= POST /image/faces (manual create via embed) PATCH /image/faces/{id} DELETE /image/faces/{id} GET /persons?library= POST /persons GET /persons/{id} PATCH /persons/{id} DELETE /persons/{id}?cascade=set_null\|delete (set_null default) POST /persons/{id}/merge GET /persons/{id}/faces?library= The file-watch hook (Phase 3) and the rerun-on-one-photo handler (Phase 6) live behind the FaceDao methods marked dead_code today — they're called only when those phases land. Same shape for the trait methods that aren't reached by Phase 2 routes. Tests: 3 DAO unit tests cover person CRUD + case-insensitive uniqueness, marker-row idempotency (mark_status is a no-op when any row exists), and merge re-pointing faces. Cargo.toml: reqwest gains the `multipart` feature. cargo build / cargo test --lib / cargo fmt / cargo clippy --all-targets all clean for the new code; the two pre-existing test_path_excluder failures and the pre-existing sort_by clippy warnings are unrelated and present on master. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 18:03:42 +00:00
Cameron	f0ae9f95dc	feat(ai): few-shot exemplars + sticky Ollama preference - Few-shot injection on /insights/generate/agentic: compresses prior training_messages into trajectory blocks (tool calls + result summaries) and injects into the system prompt. Hardcoded default ids with optional request override. - New fewshot_source_ids column on photo_insights (+ migration) to track which exemplars influenced a given row, for downstream training-set filtering. Chat amend rows stamp None with a lineage note. - Ollama client now remembers which server (primary/fallback) most recently succeeded and tries it first on the next call, via a shared Arc<AtomicBool>. Avoids re-404ing the primary on every agent iteration when the chosen model only lives on the fallback. - Demote noisy logs: daily_summary "Summary match" lines to debug; inner chat_with_tools non-2xx body log from error to warn (outer layer owns the terminal-error signal). - Drift-guard tests for summarize_tool_result covering the success / empty / error / unknown shape for every tool. - Tidy: three pre-existing clippy warnings cleaned up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 13:54:06 -04:00
Cameron	0b9528f61e	feat(ai): chat continuation for photo insights (server v1) Adds POST /insights/chat and GET /insights/chat/history. Replays the stored agentic conversation through the same backend the insight was generated with (or a per-turn override), runs a short tool-calling loop, and persists the extended history in append or amend mode. Backend switching: same-backend or hybrid->local replay verbatim; local->hybrid is rejected in v1 (would require on-the-fly vision description rewrite). Per-(library, file) async mutex serialises concurrent turns. Soft context budget drops oldest tool_call+result pairs when the serialized history exceeds num_ctx - 2048 tokens. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 13:00:27 -04:00

1 2 3

104 Commits