Commit Graph

3 Commits

Author SHA1 Message Date
Cameron Cordes
db9dc63e5e sqlite: enable WAL + busy_timeout in connect(); 408/413/429 transient
The DB connection helper now sets `journal_mode=WAL`, `busy_timeout=5000`,
and `synchronous=NORMAL` on every connection. 13+ DAOs each open their
own connection through this helper and share one SQLite file — without
WAL, a writer's exclusive lock blocks readers and `load_persons` racing
the face-watch write storm errored instantly with "database is locked".
GPU face inference made this visible by speeding detect ~10× and
flooding the writer side. WAL persists in the file once set so the
debug binaries that bypass connect() inherit it automatically.

Also widen face_client.rs's classifier: 408 / 413 / 429 are now Transient
instead of Permanent. These are operator-fixable proxy/infra errors;
marking them Permanent poisons every affected photo with status='failed'
and requires manual SQL to recover. Specifically, Apollo's nginx
defaulted to a 1 MB body cap and silently rejected normal-size photos
before they reached the backend — the deferred-and-retry contract is
the right behavior for that class of fault.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 18:13:15 +00:00
Cameron Cordes
f77e44b34d faces: fix PathExcluder false-positive + cover face_client/crop in tests
PathExcluder was iterating every component of the absolute path,
including the system prefix. Two of the existing memories tests had
been failing on master because tempdir() lives under /tmp on Linux
and a pattern like "tmp" then matched the system /tmp component
rather than anything the user actually asked to exclude. Phase 3's
file-watch hook will use the same code to skip @eaDir / .thumbnails
under each library's BASE_PATH, so the bug would hide every photo
on a host whose BASE_PATH passes through a directory named the same
as a user pattern.

Fix: store base in PathExcluder and strip it before scanning
components. A path that lives outside base falls through to the
no-match branch (defensive — nothing legit hits that today).

Also extracted the face_client error classification into a pure
classify_error_response(status, body) so the marker-row contract
with Apollo (422 → Permanent / 'failed', 5xx → Transient / defer)
is unit-testable without spinning up an HTTP server.

New tests:
  memories::tests::test_path_excluder_*           — 2 previously
    failing tests now pass.
  ai::face_client::tests::classify_*              — 4 cases:
    422 decode_failed → Permanent, 503 cuda_oom → Transient
    (handles both string and {code:..} detail shapes), 5xx →
    Transient + other 4xx → Permanent, unparseable HTML body still
    classifies on status.
  faces::tests::crop_*                            — 3 cases:
    invalid bbox rejected, valid bbox round-trips through JPEG
    decode, corner crop with 10% padding clamps inside source.

cargo test --lib: 165 passed / 0 failed (was 156 / 2 failed).
cargo fmt and clippy on new code clean. The remaining
sort_by clippy warnings in pre-existing files (memories.rs,
files.rs, exif.rs) are unrelated and present on master.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 18:09:44 +00:00
Cameron Cordes
860169032b faces: phase 2 — schema + manual face/person CRUD
Land the persistence model and HTTP surface for local face recognition.
Inference still lives in Apollo (Phase 1); this side adds the data home
plus every endpoint Apollo's UI and FileViewer-React will consume.

Schema (new migration 2026-04-29-000000_add_faces):
  - persons: visual identities. Optional entity_id bridges to the
    existing knowledge-graph entities table; auto-bridging is left to
    the management UI (we don't muddy LLM provenance from face rows).
    UNIQUE(name COLLATE NOCASE) so 'alice' / 'Alice' fold to one row.
  - face_detections: keyed on content_hash (cross-library dedup), with
    status='detected' carrying bbox + 512-d embedding BLOB, and
    'no_faces' / 'failed' marker rows that tell Phase 3's file watcher
    not to re-scan. Marker invariant enforced via CHECK; partial UNIQUE
    on content_hash WHERE status='no_faces' guards against double-marks.

Schema regenerated with `diesel print-schema` against a clean migration
run; joinables added for face_detections → libraries / persons and
persons → entities.

face_client.rs (sibling of apollo_client.rs):
  - reqwest multipart, 60 s timeout (CPU inference on a backlog can be
    slow; bounded threadpool on Apollo serializes calls anyway).
  - FaceDetectError::{Permanent, Transient, Disabled} — Phase 3 keys
    its marker-row decision on this. 422 → mark failed, 5xx → defer.
  - APOLLO_FACE_API_BASE_URL falls back to APOLLO_API_BASE_URL when
    unset; both unset = is_enabled() false, callers no-op.

faces.rs (DAO + handlers):
  - SqliteFaceDao implements the full FaceDao trait; person face counts
    go through sql_query because diesel's BoxedSelectStatement +
    group_by trips trait-resolver recursion.
  - merge_persons re-points face rows in a transaction, copies notes
    when target's are empty, deletes src.
  - manual POST /image/faces resolves content_hash through image_exif,
    crops the user-drawn bbox with 10% padding (detector wants context
    around ears/jaw), POSTs the crop to face_client.embed for a real
    ArcFace vector, then inserts source='manual'.
  - Cluster-suggest (Phase 6) gets its data from
    GET /faces/embeddings — base64-encoded paged BLOBs so Apollo's
    DBSCAN can stream them without ImageApi pre-aggregating.

Endpoints registered alongside add_*_services in main.rs:
  GET    /faces/stats?library=
  GET    /faces/embeddings?library=&unassigned=&limit=&offset=
  GET    /image/faces?path=&library=
  POST   /image/faces                        (manual create via embed)
  PATCH  /image/faces/{id}
  DELETE /image/faces/{id}
  GET    /persons?library=
  POST   /persons
  GET    /persons/{id}
  PATCH  /persons/{id}
  DELETE /persons/{id}?cascade=set_null|delete   (set_null default)
  POST   /persons/{id}/merge
  GET    /persons/{id}/faces?library=

The file-watch hook (Phase 3) and the rerun-on-one-photo handler
(Phase 6) live behind the FaceDao methods marked dead_code today —
they're called only when those phases land. Same shape for the trait
methods that aren't reached by Phase 2 routes.

Tests: 3 DAO unit tests cover person CRUD + case-insensitive uniqueness,
marker-row idempotency (mark_status is a no-op when any row exists),
and merge re-pointing faces.

Cargo.toml: reqwest gains the `multipart` feature.

cargo build / cargo test --lib / cargo fmt / cargo clippy --all-targets
all clean for the new code; the two pre-existing test_path_excluder
failures and the pre-existing sort_by clippy warnings are unrelated and
present on master.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 18:03:42 +00:00