Commit Graph

68 Commits

Author SHA1 Message Date
Cameron Cordes
ecd49fd053 otel: revert HTTP transport, keep gRPC
The HTTP/protobuf exporter never sent any traffic in prod (tcpdump
on port 4318 showed nothing) despite the receiver path being correct
and the bridge wiring being intact (logs reached journalctl via the
stdout exporter). Likely the BatchLogProcessor + reqwest-client combo
isn't getting the right runtime context, but debugging that on a live
deployment isn't worth holding up the rest of the speedups.

Restoring grpc-tonic transport so prod observability comes back. The
remaining build-time wins on this branch (mold linker, system sqlite3,
profile.dev tweaks, lockfile-only dep refresh) deliver most of the
original savings without touching telemetry. Operator: revert
OTLP_OTLS_ENDPOINT in prod from port 4318 back to 4317.

HTTP transport remains a viable follow-up — needs to be debugged
against a local SigNoz instance with internal SDK error visibility
enabled, on its own branch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 18:33:37 -04:00
Cameron Cordes
f73db58771 build: speed up debug compile loop
- Drop libsqlite3-sys 'bundled' on Linux/macOS so the SQLite C source
  isn't recompiled every clean build; Windows keeps 'bundled' via a
  cfg(windows) target override.
- Switch opentelemetry-otlp from grpc-tonic to http-proto + reqwest-client.
  Removes the tonic + h2 + hyper-h2 stack from the build graph; reqwest
  was already a dependency. Updates otel.rs to call .with_http().
- Add [profile.dev] debug = "line-tables-only" to shrink linker work
  while keeping panics/backtraces useful.
- Add .cargo/config.toml selecting mold via gcc on x86_64-linux-gnu.
  Requires `apt install mold`. Other platforms use the default linker.
- cargo update: lockfile-only refresh of all minor/patch bumps within
  existing version constraints.

Cold debug build: ~1m 37s; touch-one-file rebuild: ~5s on Linux.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 17:36:42 -04:00
Cameron Cordes
7584cd8792 duplicates: perceptual hash + soft-mark resolution + upload 409
Adds pHash + dHash columns alongside the existing blake3 content_hash so
near-duplicates (re-encoded, resized, format-converted copies) become
queryable. /duplicates/{exact,perceptual} return groups; /duplicates/
{resolve,unresolve} flip a duplicate_of_hash soft-mark on losing rows
and union perceptual-only tag sets onto the survivor. The default
/photos listing filters duplicate_of_hash IS NULL so demoted siblings
stop cluttering the grid; include_duplicates=true opts back in for
Apollo's review modal. Upload now hashes bytes pre-write and returns
409 with the canonical sibling when a file's bytes already exist.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 17:36:01 -04:00
Cameron Cordes
41f93d70d1 faces: tighten bootstrap candidate filter, bump to 1.1.0
Filter <3-char tags and emoji/symbol-bearing tags out of the bootstrap
candidate list before grouping. Manual testing surfaced these as noise
the operator never tickets — they pushed real candidates lower in the
list and made the UI harder to scan. This is a hard filter (drop from
candidates entirely), not a heuristic flag — looks_like_person still
governs the default-checked decision for the rows that *do* survive.

is_plausible_name_token rules:
  - >= 3 chars after trimming (rejects "AB", "OK", whitespace-only)
  - Each char is alphabetic (any script — covers Renée, José, 田中太郎),
    whitespace, name-punctuation (' - . _ U+2019), or ASCII digit
  - Anything else (emoji, symbols, math, arrows, control codes) drops
    the whole tag

Digits stay allowed at this layer; looks_like_person handles "Trip 2018"
on the heuristic side. Lets a "Sarah2" alias still appear so the
operator can spot and confirm it manually, just unticked by default.

Cargo version bump 1.0.0 → 1.1.0 marks the face-recog feature surface
landing — Phase 2's schema + endpoints, Phase 3's file-watch hook, and
Phase 4's bootstrap + auto-bind are all behind APOLLO_FACE_API_BASE_URL,
so legacy 1.0 deploys without that env see no behavior change.

Tests: 1 new (faces::tests::is_plausible_name_token_filters_short_and_emoji)
covers the accept-list (Latin/accented/Asian scripts, hyphenated and
apostrophe names) and the reject-list (length floor, emoji classes,
symbols, leading/trailing whitespace handling).

cargo test --lib: 180 / 0; fmt + clippy clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 19:05:04 +00:00
Cameron Cordes
860169032b faces: phase 2 — schema + manual face/person CRUD
Land the persistence model and HTTP surface for local face recognition.
Inference still lives in Apollo (Phase 1); this side adds the data home
plus every endpoint Apollo's UI and FileViewer-React will consume.

Schema (new migration 2026-04-29-000000_add_faces):
  - persons: visual identities. Optional entity_id bridges to the
    existing knowledge-graph entities table; auto-bridging is left to
    the management UI (we don't muddy LLM provenance from face rows).
    UNIQUE(name COLLATE NOCASE) so 'alice' / 'Alice' fold to one row.
  - face_detections: keyed on content_hash (cross-library dedup), with
    status='detected' carrying bbox + 512-d embedding BLOB, and
    'no_faces' / 'failed' marker rows that tell Phase 3's file watcher
    not to re-scan. Marker invariant enforced via CHECK; partial UNIQUE
    on content_hash WHERE status='no_faces' guards against double-marks.

Schema regenerated with `diesel print-schema` against a clean migration
run; joinables added for face_detections → libraries / persons and
persons → entities.

face_client.rs (sibling of apollo_client.rs):
  - reqwest multipart, 60 s timeout (CPU inference on a backlog can be
    slow; bounded threadpool on Apollo serializes calls anyway).
  - FaceDetectError::{Permanent, Transient, Disabled} — Phase 3 keys
    its marker-row decision on this. 422 → mark failed, 5xx → defer.
  - APOLLO_FACE_API_BASE_URL falls back to APOLLO_API_BASE_URL when
    unset; both unset = is_enabled() false, callers no-op.

faces.rs (DAO + handlers):
  - SqliteFaceDao implements the full FaceDao trait; person face counts
    go through sql_query because diesel's BoxedSelectStatement +
    group_by trips trait-resolver recursion.
  - merge_persons re-points face rows in a transaction, copies notes
    when target's are empty, deletes src.
  - manual POST /image/faces resolves content_hash through image_exif,
    crops the user-drawn bbox with 10% padding (detector wants context
    around ears/jaw), POSTs the crop to face_client.embed for a real
    ArcFace vector, then inserts source='manual'.
  - Cluster-suggest (Phase 6) gets its data from
    GET /faces/embeddings — base64-encoded paged BLOBs so Apollo's
    DBSCAN can stream them without ImageApi pre-aggregating.

Endpoints registered alongside add_*_services in main.rs:
  GET    /faces/stats?library=
  GET    /faces/embeddings?library=&unassigned=&limit=&offset=
  GET    /image/faces?path=&library=
  POST   /image/faces                        (manual create via embed)
  PATCH  /image/faces/{id}
  DELETE /image/faces/{id}
  GET    /persons?library=
  POST   /persons
  GET    /persons/{id}
  PATCH  /persons/{id}
  DELETE /persons/{id}?cascade=set_null|delete   (set_null default)
  POST   /persons/{id}/merge
  GET    /persons/{id}/faces?library=

The file-watch hook (Phase 3) and the rerun-on-one-photo handler
(Phase 6) live behind the FaceDao methods marked dead_code today —
they're called only when those phases land. Same shape for the trait
methods that aren't reached by Phase 2 routes.

Tests: 3 DAO unit tests cover person CRUD + case-insensitive uniqueness,
marker-row idempotency (mark_status is a no-op when any row exists),
and merge re-pointing faces.

Cargo.toml: reqwest gains the `multipart` feature.

cargo build / cargo test --lib / cargo fmt / cargo clippy --all-targets
all clean for the new code; the two pre-existing test_path_excluder
failures and the pre-existing sort_by clippy warnings are unrelated and
present on master.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 18:03:42 +00:00
Cameron
b9d5578653 feat(bins): multi-library populate_knowledge + progress UX
populate_knowledge now loads real libraries from the DB instead of
fabricating a single library_id=1 row from BASE_PATH. Adds --library
<id|name> to restrict the walk and validates --path against the selected
library roots. The full library set is still passed to InsightGenerator so
resolve_full_path can probe every root when an insight resolves to a
different library than the one being walked.

Adds indicatif progress bars across the long-running utility binaries via
a shared src/bin_progress.rs helper (determinate bar + open-ended spinner
with consistent styling). Per-batch info! noise is replaced by the bar's
throughput/ETA; warnings and errors route through pb.println so they
scroll above the bar instead of fighting with it.

  populate_knowledge   spinner during scan, determinate bar over all libs
  backfill_hashes      spinner with running hashed/missing/errors counts
  import_calendar      determinate bar; embedding/store failures inline
  import_location_*    determinate bar advancing by chunk size
  import_search_*      determinate bar; pb cloned into the spawn task
  cleanup_files P1     determinate bar over DB paths
  cleanup_files P2     determinate bar; pb.suspend() around y/n/a/s prompt

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 23:55:33 -04:00
Cameron
13b9d54861 fix(scan): quiet startup scans & thumbnail RAW/HEIC
Three recurring issues on every full scan:

1. Video playlist scans re-enqueued every file only to reject it as
   AlreadyExists. Pre-filter in ScanDirectoryMessage and QueueVideosMessage
   so we skip videos whose .m3u8 already exists, and demote the leaked
   AlreadyExists log to debug.

2. image crate was built with only jpeg/png features, so webp/tiff/avif
   files logged "format not supported" every scan. Enable those features.

3. RAW (ARW/NEF/CR2/...) and HEIC thumbnails weren't generated, so the
   scan kept retrying them. Try the file's embedded JPEG preview via
   kamadak-exif first (fast, pure-Rust, works on Sony ARW where ffmpeg's
   TIFF decoder fails). Fall back to ffmpeg for HEIC/HEIF and RAWs with
   no preview. Anything still undecodable gets a <thumb>.unsupported
   sentinel so future scans skip it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 20:47:13 -04:00
Cameron
079cd4c5b9 feat(ai): streaming chat endpoint with live tool events
Add LlmClient::chat_with_tools_stream and SSE endpoint
POST /insights/chat/stream that emits text deltas, tool_call /
tool_result pairs, truncated notice, and a terminal done frame as the
agentic loop runs.

- Ollama: parses NDJSON from /api/chat stream, accumulates content
  deltas, emits Done with tool_calls from the final chunk.
- OpenRouter: parses OpenAI-compatible SSE, reassembles tool_call
  argument deltas by index, asks for stream_options.include_usage.
- InsightChatService spawns the loop on a tokio task, feeds events
  through an mpsc channel, persists training_messages at the end.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 16:57:41 -04:00
Cameron
0073409b3d refactor: introduce LlmClient trait (no-op)
Preparation for a second LLM backend (OpenRouter) and hybrid vision-local /
chat-remote mode. Shared wire types (ChatMessage, Tool, ToolCall, etc.) move
into a new src/ai/llm_client.rs and are re-exported from ollama.rs so
existing imports keep working. OllamaClient now implements LlmClient.

No behavior change; callers still hold the concrete OllamaClient. Caller
migration to Arc<dyn LlmClient> is deferred to the PR that wires hybrid
backend routing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 22:11:05 -04:00
Cameron
39c212b0e6 Bump to 1.0.0 for multi-library support 2026-04-21 01:55:07 +00:00
Cameron
0aaea91cc2 feat: add content_hash backfill + register every media file
Adds blake3 content hashing as the basis for derivative dedup
(thumbnails, HLS) across libraries. Computed inline by the watcher on
ingest and by a new `backfill_hashes` binary for historical rows.

Key changes:
- `content_hash` and `size_bytes` are now populated on new image_exif
  rows; a new ExifDao surface (`get_rows_missing_hash`,
  `backfill_content_hash`, `find_by_content_hash`) supports backfill and
  future hash-keyed lookups.
- The watcher now registers every image/video in image_exif, not just
  files with parseable EXIF. EXIF becomes optional enrichment; videos
  and other non-EXIF files still get a hashed row. This also makes
  DB-indexed sort/filter cover the full library.
- `/image` thumbnail serve dual-looks up hash-keyed path first, then
  falls back to the legacy mirrored layout.
- Upload flow accepts `?library=` query param + hashes uploaded files.
- Store_exif logs the underlying Diesel error on insert failure so
  constraint violations surface instead of hiding behind a generic
  InsertError.
- New migration normalizes rel_path separators to forward slash across
  all tables, deduplicating any rows that collide after normalization.
  Fixes spurious UNIQUE violations from mixed backslash/forward-slash
  paths on Windows ingest.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-21 01:55:07 +00:00
Cameron
a6cc64ece0 Bump to version 0.5.2 2026-01-26 20:05:42 -05:00
Cameron Cordes
85e6567674 Bump to 0.5.1 2026-01-18 19:17:53 -05:00
Cameron
b2cc617bc2 Pass image as additional Insight context 2026-01-10 11:30:01 -05:00
Cameron
d86b2c3746 Add Google Takeout data import infrastructure
Implements Phase 1 & 2 of Google Takeout RAG integration:
- Database migrations for calendar_events, location_history, search_history
- DAO implementations with hybrid time + semantic search
- Parsers for .ics, JSON, and HTML Google Takeout formats
- Import utilities with batch insert optimization

Features:
- CalendarEventDao: Hybrid time-range + semantic search for events
- LocationHistoryDao: GPS proximity with Haversine distance calculation
- SearchHistoryDao: Semantic-first search (queries are embedding-rich)
- Batch inserts for performance (1M+ records in minutes vs hours)
- OpenTelemetry tracing for all database operations

Import utilities:
- import_calendar: Parse .ics with optional embedding generation
- import_location_history: High-volume GPS data with batch inserts
- import_search_history: Always generates embeddings for semantic search

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-05 14:50:49 -05:00
Cameron
11e725c443 Enhanced Insights with daily summary embeddings
Bump to 0.5.0. Added daily summary generation job
2026-01-05 09:13:16 -05:00
Cameron
43b7c2b8ec Remove dialoguer dependency 2026-01-03 20:32:00 -05:00
Cameron
1171f19845 Create Insight Generation Feature
Added integration with Messages API and Ollama
2026-01-03 10:30:37 -05:00
Cameron
2d915518e2 Bump to 0.4.1 2025-12-29 19:51:21 -05:00
Cameron
2c52cffd65 Implement critical security improvements for authentication
This commit addresses several security vulnerabilities in the authentication
and authorization system:

1. JWT Encoding Panic Fix (Critical)
   - Replace .unwrap() with proper error handling in JWT token generation
   - Prevents server crashes from encoding failures
   - Returns HTTP 500 with error logging instead of panicking

2. Rate Limiting for Login Endpoint (Critical)
   - Add actix-governor dependency (v0.5)
   - Configure rate limiter: 2 requests/sec with burst of 5
   - Protects against brute-force authentication attacks

3. Strengthen Password Requirements
   - Minimum length increased from 6 to 12 characters
   - Require uppercase, lowercase, numeric, and special characters
   - Add comprehensive validation with clear error messages

4. Fix Token Parsing Vulnerability
   - Replace unsafe split().last().unwrap_or() pattern
   - Use strip_prefix() for proper Bearer token validation
   - Return InvalidToken error for malformed Authorization headers

5. Improve Authentication Logging
   - Sanitize error messages to avoid leaking user existence
   - Change from "User not found or incorrect password" to "Failed login attempt"

All changes tested and verified with existing test suite (65/65 tests passing).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-26 23:53:54 -05:00
Cameron
f0d482af12 Optimize release build times with thin LTO
Switch from fat LTO to thin LTO for faster release builds while maintaining similar performance characteristics.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-24 10:21:47 -05:00
Cameron
47d3ad7222 Add polling-based file watching
Remove notify and update otel creates
2025-12-22 22:54:19 -05:00
Cameron
aaf9cc64be Add Cleanup binary for fixing broken DB/file relations 2025-12-18 16:02:15 -05:00
Cameron
52e1ced2a2 Improved image caching and CORS handling 2025-12-17 22:36:03 -05:00
Cameron
445b82b21a Bump to 0.4.0 2025-12-17 22:17:54 -05:00
Cameron
4082f1fdb8 Add Exif storing and update to Metadata endpoint 2025-12-17 16:55:48 -05:00
Cameron
f02a858368 Bump to 0.3.1 and format/clippy 2025-12-01 13:04:55 -05:00
Cameron
273b877e16 Update to Rust 2024 edition
Formatted code.
2025-09-01 13:36:27 -04:00
Cameron
544256f658 Bump to 0.3.0 2025-08-15 23:22:05 -04:00
Cameron
8114204485 Update Otel 2025-08-15 23:18:53 -04:00
Cameron
85093ff0c7 Add parsing date from filename for memories 2025-08-12 20:55:22 -04:00
Cameron
e5afdd909b Serve video gifs when requested 2025-07-02 15:48:49 -04:00
Cameron
24d2123fc2 Fix recursive-any tag counting
This is bad security wise so it'll need another pass.
2025-05-18 19:57:16 -04:00
Cameron
d6451ee782 Add Simple OpenTelemetry setup 2025-05-06 20:15:03 -04:00
Cameron
04a7cb417f Bump app version to 0.2.0 2024-12-05 20:30:45 -05:00
Cameron
18ba5796b0 Update to rust 2021
Fix tests
2024-12-05 20:27:01 -05:00
Cameron
0419aa2323 Scan and generate Video HLS playlists on startup
Refactored and improved video path state. Bumped versions of some dependencies.
2024-12-05 20:19:03 -05:00
Cameron
6986540295 Add sorting shuffle, and name asc/desc 2024-11-23 19:13:25 -05:00
Cameron
287a61ae3f Update dependencies, improve startup logging 2024-11-23 12:14:12 -05:00
Cameron Cordes
17012fc447 Merge branch 'master' into feature/include-tag-counts 2024-01-17 22:47:46 -05:00
Cameron Cordes
5bbc775d3a Update to Watcher 6
Improve upload performance by relying on the file watcher instead of
synchronously creating thumbnails before responding to the client.
2024-01-17 22:25:18 -05:00
Cameron Cordes
7e11448ada Update dependencies 2023-12-02 14:23:51 -05:00
Cameron Cordes
68bfcbf85f Update and Migrate Diesel to 2.0
Almost have tag support working, still figuring out how to get photo
tags.
2023-03-18 14:43:41 -04:00
Cameron Cordes
c8cae28c9f Merge branch 'master' into feature/tagging 2022-03-17 21:53:17 -04:00
Cameron Cordes
69fe307516 Update to Actix 4
Some checks failed
Core Repos/ImageApi/pipeline/pr-master There was a failure building this commit
2022-03-01 20:38:41 -05:00
Cameron Cordes
2d6db6d059 Update dependencies 2021-10-11 21:52:06 -04:00
Cameron Cordes
2c50b4ae2f Add anyhow, Improve Auth token code
Moved test helper code to its own module.
2021-10-07 20:32:36 -04:00
Cameron Cordes
0e972509aa Update dependencies
All checks were successful
Core Repos/ImageApi/pipeline/pr-master This commit looks good
2021-07-08 16:53:50 -04:00
Cameron Cordes
a79179c5c3 Add Image and Video total gauges 2021-04-30 23:53:10 -04:00
Cameron Cordes
6abc99d9b6 Add PrometheusMetrics 2021-04-05 20:14:34 -04:00