Commit Graph

523 Commits

Author SHA1 Message Date
Cameron Cordes
a079065ae9 faces: add person_id filter to /faces/embeddings; remove tag-bootstrap
Pairs with the Apollo FACES-tab change. The new
POST /api/persons/{id}/similar-unassigned route on Apollo needs to
fetch one person's embeddings cheaply to compute the centroid;
adding a person_id query param to /faces/embeddings keeps that to a
single round-trip instead of paging the whole detected set
client-side. When both person_id and unassigned=true are supplied,
person_id wins (the explicit filter is the more specific intent).

Tag-bootstrap removal: bootstrap_candidates_handler,
bootstrap_persons_handler, /persons/bootstrap and
/tags/people-bootstrap-candidates route registrations, and the
heuristic helpers (is_plausible_name_token, looks_like_person) plus
their tests. Only Apollo called these; the migration is complete.
The persons.created_from_tag column stays - it's informational on
existing rows and removing it would be a destructive migration for
no benefit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 11:30:37 -04:00
25233904aa Merge pull request 'personas: elevate to server with per-persona fact scoping' (#88) from feature/persona-knowledge-segmentation into master
Reviewed-on: #88
2026-05-10 03:44:26 +00:00
8c377324a1 Merge pull request 'video: handle unknown/short durations in thumb + preview gen' (#87) from fix/video-thumb-preview-edge-cases into master
Reviewed-on: #87
2026-05-10 03:12:58 +00:00
Cameron Cordes
5476ed8ac4 video: handle unknown/short durations in thumb + preview gen
`get_duration_seconds` now returns `Option<f64>` and falls back from
`format=duration` to `stream=duration`. Empty stdout no longer
parse-panics with "cannot parse float from empty string", which was
poisoning the preview-clip row with status=failed and re-queueing every
full scan (notably for GoPro LRV files). `generate_preview_clip` handles
the unknown-duration case by transcoding the whole file (capped at 10s).

`generate_video_thumbnail` seeks to ~50% of the probed duration instead
of a hardcoded `-ss 3`, with a first-frame fallback when the probe
returns nothing. Fixes the loop where short Snapchat clips (<3s) got
"missing thumbnail" logged on every scan because ffmpeg exited 0
without writing a frame, and never wrote the .unsupported sentinel
either.

Adds unit tests for `parse_ffprobe_duration` covering the empty-output,
N/A, multi-line, non-positive, and non-finite cases.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 23:08:16 -04:00
7350f1916a Merge pull request 'fix/manual-date-update' (#86) from fix/manual-date-update into master
Reviewed-on: #86
2026-05-10 02:53:20 +00:00
Cameron Cordes
9871c685b4 date-override: cargo fmt
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 21:23:11 -04:00
Cameron Cordes
108bbeb029 date-override: union semantics across libraries + slash forms
The date-override path used to look up `image_exif` strictly by
`(library_id, rel_path)` with only the forward-slash form, while
`/image/metadata`'s `get_exif` falls back across libraries and tries
both slash forms. A photo whose row sat under a different library_id
than its filesystem-resolved one — or whose rel_path was stored with
backslashes — rendered fine in the modal but 404'd on save.

`set_manual_date_taken` / `clear_manual_date_taken` now share a
`locate_image_exif_row` helper that mirrors `get_exif`'s union
semantics (scoped lookup first, library-agnostic fallback by rel_path
in both slash forms), then update by primary key so the write hits
exactly the row read. Inner anyhow errors are logged with
`(library_id, rel_path)` so the next failure mode is debuggable.

Handler-side: `resolve_library_param` errors no longer silently fall
back to the primary library (which would have masked the original bug
with a different "row not found"); a malformed library param now
returns 400. New `DbErrorKind::NotFound` lets the handler distinguish
genuine misses (404) from real DB failures (500).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 21:21:25 -04:00
Cameron Cordes
3e2f36a748 personas: elevate to server with per-persona fact scoping
Move personas off the mobile client into ImageApi as first-class
records, and scope entity_facts by persona so each one builds its own
voice over a shared entity graph. The new include_all_memories flag
lets a persona opt back into the full hive-mind pool for human
browsing of /knowledge/*; agentic generation always stays in-voice.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 17:59:20 -04:00
55a986c249 Merge pull request 'feature/streaming-insights' (#85) from feature/streaming-insights into master
Reviewed-on: #85
2026-05-09 20:57:16 +00:00
c52a646be2 Merge pull request 'memories: restore early-era Snapchat unix-epoch filenames' (#84) from feature/snapchat-early-era-dates into master
Reviewed-on: #84
2026-05-08 20:23:35 +00:00
Cameron Cordes
d32a7d7c3a memories: restore early-era Snapchat unix-epoch filenames
The recent blanket "snapchat-" prefix denylist (43f8f83) rejected ALL
Snapchat-prefixed filenames from timestamp parsing, which fixed the
sequential-ID false positives but also broke real unix-second
filenames from Snapchat's early era. `Snapchat-1383929602.jpg`
(2013-11-08 16:53:22 UTC) now falls through to fs_time — and on files
with broken filesystem metadata, fs_time pins to 1970.

Replace the blanket prefix denial with a tighter discriminator:
  - exactly 10 captured digits AND timestamp >= 2011-09-23 (Snapchat
    launch) → real unix epoch, accept
  - any other length under this prefix → sequential ID, reject

This keeps the existing rejections intact:
  Snapchat-1021849065.mp4          → 10 digits, 2002 < launch → reject
  Snapchat-1751031586660373917.jpg → 19 digits truncates to 16 → reject
And restores the regression case:
  Snapchat-1383929602.jpg          → 10 digits, 2013 ≥ launch → accept

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 16:22:57 -04:00
Cameron Cordes
3699e059a2 insight-chat: include Date taken + GPS in bootstrap photo context
The bootstrap system message gave the model a file path and (in
hybrid mode) a visual description, but no temporal anchor. Models
defaulted to today's date when calling get_sms_messages — Nov 2014
photos were getting "2024-03-11" passed as `date`, missing every
historical message and leading the model to confidently misreport
context.

This commit folds two more EXIF-sourced facts into the
--- PHOTO CONTEXT --- block:

  Date taken: <YYYY-MM-DD or "unknown">
  GPS: <lat, lon to 4dp>           (omitted when no GPS)

Resolution waterfall for date_taken matches the documented canonical
date pipeline at the EXIF / filename steps, but intentionally stops
short of the fs-time fallback `generate_agentic_insight_for_photo`
uses — for chat we'd rather show "unknown" than mislead the model
with an inode mtime. GPS is taken straight from EXIF when both
lat/lon are populated; absent GPS suppresses the line entirely so
the model doesn't hallucinate coordinates.

InsightGenerator gains a `fetch_exif(file_path)` accessor (crate-
visible) so the chat service doesn't need its own ExifDao plumbing.

build_bootstrap_system_message picks up two new params (date,
gps); existing tests updated and 5 new tests cover:
- date present / absent / waterfall (EXIF wins, filename fallback,
  None when neither source has it)
- GPS present / absent
- ordering (path → date → visual)

Total insight_chat unit tests: 33 (up from 27).
2026-05-08 11:14:39 -04:00
Cameron Cordes
a0ec1a5080 insight-chat: photo context belongs in system msg, not user turn
After refresh, the rendered transcript was showing two unwanted
artifacts in the initial user bubble:

  Photo file path: pics/DSC_5171.jpg
  please tell me about this photo and what was going on around it

  Please write your final answer now without calling any more tools.

Two distinct bugs:

1. Bootstrap was prepending `Photo file path: <path>` (and, in
   hybrid mode, the visual description block) into the user-turn
   content. The model needed it to call file_path-keyed tools, but
   the user could see it in their own bubble on replay.

2. The no-tools fallback ("Please write your final answer now…")
   was a synthetic user message we never stripped from history,
   so it persisted into training_messages, rendered as a second
   user bubble, AND wiped the prior tool-call accumulator inside
   load_history (user-turn handler clears pending_tools), which
   is why the tool invocations disappeared from the assistant
   bubble after refresh.

Fixes:

- New `build_bootstrap_system_message` helper composes the persona
  with a `--- PHOTO CONTEXT ---` block (path + optional visual
  description). Lives in the system message, not the user turn.
  The user's bubble shows only what they typed.
- Streaming agentic loop's no-tools fallback now records its
  insertion index and removes the synthetic user prompt from
  `messages` after the model responds. Final assistant content
  stays — it reads coherently on replay without the synthetic
  prompt above it. Applies to both bootstrap and continuation.

3 new tests cover the system-message composer (path-only, with
visual block, persona-trim). Total insight_chat unit tests: 27.
2026-05-08 11:07:03 -04:00
Cameron Cordes
24ecf2abd4 insight-chat: prepend Photo file path: <path> to bootstrap user turn
Bug: bootstrap user_content was just the user's typed message (plus
the hybrid visual description). Tools that take a file_path arg —
recall_facts_for_photo, get_file_tags, get_faces_in_photo — had no
way to learn the canonical path. Small models would invent
placeholders like "input_file_0.png" or call the tool with a name
guessed from a hidden multimodal input handle, neither of which
matched any real photo.

Fix: prepend a single-line "Photo file path: <normalized>\n\n" block
to user_content. Same shape generate_agentic_insight_for_photo
already uses for non-chat callers — kept the bootstrap minimal
(no date / GPS / tags pre-stuffing; the agentic loop can fetch
those via tools when needed).

Hybrid still injects the visual description block between the path
block and the user message; local mode just gets path + user text.
2026-05-08 10:59:35 -04:00
Cameron Cordes
a29ff406a1 insight-chat: extract bootstrap resolution helpers + unit-test them
resolve_bootstrap_system_prompt and resolve_bootstrap_backend run on
every bootstrap turn — they pick the persisted system prompt and the
chosen backend label. They were inline conditionals before; pulling
them out makes the rules testable without spinning up the full
streaming stack.

9 new tests cover:
- system prompt fallback to BOOTSTRAP_DEFAULT_SYSTEM_PROMPT for None,
  empty string, whitespace-only
- supplied non-empty prompts pass through verbatim, with interior
  newlines / spacing preserved (Apollo personas use multi-line tool
  listings)
- backend defaults to "local" for None / empty
- "local" / "hybrid" accepted case-insensitively with edge-trim
- unknown labels return a descriptive error

Total insight_chat tests: 24 (up from 15). No behaviour change.
2026-05-08 10:56:22 -04:00
Cameron Cordes
928efe49f9 insight-chat: bootstrap insight on first Discuss message + regenerate flag
Tap-Discuss-on-no-insight previously failed silently: ImageApi's
/insights/chat/stream required an existing agentic insight, errored
when missing, and emitted the failure as `event: error` — which the
frontend SSE consumer ignored (it listens for `error_message`).

This commit closes both gaps with a server-side state machine:

- /insights/chat/stream now branches on insight presence. Missing
  insight (or `regenerate: true` in the body) → bootstrap path:
  builds [System(req.system_prompt), User(req.user_message + image)],
  runs the agentic loop, generates a title, persists a new row via
  store_insight (which auto-flips priors). Existing insight →
  continuation path (unchanged behaviour).
- New `regenerate: bool` request field forces bootstrap even when an
  insight exists. Takes precedence over `amend`.
- `done` SSE payload field-name alignment with Apollo's frontend
  convention: prompt_eval_count → prompt_tokens, eval_count →
  eval_tokens, num_ctx echo added.
- `amended_insight_id` semantics broaden — now populated whenever the
  turn produced a new row (bootstrap, regenerate, or amend). Existing
  amend clients keep working unchanged; new clients get the new row's
  id for free.
- `event: error` → `event: error_message` so frontend errors stop
  silently dropping.

Refactor: extracted run_streaming_agentic_loop, build_chat_clients,
and generate_title as shared helpers between bootstrap and
continuation. Continuation path's outer logic moves to
run_continuation_streaming with no behaviour change.

Mobile-ready: any client (Apollo backend, mobile, future) sends one
request to /insights/chat/stream and gets the right path. Apollo's
proxy stays a dumb pipe.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 10:41:50 -04:00
bdafd39546 Merge pull request 'feature/insight-chat-improvements' (#83) from feature/insight-chat-improvements into master
Reviewed-on: #83
2026-05-07 22:19:12 +00:00
Cameron Cordes
8bd1a85070 insight-chat: cargo fmt sweep on the get_faces_in_photo additions
Single-line dao lock + reordered faces import. No logic changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 17:53:31 -04:00
Cameron Cordes
6f0c15d0c5 insight-chat: code-review polish on get_faces_in_photo
- Drop redundant `use anyhow::Context` inside has_any_faces (already
  imported at the module level).
- Drop dead `.unwrap_or("?")` on bound faces — the vec is filtered to
  is_some() so the fallback can never fire.
- Reorder the face_dao constructor param + initializer to match the
  struct declaration (between tag_dao and knowledge_dao). Update both
  state.rs call sites and populate_knowledge.rs to match.
- Hold face_dao lock once across the library-resolver loop instead of
  reacquiring per iteration.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 17:48:22 -04:00
Cameron Cordes
b64a5bec28 insight-chat: add get_faces_in_photo agentic tool
The LLM had no path to see face_detections data — get_file_tags
returns user-applied tags, but a face that's been detected and bound
to a person via the embedding-cluster auto-bind path doesn't always
have a matching tag. The new tool joins face_detections with persons
by content_hash and returns bound names + bboxes, plus unidentified
faces (so smaller models can count people in the photo without
inferring from a visual description).

Gated on face_detections being non-empty via the same has_any_*
pattern as daily_summaries.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 17:43:16 -04:00
Cameron Cordes
388eb22cd2 Remove full plan file, just keep spec 2026-05-07 17:29:04 -04:00
Cameron Cordes
eef41d4172 thumbnails: align video ffmpeg args with the image path so non-yuvj420p sources work
The bare 'ffmpeg -ss 3 -i in -vframes 1 -f image2 out' command failed on
sources whose decoded pix_fmt isn't yuvj420p (e.g. older Samsung phone
videos in yuv420p). With no -vf filter chain, the decoded frame goes
straight to the mjpeg encoder, which rejects it with 'Non full-range
YUV is non-standard' and exits non-zero.

generate_image_thumbnail_ffmpeg already handles the same class of
source for HEIC/RAW by adding -vf scale=200:-1 -c:v mjpeg — the filter
chain lets ffmpeg auto-insert the pix_fmt converter the encoder needs.
Adopt the same args here. Side benefit: video thumbnails are now 200px
wide on disk, matching image thumbnails (previously full-resolution).

Pre-existing .unsupported sentinels for videos that hit this failure
will need to be deleted manually to retry — they're under
$THUMBNAILS/<lib_id>/.../*.unsupported.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 17:20:05 -04:00
Cameron Cordes
b42acbb3f3 fmt: cargo fmt sweep across drifted files
No behavior change — purely whitespace/line-break cleanup that had
accumulated since the last format run.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 16:42:41 -04:00
Cameron Cordes
2a273a3ed9 thumbnails: stop video failures from re-logging every watcher tick
generate_video_thumbnail used .output().expect(...), which only catches
spawn failure — non-zero ffmpeg exits were silently discarded. With no
thumbnail and no .unsupported sentinel left behind, the watcher
re-detected the file as missing every quick-scan tick and re-logged
"New file detected (missing thumbnail)" forever.

Mirror the image branch: return io::Result, check status.success(),
and write the sentinel from create_thumbnails on failure.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 16:41:24 -04:00
Cameron Cordes
a8433c2e01 insight-chat: document the new system_prompt field in CLAUDE.md
Add system_prompt to the /insights/chat body schema with a one-paragraph
note on the append-vs-amend semantics so future readers find the
contract alongside the rest of the chat-continuation docs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 15:26:32 -04:00
Cameron Cordes
1cdc0f6eb9 insight-chat: drop the dead SmsApiClient::search_messages wrapper
The post-PR-4 delegation kept it as a convenience for callers that
don't filter by contact, but nothing actually uses it. Delete to clear
the dead_code warning. search_messages_with_contact remains as the
single entry point.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 15:10:31 -04:00
Cameron Cordes
e539c083c9 insight-chat: code-review polish on the tool-gating PR
- search_messages now delegates to search_messages_with_contact(.., None)
  so the two methods share a single HTTP path. Drops the dead-code
  warning and the ~30-line duplication.
- DailySummaryDao gains has_any_summaries (LIMIT 1 existence probe)
  used by current_gate_opts; the SELECT COUNT(*) get_total_summary_count
  added in the prior commit is removed (it had no other caller).
- current_gate_opts doc comment corrected to describe what the probes
  actually do.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 15:07:57 -04:00
Cameron Cordes
f50d32667b insight-chat: ToolGateOpts + per-tool description rewrites
Tools whose backing tables are empty (calendar, location_history,
daily_summaries) drop out of the catalog so the LLM doesn't waste
iteration budget calling them only to receive "no results found".
Vision and apollo gates already existed; this generalizes the pattern.

search_messages gains start_ts/end_ts/contact_id filters (date filter
is a client-side post-filter; SMS-API only accepts contact_id natively
on the search endpoint).

Descriptions follow a consistent convention: one sentence (what +
when), param semantics, examples for tools with non-obvious param
choices. No more all-caps headers, no more identity-prescriptive
language inside descriptions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 14:56:58 -04:00
Cameron Cordes
b02da0d0cc insight-chat: code-review polish on the days_radius fix
- Bind effective_radius once in fetch_messages_for_contact so the log
  output and window math share a single source of truth for the clamp.
- Clamp tool-supplied days_radius to [1, 30] at the tool boundary so a
  runaway LLM value can't produce a thousand-day window.
- Split the negative-input test into a real negative-input case
  alongside the zero-input case.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 14:47:46 -04:00
Cameron Cordes
659e7bd973 insight-chat: get_sms_messages tool now honors days_radius
The agentic tool definition advertised a days_radius parameter but
sms_client::fetch_messages_for_contact was hardcoded to ±4 days,
silently ignoring whatever value the LLM chose. Plumb the parameter
through; default 4 retained at the tool level for back-compat.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 14:42:42 -04:00
Cameron Cordes
428f24b0f8 insight-chat: code-review polish on the chat system_prompt override
- Trim the override input once via Option::map(str::trim).filter(...).
- Use matches!() in restore_system_prompt_override's Prepended arm so
  it reads consistently with the Replaced arm.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 14:40:04 -04:00
Cameron Cordes
faa289882f insight-chat: per-turn system_prompt override on chat continuation
Append mode: applied ephemerally — original system message restored
before persistence so re-opens see the baked persona. Amend mode:
override stays in place and becomes the new insight row's system
message. Pattern mirrors annotate_system_with_budget.

Adds system_prompt field on both ChatTurnHttpRequest and ChatTurnRequest;
plumbs through chat_turn and chat_turn_stream identically.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 14:34:08 -04:00
Cameron Cordes
177187f6a2 insight-chat: code-review polish on the system-prompt split
- Use Option::map instead of manual match-on-Option (drops clippy::manual_map).
- Drop redundant `max_iterations = max_iterations` from the format! call.
- Use captured identifiers consistently in the user_content format!.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 14:27:59 -04:00
Cameron Cordes
8ae4099d46 insight-chat: split generation system prompt into identity + procedural blocks
The framework no longer asserts "you are a personal photo memory
assistant" alongside a user-supplied custom_system_prompt — the
persona is the authoritative identity. The procedural block (tool-use
guidance, iteration budget) stays identity-free.

The user message also stops asking for "a detailed insight with a
title and summary" since the title is regenerated post-hoc anyway and
the wording was constraining voice for no data-model benefit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 14:20:45 -04:00
Cameron Cordes
204428b0c0 insight-chat: implementation plan for the spec
Five sequenced PRs:
  1. Split generation system prompt + neutralize user message
  2. system_prompt field on chat request (ephemeral / amend-persisted)
  3. fetch_messages_for_contact honors days_radius
  4. ToolGateOpts + per-tool description rewrites + search_messages
     gains start_ts/end_ts/contact_id
  5. FileViewer-React: persona system_prompt on every turn + style note

Each PR independently mergeable. Tests inline TDD per task.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 14:15:09 -04:00
Cameron Cordes
fbece0ba9a insight-chat: design for tool catalog, system prompt, and SMS fixes
Lays out the cycle: split generation system prompt into identity vs
procedural blocks so personas drive voice/shape, add per-turn
system_prompt override on chat (ephemeral in append mode, persisted
on amend), gate optional tools on data presence, and fix the
days_radius bug in get_sms_messages.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 14:04:07 -04:00
22e157411c Merge pull request 'date_resolver: drop -fast2 so MP4 moov-at-end files resolve' (#82) from fix/exiftool-mp4-moov-trailer into master
Reviewed-on: #82
2026-05-07 16:42:08 +00:00
Cameron Cordes
c128596470 date_resolver: drop -fast2 so MP4 moov-at-end files resolve
For QuickTime/MP4 files whose `moov` atom sits at the end of the
file (non-faststart — common for Snapchat exports and any MP4
muxed without `-movflags +faststart`), `-fast2` causes exiftool
to skip the trailer and return no `CreateDate` /
`MediaCreateDate`, dropping the resolver to the `fs_time`
fallback for files that actually have a real capture date.

Reported cases:
  Snapchat-477624257.mp4
    fs_time: 2026-05-04 (today, file was just modified)
    real:    QuickTime CreateDate 2018-09-02
  action_compound_cc92e65b709d1deb895b4c2a9484fc6a.mp4
    fs_time: 2026-05-04
    real:    MediaCreateDate 2018-03-01

The waterfall pre-filters to files kamadak-exif couldn't read, so
the JPEG fast-path is already covered without `-fast2`. Paying
full-scan cost on the residual is the right trade. The per-tick
drain re-resolves `source = 'fs_time'` rows, so existing rows
recover automatically on the next watcher tick after deploy — no
SQL migration needed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 12:40:50 -04:00
ac8d17fb22 Merge pull request 'memories: deny Snapchat-prefixed filenames from timestamp parsing' (#81) from feature/filename-date-snapchat-denylist into master
Reviewed-on: #81
2026-05-07 16:20:06 +00:00
Cameron Cordes
43f8f83d80 memories: deny Snapchat-prefixed filenames from timestamp parsing
Snapchat assigns sequential IDs that happen to overlap real epoch
values, so the 10-16 digit timestamp regex matched and produced
2002-era dates for files actually saved in 2016/2021. The digits
themselves are indistinguishable from a unix timestamp, so we
dispatch on the source-app prefix instead. Case-insensitive,
extensible for future apps that exhibit the same pattern.

Reported cases:
  Snapchat-1021849065.mp4          → 2002-05-19 (actual 2021)
  Snapchat-1751031586660373917.jpg → 2002-09-09 (actual 2016)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 12:17:40 -04:00
e55f6a5961 Merge pull request 'memories: reject implausible filename-derived timestamps' (#80) from feature/filename-date-plausibility into master
Reviewed-on: #80
2026-05-07 16:02:50 +00:00
Cameron Cordes
feaae9b6d3 memories: reject implausible filename-derived timestamps
Filenames like `000227580005.jpg` (film-scan ID) and
`IMG_21323906751390.jpeg` were matched by the 10-16 digit timestamp
regex and resolved to 1970 / 2037, then written into
`image_exif.date_taken` with `source = 'filename'`. EXIF-less
photos showed up under those bogus dates everywhere date_taken is
read.

Two new guards in `extract_date_from_filename`:
- leading zero → reject (real epoch values don't have one at any
  sane resolution).
- resolved year outside [1995, now+1y] → reject.

Both let the date_resolver waterfall fall through to fs_time,
which is a much better proxy for content age than a fake epoch
date. Regression tests cover the two reported filenames.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 12:02:07 -04:00
95e21c8128 Merge pull request 'feature/manual-date-override' (#79) from feature/manual-date-override into master
Reviewed-on: #79
2026-05-07 15:10:37 +00:00
Cameron Cordes
7e1c4ab318 backfill_date_taken: surface the actual diesel error in warnings
The DAO swallowed every diesel::update failure as a flat
`anyhow!("Update error")`, then trace_db_call further reduced it to
`DbError { kind: UpdateError }`. Operators saw "update failed for lib
2 Snapchat/foo.mp4: DbError { kind: UpdateError }" with no clue why
(constraint violation? type mismatch? row vanished mid-flight? DB
locked?).

Two changes:
- Preserve the diesel error in the anyhow chain along with the input
  params (lib, rel_path, date_taken, source) so the cause is visible.
- Log the chain at warn-level inside the DAO before the trace wrapper
  collapses it to DbErrorKind::UpdateError, so the warning at the
  call site finally has something diagnosable next to it.
- Treat zero-row updates as a debug-level "row likely retired by the
  missing-file scan" rather than a hard failure — that case is benign
  and shouldn't poison the drain's error tally.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 11:07:17 -04:00
Cameron Cordes
65af7d999e memories: parse filename dates as UTC, not server local
`extract_date_from_filename` was calling `Local::from_local_datetime`
on the parsed YYYY-MM-DD-HH-MM-SS components, then `.timestamp()` was
shifting the result by the SERVER's TZ offset to produce real UTC
seconds. That made filename-sourced timestamps disagree with EXIF-
sourced timestamps by hours: kamadak-exif's `DateTimeOriginal` is a
naive string parsed AS-IF-UTC (the project's load-bearing
"naive local reinterpreted as UTC" convention), and Apollo's photo
matcher re-anchors that naive value through the BROWSER's TZ when
matching to the track. Anything stamped in server-local instead got
double-shifted on its way through the matcher and through any
`formatNaive*` display path on the client.

Visible symptom in the Apollo DETAILS modal: a photo's CURRENT date
read correctly (1:25 AM via exif) while FROM FILENAME read 4 hours
ahead (5:25 AM in EDT) for the same `IMG_20160710_012515.jpg`.

Switch to `Utc::from_utc_datetime` so `.timestamp()` returns the
wall-clock-as-UTC unix seconds — same convention as the EXIF path.
The /memories endpoint, the canonical-date waterfall (which feeds
`image_exif.date_taken` for filename-only files), and Apollo's
DETAILS modal `filename_date` field all now line up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 20:43:18 -04:00
Cameron Cordes
16d6586b7d exif: GET /image/exif/full — exiftool dump for the DETAILS modal
The curated `image_exif` columns are a small slice of what exiftool
can read (camera/lens/GPS/capture/dates). Apollo's DETAILS modal wants
to surface everything — white balance, metering, MakerNotes, IPTC,
ICC profile, Composite tags, the lot — for an operator inspecting a
photo's provenance.

`read_full_exif_via_exiftool(path)` shells out to `exiftool -j -G -n`:
JSON output, group-prefixed keys (`EXIF:Make`, `MakerNotes:LensInfo`),
numeric values (callers can reformat). Spawned via web::block to keep
it off the actix worker — RAW with rich MakerNotes can take a few
seconds.

The endpoint is on-demand only; the indexer / file watcher does NOT
call it. Falls back to 503 with a clear message when exiftool isn't
on PATH so Apollo can render an "install exiftool" hint. Multi-library
union resolution mirrors set_image_gps / get_file_metadata.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 19:42:41 -04:00
Cameron Cordes
832b50d587 image_exif: manual date_taken override (set/clear endpoints)
Add `POST /image/exif/date` and `POST /image/exif/date/clear` so an
operator can correct a row whose canonical-date waterfall landed on the
wrong value (camera clock reset, fs_time fallback for a copied-from-
backup file, etc). New `original_date_taken` / `original_date_taken_source`
columns snapshot the prior value on first override so revert is lossless.

The waterfall source set is now `'exif' | 'exiftool' | 'filename' | 'fs_time' | 'manual'`.
The existing `idx_image_exif_date_backfill` partial index already filters
to `date_taken IS NULL OR date_taken_source = 'fs_time'`, so manual rows
are naturally excluded from the per-tick drain — no index change needed.

`ExifMetadata` now exposes `date_taken_source` + originals so a UI can
render "manually set; was X via filename".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 19:26:43 -04:00
2acc525e73 Merge pull request 'otel: revert HTTP transport, keep gRPC' (#78) from fix/otlp-revert-to-grpc into master
Reviewed-on: #78
2026-05-06 22:36:09 +00:00
Cameron Cordes
ecd49fd053 otel: revert HTTP transport, keep gRPC
The HTTP/protobuf exporter never sent any traffic in prod (tcpdump
on port 4318 showed nothing) despite the receiver path being correct
and the bridge wiring being intact (logs reached journalctl via the
stdout exporter). Likely the BatchLogProcessor + reqwest-client combo
isn't getting the right runtime context, but debugging that on a live
deployment isn't worth holding up the rest of the speedups.

Restoring grpc-tonic transport so prod observability comes back. The
remaining build-time wins on this branch (mold linker, system sqlite3,
profile.dev tweaks, lockfile-only dep refresh) deliver most of the
original savings without touching telemetry. Operator: revert
OTLP_OTLS_ENDPOINT in prod from port 4318 back to 4317.

HTTP transport remains a viable follow-up — needs to be debugged
against a local SigNoz instance with internal SDK error visibility
enabled, on its own branch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 18:33:37 -04:00
c7bd2226cc Merge pull request 'build: speed up debug compile loop' (#77) from feature/build-time-speedups into master
Reviewed-on: #77
2026-05-06 21:41:19 +00:00