memories: single-SQL rewrite + 20-year lookback
Replaces the EXIF-loop + WalkDir-fallback pipeline that powered
`/memories` with a single per-library SQL query
(`get_memories_in_window`) that uses `strftime('%m-%d' | '%W' | '%m',
date_taken, 'unixepoch', tz_offset)` for calendar matching in the
client's timezone, plus a `years_back` lower bound and a
no-future-dates upper bound. Returns only the matching rows; the
handler applies per-library `PathExcluder` post-query and sorts.
Drops:
- `collect_exif_memories` — replaced by the single SQL query.
- `collect_filesystem_memories` — the canonical-date pipeline now
populates `date_taken` for every row at ingest, so the WalkDir
fallback that scanned 14k+ files each request is no longer needed.
- `get_memory_date_with_priority` and friends — request-time waterfall
superseded by `date_resolver` running at ingest. The associated
three priority-tests are dropped; their replacement lives in
`date_resolver::tests`.
On a ~14k-file library this drops `/memories` from 10–15 s
(dominated by `fs::metadata` per row) to single-digit ms.
Bumps `DEFAULT_YEARS_BACK` from 15 → 20 to surface deeper archives
on matching anniversaries.
Note vs. ISO weeks: the original Rust used `chrono::iso_week().week()`
for week-span matching. SQLite's `%W` is Monday-anchored but uses week
0 for days before the first Monday, so it can disagree with ISO at
year boundaries by ±1. Acceptable for nostalgia browsing.
Adds 3 new DAO tests covering month-span filter, library scoping, and
the unknown-span-token guard. Also adds a CLAUDE.md section describing
the canonical-date pipeline end-to-end and the new
`DATE_BACKFILL_MAX_PER_TICK` env var.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
48
CLAUDE.md
48
CLAUDE.md
@@ -364,6 +364,53 @@ Runs in background thread with two-tier strategy:
|
||||
- Batch queries EXIF DB to detect new files
|
||||
- Configurable via `WATCH_QUICK_INTERVAL_SECONDS` and `WATCH_FULL_INTERVAL_SECONDS`
|
||||
|
||||
**Canonical date_taken pipeline (`src/date_resolver.rs`).** Every row's
|
||||
`image_exif.date_taken` is populated at ingest by a four-step waterfall;
|
||||
which step won is recorded in `image_exif.date_taken_source` so the
|
||||
per-tick drain can re-resolve weak entries when better tools become
|
||||
available, and so the UI/debug surface can answer "why did this photo
|
||||
land on this date?". Order:
|
||||
|
||||
1. **`exif`** — kamadak-exif `DateTime` / `DateTimeOriginal`. Fast,
|
||||
in-process, image-only.
|
||||
2. **`exiftool`** — shell-out fallback for tags kamadak can't reach:
|
||||
QuickTime/MP4 (`MediaCreateDate`, `TrackCreateDate`, `CreateDate`),
|
||||
Apple's `ContentCreateDate`, MakerNote sub-IFDs. Required for
|
||||
videos to land a real date. Single-file at ingest; the per-tick
|
||||
drain feeds the whole batch through one `exiftool -@ -` subprocess.
|
||||
Degrades silently when `exiftool` isn't on PATH (resolver caches the
|
||||
"available" check via `OnceLock`).
|
||||
3. **`filename`** — `extract_date_from_filename` in `memories.rs`
|
||||
matches screenshot, chat-export, and timestamp-named patterns.
|
||||
4. **`fs_time`** — `earliest_fs_time(metadata)` (earlier of created /
|
||||
modified). Last resort.
|
||||
|
||||
Notable behavior change vs. the pre-2026-05 request-time logic:
|
||||
**EXIF beats filename when both are present.** A photo named
|
||||
`Screenshot_2014-06-01.png` whose EXIF `DateTime` is 2021 now appears
|
||||
under 2021, not 2014 — on the theory that EXIF is more reliable than
|
||||
import-named filenames. The reverse case (no EXIF, filename has a
|
||||
date) is unchanged.
|
||||
|
||||
The `backfill_missing_date_taken` drain (`src/main.rs`) runs every
|
||||
watcher tick alongside `backfill_unhashed_backlog`. It loads up to
|
||||
`DATE_BACKFILL_MAX_PER_TICK` rows (default 500) where
|
||||
`date_taken IS NULL OR date_taken_source = 'fs_time'` (backed by the
|
||||
`idx_image_exif_date_backfill` partial index), runs the waterfall
|
||||
batch via `resolve_dates_batch`, and writes results via the
|
||||
`backfill_date_taken` DAO method (touches only `date_taken` +
|
||||
`date_taken_source` so EXIF / hash / perceptual columns are
|
||||
preserved). `filename`-sourced rows are intentionally not re-resolved
|
||||
— the regex is authoritative when it matches, and re-running exiftool
|
||||
won't change the answer.
|
||||
|
||||
`/memories` is a single SQL query against this column
|
||||
(`get_memories_in_window` in `src/database/mod.rs`), using
|
||||
`strftime('%m-%d' | '%W' | '%m', date_taken, 'unixepoch', tz)` for
|
||||
calendar matching with the client's timezone offset. The pre-rewrite
|
||||
version stat'd every row and walked the entire library tree — at
|
||||
~14k photos this took 10–15 s; the rewrite is single-digit ms.
|
||||
|
||||
**EXIF Extraction:**
|
||||
- Uses `kamadak-exif` crate
|
||||
- Supports: JPEG, TIFF, RAW (NEF, CR2, CR3), HEIF/HEIC, PNG, WebP
|
||||
@@ -534,6 +581,7 @@ Optional:
|
||||
```bash
|
||||
WATCH_QUICK_INTERVAL_SECONDS=60 # Quick scan interval
|
||||
WATCH_FULL_INTERVAL_SECONDS=3600 # Full scan interval
|
||||
DATE_BACKFILL_MAX_PER_TICK=500 # Cap on canonical-date drain per watcher tick
|
||||
OTLP_OTLS_ENDPOINT=http://... # OpenTelemetry collector (release builds)
|
||||
|
||||
# AI Insights Configuration
|
||||
|
||||
Reference in New Issue
Block a user