Drop fs_time from date-backfill eligibility
The drain queried `date_taken IS NULL OR date_taken_source = 'fs_time'`
ORDER BY id ASC LIMIT 500 every watcher tick. The resolver is
deterministic on file bytes + filename + fs metadata, so any row that
landed on fs_time once landed there again on every retry — the drain
spun on the same lowest-id rows in perpetuity, never advancing to
rows 501+ while still logging more_remain=true.
Side effect: 500 auto-commit UPDATEs per tick sustained the SQLite
write lock long enough that other writers on separate DAO connections
hit the 5s busy_timeout. Manifested as intermittent 500s on
PATCH /image/faces/{id} that succeeded on retry.
Narrow the partial index and query predicate to `date_taken IS NULL`.
If exiftool installs or a new filename regex lands, an operator can
re-resolve fs_time rows out-of-band rather than re-introducing the
steady-state churn.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
23
CLAUDE.md
23
CLAUDE.md
@@ -398,14 +398,21 @@ date) is unchanged.
|
||||
The `backfill_missing_date_taken` drain (`src/backfill.rs`) runs every
|
||||
watcher tick alongside `backfill_unhashed_backlog` (also `src/backfill.rs`). It loads up to
|
||||
`DATE_BACKFILL_MAX_PER_TICK` rows (default 500) where
|
||||
`date_taken IS NULL OR date_taken_source = 'fs_time'` (backed by the
|
||||
`idx_image_exif_date_backfill` partial index), runs the waterfall
|
||||
batch via `resolve_dates_batch`, and writes results via the
|
||||
`backfill_date_taken` DAO method (touches only `date_taken` +
|
||||
`date_taken_source` so EXIF / hash / perceptual columns are
|
||||
preserved). `filename`-sourced rows are intentionally not re-resolved
|
||||
— the regex is authoritative when it matches, and re-running exiftool
|
||||
won't change the answer.
|
||||
`date_taken IS NULL` (backed by the `idx_image_exif_date_backfill`
|
||||
partial index), runs the waterfall batch via `resolve_dates_batch`,
|
||||
and writes results via the `backfill_date_taken` DAO method (touches
|
||||
only `date_taken` + `date_taken_source` so EXIF / hash / perceptual
|
||||
columns are preserved). Resolved rows — including the ones the
|
||||
waterfall could only resolve via `fs_time` — are not re-eligible:
|
||||
the resolver is deterministic on file bytes + filename + fs metadata,
|
||||
so re-running on the same inputs lands on the same source every time.
|
||||
An earlier version included `date_taken_source = 'fs_time'` in the
|
||||
eligibility predicate, but with `ORDER BY id ASC LIMIT 500` it spun on
|
||||
the same lowest-id rows in perpetuity and held the SQLite write lock
|
||||
long enough to starve face-PATCH writers (5s busy_timeout → 500). If
|
||||
a stronger tool comes online (exiftool install, new filename regex),
|
||||
re-resolve out-of-band rather than re-introducing the steady-state
|
||||
eligibility.
|
||||
|
||||
`/memories` is a single SQL query against this column
|
||||
(`get_memories_in_window` in `src/database/mod.rs`), using
|
||||
|
||||
Reference in New Issue
Block a user