faces: backfill no longer stalls on chronic-error files at the front

The content-hash backfill capped at 500/tick AND counted errors
against that cap. So a pocket of files that errored every time
(vanished mid-scan, permission denied, unreadable) at the head of the
exif_records iteration order burned the entire budget every tick and
the rest of the backlog never advanced — surfacing as a face-scan
stuck at e.g. 44% with no progress. Without a content_hash, those
photos never become face-detection candidates, so it looks like
detection is broken when really it's the prerequisite hash that
isn't filling.

Two fixes:

  - Cap on successes only. Errors still get counted and logged but
    don't burn the per-tick budget; the loop keeps moving past them
    to the working files behind. Errors are bounded by the unhashed
    backlog size (each record walked at most once per tick), so this
    can't run away.

  - Always log the unhashed backlog count when non-zero. Previously
    "stuck at 44%" looked silent from the outside; now every tick
    surfaces "backfilled N/M; K still need backfill" so an operator
    can tell backfill is making progress (or isn't).

Also bumps the default cap from 500 to 2000. Hashing is cheap (blake3
+ one DB UPDATE), and 500 was conservative for a personal-scale
library where 10k+ unhashed files is a normal first-run state.
This commit is contained in:
Cameron Cordes
2026-04-30 00:03:26 +00:00
parent 891a9982ef
commit 16abacf4c5

View File

@@ -2317,12 +2317,26 @@ fn backfill_missing_content_hashes(
.ok() .ok()
.and_then(|s| s.parse().ok()) .and_then(|s| s.parse().ok())
.filter(|n: &usize| *n > 0) .filter(|n: &usize| *n > 0)
.unwrap_or(500); .unwrap_or(2000);
// Count the unhashed backlog up front so we can surface "still needs
// backfill: N" in the log — without it, a face-scan that's stuck at
// 44% looks stalled when really it's chipping through hashes.
let unhashed_total = exif_records
.iter()
.filter(|r| r.content_hash.is_none())
.count();
let mut backfilled = 0usize; let mut backfilled = 0usize;
let mut errors = 0usize; let mut errors = 0usize;
for record in &exif_records { for record in &exif_records {
if backfilled + errors >= cap { // Cap on successes only — earlier this counted errors too, so a
// pocket of chronically-unhashable files at the front of the
// table (vanished mid-scan, permission denied, etc.) burned the
// budget every tick and the rest of the backlog never advanced.
// Errors are still bounded by `unhashed_total` (the loop walks
// each unhashed record at most once per tick).
if backfilled >= cap {
break; break;
} }
if record.content_hash.is_some() { if record.content_hash.is_some() {
@@ -2362,10 +2376,14 @@ fn backfill_missing_content_hashes(
} }
} }
} }
if backfilled > 0 || errors > 0 { // Always log when there's an unhashed backlog so an operator
// looking at "scan stuck at 44%" can see backfill is running and
// how much remains. Quiet only when there's nothing to do.
if unhashed_total > 0 || backfilled > 0 || errors > 0 {
let remaining = unhashed_total.saturating_sub(backfilled);
info!( info!(
"face_watch: backfilled content_hash for {} file(s) in library '{}' ({} error(s); cap={})", "face_watch: backfilled {}/{} content_hash for library '{}' ({} error(s); {} still need backfill; cap={})",
backfilled, library.name, errors, cap backfilled, unhashed_total, library.name, errors, remaining, cap
); );
} }
} }