indexer: prune EXCLUDED_DIRS at WalkDir time, extract enumerate_indexable_files #63
Reference in New Issue
Block a user
Delete Branch "feature/exclude-dirs-at-index-time"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Synology drops
@eaDir/.../SYNOFILE_THUMB_*.jpgfiles alongside everyphoto. The face-detect pipeline already filters those out via
face_watch::filter_excluded, but the filter runs after the indexerhas already inserted rows into
image_exif. Result: phantom rows whosecontent_hash never matches a
face_detectionsrow, so the anti-join inlist_unscanned_candidatesreturns them every tick. They're filteredout at runtime, no marker is written, and the cycle repeats forever —
log spam, wrong stats denominator, and on a real Synology library the
phantom rows balloon into the hundreds of thousands.
Move the exclusion to the WalkDir pass, where filter_entry can prune
whole subtrees instead of walking and discarding leaves. Extract the
pre-existing 30-line walker chain in main.rs::process_new_files into
file_scan::enumerate_indexable_filesso it's testable in isolation.Six tests cover the bug (eadir prune), nested patterns, absolute-under-base
syntax, non-media filtering, modified_since semantics, and forward-slash
rel_path normalization.
Out of scope (other WalkDir callers in main.rs that don't yet apply
EXCLUDED_DIRS — thumbnail gen at 1309, media scan at 1377, video
playlist scan at 1685, and two nested walks at 1709 / 1743): separate
audit PR.
Operator note: existing phantom rows still need a one-shot cleanup —
DELETE FROM face_detections WHERE content_hash IN (
SELECT content_hash FROM image_exif WHERE rel_path LIKE '%/@eaDir/%'
);
DELETE FROM image_exif WHERE rel_path LIKE '%/@eaDir/%' OR rel_path LIKE '@eaDir/%';
Run before attaching a fresh Synology-sourced library.
Co-Authored-By: Claude Opus 4.7 (1M context) noreply@anthropic.com
a48744c7adtof50655fb21