indexer: prune EXCLUDED_DIRS at WalkDir time, extract enumerate_indexable_files
Synology drops `@eaDir/.../SYNOFILE_THUMB_*.jpg` files alongside every
photo. The face-detect pipeline already filters those out via
`face_watch::filter_excluded`, but the filter runs *after* the indexer
has already inserted rows into `image_exif`. Result: phantom rows whose
content_hash never matches a `face_detections` row, so the anti-join in
`list_unscanned_candidates` returns them every tick. They're filtered
out at runtime, no marker is written, and the cycle repeats forever —
log spam, wrong stats denominator, and on a real Synology library the
phantom rows balloon into the hundreds of thousands.
Move the exclusion to the WalkDir pass, where filter_entry can prune
whole subtrees instead of walking and discarding leaves. Extract the
pre-existing 30-line walker chain in main.rs::process_new_files into
`file_scan::enumerate_indexable_files` so it's testable in isolation.
Six tests cover the bug (eadir prune), nested patterns, absolute-under-base
syntax, non-media filtering, modified_since semantics, and forward-slash
rel_path normalization.
Out of scope (other WalkDir callers in main.rs that don't yet apply
EXCLUDED_DIRS — thumbnail gen at 1309, media scan at 1377, video
playlist scan at 1685, and two nested walks at 1709 / 1743): separate
audit PR.
Operator note: existing phantom rows still need a one-shot cleanup —
DELETE FROM face_detections WHERE content_hash IN (
SELECT content_hash FROM image_exif WHERE rel_path LIKE '%/@eaDir/%'
);
DELETE FROM image_exif WHERE rel_path LIKE '%/@eaDir/%' OR rel_path LIKE '@eaDir/%';
Run before attaching a fresh Synology-sourced library.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
36
src/main.rs
36
src/main.rs
@@ -1974,37 +1974,11 @@ fn process_new_files(
|
||||
let thumbnail_directory = Path::new(&thumbs);
|
||||
let base_path = Path::new(&library.root_path);
|
||||
|
||||
// Collect all image and video files, optionally filtered by modification time
|
||||
let files: Vec<(PathBuf, String)> = WalkDir::new(base_path)
|
||||
.into_iter()
|
||||
.filter_map(|entry| entry.ok())
|
||||
.filter(|entry| entry.file_type().is_file())
|
||||
.filter(|entry| {
|
||||
// Filter by modification time if specified
|
||||
if let Some(since) = modified_since {
|
||||
if let Ok(metadata) = entry.metadata()
|
||||
&& let Ok(modified) = metadata.modified()
|
||||
{
|
||||
return modified >= since;
|
||||
}
|
||||
// If we can't get metadata, include the file to be safe
|
||||
return true;
|
||||
}
|
||||
true
|
||||
})
|
||||
.filter(|entry| is_image(entry) || is_video(entry))
|
||||
.filter_map(|entry| {
|
||||
let file_path = entry.path().to_path_buf();
|
||||
// Canonical rel_path is forward-slash regardless of OS so DB
|
||||
// comparisons against the batch EXIF lookup line up.
|
||||
let relative_path = file_path
|
||||
.strip_prefix(base_path)
|
||||
.ok()?
|
||||
.to_str()?
|
||||
.replace('\\', "/");
|
||||
Some((file_path, relative_path))
|
||||
})
|
||||
.collect();
|
||||
// Walk, prune EXCLUDED_DIRS subtrees, and apply image/video + modified_since
|
||||
// filters. See `file_scan` for why exclusion has to happen at WalkDir
|
||||
// time (filter_entry) rather than at face-detect time.
|
||||
let files: Vec<(PathBuf, String)> =
|
||||
image_api::file_scan::enumerate_indexable_files(base_path, excluded_dirs, modified_since);
|
||||
|
||||
if files.is_empty() {
|
||||
debug!("No files to process");
|
||||
|
||||
Reference in New Issue
Block a user