file_types: filter macOS AppleDouble + .DS_Store from media predicates #99
Reference in New Issue
Block a user
Delete Branch "feature/filter-fs-metadata"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Symptom: Apollo's logs showed bursts of 422 decode_failed from
ImageApi's CLIP backfill — e.g.
._DSC_2182-S.jpg. macOS writes._<name>AppleDouble sidecars when copying to non-HFS volumes(SMB, FAT, exFAT), and they carry the original file's extension
even though their bytes are extended-attribute metadata, not the
image. ImageApi's walker matched them via the extension predicate,
sent them through the ingest pipeline, and accumulated failed rows
in face_detections + clip_embedding while pinning Apollo's eviction
timer with the 422 burst.
Fix: predicate-level guard in is_image_file / is_video_file (and
by inheritance is_media_file). Every walker that already gates on
these (face_watch, backfill, clip_watch, watcher, files,
probe_clip_search) inherits the skip without per-callsite edits.
Narrow scope on purpose —
._*prefix + the exact.DS_Storebasename — rather than blanket dotfile filtering, because a user
could plausibly name a cover image
.cover.jpg.Existing rows are not cleaned by this change. To purge what
already accumulated (one-shot, run from your DB shell after
deploying):
DELETE FROM image_exif
WHERE file_path LIKE '%/.%' OR file_path LIKE '%/.DS_Store';
DELETE FROM face_detections
WHERE rel_path LIKE '%/.%' OR rel_path LIKE '%/.DS_Store';
DELETE FROM tagged_photo
WHERE file_path LIKE '%/.%' OR file_path LIKE '%/.DS_Store';
DELETE FROM favorites
WHERE path LIKE '%/.%' OR path LIKE '%/.DS_Store';
The maintenance pipeline's missing-file scan would NOT catch these
on its own — the files exist on disk (they're real macOS metadata,
just not images), so stat() returns Ok and the row sticks.