RAW preview: exiftool fallback for MakerNote / SubIFD previews

kamadak-exif's In::PRIMARY / In::THUMBNAIL only address IFD0 and IFD1.
On modern Nikon NEFs the full-res review JPEG lives in the MakerNote's
PreviewIFD (and many Canon CR2s / DNGs put theirs in a SubIFD chain) —
both unreachable through the existing reader, so the previous patch
still produced no preview for those files and the pipeline fell through
to ffmpeg, which writes black frames when it can't decode the RAW.

Add a slow-path layer in extract_embedded_jpeg_preview that shells out
to exiftool for PreviewImage / JpgFromRaw / OtherImage (one process per
tag). All candidates from both layers are pooled and the largest valid
JPEG wins. exiftool not on PATH degrades to fast-path-only behavior
rather than breaking — the fallback is a strict superset.

Documented the new optional dependency in README.md and CLAUDE.md with
install commands for apt / brew / winget / choco.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Cameron Cordes
2026-04-28 17:13:36 +00:00
parent 00b3c80141
commit 6521a328bf
3 changed files with 119 additions and 19 deletions

View File

@@ -111,6 +111,15 @@ All database access goes through trait-based DAOs (e.g., `ExifDao`, `SqliteExifD
2. Creates 200x200 thumbnails in THUMBNAILS directory (mirrors source structure) 2. Creates 200x200 thumbnails in THUMBNAILS directory (mirrors source structure)
3. Videos: extracts frame at 3-second mark via ffmpeg 3. Videos: extracts frame at 3-second mark via ffmpeg
4. Images: uses `image` crate for JPEG/PNG processing 4. Images: uses `image` crate for JPEG/PNG processing
5. RAW formats (NEF/CR2/ARW/DNG/etc.): the `image` crate can't decode RAW
pixel data, so the pipeline pulls an embedded JPEG preview instead. Fast
path is `exif::read_jpeg_at_ifd` against IFD0 (PRIMARY) and IFD1
(THUMBNAIL) — covers most older bodies and DNGs. Slow-path fallback shells
out to **`exiftool`** for `PreviewImage` / `JpgFromRaw` / `OtherImage`,
which reaches MakerNote / SubIFD-hosted previews kamadak-exif can't see
(e.g. Nikon's `PreviewIFD`, where modern Nikon bodies store the full-res
review JPEG). All candidates are pooled and the largest valid JPEG wins.
See `src/exif.rs::extract_embedded_jpeg_preview`.
**File Watching:** **File Watching:**
Runs in background thread with two-tier strategy: Runs in background thread with two-tier strategy:
@@ -364,6 +373,8 @@ Configurable env:
## Dependencies of Note ## Dependencies of Note
### Rust crates
- **actix-web**: HTTP framework - **actix-web**: HTTP framework
- **diesel**: ORM for SQLite - **diesel**: ORM for SQLite
- **jsonwebtoken**: JWT implementation - **jsonwebtoken**: JWT implementation
@@ -374,3 +385,18 @@ Configurable env:
- **opentelemetry**: Distributed tracing - **opentelemetry**: Distributed tracing
- **bcrypt**: Password hashing - **bcrypt**: Password hashing
- **infer**: Magic number file type detection - **infer**: Magic number file type detection
### External binaries (must be on `PATH`)
- **`ffmpeg`** — video thumbnail extraction (`StreamActor`, HLS pipeline) and
the HEIF/HEIC/NEF/ARW thumbnail fallback in `generate_image_thumbnail_ffmpeg`.
Required for any deploy that holds video or HEIF files.
- **`exiftool`** — optional but strongly recommended for RAW-heavy libraries.
The thumbnail pipeline shells out to it as the slow-path fallback for
embedded preview extraction (Nikon MakerNote `PreviewIFD`, Canon SubIFDs,
etc. — anything kamadak-exif's IFD0/IFD1 readers can't reach). Without
exiftool installed, RAWs whose preview lives outside IFD0/IFD1 will fall
through to ffmpeg, which often produces black thumbnails. Install via
package manager: `apt install libimage-exiftool-perl`,
`brew install exiftool`, `winget install OliverBetz.ExifTool`, or
`choco install exiftool`.

View File

@@ -28,14 +28,31 @@ Builds used in development: the `gyan.dev` full build on Windows, and distro `ff
packages on Linux work fine. If HEIC thumbnails silently fail, check packages on Linux work fine. If HEIC thumbnails silently fail, check
`ffmpeg -formats | grep heif` to confirm HEIF support. `ffmpeg -formats | grep heif` to confirm HEIF support.
### RAW photo thumbnails (no extra dependency) ### RAW photo thumbnails
RAW formats (ARW, NEF, CR2, CR3, DNG, RAF, ORF, RW2, PEF, SRW, TIFF) are thumbnailed RAW formats (ARW, NEF, CR2, CR3, DNG, RAF, ORF, RW2, PEF, SRW, TIFF) are thumbnailed
by reading the embedded JPEG preview from the TIFF IFD1 using `kamadak-exif`. No by reading an embedded JPEG preview out of the TIFF container — no external RAW
external RAW decoder (libraw / dcraw) is required. Files without an embedded preview decoder (libraw / dcraw) is involved. The pipeline tries two layers in order and
fall back to ffmpeg (works for most NEF files), and anything that still can't be keeps the largest valid JPEG:
decoded is marked with a `<thumb>.unsupported` sentinel in the thumbnail directory
so we don't retry it every scan. Delete those sentinels to force retries after a 1. **Fast path (no extra dependency)**`kamadak-exif` reads
tooling upgrade. `JPEGInterchangeFormat` from IFD0 / IFD1 directly. Covers older bodies and
most DNGs.
2. **`exiftool` fallback (recommended for RAW-heavy libraries)** — shells out
to extract `PreviewImage` / `JpgFromRaw` / `OtherImage`, which reaches
MakerNote and SubIFD-hosted previews kamadak-exif can't see (e.g. Nikon's
`PreviewIFD`, where modern Nikon bodies stash the full-res review JPEG).
If `exiftool` isn't on `PATH` this layer is skipped silently and only the
fast-path result is used.
Install `exiftool` via your package manager:
- macOS: `brew install exiftool`
- Linux (Debian/Ubuntu): `apt install libimage-exiftool-perl`
- Windows: `winget install OliverBetz.ExifTool` or `choco install exiftool`
Files where neither layer produces a valid preview fall back to ffmpeg. Anything
that still can't be decoded is marked with a `<thumb>.unsupported` sentinel in
the thumbnail directory so we don't retry it every scan. Delete those sentinels
(and any cached black thumbnails) to force retries after a tooling upgrade.
## Environment ## Environment
There are a handful of required environment variables to have the API run. There are a handful of required environment variables to have the API run.

View File

@@ -1,6 +1,7 @@
use std::fs::File; use std::fs::File;
use std::io::{BufReader, Read, Seek, SeekFrom}; use std::io::{BufReader, Read, Seek, SeekFrom};
use std::path::Path; use std::path::Path;
use std::process::Command;
use anyhow::{Result, anyhow}; use anyhow::{Result, anyhow};
use exif::{In, Reader, Tag, Value}; use exif::{In, Reader, Tag, Value};
@@ -70,6 +71,55 @@ fn read_jpeg_at_ifd(exif: &exif::Exif, path: &Path, ifd: In) -> Option<Vec<u8>>
Some(buf) Some(buf)
} }
/// Tags exiftool exposes for embedded JPEG previews, in priority order. The
/// largest valid JPEG returned by any of them wins. Different camera makers
/// stash their largest preview under different names: Nikon's full-res
/// preview lives under `PreviewImage` in the MakerNote `PreviewIFD`, Canon /
/// Sony often expose theirs as `JpgFromRaw`, and `OtherImage` is a catch-all
/// some sub-IFD chains use.
const EXIFTOOL_PREVIEW_TAGS: &[&str] = &["PreviewImage", "JpgFromRaw", "OtherImage"];
/// Shell out to `exiftool -b -<tag>` for one tag. Returns the response bytes
/// only if exiftool succeeded AND the bytes start with the JPEG SOI marker
/// (some MakerNote tags hold TIFF-wrapped previews or other non-JPEG payloads
/// we can't load).
fn extract_exiftool_tag(path: &Path, tag: &str) -> Option<Vec<u8>> {
let output = Command::new("exiftool")
.arg("-b")
.arg(format!("-{}", tag))
.arg(path)
.output()
.ok()?;
if !output.status.success() {
return None;
}
let bytes = output.stdout;
if bytes.len() < 2 || bytes[0] != 0xFF || bytes[1] != 0xD8 {
return None;
}
Some(bytes)
}
/// Try each EXIFTOOL_PREVIEW_TAGS in turn and return the largest valid JPEG.
/// If `exiftool` isn't on PATH the very first spawn returns `None` and we
/// silently bail — callers fall back to whatever the IFD0/IFD1 fast path
/// found.
fn extract_preview_via_exiftool(path: &Path) -> Option<Vec<u8>> {
let mut best: Option<Vec<u8>> = None;
for &tag in EXIFTOOL_PREVIEW_TAGS {
let Some(bytes) = extract_exiftool_tag(path, tag) else {
continue;
};
match &best {
None => best = Some(bytes),
Some(b) if b.len() < bytes.len() => best = Some(bytes),
_ => {}
}
}
best
}
/// Returns the bytes of the embedded JPEG preview in a TIFF-based RAW or /// Returns the bytes of the embedded JPEG preview in a TIFF-based RAW or
/// TIFF file. Used to thumbnail formats whose RAW pixel data can't be decoded /// TIFF file. Used to thumbnail formats whose RAW pixel data can't be decoded
/// by our normal tools (e.g. Sony ARW), and to serve a usable full-size /// by our normal tools (e.g. Sony ARW), and to serve a usable full-size
@@ -77,12 +127,20 @@ fn read_jpeg_at_ifd(exif: &exif::Exif, path: &Path, ifd: In) -> Option<Vec<u8>>
/// `None` if no preview is present, the file isn't a TIFF container, or the /// `None` if no preview is present, the file isn't a TIFF container, or the
/// data doesn't look like a valid JPEG. /// data doesn't look like a valid JPEG.
/// ///
/// Both IFD0 (PRIMARY) and IFD1 (THUMBNAIL) are checked, preferring the /// Strategy:
/// larger valid JPEG. Conventions vary by camera: most modern Nikon NEFs /// 1. Fast path: read `JPEGInterchangeFormat` from IFD0 (PRIMARY) and IFD1
/// expose the larger reduced-resolution preview (~12 MP) via IFD0 and a /// (THUMBNAIL) directly via kamadak-exif. No subprocess, no external
/// small chip via IFD1; some bodies leave one or the other empty or zero- /// dependency.
/// length, and an earlier THUMBNAIL-only implementation produced black /// 2. Slow path: shell out to `exiftool -b -<tag>` for each of
/// thumbnails for any NEF whose IFD1 thumbnail was missing or corrupted. /// `PreviewImage` / `JpgFromRaw` / `OtherImage`. kamadak-exif can't
/// reach SubIFDs or MakerNote sub-IFDs, but most modern Nikon bodies
/// stash their large preview JPEG in the Nikon MakerNote's PreviewIFD;
/// Canon / Sony often use `JpgFromRaw` in a SubIFD chain. Skipped
/// gracefully if exiftool isn't on PATH.
///
/// All candidates are pooled and the largest valid JPEG wins, so a deploy
/// without exiftool degrades to "fast-path only" behavior rather than
/// breaking outright.
pub fn extract_embedded_jpeg_preview(path: &Path) -> Option<Vec<u8>> { pub fn extract_embedded_jpeg_preview(path: &Path) -> Option<Vec<u8>> {
if !is_tiff_raw(path) { if !is_tiff_raw(path) {
return None; return None;
@@ -94,13 +152,12 @@ pub fn extract_embedded_jpeg_preview(path: &Path) -> Option<Vec<u8>> {
let primary = read_jpeg_at_ifd(&exif, path, In::PRIMARY); let primary = read_jpeg_at_ifd(&exif, path, In::PRIMARY);
let thumbnail = read_jpeg_at_ifd(&exif, path, In::THUMBNAIL); let thumbnail = read_jpeg_at_ifd(&exif, path, In::THUMBNAIL);
let exiftool = extract_preview_via_exiftool(path);
match (primary, thumbnail) { [primary, thumbnail, exiftool]
(Some(p), Some(t)) => Some(if p.len() >= t.len() { p } else { t }), .into_iter()
(Some(p), None) => Some(p), .flatten()
(None, Some(t)) => Some(t), .max_by_key(|v| v.len())
(None, None) => None,
}
} }
pub fn supports_exif(path: &Path) -> bool { pub fn supports_exif(path: &Path) -> bool {