feat/raw-thumb-embedded-preview #59

Merged
cameron merged 2 commits from feat/raw-thumb-embedded-preview into master 2026-04-28 17:21:29 +00:00
4 changed files with 166 additions and 26 deletions

View File

@@ -111,6 +111,15 @@ All database access goes through trait-based DAOs (e.g., `ExifDao`, `SqliteExifD
2. Creates 200x200 thumbnails in THUMBNAILS directory (mirrors source structure) 2. Creates 200x200 thumbnails in THUMBNAILS directory (mirrors source structure)
3. Videos: extracts frame at 3-second mark via ffmpeg 3. Videos: extracts frame at 3-second mark via ffmpeg
4. Images: uses `image` crate for JPEG/PNG processing 4. Images: uses `image` crate for JPEG/PNG processing
5. RAW formats (NEF/CR2/ARW/DNG/etc.): the `image` crate can't decode RAW
pixel data, so the pipeline pulls an embedded JPEG preview instead. Fast
path is `exif::read_jpeg_at_ifd` against IFD0 (PRIMARY) and IFD1
(THUMBNAIL) — covers most older bodies and DNGs. Slow-path fallback shells
out to **`exiftool`** for `PreviewImage` / `JpgFromRaw` / `OtherImage`,
which reaches MakerNote / SubIFD-hosted previews kamadak-exif can't see
(e.g. Nikon's `PreviewIFD`, where modern Nikon bodies store the full-res
review JPEG). All candidates are pooled and the largest valid JPEG wins.
See `src/exif.rs::extract_embedded_jpeg_preview`.
**File Watching:** **File Watching:**
Runs in background thread with two-tier strategy: Runs in background thread with two-tier strategy:
@@ -364,6 +373,8 @@ Configurable env:
## Dependencies of Note ## Dependencies of Note
### Rust crates
- **actix-web**: HTTP framework - **actix-web**: HTTP framework
- **diesel**: ORM for SQLite - **diesel**: ORM for SQLite
- **jsonwebtoken**: JWT implementation - **jsonwebtoken**: JWT implementation
@@ -374,3 +385,18 @@ Configurable env:
- **opentelemetry**: Distributed tracing - **opentelemetry**: Distributed tracing
- **bcrypt**: Password hashing - **bcrypt**: Password hashing
- **infer**: Magic number file type detection - **infer**: Magic number file type detection
### External binaries (must be on `PATH`)
- **`ffmpeg`** — video thumbnail extraction (`StreamActor`, HLS pipeline) and
the HEIF/HEIC/NEF/ARW thumbnail fallback in `generate_image_thumbnail_ffmpeg`.
Required for any deploy that holds video or HEIF files.
- **`exiftool`** — optional but strongly recommended for RAW-heavy libraries.
The thumbnail pipeline shells out to it as the slow-path fallback for
embedded preview extraction (Nikon MakerNote `PreviewIFD`, Canon SubIFDs,
etc. — anything kamadak-exif's IFD0/IFD1 readers can't reach). Without
exiftool installed, RAWs whose preview lives outside IFD0/IFD1 will fall
through to ffmpeg, which often produces black thumbnails. Install via
package manager: `apt install libimage-exiftool-perl`,
`brew install exiftool`, `winget install OliverBetz.ExifTool`, or
`choco install exiftool`.

View File

@@ -28,14 +28,31 @@ Builds used in development: the `gyan.dev` full build on Windows, and distro `ff
packages on Linux work fine. If HEIC thumbnails silently fail, check packages on Linux work fine. If HEIC thumbnails silently fail, check
`ffmpeg -formats | grep heif` to confirm HEIF support. `ffmpeg -formats | grep heif` to confirm HEIF support.
### RAW photo thumbnails (no extra dependency) ### RAW photo thumbnails
RAW formats (ARW, NEF, CR2, CR3, DNG, RAF, ORF, RW2, PEF, SRW, TIFF) are thumbnailed RAW formats (ARW, NEF, CR2, CR3, DNG, RAF, ORF, RW2, PEF, SRW, TIFF) are thumbnailed
by reading the embedded JPEG preview from the TIFF IFD1 using `kamadak-exif`. No by reading an embedded JPEG preview out of the TIFF container — no external RAW
external RAW decoder (libraw / dcraw) is required. Files without an embedded preview decoder (libraw / dcraw) is involved. The pipeline tries two layers in order and
fall back to ffmpeg (works for most NEF files), and anything that still can't be keeps the largest valid JPEG:
decoded is marked with a `<thumb>.unsupported` sentinel in the thumbnail directory
so we don't retry it every scan. Delete those sentinels to force retries after a 1. **Fast path (no extra dependency)**`kamadak-exif` reads
tooling upgrade. `JPEGInterchangeFormat` from IFD0 / IFD1 directly. Covers older bodies and
most DNGs.
2. **`exiftool` fallback (recommended for RAW-heavy libraries)** — shells out
to extract `PreviewImage` / `JpgFromRaw` / `OtherImage`, which reaches
MakerNote and SubIFD-hosted previews kamadak-exif can't see (e.g. Nikon's
`PreviewIFD`, where modern Nikon bodies stash the full-res review JPEG).
If `exiftool` isn't on `PATH` this layer is skipped silently and only the
fast-path result is used.
Install `exiftool` via your package manager:
- macOS: `brew install exiftool`
- Linux (Debian/Ubuntu): `apt install libimage-exiftool-perl`
- Windows: `winget install OliverBetz.ExifTool` or `choco install exiftool`
Files where neither layer produces a valid preview fall back to ffmpeg. Anything
that still can't be decoded is marked with a `<thumb>.unsupported` sentinel in
the thumbnail directory so we don't retry it every scan. Delete those sentinels
(and any cached black thumbnails) to force retries after a tooling upgrade.
## Environment ## Environment
There are a handful of required environment variables to have the API run. There are a handful of required environment variables to have the API run.

View File

@@ -1,6 +1,7 @@
use std::fs::File; use std::fs::File;
use std::io::{BufReader, Read, Seek, SeekFrom}; use std::io::{BufReader, Read, Seek, SeekFrom};
use std::path::Path; use std::path::Path;
use std::process::Command;
use anyhow::{Result, anyhow}; use anyhow::{Result, anyhow};
use exif::{In, Reader, Tag, Value}; use exif::{In, Reader, Tag, Value};
@@ -28,7 +29,7 @@ pub struct ExifData {
/// TIFF-based RAW formats where `JPEGInterchangeFormat` offsets are /// TIFF-based RAW formats where `JPEGInterchangeFormat` offsets are
/// absolute file offsets (the file itself is a TIFF container). /// absolute file offsets (the file itself is a TIFF container).
fn is_tiff_raw(path: &Path) -> bool { pub fn is_tiff_raw(path: &Path) -> bool {
matches!( matches!(
path.extension() path.extension()
.and_then(|e| e.to_str()) .and_then(|e| e.to_str())
@@ -40,26 +41,18 @@ fn is_tiff_raw(path: &Path) -> bool {
) )
} }
/// Returns the bytes of the embedded JPEG thumbnail in a TIFF-based RAW or /// Read the JPEG bytes pointed to by `JPEGInterchangeFormat` /
/// TIFF file. Used to thumbnail formats whose RAW pixel data can't be decoded /// `JPEGInterchangeFormatLength` in a single IFD. Returns `None` on any
/// by our normal tools (e.g. Sony ARW). Returns `None` if no preview is /// failure: tags missing, length zero, file read failure, or bytes that
/// present, the file isn't a TIFF container, or the data doesn't look like /// don't start with the JPEG SOI marker (some MakerNote pointers reference
/// a valid JPEG. /// TIFF-wrapped previews or other non-JPEG payloads we can't load).
pub fn extract_embedded_jpeg_preview(path: &Path) -> Option<Vec<u8>> { fn read_jpeg_at_ifd(exif: &exif::Exif, path: &Path, ifd: In) -> Option<Vec<u8>> {
if !is_tiff_raw(path) {
return None;
}
let file = File::open(path).ok()?;
let mut bufreader = BufReader::new(file);
let exif = Reader::new().read_from_container(&mut bufreader).ok()?;
let offset = exif let offset = exif
.get_field(Tag::JPEGInterchangeFormat, In::THUMBNAIL)? .get_field(Tag::JPEGInterchangeFormat, ifd)?
.value .value
.get_uint(0)?; .get_uint(0)?;
let length = exif let length = exif
.get_field(Tag::JPEGInterchangeFormatLength, In::THUMBNAIL)? .get_field(Tag::JPEGInterchangeFormatLength, ifd)?
.value .value
.get_uint(0)?; .get_uint(0)?;
if length == 0 { if length == 0 {
@@ -71,8 +64,6 @@ pub fn extract_embedded_jpeg_preview(path: &Path) -> Option<Vec<u8>> {
let mut buf = vec![0u8; length as usize]; let mut buf = vec![0u8; length as usize];
file.read_exact(&mut buf).ok()?; file.read_exact(&mut buf).ok()?;
// JPEG SOI marker sanity check — MakerNote offsets sometimes point at
// TIFF-wrapped previews or other non-JPEG data.
if buf.len() < 2 || buf[0] != 0xFF || buf[1] != 0xD8 { if buf.len() < 2 || buf[0] != 0xFF || buf[1] != 0xD8 {
return None; return None;
} }
@@ -80,6 +71,95 @@ pub fn extract_embedded_jpeg_preview(path: &Path) -> Option<Vec<u8>> {
Some(buf) Some(buf)
} }
/// Tags exiftool exposes for embedded JPEG previews, in priority order. The
/// largest valid JPEG returned by any of them wins. Different camera makers
/// stash their largest preview under different names: Nikon's full-res
/// preview lives under `PreviewImage` in the MakerNote `PreviewIFD`, Canon /
/// Sony often expose theirs as `JpgFromRaw`, and `OtherImage` is a catch-all
/// some sub-IFD chains use.
const EXIFTOOL_PREVIEW_TAGS: &[&str] = &["PreviewImage", "JpgFromRaw", "OtherImage"];
/// Shell out to `exiftool -b -<tag>` for one tag. Returns the response bytes
/// only if exiftool succeeded AND the bytes start with the JPEG SOI marker
/// (some MakerNote tags hold TIFF-wrapped previews or other non-JPEG payloads
/// we can't load).
fn extract_exiftool_tag(path: &Path, tag: &str) -> Option<Vec<u8>> {
let output = Command::new("exiftool")
.arg("-b")
.arg(format!("-{}", tag))
.arg(path)
.output()
.ok()?;
if !output.status.success() {
return None;
}
let bytes = output.stdout;
if bytes.len() < 2 || bytes[0] != 0xFF || bytes[1] != 0xD8 {
return None;
}
Some(bytes)
}
/// Try each EXIFTOOL_PREVIEW_TAGS in turn and return the largest valid JPEG.
/// If `exiftool` isn't on PATH the very first spawn returns `None` and we
/// silently bail — callers fall back to whatever the IFD0/IFD1 fast path
/// found.
fn extract_preview_via_exiftool(path: &Path) -> Option<Vec<u8>> {
let mut best: Option<Vec<u8>> = None;
for &tag in EXIFTOOL_PREVIEW_TAGS {
let Some(bytes) = extract_exiftool_tag(path, tag) else {
continue;
};
match &best {
None => best = Some(bytes),
Some(b) if b.len() < bytes.len() => best = Some(bytes),
_ => {}
}
}
best
}
/// Returns the bytes of the embedded JPEG preview in a TIFF-based RAW or
/// TIFF file. Used to thumbnail formats whose RAW pixel data can't be decoded
/// by our normal tools (e.g. Sony ARW), and to serve a usable full-size
/// image for clients that can't decode the RAW container directly. Returns
/// `None` if no preview is present, the file isn't a TIFF container, or the
/// data doesn't look like a valid JPEG.
///
/// Strategy:
/// 1. Fast path: read `JPEGInterchangeFormat` from IFD0 (PRIMARY) and IFD1
/// (THUMBNAIL) directly via kamadak-exif. No subprocess, no external
/// dependency.
/// 2. Slow path: shell out to `exiftool -b -<tag>` for each of
/// `PreviewImage` / `JpgFromRaw` / `OtherImage`. kamadak-exif can't
/// reach SubIFDs or MakerNote sub-IFDs, but most modern Nikon bodies
/// stash their large preview JPEG in the Nikon MakerNote's PreviewIFD;
/// Canon / Sony often use `JpgFromRaw` in a SubIFD chain. Skipped
/// gracefully if exiftool isn't on PATH.
///
/// All candidates are pooled and the largest valid JPEG wins, so a deploy
/// without exiftool degrades to "fast-path only" behavior rather than
/// breaking outright.
pub fn extract_embedded_jpeg_preview(path: &Path) -> Option<Vec<u8>> {
if !is_tiff_raw(path) {
return None;
}
let file = File::open(path).ok()?;
let mut bufreader = BufReader::new(file);
let exif = Reader::new().read_from_container(&mut bufreader).ok()?;
let primary = read_jpeg_at_ifd(&exif, path, In::PRIMARY);
let thumbnail = read_jpeg_at_ifd(&exif, path, In::THUMBNAIL);
let exiftool = extract_preview_via_exiftool(path);
[primary, thumbnail, exiftool]
.into_iter()
.flatten()
.max_by_key(|v| v.len())
}
pub fn supports_exif(path: &Path) -> bool { pub fn supports_exif(path: &Path) -> bool {
if let Some(ext) = path.extension() { if let Some(ext) = path.extension() {
let ext_lower = ext.to_string_lossy().to_lowercase(); let ext_lower = ext.to_string_lossy().to_lowercase();

View File

@@ -215,6 +215,23 @@ async fn get_image(
} }
} }
// Full-size requests for RAW formats (NEF/CR2/ARW/etc.) can't just
// NamedFile-stream the original bytes — browsers won't decode the
// RAW container, so a `<img src=...>` lands as a broken image. Serve
// the embedded JPEG preview instead (typically the camera's in-body
// review JPEG, ~12 MP). Falls through to NamedFile if no preview is
// available, which preserves the historical behavior for callers
// that genuinely want the original bytes.
if image_size == PhotoSize::Full && exif::is_tiff_raw(&path) {
if let Some(preview) = exif::extract_embedded_jpeg_preview(&path) {
span.set_status(Status::Ok);
return HttpResponse::Ok()
.content_type("image/jpeg")
.insert_header(("Cache-Control", "public, max-age=3600"))
.body(preview);
}
}
if let Ok(file) = NamedFile::open(&path) { if let Ok(file) = NamedFile::open(&path) {
span.set_status(Status::Ok); span.set_status(Status::Ok);
// Enable ETag and set cache headers for full images (1 hour cache) // Enable ETag and set cache headers for full images (1 hour cache)