ImageApi

Author	SHA1	Message	Date
Cameron Cordes	6e90f24307	Reels: burst beats + duration budget for week/month, plus step logging Restructures a reel around beats — one narration line over one or more photos — instead of one line per photo. A single-photo beat is a held shot; a multi-photo beat is a quick burst that flashes through several moments of an event while the line is read. So a week/month reel can show everything it spans without a narrated (and timed) segment per photo. Selection (selector.rs): - Duration budget: cap the number of narrated beats to ~REEL_TARGET_SECONDS (default 90, env-tunable) so week/month reels don't run minutes long. - Event clustering by time gap; when there are more events than the beat budget, adjacent events merge so the whole span stays covered. Each beat bursts up to MAX_BURST_PHOTOS (an even spread), so a 40-shot dinner contributes a handful of quick frames, not forty narrated seconds. Render (render.rs): a beat renders its photos as a concat of per-photo fills (blurred-bg portrait, fps-before-fade) under one muxed narration; burst photos get a snappier fade. beat_durations splits the narration across the photos, stretching only if a long burst would flash too fast. Adds high-level info logs across the steps (request → script → per-beat narrate/render → join → done with elapsed) for visibility. Bumps RENDER_VERSION to re-render cached reels. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-12 23:43:18 -04:00
Cameron Cordes	740fc4d841	Reels: fix steppy fade (fps before fade) and ease the expression bump The fade looked steppy/low-frame-rate because the filtergraph normalized fps AFTER the fade filters: the brightness ramp was sampled at the looped still's coarse input cadence, then duplicated up to 30fps. Move fps ahead of the fades, pin the still's input framerate (-framerate), and force CFR output (-r) so the dip ramps across a full 30 frames and plays steadily. Ease narration expressiveness from 0.7 to 0.6 (still tunable via REEL_TTS_EXAGGERATION). Bump RENDER_VERSION so existing reels re-render. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-12 23:20:52 -04:00
Cameron Cordes	7715a7a905	Reels: portrait canvas with blurred fill, fade transitions, warmer TTS Fixes the "image is tiny" problem: a 1920x1080 landscape reel letterboxes to a ~25%-height band on a portrait phone. Switch to a portrait 1080x1920 canvas and fill it per photo with a blurred, zoomed copy of the image behind the sharp fitted photo — so the frame is always full regardless of the photo's orientation, with no black bars and no cropping of the subject. Add a quick 0.35s fade in/out baked into each segment so concatenated photos dip smoothly instead of hard-cutting (fade-out lands in the narration's silent tail, so speech isn't clipped). Drop the unused Ken Burns branch — motion can return deliberately later. Warm up the narration a touch: thread Chatterbox's `exaggeration` through synthesize_serialized and default reels to 0.7 (tunable via REEL_TTS_EXAGGERATION). Bump RENDER_VERSION so existing landscape reels re-render. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-12 23:10:26 -04:00
Cameron Cordes	42453d5786	Fix reel concat: force -f mp4 for the .tmp output path The concat stage wrote to <key>.mp4.tmp (for an atomic publish-rename), but ffmpeg infers the muxer from the output extension and can't map .tmp to a format — "Unable to choose an output format". Force the mp4 muxer explicitly so the temp extension is irrelevant. Segment render, NVENC, TTS, and scripting were already working end-to-end; this was the only failure, at the final join. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-12 22:56:48 -04:00
Cameron Cordes	e3f731b3b2	Add memory-reel backend: on-demand narrated photo slideshow New POST /reels + GET /reels/{id} (+ /video) build an MP4 slideshow of a memory span (day/week/month), narrated by the LLM in a cloned voice. Pipeline (src/reels/): a selector resolves which photos + reel metadata, the scripter writes one narration line per photo via a single LLM call (reusing each photo's cached insight as context — no fresh vision calls, so reel generation stays off the GPU's vision slot), each line is synthesized to speech, and the renderer assembles stills + narration via ffmpeg. Jobs run in the background (mirroring the TTS speech-job registry) since a reel takes minutes; the finished MP4 is cached on disk keyed by the selection so a repeat request is instant. The segment model is media-typed (Photo today) so a video-clip segment (phase 2) and a nightly pre-render (phase 3) slot in without reworking the pipeline. Ken Burns motion is implemented but defaulted off pending a visual check on the GPU box. Supporting changes: - memories: extract gather_memory_items() so the reel selector reuses the exact window/exclusion/tz/sort logic behind /memories. - ai::tts: add synthesize_serialized() so reel narration honors the same single-GPU permit + write lease as user TTS requests. - video::ffmpeg: make get_duration_seconds() pub for narration timing. - AppState: reels_path (REELS_DIRECTORY, defaults beside preview clips). Pure logic (cache key, script parsing, ffmpeg arg/filter construction, even sampling, segment timing) is unit-tested (26 tests). The runtime path (ffmpeg render, TTS, LLM) needs a real run on the GPU host to verify end-to-end — not exercisable in CI. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-12 22:31:08 -04:00

5 Commits