Reels: burst beats + duration budget for week/month, plus step logging
Restructures a reel around beats — one narration line over one or more photos — instead of one line per photo. A single-photo beat is a held shot; a multi-photo beat is a quick burst that flashes through several moments of an event while the line is read. So a week/month reel can show everything it spans without a narrated (and timed) segment per photo. Selection (selector.rs): - Duration budget: cap the number of narrated beats to ~REEL_TARGET_SECONDS (default 90, env-tunable) so week/month reels don't run minutes long. - Event clustering by time gap; when there are more events than the beat budget, adjacent events merge so the whole span stays covered. Each beat bursts up to MAX_BURST_PHOTOS (an even spread), so a 40-shot dinner contributes a handful of quick frames, not forty narrated seconds. Render (render.rs): a beat renders its photos as a concat of per-photo fills (blurred-bg portrait, fps-before-fade) under one muxed narration; burst photos get a snappier fade. beat_durations splits the narration across the photos, stretching only if a long burst would flash too fast. Adds high-level info logs across the steps (request → script → per-beat narrate/render → join → done with elapsed) for visibility. Bumps RENDER_VERSION to re-render cached reels. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
+61
-34
@@ -1,10 +1,11 @@
|
||||
//! Narration scripting for memory reels.
|
||||
//!
|
||||
//! One LLM call turns the planned segments (each carrying its date and, where
|
||||
//! One LLM call turns the planned beats (each carrying its date and, where
|
||||
//! available, its cached insight) into a short first-person narration line per
|
||||
//! photo plus a title for the reel. We reuse the cached insight summary as the
|
||||
//! richest per-photo signal rather than re-running vision at reel time — that
|
||||
//! keeps reel generation off the GPU's vision slot entirely.
|
||||
//! beat plus a title for the reel. A beat may show several photos in a quick
|
||||
//! burst, so a line narrates the *moment*, not a single frame. We reuse the
|
||||
//! cached insight summary as the richest signal rather than re-running vision
|
||||
//! at reel time — that keeps reel generation off the GPU's vision slot.
|
||||
//!
|
||||
//! The prompt builder and response parser are pure so the contract is
|
||||
//! unit-testable; `generate_script` wires them to the LLM client.
|
||||
@@ -12,11 +13,11 @@
|
||||
use anyhow::{Context, Result};
|
||||
use std::sync::Arc;
|
||||
|
||||
use super::{PlannedSegment, ReelMeta};
|
||||
use super::{PlannedBeat, ReelMeta};
|
||||
use crate::ai::llamacpp::LlamaCppClient;
|
||||
use crate::ai::llm_client::LlmClient;
|
||||
|
||||
/// The narration for a whole reel: a title and one line per segment, in order.
|
||||
/// The narration for a whole reel: a title and one line per beat, in order.
|
||||
#[derive(Debug, Clone, PartialEq)]
|
||||
pub struct ReelScript {
|
||||
pub title: String,
|
||||
@@ -26,33 +27,38 @@ pub struct ReelScript {
|
||||
const SYSTEM_PROMPT: &str = "You are narrating a personal memory reel — a short \
|
||||
slideshow of someone's own photos set to a spoken voiceover. Write warm, \
|
||||
specific, first-person narration as if the person is gently looking back on \
|
||||
their own memories. Be concrete and grounded in the details given; never \
|
||||
invent names, places, or events that aren't supported. Keep each line to one \
|
||||
or two short sentences that can be read aloud in a few seconds. Avoid generic \
|
||||
filler like \"what a wonderful day\" — if you have little to go on, simply \
|
||||
describe the moment plainly.";
|
||||
their own memories. Each line plays over one moment, which may be a quick burst \
|
||||
of several photos, so narrate the moment as a whole rather than a single frame. \
|
||||
Be concrete and grounded in the details given; never invent names, places, or \
|
||||
events that aren't supported. Keep each line to one or two short sentences that \
|
||||
can be read aloud in a few seconds. Avoid generic filler like \"what a \
|
||||
wonderful day\" — if you have little to go on, simply describe the moment \
|
||||
plainly.";
|
||||
|
||||
/// Build the (system, user) prompt pair for the scripter. The user message
|
||||
/// describes each segment in order and asks for strict JSON back.
|
||||
pub fn build_script_messages(meta: &ReelMeta, planned: &[PlannedSegment]) -> (String, String) {
|
||||
/// describes each beat in order and asks for strict JSON back.
|
||||
pub fn build_script_messages(meta: &ReelMeta, beats: &[PlannedBeat]) -> (String, String) {
|
||||
let mut user = String::new();
|
||||
user.push_str(&format!(
|
||||
"These are {} photos surfaced as memories {}.\n\n",
|
||||
planned.len(),
|
||||
"This reel has {} moments surfaced as memories {}.\n\n",
|
||||
beats.len(),
|
||||
meta.span_phrase()
|
||||
));
|
||||
if !meta.years.is_empty() {
|
||||
let years: Vec<String> = meta.years.iter().map(|y| y.to_string()).collect();
|
||||
user.push_str(&format!("They span the years: {}.\n\n", years.join(", ")));
|
||||
}
|
||||
user.push_str("Photos, in the order they will appear:\n");
|
||||
for (i, seg) in planned.iter().enumerate() {
|
||||
user.push_str("Moments, in the order they will appear:\n");
|
||||
for (i, beat) in beats.iter().enumerate() {
|
||||
user.push_str(&format!("\n[{}]", i + 1));
|
||||
if let Some(date) = seg.date_label() {
|
||||
if let Some(date) = beat.date_label() {
|
||||
user.push_str(&format!(" {date}"));
|
||||
}
|
||||
if beat.photos.len() > 1 {
|
||||
user.push_str(&format!(" (a burst of {} photos)", beat.photos.len()));
|
||||
}
|
||||
user.push('\n');
|
||||
match (&seg.insight_title, &seg.insight_summary) {
|
||||
match (&beat.insight_title, &beat.insight_summary) {
|
||||
(Some(t), Some(s)) if !s.trim().is_empty() => {
|
||||
user.push_str(&format!(" Known context: {t} — {s}\n"));
|
||||
}
|
||||
@@ -65,10 +71,10 @@ pub fn build_script_messages(meta: &ReelMeta, planned: &[PlannedSegment]) -> (St
|
||||
}
|
||||
user.push_str(&format!(
|
||||
"\nReturn ONLY a JSON object, no prose or code fences, shaped exactly:\n\
|
||||
{{\"title\": \"<short reel title>\", \"segments\": [\"<line for photo 1>\", \
|
||||
\"<line for photo 2>\", ... ]}}\n\
|
||||
The \"segments\" array MUST have exactly {} items, one per photo in order.",
|
||||
planned.len()
|
||||
{{\"title\": \"<short reel title>\", \"segments\": [\"<line for moment 1>\", \
|
||||
\"<line for moment 2>\", ... ]}}\n\
|
||||
The \"segments\" array MUST have exactly {} items, one per moment in order.",
|
||||
beats.len()
|
||||
));
|
||||
(SYSTEM_PROMPT.to_string(), user)
|
||||
}
|
||||
@@ -174,20 +180,20 @@ fn clean_text(s: &str) -> String {
|
||||
trimmed.split_whitespace().collect::<Vec<_>>().join(" ")
|
||||
}
|
||||
|
||||
/// Generate the reel script via the LLM. Text-only (no images) — the per-photo
|
||||
/// Generate the reel script via the LLM. Text-only (no images) — the per-beat
|
||||
/// context comes from cached insights. The call takes the GPU read lease
|
||||
/// internally (see `LlamaCppClient::generate`).
|
||||
pub async fn generate_script(
|
||||
client: &Arc<LlamaCppClient>,
|
||||
meta: &ReelMeta,
|
||||
planned: &[PlannedSegment],
|
||||
beats: &[PlannedBeat],
|
||||
) -> Result<ReelScript> {
|
||||
let (system, user) = build_script_messages(meta, planned);
|
||||
let (system, user) = build_script_messages(meta, beats);
|
||||
let raw = client
|
||||
.generate(&user, Some(&system), None)
|
||||
.await
|
||||
.context("LLM script generation failed")?;
|
||||
Ok(parse_script_response(&raw, planned.len()))
|
||||
Ok(parse_script_response(&raw, beats.len()))
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
@@ -202,13 +208,13 @@ mod tests {
|
||||
}
|
||||
}
|
||||
|
||||
fn planned(n: usize) -> Vec<PlannedSegment> {
|
||||
fn planned(n: usize) -> Vec<PlannedBeat> {
|
||||
(0..n)
|
||||
.map(|i| PlannedSegment {
|
||||
media: super::super::SegmentMedia::Photo {
|
||||
.map(|i| PlannedBeat {
|
||||
photos: vec![super::super::SegmentMedia::Photo {
|
||||
rel_path: format!("p{i}.jpg"),
|
||||
library_id: 1,
|
||||
},
|
||||
}],
|
||||
date: Some(1_560_000_000 + i as i64 * 86_400),
|
||||
insight_title: None,
|
||||
insight_summary: None,
|
||||
@@ -217,16 +223,37 @@ mod tests {
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn prompt_states_exact_segment_count_and_span() {
|
||||
fn prompt_states_exact_moment_count_and_span() {
|
||||
let (sys, user) = build_script_messages(&meta(), &planned(3));
|
||||
assert!(sys.contains("memory reel"));
|
||||
assert!(user.contains("3 photos"));
|
||||
assert!(user.contains("3 moments"));
|
||||
assert!(user.contains("on this day"));
|
||||
assert!(user.contains("exactly 3 items"));
|
||||
// Each photo gets an indexed entry.
|
||||
// Each moment gets an indexed entry.
|
||||
assert!(user.contains("[1]") && user.contains("[2]") && user.contains("[3]"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn prompt_notes_burst_photo_count() {
|
||||
let mut p = planned(1);
|
||||
p[0].photos = vec![
|
||||
super::super::SegmentMedia::Photo {
|
||||
rel_path: "a.jpg".into(),
|
||||
library_id: 1,
|
||||
},
|
||||
super::super::SegmentMedia::Photo {
|
||||
rel_path: "b.jpg".into(),
|
||||
library_id: 1,
|
||||
},
|
||||
super::super::SegmentMedia::Photo {
|
||||
rel_path: "c.jpg".into(),
|
||||
library_id: 1,
|
||||
},
|
||||
];
|
||||
let (_sys, user) = build_script_messages(&meta(), &p);
|
||||
assert!(user.contains("a burst of 3 photos"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn prompt_includes_insight_context_when_present() {
|
||||
let mut p = planned(1);
|
||||
|
||||
Reference in New Issue
Block a user