Add reconnectable async chat-turn flow with in-memory TurnRegistry

Replace the one-shot SSE chat stream with an async dispatch + reconnectable
replay flow so the mobile client survives backgrounding, network blips, and
OS-killed sockets without losing an in-flight agentic turn.

- TurnRegistry/TurnEntry: in-memory per-turn event buffer (cap 500, front
  eviction) shared by the agentic loop (writer) and SSE replay readers.
  ReplayOutcome + replay_from/next_batch distinguish Events/CaughtUp/Gone;
  next_batch registers the Notify before reading state (no lost wakeup) and
  drains every buffered event before signaling terminal, so the final
  Done/Error is never dropped and the stream closes cleanly.
- Endpoints: POST /insights/chat/turn (202 + turn_id), GET
  /insights/chat/turn/{id} (SSE replay, ?skip_before= resume, per-event seq,
  410 on eviction), DELETE /insights/chat/turn/{id} (real task abort +
  cooperative is_running() check at each loop boundary).
- Cancellation actually stops the task (AbortHandle stored on the entry) and
  emits a Done{cancelled:true}; callers skip persistence on cancel.
- Background sweeper drops stale turns; interval clamped to <=300s.
- OpenTelemetry spans: ai.chat.turn.execute/replay/cancel.
- Legacy POST /insights/chat/stream path preserved unchanged.

Tests: registry coverage for terminal delivery (race guard), waiting, Gone,
abort, eviction; handler integration tests for 404/410, skip_before, seq
stamping, completed replay, and cancel.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
Cameron Cordes
2026-05-29 19:50:25 -04:00
parent 0c1c1c6792
commit 962f7bf05c
8 changed files with 1946 additions and 17 deletions
+25
View File
@@ -197,6 +197,28 @@ fn main() -> std::io::Result<()> {
app_state.library_health.clone(),
);
// Periodically clean up stale turn entries from the in-memory
// registry. Runs at the same interval as the configured timeout,
// drops entries older than that timeout.
{
let registry = app_state.turn_registry.clone();
let timeout_secs = registry.timeout_secs();
tokio::spawn(async move {
// Sweep at most every 5 minutes, and never less often than the
// timeout itself — otherwise entries could linger up to ~2× the
// configured timeout before being reclaimed.
let interval_secs = timeout_secs.clamp(1, 300);
let interval = tokio::time::Duration::from_secs(interval_secs);
loop {
tokio::time::sleep(interval).await;
let cleaned = registry.cleanup_stale().await;
if cleaned > 0 {
log::info!("TurnRegistry: cleaned up {cleaned} stale entries");
}
}
});
}
// Spawn background job to generate daily conversation summaries
{
use crate::ai::generate_daily_summaries;
@@ -335,6 +357,9 @@ fn main() -> std::io::Result<()> {
.service(ai::chat_stream_handler)
.service(ai::chat_history_handler)
.service(ai::chat_rewind_handler)
.service(ai::turn_async_handler)
.service(ai::turn_replay_handler)
.service(ai::cancel_turn_handler)
.service(ai::rate_insight_handler)
.service(ai::export_training_data_handler)
.service(libraries::list_libraries)