feat(ai): streaming chat endpoint with live tool events

Add LlmClient::chat_with_tools_stream and SSE endpoint POST /insights/chat/stream that emits text deltas, tool_call / tool_result pairs, truncated notice, and a terminal done frame as the agentic loop runs. - Ollama: parses NDJSON from /api/chat stream, accumulates content deltas, emits Done with tool_calls from the final chunk. - OpenRouter: parses OpenAI-compatible SSE, reassembles tool_call argument deltas by index, asks for stream_options.include_usage. - InsightChatService spawns the loop on a tokio task, feeds events through an mpsc channel, persists training_messages at the end. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 16:57:41 -04:00
parent c2bd3c08e1
commit 079cd4c5b9
9 changed files with 1071 additions and 9 deletions
--- a/src/ai/mod.rs
+++ b/src/ai/mod.rs
@@ -11,10 +11,10 @@ pub mod sms_client;
 #[allow(unused_imports)]
 pub use daily_summary_job::{generate_daily_summaries, strip_summary_boilerplate};
 pub use handlers::{
-    chat_history_handler, chat_rewind_handler, chat_turn_handler, delete_insight_handler,
-    export_training_data_handler, generate_agentic_insight_handler, generate_insight_handler,
-    get_all_insights_handler, get_available_models_handler, get_insight_handler,
-    get_openrouter_models_handler, rate_insight_handler,
+    chat_history_handler, chat_rewind_handler, chat_stream_handler, chat_turn_handler,
+    delete_insight_handler, export_training_data_handler, generate_agentic_insight_handler,
+    generate_insight_handler, get_all_insights_handler, get_available_models_handler,
+    get_insight_handler, get_openrouter_models_handler, rate_insight_handler,
 };
 pub use insight_generator::InsightGenerator;
 #[allow(unused_imports)]