feat(ai): chat rewind + ollama metrics logging

Rewind: POST /insights/chat/rewind truncates training_messages at a given rendered index, dropping the target message plus any preceding tool-call scaffolding. The initial user prompt is protected. Metrics: log prompt_eval_count/duration and eval_count/duration from every Ollama chat response, rendered as tokens + ms + tok/s. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 15:16:32 -04:00
parent 0b9528f61e
commit 65ab10e9a8
5 changed files with 270 additions and 4 deletions
--- a/src/main.rs
+++ b/src/main.rs
@@ -1358,6 +1358,7 @@ fn main() -> std::io::Result<()> {
                .service(ai::get_openrouter_models_handler)
                .service(ai::chat_turn_handler)
                .service(ai::chat_history_handler)
+                .service(ai::chat_rewind_handler)
                .service(ai::rate_insight_handler)
                .service(ai::export_training_data_handler)
                .service(libraries::list_libraries)