feat(ai): chat rewind + ollama metrics logging

Rewind: POST /insights/chat/rewind truncates training_messages at a
given rendered index, dropping the target message plus any preceding
tool-call scaffolding. The initial user prompt is protected.

Metrics: log prompt_eval_count/duration and eval_count/duration from
every Ollama chat response, rendered as tokens + ms + tok/s.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Cameron
2026-04-21 15:16:32 -04:00
parent 0b9528f61e
commit 65ab10e9a8
5 changed files with 270 additions and 4 deletions

View File

@@ -1358,6 +1358,7 @@ fn main() -> std::io::Result<()> {
.service(ai::get_openrouter_models_handler)
.service(ai::chat_turn_handler)
.service(ai::chat_history_handler)
.service(ai::chat_rewind_handler)
.service(ai::rate_insight_handler)
.service(ai::export_training_data_handler)
.service(libraries::list_libraries)