ai: add llamacpp backend (llama-swap) as third LLM client

Wires a new LlamaCppClient (OpenAI-compatible /v1 wire format) alongside
OllamaClient and OpenRouterClient. Per-slot routing for chat/vision/embed
via env (LLAMA_SWAP_URL + *_MODEL vars); capability inference uses an
env allowlist since /v1/models doesn't report modality.

InsightGenerator + InsightChatService gain three-way dispatch on
chat_backend = "local" | "hybrid" | "llamacpp". Hybrid and llamacpp
share the describe-then-inline path (text-only chat after a separate
vision describe). HYBRID_VISION_BACKEND=llamacpp lets hybrid route its
describe pass through llama-swap's vision slot while chat still goes
to OpenRouter.

Cross-replay matrix added (validate_cross_replay): local<->llamacpp
and hybrid<->llamacpp allowed; local->hybrid and llamacpp->hybrid
rejected. New /insights/llamacpp/models handler mirrors the OpenRouter
shape.
This commit is contained in:
Cameron Cordes
2026-05-20 17:52:33 -04:00
parent d04b86e32c
commit f0927f5355
9 changed files with 1468 additions and 102 deletions

View File

@@ -549,6 +549,36 @@ pub async fn get_openrouter_models_handler(
HttpResponse::Ok().json(response)
}
#[derive(serde::Serialize)]
pub struct LlamaCppModelsResponse {
pub models: Vec<String>,
pub default_model: Option<String>,
pub configured: bool,
}
/// GET /insights/llamacpp/models - Curated llama-swap model ids exposed
/// to clients for the llamacpp backend. Returned verbatim from
/// `LLAMA_SWAP_ALLOWED_MODELS`; no live call to llama-swap. Use
/// `LLAMA_SWAP_URL` plus `LLAMA_SWAP_PRIMARY_MODEL` on the server side to
/// pick the actual chat slot.
#[get("/insights/llamacpp/models")]
pub async fn get_llamacpp_models_handler(
_claims: Claims,
app_state: web::Data<crate::state::AppState>,
) -> impl Responder {
let configured = app_state.llamacpp.is_some();
let default_model = app_state
.llamacpp
.as_ref()
.map(|c| c.primary_model.clone());
let response = LlamaCppModelsResponse {
models: app_state.llamacpp_allowed_models.clone(),
default_model,
configured,
};
HttpResponse::Ok().json(response)
}
/// POST /insights/rate - Rate an insight (thumbs up/down for training data)
#[post("/insights/rate")]
pub async fn rate_insight_handler(