ai: add llamacpp backend (llama-swap) as third LLM client
Wires a new LlamaCppClient (OpenAI-compatible /v1 wire format) alongside OllamaClient and OpenRouterClient. Per-slot routing for chat/vision/embed via env (LLAMA_SWAP_URL + *_MODEL vars); capability inference uses an env allowlist since /v1/models doesn't report modality. InsightGenerator + InsightChatService gain three-way dispatch on chat_backend = "local" | "hybrid" | "llamacpp". Hybrid and llamacpp share the describe-then-inline path (text-only chat after a separate vision describe). HYBRID_VISION_BACKEND=llamacpp lets hybrid route its describe pass through llama-swap's vision slot while chat still goes to OpenRouter. Cross-replay matrix added (validate_cross_replay): local<->llamacpp and hybrid<->llamacpp allowed; local->hybrid and llamacpp->hybrid rejected. New /insights/llamacpp/models handler mirrors the OpenRouter shape.
This commit is contained in:
@@ -549,6 +549,36 @@ pub async fn get_openrouter_models_handler(
|
||||
HttpResponse::Ok().json(response)
|
||||
}
|
||||
|
||||
#[derive(serde::Serialize)]
|
||||
pub struct LlamaCppModelsResponse {
|
||||
pub models: Vec<String>,
|
||||
pub default_model: Option<String>,
|
||||
pub configured: bool,
|
||||
}
|
||||
|
||||
/// GET /insights/llamacpp/models - Curated llama-swap model ids exposed
|
||||
/// to clients for the llamacpp backend. Returned verbatim from
|
||||
/// `LLAMA_SWAP_ALLOWED_MODELS`; no live call to llama-swap. Use
|
||||
/// `LLAMA_SWAP_URL` plus `LLAMA_SWAP_PRIMARY_MODEL` on the server side to
|
||||
/// pick the actual chat slot.
|
||||
#[get("/insights/llamacpp/models")]
|
||||
pub async fn get_llamacpp_models_handler(
|
||||
_claims: Claims,
|
||||
app_state: web::Data<crate::state::AppState>,
|
||||
) -> impl Responder {
|
||||
let configured = app_state.llamacpp.is_some();
|
||||
let default_model = app_state
|
||||
.llamacpp
|
||||
.as_ref()
|
||||
.map(|c| c.primary_model.clone());
|
||||
let response = LlamaCppModelsResponse {
|
||||
models: app_state.llamacpp_allowed_models.clone(),
|
||||
default_model,
|
||||
configured,
|
||||
};
|
||||
HttpResponse::Ok().json(response)
|
||||
}
|
||||
|
||||
/// POST /insights/rate - Rate an insight (thumbs up/down for training data)
|
||||
#[post("/insights/rate")]
|
||||
pub async fn rate_insight_handler(
|
||||
|
||||
Reference in New Issue
Block a user