ai: add llamacpp backend (llama-swap) as third LLM client
Wires a new LlamaCppClient (OpenAI-compatible /v1 wire format) alongside OllamaClient and OpenRouterClient. Per-slot routing for chat/vision/embed via env (LLAMA_SWAP_URL + *_MODEL vars); capability inference uses an env allowlist since /v1/models doesn't report modality. InsightGenerator + InsightChatService gain three-way dispatch on chat_backend = "local" | "hybrid" | "llamacpp". Hybrid and llamacpp share the describe-then-inline path (text-only chat after a separate vision describe). HYBRID_VISION_BACKEND=llamacpp lets hybrid route its describe pass through llama-swap's vision slot while chat still goes to OpenRouter. Cross-replay matrix added (validate_cross_replay): local<->llamacpp and hybrid<->llamacpp allowed; local->hybrid and llamacpp->hybrid rejected. New /insights/llamacpp/models handler mirrors the OpenRouter shape.
This commit is contained in:
+3
-1
@@ -5,6 +5,7 @@ pub mod face_client;
|
||||
pub mod handlers;
|
||||
pub mod insight_chat;
|
||||
pub mod insight_generator;
|
||||
pub mod llamacpp;
|
||||
pub mod llm_client;
|
||||
pub mod ollama;
|
||||
pub mod openrouter;
|
||||
@@ -20,7 +21,8 @@ pub use handlers::{
|
||||
chat_history_handler, chat_rewind_handler, chat_stream_handler, chat_turn_handler,
|
||||
delete_insight_handler, export_training_data_handler, generate_agentic_insight_handler,
|
||||
generate_insight_handler, get_all_insights_handler, get_available_models_handler,
|
||||
get_insight_handler, get_openrouter_models_handler, rate_insight_handler,
|
||||
get_insight_handler, get_llamacpp_models_handler, get_openrouter_models_handler,
|
||||
rate_insight_handler,
|
||||
};
|
||||
pub use insight_generator::InsightGenerator;
|
||||
#[allow(unused_imports)]
|
||||
|
||||
Reference in New Issue
Block a user