Apps/ImageApi

feature/llamacpp-backend #101

Merged

cameron merged 11 commits from feature/llamacpp-backend into master

2026-05-26 18:58:48 +00:00

Author	SHA1	Message	Date
Cameron Cordes	b03ee60342	fix: prevent hybrid mode from leaking OpenRouter model to local llamacpp client When backend=hybrid with LLM_BACKEND=llamacpp, the user-selected model (an OpenRouter id like "google/gemini-3-flash-preview") was being applied to the local LlamaCppClient's primary_model and vision_model. This caused describe_image to send the OpenRouter model name to llama-swap, which returned 400 because it has no such slot. Guard the local-client model override with !is_hybrid so it only applies in local-only mode (where the user is selecting a different local model). Bump to v1.2.0. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-26 09:55:16 -04:00
Cameron Cordes	0a627f4880	Add contact name filter to SMS search tool + misc improvements - sms search tool: accept contact name, trim/validate, skip when contact_id is set, pass to API client - sms_client: new contact field in SmsSearchParams, URL-encode on wire - Tool description clarifies contact_id takes precedence when both given - Add parse_title_body helper for LLM response parsing - llamacpp backend improvements	2026-05-25 21:46:18 -04:00
Cameron Cordes	b9175e2718	image: add xlarge (4096px) on-demand preview tier New `PhotoSize::XLarge` variant sits between `Large` (2048px) and `Full` (original). On-demand generated and disk-cached at `_xlarge/<hash>.jpg`, same waterfall as `Large` (embedded RAW preview → ffmpeg → image crate). Sources below 4096px serve at native size. Reduces decoded bitmap memory from ~192MB (48MP full) to ~64MB for the mobile viewer's zoom tier. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 15:33:03 -04:00
Cameron Cordes	9dba659d1e	test: add llamacpp model-slot consistency and content-null tests Cover the properties that prevent mid-turn model swaps in llama-swap exclusive mode: vision_model defaults to primary, cloned local client mirrors the user-selected model, embeddings stay on their own slot. Also test the content:null serialization for tool-calling messages. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-24 19:29:51 -04:00
Cameron Cordes	208344ad98	ai: mirror chat model on local client to prevent mid-turn model swap When the user selects a model from the picker, the local client's primary_model and vision_model now match the chat model. Prevents llama-swap exclusive mode from swapping models when describe_photo or rerank fires during an agentic turn. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-24 19:27:29 -04:00
Cameron Cordes	fb388c29d7	docs: update env + CLAUDE.md for direct-vision llamacpp + ResolvedBackend llamacpp models now receive images directly instead of describe-then-inline. LLAMA_SWAP_VISION_MODEL defaults to the primary model. Document the ResolvedBackend dispatch pattern. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-24 15:03:12 -04:00
Cameron Cordes	a8a661f70a	ai: extract ResolvedBackend, remove ~480 lines of duplicated dispatch Replace 5 copies of the ~80-line backend resolution pattern with a single InsightGenerator::resolve_backend() builder that returns a ResolvedBackend (chat + local clients, BackendKind enum, images_inline flag). Tool dispatch now takes &ResolvedBackend instead of &OllamaClient + model + backend strings. Remove duplicated ollama/openrouter/llamacpp fields from InsightChatService — InsightGenerator owns them and resolve_backend uses them. Delete build_chat_clients (replaced by resolve_backend). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-24 15:00:50 -04:00
Cameron Cordes	0631820fbf	ai: send images directly to llamacpp chat models + add ResolvedBackend llamacpp models now receive images via OpenAI content-parts instead of the describe-then-inline strategy (hybrid mode unchanged). Fixes assistant messages with tool_calls emitting content: null instead of "" to satisfy strict Jinja template role-alternation checks. Adds debug logging of message role sequences on llamacpp requests. Introduces BackendKind enum, SamplingOverrides, and ResolvedBackend in a new backend.rs module. InsightGenerator::resolve_backend centralises client construction + vision capability detection — next step wires the existing inline dispatch through it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-24 14:00:37 -04:00
Cameron Cordes	be51421b38	ai: collapse llamacpp into LLM_BACKEND env switch Reverts the per-request backend="llamacpp" value. Chat/vision/embedding backend is now a deploy-time decision (LLM_BACKEND=ollama\|llamacpp), applied globally across chat, vision describe, and embeddings — so embedding vectors stay in one space across the index. - Per-request backend whitelist back to "local"\|"hybrid". A request arriving with backend="llamacpp" is rejected. - LLM_BACKEND=llamacpp swaps the entire local stack to llama-swap: chat hits the chat slot, describe hits the vision slot, embeddings hit the embed slot. Hybrid mode still routes chat to OpenRouter but uses LLM_BACKEND for the describe pass. - Drops env vars HYBRID_VISION_BACKEND, LLAMA_SWAP_VISION_MODELS, EMBEDDING_BACKEND (the last never shipped). Drops the LlamaCppClient.vision_models allowlist — capability inference now reports has_vision only for the configured vision_model slot. - Drops the /insights/llamacpp/models handler. /insights/models is the single endpoint; returns Ollama servers under LLM_BACKEND=ollama and llama-swap slots (from LLAMA_SWAP_ALLOWED_MODELS) under LLM_BACKEND=llamacpp. Same envelope shape either way. - New ai::embed_one helper routes embeddings through llama-swap when LLM_BACKEND=llamacpp (else Ollama). Wires it into the four insight_generator embedding sites. - Cross-replay matrix simplifies to pre-llamacpp shape (local↔local, hybrid↔hybrid, hybrid→local allowed; local→hybrid rejected).	2026-05-21 11:36:58 -04:00
Cameron Cordes	d14df63f19	env.example: document LLAMA_SWAP_* + HYBRID_VISION_BACKEND vars Mirrors the section added to CLAUDE.md so deploys can opt into the llamacpp backend from the template alone.	2026-05-20 17:54:08 -04:00
Cameron Cordes	f0927f5355	ai: add llamacpp backend (llama-swap) as third LLM client Wires a new LlamaCppClient (OpenAI-compatible /v1 wire format) alongside OllamaClient and OpenRouterClient. Per-slot routing for chat/vision/embed via env (LLAMA_SWAP_URL + *_MODEL vars); capability inference uses an env allowlist since /v1/models doesn't report modality. InsightGenerator + InsightChatService gain three-way dispatch on chat_backend = "local" \| "hybrid" \| "llamacpp". Hybrid and llamacpp share the describe-then-inline path (text-only chat after a separate vision describe). HYBRID_VISION_BACKEND=llamacpp lets hybrid route its describe pass through llama-swap's vision slot while chat still goes to OpenRouter. Cross-replay matrix added (validate_cross_replay): local<->llamacpp and hybrid<->llamacpp allowed; local->hybrid and llamacpp->hybrid rejected. New /insights/llamacpp/models handler mirrors the OpenRouter shape.	2026-05-20 17:52:33 -04:00