Add Insights Model Discovery and Fallback Handling

2026-01-03 20:27:34 -05:00
parent 1171f19845
commit cf52d4ab76
10 changed files with 419 additions and 80 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -250,8 +250,31 @@ Optional:
 WATCH_QUICK_INTERVAL_SECONDS=60        # Quick scan interval
 WATCH_FULL_INTERVAL_SECONDS=3600       # Full scan interval
 OTLP_OTLS_ENDPOINT=http://...          # OpenTelemetry collector (release builds)
+
+# AI Insights Configuration
+OLLAMA_PRIMARY_URL=http://desktop:11434        # Primary Ollama server (e.g., desktop)
+OLLAMA_FALLBACK_URL=http://server:11434        # Fallback Ollama server (optional, always-on)
+OLLAMA_PRIMARY_MODEL=nemotron-3-nano:30b       # Model for primary server (default: nemotron-3-nano:30b)
+OLLAMA_FALLBACK_MODEL=llama3.2:3b              # Model for fallback server (optional, uses primary if not set)
+SMS_API_URL=http://localhost:8000              # SMS message API endpoint (default: localhost:8000)
+SMS_API_TOKEN=your-api-token                   # SMS API authentication token (optional)
 ```

+**AI Insights Fallback Behavior:**
+- Primary server is tried first with its configured model (5-second connection timeout)
+- On connection failure, automatically falls back to secondary server with its model (if configured)
+- If `OLLAMA_FALLBACK_MODEL` not set, uses same model as primary server on fallback
+- Total request timeout is 120 seconds to accommodate slow LLM inference
+- Logs indicate which server and model was used (info level) and failover attempts (warn level)
+- Backwards compatible: `OLLAMA_URL` and `OLLAMA_MODEL` still supported as fallbacks
+
+**Model Discovery:**
+The `OllamaClient` provides methods to query available models:
+- `OllamaClient::list_models(url)` - Returns list of all models on a server
+- `OllamaClient::is_model_available(url, model_name)` - Checks if a specific model exists
+
+This allows runtime verification of model availability before generating insights.
+
 ## Dependencies of Note

 - **actix-web**: HTTP framework