CPU-based Ollama inference on Netcup is too slow due to server memory
pressure. Add OpenAI-compatible API support so we can use Gemini Flash
or other cloud APIs for clip analysis. Also increase transcript sample
size to 20K chars since cloud APIs handle it easily.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
47-minute videos produce ~48K chars of transcript which takes
>10 minutes for llama3.1:8b on CPU to process.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>