Fixes three issues preventing summary generation via RunPod GPU:
- Extract text from vLLM's nested choices/tokens output format
- Strip LLM preamble text before JSON parsing
- Serialize lists to JSON strings for asyncpg jsonb columns
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Increase Ollama timeout from hardcoded 120s to configurable 600s default.
Add optional AI Orchestrator integration for RunPod GPU acceleration with
automatic fallback to direct Ollama when orchestrator is unavailable.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>