AI Orchestrator - Smart routing between Ollama (free) and RunPod (GPU)
Go to file
Jeff Emmett cc2d06bbfb fix: use sampling_params for RunPod vLLM max_tokens + add API key env
vLLM ignores top-level max_tokens, only reads from sampling_params.
Also adds RUNPOD_API_KEY to compose for explicit env injection.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-23 10:15:12 -07:00
.dockerignore Add .dockerignore for optimized Docker builds 2026-02-21 17:59:08 -07:00
.env.example Initial commit: AI Orchestrator with Ollama + RunPod smart routing 2025-11-26 19:11:58 -08:00
.gitignore Initial commit: AI Orchestrator with Ollama + RunPod smart routing 2025-11-26 19:11:58 -08:00
Dockerfile Wire Infisical secret injection for RUNPOD_API_KEY 2026-02-24 09:32:34 -08:00
docker-compose.yml fix: use sampling_params for RunPod vLLM max_tokens + add API key env 2026-03-23 10:15:12 -07:00
entrypoint.sh Wire Infisical secret injection for RUNPOD_API_KEY 2026-02-24 09:32:34 -08:00
requirements.txt Initial commit: AI Orchestrator with Ollama + RunPod smart routing 2025-11-26 19:11:58 -08:00
server.py fix: use sampling_params for RunPod vLLM max_tokens + add API key env 2026-03-23 10:15:12 -07:00
test_api.py Initial commit: AI Orchestrator with Ollama + RunPod smart routing 2025-11-26 19:11:58 -08:00