15 KiB
AI Services Deployment & Testing Guide
Complete guide for deploying and testing the AI services integration in canvas-website with Netcup RS 8000 and RunPod.
🎯 Overview
This project integrates multiple AI services with smart routing:
Smart Routing Strategy:
- Text/Code (70-80% workload): Local Ollama on RS 8000 → FREE
- Images - Low Priority: Local Stable Diffusion on RS 8000 → FREE (slow ~60s)
- Images - High Priority: RunPod GPU (SDXL) → $0.02/image (fast ~5s)
- Video Generation: RunPod GPU (Wan2.1) → $0.50/video (30-90s)
Expected Cost Savings: $86-350/month compared to persistent GPU instances
📦 What's Included
AI Services:
-
✅ Text Generation (LLM)
- RunPod integration via
src/lib/runpodApi.ts - Enhanced LLM utilities in
src/utils/llmUtils.ts - AI Orchestrator client in
src/lib/aiOrchestrator.ts - Prompt shapes, arrow LLM actions, command palette
- RunPod integration via
-
✅ Image Generation
- ImageGenShapeUtil in
src/shapes/ImageGenShapeUtil.tsx - ImageGenTool in
src/tools/ImageGenTool.ts - Mock mode DISABLED (ready for production)
- Smart routing: low priority → local CPU, high priority → RunPod GPU
- ImageGenShapeUtil in
-
✅ Video Generation (NEW!)
- VideoGenShapeUtil in
src/shapes/VideoGenShapeUtil.tsx - VideoGenTool in
src/tools/VideoGenTool.ts - Wan2.1 I2V 14B 720p model on RunPod
- Always uses GPU (no local option)
- VideoGenShapeUtil in
-
✅ Voice Transcription
- WhisperX integration via
src/hooks/useWhisperTranscriptionSimple.ts - Automatic fallback to local Whisper model
- WhisperX integration via
🚀 Deployment Steps
Step 1: Deploy AI Orchestrator on Netcup RS 8000
Prerequisites:
- SSH access to Netcup RS 8000:
ssh netcup - Docker and Docker Compose installed
- RunPod API key
1.1 Create AI Orchestrator Directory:
ssh netcup << 'EOF'
mkdir -p /opt/ai-orchestrator/{services/{router,workers,monitor},configs,data/{redis,postgres,prometheus}}
cd /opt/ai-orchestrator
EOF
1.2 Copy Configuration Files:
From your local machine, copy the AI orchestrator files created in NETCUP_MIGRATION_PLAN.md:
# Copy docker-compose.yml
scp /path/to/docker-compose.yml netcup:/opt/ai-orchestrator/
# Copy service files
scp -r /path/to/services/* netcup:/opt/ai-orchestrator/services/
1.3 Configure Environment Variables:
ssh netcup "cat > /opt/ai-orchestrator/.env" << 'EOF'
# PostgreSQL
POSTGRES_PASSWORD=$(openssl rand -hex 16)
# RunPod API Keys
RUNPOD_API_KEY=your_runpod_api_key_here
RUNPOD_TEXT_ENDPOINT_ID=your_text_endpoint_id
RUNPOD_IMAGE_ENDPOINT_ID=your_image_endpoint_id
RUNPOD_VIDEO_ENDPOINT_ID=your_video_endpoint_id
# Grafana
GRAFANA_PASSWORD=$(openssl rand -hex 16)
# Monitoring
ALERT_EMAIL=your@email.com
COST_ALERT_THRESHOLD=100
EOF
1.4 Deploy the Stack:
ssh netcup << 'EOF'
cd /opt/ai-orchestrator
# Start all services
docker-compose up -d
# Check status
docker-compose ps
# View logs
docker-compose logs -f router
EOF
1.5 Verify Deployment:
# Check health endpoint
ssh netcup "curl http://localhost:8000/health"
# Check API documentation
ssh netcup "curl http://localhost:8000/docs"
# Check queue status
ssh netcup "curl http://localhost:8000/queue/status"
Step 2: Setup Local AI Models on RS 8000
2.1 Download Ollama Models:
ssh netcup << 'EOF'
# Download recommended models
docker exec ai-ollama ollama pull llama3:70b
docker exec ai-ollama ollama pull codellama:34b
docker exec ai-ollama ollama pull deepseek-coder:33b
docker exec ai-ollama ollama pull mistral:7b
# Verify
docker exec ai-ollama ollama list
# Test a model
docker exec ai-ollama ollama run llama3:70b "Hello, how are you?"
EOF
2.2 Download Stable Diffusion Models:
ssh netcup << 'EOF'
mkdir -p /data/models/stable-diffusion/sd-v2.1
cd /data/models/stable-diffusion/sd-v2.1
# Download SD 2.1 weights
wget https://huggingface.co/stabilityai/stable-diffusion-2-1/resolve/main/v2-1_768-ema-pruned.safetensors
# Verify
ls -lh v2-1_768-ema-pruned.safetensors
EOF
2.3 Download Wan2.1 Video Generation Model:
ssh netcup << 'EOF'
# Install huggingface-cli
pip install huggingface-hub
# Download Wan2.1 I2V 14B 720p
mkdir -p /data/models/video-generation
cd /data/models/video-generation
huggingface-cli download Wan-AI/Wan2.1-I2V-14B-720P \
--include "*.safetensors" \
--local-dir wan2.1_i2v_14b
# Check size (~28GB)
du -sh wan2.1_i2v_14b
EOF
Note: The Wan2.1 model will be deployed to RunPod, not run locally on CPU.
Step 3: Setup RunPod Endpoints
3.1 Create RunPod Serverless Endpoints:
Go to RunPod Serverless and create endpoints for:
-
Text Generation Endpoint (optional, fallback)
- Model: Any LLM (Llama, Mistral, etc.)
- GPU: Optional (we use local CPU primarily)
-
Image Generation Endpoint
- Model: SDXL or SD3
- GPU: A4000/A5000 (good price/performance)
- Expected cost: ~$0.02/image
-
Video Generation Endpoint
- Model: Wan2.1-I2V-14B-720P
- GPU: A100 or H100 (required for video)
- Expected cost: ~$0.50/video
3.2 Get Endpoint IDs:
For each endpoint, copy the endpoint ID from the URL or endpoint details.
Example: If URL is https://api.runpod.ai/v2/jqd16o7stu29vq/run, then jqd16o7stu29vq is your endpoint ID.
3.3 Update Environment Variables:
Update /opt/ai-orchestrator/.env with your endpoint IDs:
ssh netcup "nano /opt/ai-orchestrator/.env"
# Add your endpoint IDs:
RUNPOD_TEXT_ENDPOINT_ID=your_text_endpoint_id
RUNPOD_IMAGE_ENDPOINT_ID=your_image_endpoint_id
RUNPOD_VIDEO_ENDPOINT_ID=your_video_endpoint_id
# Restart services
cd /opt/ai-orchestrator && docker-compose restart
Step 4: Configure canvas-website
4.1 Create .env.local:
In your canvas-website directory:
cd /home/jeffe/Github/canvas-website-branch-worktrees/add-runpod-AI-API
cat > .env.local << 'EOF'
# AI Orchestrator (Primary - Netcup RS 8000)
VITE_AI_ORCHESTRATOR_URL=http://159.195.32.209:8000
# Or use domain when DNS is configured:
# VITE_AI_ORCHESTRATOR_URL=https://ai-api.jeffemmett.com
# RunPod API (Fallback/Direct Access)
VITE_RUNPOD_API_KEY=your_runpod_api_key_here
VITE_RUNPOD_TEXT_ENDPOINT_ID=your_text_endpoint_id
VITE_RUNPOD_IMAGE_ENDPOINT_ID=your_image_endpoint_id
VITE_RUNPOD_VIDEO_ENDPOINT_ID=your_video_endpoint_id
# Other existing vars...
VITE_GOOGLE_CLIENT_ID=your_google_client_id
VITE_GOOGLE_MAPS_API_KEY=your_google_maps_api_key
VITE_DAILY_DOMAIN=your_daily_domain
VITE_TLDRAW_WORKER_URL=your_worker_url
EOF
4.2 Install Dependencies:
npm install
4.3 Build and Start:
# Development
npm run dev
# Production build
npm run build
npm run start
Step 5: Register Video Generation Tool
You need to register the VideoGen shape and tool with tldraw. Find where shapes and tools are registered (likely in src/routes/Board.tsx or similar):
Add to shape utilities array:
import { VideoGenShapeUtil } from '@/shapes/VideoGenShapeUtil'
const shapeUtils = [
// ... existing shapes
VideoGenShapeUtil,
]
Add to tools array:
import { VideoGenTool } from '@/tools/VideoGenTool'
const tools = [
// ... existing tools
VideoGenTool,
]
🧪 Testing
Test 1: Verify AI Orchestrator
# Test health endpoint
curl http://159.195.32.209:8000/health
# Expected response:
# {"status":"healthy","timestamp":"2025-11-25T12:00:00.000Z"}
# Test text generation
curl -X POST http://159.195.32.209:8000/generate/text \
-H "Content-Type: application/json" \
-d '{
"prompt": "Write a hello world program in Python",
"priority": "normal"
}'
# Expected response:
# {"job_id":"abc123","status":"queued","message":"Job queued on local provider"}
# Check job status
curl http://159.195.32.209:8000/job/abc123
# Check queue status
curl http://159.195.32.209:8000/queue/status
# Check costs
curl http://159.195.32.209:8000/costs/summary
Test 2: Test Text Generation in Canvas
- Open canvas-website in browser
- Open browser console (F12)
- Look for log messages:
✅ AI Orchestrator is available at http://159.195.32.209:8000
- Create a Prompt shape or use arrow LLM action
- Enter a prompt and submit
- Verify response appears
- Check console for routing info:
- Should see
Using local Ollama (FREE)
- Should see
Test 3: Test Image Generation
Low Priority (Local CPU - FREE):
- Use ImageGen tool from toolbar
- Click on canvas to create ImageGen shape
- Enter prompt: "A beautiful mountain landscape"
- Select priority: "Low"
- Click "Generate"
- Wait 30-60 seconds
- Verify image appears
- Check console: Should show
Using local Stable Diffusion CPU
High Priority (RunPod GPU - $0.02):
- Create new ImageGen shape
- Enter prompt: "A futuristic city at sunset"
- Select priority: "High"
- Click "Generate"
- Wait 5-10 seconds
- Verify image appears
- Check console: Should show
Using RunPod SDXL - Check cost: Should show
~$0.02
Test 4: Test Video Generation
- Use VideoGen tool from toolbar
- Click on canvas to create VideoGen shape
- Enter prompt: "A cat walking through a garden"
- Set duration: 3 seconds
- Click "Generate"
- Wait 30-90 seconds
- Verify video appears and plays
- Check console: Should show
Using RunPod Wan2.1 - Check cost: Should show
~$0.50 - Test download button
Test 5: Test Voice Transcription
- Use Transcription tool from toolbar
- Click to create Transcription shape
- Click "Start Recording"
- Speak into microphone
- Click "Stop Recording"
- Verify transcription appears
- Check if using RunPod or local Whisper
Test 6: Monitor Costs and Performance
Access monitoring dashboards:
# API Documentation
http://159.195.32.209:8000/docs
# Queue Status
http://159.195.32.209:8000/queue/status
# Cost Tracking
http://159.195.32.209:3000/api/costs/summary
# Grafana Dashboard
http://159.195.32.209:3001
# Default login: admin / admin (change this!)
Check daily costs:
curl http://159.195.32.209:3000/api/costs/summary
Expected response:
{
"today": {
"local": 0.00,
"runpod": 2.45,
"total": 2.45
},
"this_month": {
"local": 0.00,
"runpod": 45.20,
"total": 45.20
},
"breakdown": {
"text": 0.00,
"image": 12.50,
"video": 32.70,
"code": 0.00
}
}
🐛 Troubleshooting
Issue: AI Orchestrator not available
Symptoms:
- Console shows:
⚠️ AI Orchestrator configured but not responding - Health check fails
Solutions:
# 1. Check if services are running
ssh netcup "cd /opt/ai-orchestrator && docker-compose ps"
# 2. Check logs
ssh netcup "cd /opt/ai-orchestrator && docker-compose logs -f router"
# 3. Restart services
ssh netcup "cd /opt/ai-orchestrator && docker-compose restart"
# 4. Check firewall
ssh netcup "sudo ufw status"
ssh netcup "sudo ufw allow 8000/tcp"
Issue: Image generation fails with "No output found"
Symptoms:
- Job completes but no image URL returned
- Error:
Job completed but no output data found
Solutions:
- Check RunPod endpoint configuration
- Verify endpoint handler returns correct format:
{"output": {"image": "base64_or_url"}} - Check endpoint logs in RunPod console
- Test endpoint directly with curl
Issue: Video generation timeout
Symptoms:
- Job stuck in "processing" state
- Timeout after 120 attempts
Solutions:
- Video generation takes 30-90 seconds, ensure patience
- Check RunPod GPU availability (might be cold start)
- Increase timeout in VideoGenShapeUtil if needed
- Check RunPod endpoint logs for errors
Issue: High costs
Symptoms:
- Monthly costs exceed budget
- Too many RunPod requests
Solutions:
# 1. Check cost breakdown
curl http://159.195.32.209:3000/api/costs/summary
# 2. Review routing decisions
curl http://159.195.32.209:8000/queue/status
# 3. Adjust routing thresholds
# Edit router configuration to prefer local more
ssh netcup "nano /opt/ai-orchestrator/services/router/main.py"
# 4. Set cost alerts
ssh netcup "nano /opt/ai-orchestrator/.env"
# COST_ALERT_THRESHOLD=50 # Alert if daily cost > $50
Issue: Local models slow or failing
Symptoms:
- Text generation slow (>30s)
- Image generation very slow (>2min)
- Out of memory errors
Solutions:
# 1. Check system resources
ssh netcup "htop"
ssh netcup "free -h"
# 2. Reduce model size
ssh netcup << 'EOF'
# Use smaller models
docker exec ai-ollama ollama pull llama3:8b # Instead of 70b
docker exec ai-ollama ollama pull mistral:7b # Lighter model
EOF
# 3. Limit concurrent workers
ssh netcup "nano /opt/ai-orchestrator/docker-compose.yml"
# Reduce worker replicas if needed
# 4. Increase swap (if low RAM)
ssh netcup "sudo fallocate -l 8G /swapfile"
ssh netcup "sudo chmod 600 /swapfile"
ssh netcup "sudo mkswap /swapfile"
ssh netcup "sudo swapon /swapfile"
📊 Performance Expectations
Text Generation:
- Local (Llama3-70b): 2-10 seconds
- Local (Mistral-7b): 1-3 seconds
- RunPod (fallback): 3-8 seconds
- Cost: $0.00 (local) or $0.001-0.01 (RunPod)
Image Generation:
- Local SD CPU (low priority): 30-60 seconds
- RunPod GPU (high priority): 3-10 seconds
- Cost: $0.00 (local) or $0.02 (RunPod)
Video Generation:
- RunPod Wan2.1: 30-90 seconds
- Cost: ~$0.50 per video
Expected Monthly Costs:
Light Usage (100 requests/day):
- 70 text (local): $0
- 20 images (15 local + 5 RunPod): $0.10
- 10 videos: $5.00
- Total: ~$5-10/month
Medium Usage (500 requests/day):
- 350 text (local): $0
- 100 images (60 local + 40 RunPod): $0.80
- 50 videos: $25.00
- Total: ~$25-35/month
Heavy Usage (2000 requests/day):
- 1400 text (local): $0
- 400 images (200 local + 200 RunPod): $4.00
- 200 videos: $100.00
- Total: ~$100-120/month
Compare to persistent GPU pod: $200-300/month regardless of usage!
🎯 Next Steps
- ✅ Deploy AI Orchestrator on Netcup RS 8000
- ✅ Setup local AI models (Ollama, SD)
- ✅ Configure RunPod endpoints
- ✅ Test all AI services
- 📋 Setup monitoring and alerts
- 📋 Configure DNS for ai-api.jeffemmett.com
- 📋 Setup SSL with Let's Encrypt
- 📋 Migrate canvas-website to Netcup
- 📋 Monitor costs and optimize routing
- 📋 Decommission DigitalOcean droplets
📚 Additional Resources
- Migration Plan: See
NETCUP_MIGRATION_PLAN.md - RunPod Setup: See
RUNPOD_SETUP.md - Test Guide: See
TEST_RUNPOD_AI.md - API Documentation: http://159.195.32.209:8000/docs
- Monitoring: http://159.195.32.209:3001 (Grafana)
💡 Tips for Cost Optimization
- Prefer low priority for batch jobs: Use
priority: "low"for non-urgent tasks - Use local models first: 70-80% of workload can run locally for $0
- Monitor queue depth: Auto-scales to RunPod when local is backed up
- Set cost alerts: Get notified if daily costs exceed threshold
- Review cost breakdown weekly: Identify optimization opportunities
- Batch similar requests: Process multiple items together
- Cache results: Store and reuse common queries
Ready to deploy? Start with Step 1 and follow the guide! 🚀