canvas-website/AI_SERVICES_DEPLOYMENT_GUID...

15 KiB

AI Services Deployment & Testing Guide

Complete guide for deploying and testing the AI services integration in canvas-website with Netcup RS 8000 and RunPod.


🎯 Overview

This project integrates multiple AI services with smart routing:

Smart Routing Strategy:

  • Text/Code (70-80% workload): Local Ollama on RS 8000 → FREE
  • Images - Low Priority: Local Stable Diffusion on RS 8000 → FREE (slow ~60s)
  • Images - High Priority: RunPod GPU (SDXL) → $0.02/image (fast ~5s)
  • Video Generation: RunPod GPU (Wan2.1) → $0.50/video (30-90s)

Expected Cost Savings: $86-350/month compared to persistent GPU instances


📦 What's Included

AI Services:

  1. Text Generation (LLM)

    • RunPod integration via src/lib/runpodApi.ts
    • Enhanced LLM utilities in src/utils/llmUtils.ts
    • AI Orchestrator client in src/lib/aiOrchestrator.ts
    • Prompt shapes, arrow LLM actions, command palette
  2. Image Generation

    • ImageGenShapeUtil in src/shapes/ImageGenShapeUtil.tsx
    • ImageGenTool in src/tools/ImageGenTool.ts
    • Mock mode DISABLED (ready for production)
    • Smart routing: low priority → local CPU, high priority → RunPod GPU
  3. Video Generation (NEW!)

    • VideoGenShapeUtil in src/shapes/VideoGenShapeUtil.tsx
    • VideoGenTool in src/tools/VideoGenTool.ts
    • Wan2.1 I2V 14B 720p model on RunPod
    • Always uses GPU (no local option)
  4. Voice Transcription

    • WhisperX integration via src/hooks/useWhisperTranscriptionSimple.ts
    • Automatic fallback to local Whisper model

🚀 Deployment Steps

Step 1: Deploy AI Orchestrator on Netcup RS 8000

Prerequisites:

  • SSH access to Netcup RS 8000: ssh netcup
  • Docker and Docker Compose installed
  • RunPod API key

1.1 Create AI Orchestrator Directory:

ssh netcup << 'EOF'
mkdir -p /opt/ai-orchestrator/{services/{router,workers,monitor},configs,data/{redis,postgres,prometheus}}
cd /opt/ai-orchestrator
EOF

1.2 Copy Configuration Files:

From your local machine, copy the AI orchestrator files created in NETCUP_MIGRATION_PLAN.md:

# Copy docker-compose.yml
scp /path/to/docker-compose.yml netcup:/opt/ai-orchestrator/

# Copy service files
scp -r /path/to/services/* netcup:/opt/ai-orchestrator/services/

1.3 Configure Environment Variables:

ssh netcup "cat > /opt/ai-orchestrator/.env" << 'EOF'
# PostgreSQL
POSTGRES_PASSWORD=$(openssl rand -hex 16)

# RunPod API Keys
RUNPOD_API_KEY=your_runpod_api_key_here
RUNPOD_TEXT_ENDPOINT_ID=your_text_endpoint_id
RUNPOD_IMAGE_ENDPOINT_ID=your_image_endpoint_id
RUNPOD_VIDEO_ENDPOINT_ID=your_video_endpoint_id

# Grafana
GRAFANA_PASSWORD=$(openssl rand -hex 16)

# Monitoring
ALERT_EMAIL=your@email.com
COST_ALERT_THRESHOLD=100
EOF

1.4 Deploy the Stack:

ssh netcup << 'EOF'
cd /opt/ai-orchestrator

# Start all services
docker-compose up -d

# Check status
docker-compose ps

# View logs
docker-compose logs -f router
EOF

1.5 Verify Deployment:

# Check health endpoint
ssh netcup "curl http://localhost:8000/health"

# Check API documentation
ssh netcup "curl http://localhost:8000/docs"

# Check queue status
ssh netcup "curl http://localhost:8000/queue/status"

Step 2: Setup Local AI Models on RS 8000

2.1 Download Ollama Models:

ssh netcup << 'EOF'
# Download recommended models
docker exec ai-ollama ollama pull llama3:70b
docker exec ai-ollama ollama pull codellama:34b
docker exec ai-ollama ollama pull deepseek-coder:33b
docker exec ai-ollama ollama pull mistral:7b

# Verify
docker exec ai-ollama ollama list

# Test a model
docker exec ai-ollama ollama run llama3:70b "Hello, how are you?"
EOF

2.2 Download Stable Diffusion Models:

ssh netcup << 'EOF'
mkdir -p /data/models/stable-diffusion/sd-v2.1
cd /data/models/stable-diffusion/sd-v2.1

# Download SD 2.1 weights
wget https://huggingface.co/stabilityai/stable-diffusion-2-1/resolve/main/v2-1_768-ema-pruned.safetensors

# Verify
ls -lh v2-1_768-ema-pruned.safetensors
EOF

2.3 Download Wan2.1 Video Generation Model:

ssh netcup << 'EOF'
# Install huggingface-cli
pip install huggingface-hub

# Download Wan2.1 I2V 14B 720p
mkdir -p /data/models/video-generation
cd /data/models/video-generation

huggingface-cli download Wan-AI/Wan2.1-I2V-14B-720P \
  --include "*.safetensors" \
  --local-dir wan2.1_i2v_14b

# Check size (~28GB)
du -sh wan2.1_i2v_14b
EOF

Note: The Wan2.1 model will be deployed to RunPod, not run locally on CPU.

Step 3: Setup RunPod Endpoints

3.1 Create RunPod Serverless Endpoints:

Go to RunPod Serverless and create endpoints for:

  1. Text Generation Endpoint (optional, fallback)

    • Model: Any LLM (Llama, Mistral, etc.)
    • GPU: Optional (we use local CPU primarily)
  2. Image Generation Endpoint

    • Model: SDXL or SD3
    • GPU: A4000/A5000 (good price/performance)
    • Expected cost: ~$0.02/image
  3. Video Generation Endpoint

    • Model: Wan2.1-I2V-14B-720P
    • GPU: A100 or H100 (required for video)
    • Expected cost: ~$0.50/video

3.2 Get Endpoint IDs:

For each endpoint, copy the endpoint ID from the URL or endpoint details.

Example: If URL is https://api.runpod.ai/v2/jqd16o7stu29vq/run, then jqd16o7stu29vq is your endpoint ID.

3.3 Update Environment Variables:

Update /opt/ai-orchestrator/.env with your endpoint IDs:

ssh netcup "nano /opt/ai-orchestrator/.env"

# Add your endpoint IDs:
RUNPOD_TEXT_ENDPOINT_ID=your_text_endpoint_id
RUNPOD_IMAGE_ENDPOINT_ID=your_image_endpoint_id
RUNPOD_VIDEO_ENDPOINT_ID=your_video_endpoint_id

# Restart services
cd /opt/ai-orchestrator && docker-compose restart

Step 4: Configure canvas-website

4.1 Create .env.local:

In your canvas-website directory:

cd /home/jeffe/Github/canvas-website-branch-worktrees/add-runpod-AI-API

cat > .env.local << 'EOF'
# AI Orchestrator (Primary - Netcup RS 8000)
VITE_AI_ORCHESTRATOR_URL=http://159.195.32.209:8000
# Or use domain when DNS is configured:
# VITE_AI_ORCHESTRATOR_URL=https://ai-api.jeffemmett.com

# RunPod API (Fallback/Direct Access)
VITE_RUNPOD_API_KEY=your_runpod_api_key_here
VITE_RUNPOD_TEXT_ENDPOINT_ID=your_text_endpoint_id
VITE_RUNPOD_IMAGE_ENDPOINT_ID=your_image_endpoint_id
VITE_RUNPOD_VIDEO_ENDPOINT_ID=your_video_endpoint_id

# Other existing vars...
VITE_GOOGLE_CLIENT_ID=your_google_client_id
VITE_GOOGLE_MAPS_API_KEY=your_google_maps_api_key
VITE_DAILY_DOMAIN=your_daily_domain
VITE_TLDRAW_WORKER_URL=your_worker_url
EOF

4.2 Install Dependencies:

npm install

4.3 Build and Start:

# Development
npm run dev

# Production build
npm run build
npm run start

Step 5: Register Video Generation Tool

You need to register the VideoGen shape and tool with tldraw. Find where shapes and tools are registered (likely in src/routes/Board.tsx or similar):

Add to shape utilities array:

import { VideoGenShapeUtil } from '@/shapes/VideoGenShapeUtil'

const shapeUtils = [
  // ... existing shapes
  VideoGenShapeUtil,
]

Add to tools array:

import { VideoGenTool } from '@/tools/VideoGenTool'

const tools = [
  // ... existing tools
  VideoGenTool,
]

🧪 Testing

Test 1: Verify AI Orchestrator

# Test health endpoint
curl http://159.195.32.209:8000/health

# Expected response:
# {"status":"healthy","timestamp":"2025-11-25T12:00:00.000Z"}

# Test text generation
curl -X POST http://159.195.32.209:8000/generate/text \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Write a hello world program in Python",
    "priority": "normal"
  }'

# Expected response:
# {"job_id":"abc123","status":"queued","message":"Job queued on local provider"}

# Check job status
curl http://159.195.32.209:8000/job/abc123

# Check queue status
curl http://159.195.32.209:8000/queue/status

# Check costs
curl http://159.195.32.209:8000/costs/summary

Test 2: Test Text Generation in Canvas

  1. Open canvas-website in browser
  2. Open browser console (F12)
  3. Look for log messages:
    • ✅ AI Orchestrator is available at http://159.195.32.209:8000
  4. Create a Prompt shape or use arrow LLM action
  5. Enter a prompt and submit
  6. Verify response appears
  7. Check console for routing info:
    • Should see Using local Ollama (FREE)

Test 3: Test Image Generation

Low Priority (Local CPU - FREE):

  1. Use ImageGen tool from toolbar
  2. Click on canvas to create ImageGen shape
  3. Enter prompt: "A beautiful mountain landscape"
  4. Select priority: "Low"
  5. Click "Generate"
  6. Wait 30-60 seconds
  7. Verify image appears
  8. Check console: Should show Using local Stable Diffusion CPU

High Priority (RunPod GPU - $0.02):

  1. Create new ImageGen shape
  2. Enter prompt: "A futuristic city at sunset"
  3. Select priority: "High"
  4. Click "Generate"
  5. Wait 5-10 seconds
  6. Verify image appears
  7. Check console: Should show Using RunPod SDXL
  8. Check cost: Should show ~$0.02

Test 4: Test Video Generation

  1. Use VideoGen tool from toolbar
  2. Click on canvas to create VideoGen shape
  3. Enter prompt: "A cat walking through a garden"
  4. Set duration: 3 seconds
  5. Click "Generate"
  6. Wait 30-90 seconds
  7. Verify video appears and plays
  8. Check console: Should show Using RunPod Wan2.1
  9. Check cost: Should show ~$0.50
  10. Test download button

Test 5: Test Voice Transcription

  1. Use Transcription tool from toolbar
  2. Click to create Transcription shape
  3. Click "Start Recording"
  4. Speak into microphone
  5. Click "Stop Recording"
  6. Verify transcription appears
  7. Check if using RunPod or local Whisper

Test 6: Monitor Costs and Performance

Access monitoring dashboards:

# API Documentation
http://159.195.32.209:8000/docs

# Queue Status
http://159.195.32.209:8000/queue/status

# Cost Tracking
http://159.195.32.209:3000/api/costs/summary

# Grafana Dashboard
http://159.195.32.209:3001
# Default login: admin / admin (change this!)

Check daily costs:

curl http://159.195.32.209:3000/api/costs/summary

Expected response:

{
  "today": {
    "local": 0.00,
    "runpod": 2.45,
    "total": 2.45
  },
  "this_month": {
    "local": 0.00,
    "runpod": 45.20,
    "total": 45.20
  },
  "breakdown": {
    "text": 0.00,
    "image": 12.50,
    "video": 32.70,
    "code": 0.00
  }
}

🐛 Troubleshooting

Issue: AI Orchestrator not available

Symptoms:

  • Console shows: ⚠️ AI Orchestrator configured but not responding
  • Health check fails

Solutions:

# 1. Check if services are running
ssh netcup "cd /opt/ai-orchestrator && docker-compose ps"

# 2. Check logs
ssh netcup "cd /opt/ai-orchestrator && docker-compose logs -f router"

# 3. Restart services
ssh netcup "cd /opt/ai-orchestrator && docker-compose restart"

# 4. Check firewall
ssh netcup "sudo ufw status"
ssh netcup "sudo ufw allow 8000/tcp"

Issue: Image generation fails with "No output found"

Symptoms:

  • Job completes but no image URL returned
  • Error: Job completed but no output data found

Solutions:

  1. Check RunPod endpoint configuration
  2. Verify endpoint handler returns correct format:
    {"output": {"image": "base64_or_url"}}
    
  3. Check endpoint logs in RunPod console
  4. Test endpoint directly with curl

Issue: Video generation timeout

Symptoms:

  • Job stuck in "processing" state
  • Timeout after 120 attempts

Solutions:

  1. Video generation takes 30-90 seconds, ensure patience
  2. Check RunPod GPU availability (might be cold start)
  3. Increase timeout in VideoGenShapeUtil if needed
  4. Check RunPod endpoint logs for errors

Issue: High costs

Symptoms:

  • Monthly costs exceed budget
  • Too many RunPod requests

Solutions:

# 1. Check cost breakdown
curl http://159.195.32.209:3000/api/costs/summary

# 2. Review routing decisions
curl http://159.195.32.209:8000/queue/status

# 3. Adjust routing thresholds
# Edit router configuration to prefer local more
ssh netcup "nano /opt/ai-orchestrator/services/router/main.py"

# 4. Set cost alerts
ssh netcup "nano /opt/ai-orchestrator/.env"
# COST_ALERT_THRESHOLD=50  # Alert if daily cost > $50

Issue: Local models slow or failing

Symptoms:

  • Text generation slow (>30s)
  • Image generation very slow (>2min)
  • Out of memory errors

Solutions:

# 1. Check system resources
ssh netcup "htop"
ssh netcup "free -h"

# 2. Reduce model size
ssh netcup << 'EOF'
# Use smaller models
docker exec ai-ollama ollama pull llama3:8b  # Instead of 70b
docker exec ai-ollama ollama pull mistral:7b  # Lighter model
EOF

# 3. Limit concurrent workers
ssh netcup "nano /opt/ai-orchestrator/docker-compose.yml"
# Reduce worker replicas if needed

# 4. Increase swap (if low RAM)
ssh netcup "sudo fallocate -l 8G /swapfile"
ssh netcup "sudo chmod 600 /swapfile"
ssh netcup "sudo mkswap /swapfile"
ssh netcup "sudo swapon /swapfile"

📊 Performance Expectations

Text Generation:

  • Local (Llama3-70b): 2-10 seconds
  • Local (Mistral-7b): 1-3 seconds
  • RunPod (fallback): 3-8 seconds
  • Cost: $0.00 (local) or $0.001-0.01 (RunPod)

Image Generation:

  • Local SD CPU (low priority): 30-60 seconds
  • RunPod GPU (high priority): 3-10 seconds
  • Cost: $0.00 (local) or $0.02 (RunPod)

Video Generation:

  • RunPod Wan2.1: 30-90 seconds
  • Cost: ~$0.50 per video

Expected Monthly Costs:

Light Usage (100 requests/day):

  • 70 text (local): $0
  • 20 images (15 local + 5 RunPod): $0.10
  • 10 videos: $5.00
  • Total: ~$5-10/month

Medium Usage (500 requests/day):

  • 350 text (local): $0
  • 100 images (60 local + 40 RunPod): $0.80
  • 50 videos: $25.00
  • Total: ~$25-35/month

Heavy Usage (2000 requests/day):

  • 1400 text (local): $0
  • 400 images (200 local + 200 RunPod): $4.00
  • 200 videos: $100.00
  • Total: ~$100-120/month

Compare to persistent GPU pod: $200-300/month regardless of usage!


🎯 Next Steps

  1. Deploy AI Orchestrator on Netcup RS 8000
  2. Setup local AI models (Ollama, SD)
  3. Configure RunPod endpoints
  4. Test all AI services
  5. 📋 Setup monitoring and alerts
  6. 📋 Configure DNS for ai-api.jeffemmett.com
  7. 📋 Setup SSL with Let's Encrypt
  8. 📋 Migrate canvas-website to Netcup
  9. 📋 Monitor costs and optimize routing
  10. 📋 Decommission DigitalOcean droplets

📚 Additional Resources


💡 Tips for Cost Optimization

  1. Prefer low priority for batch jobs: Use priority: "low" for non-urgent tasks
  2. Use local models first: 70-80% of workload can run locally for $0
  3. Monitor queue depth: Auto-scales to RunPod when local is backed up
  4. Set cost alerts: Get notified if daily costs exceed threshold
  5. Review cost breakdown weekly: Identify optimization opportunities
  6. Batch similar requests: Process multiple items together
  7. Cache results: Store and reuse common queries

Ready to deploy? Start with Step 1 and follow the guide! 🚀