627 lines
15 KiB
Markdown
627 lines
15 KiB
Markdown
# AI Services Deployment & Testing Guide
|
|
|
|
Complete guide for deploying and testing the AI services integration in canvas-website with Netcup RS 8000 and RunPod.
|
|
|
|
---
|
|
|
|
## 🎯 Overview
|
|
|
|
This project integrates multiple AI services with smart routing:
|
|
|
|
**Smart Routing Strategy:**
|
|
- **Text/Code (70-80% workload)**: Local Ollama on RS 8000 → **FREE**
|
|
- **Images - Low Priority**: Local Stable Diffusion on RS 8000 → **FREE** (slow ~60s)
|
|
- **Images - High Priority**: RunPod GPU (SDXL) → **$0.02/image** (fast ~5s)
|
|
- **Video Generation**: RunPod GPU (Wan2.1) → **$0.50/video** (30-90s)
|
|
|
|
**Expected Cost Savings:** $86-350/month compared to persistent GPU instances
|
|
|
|
---
|
|
|
|
## 📦 What's Included
|
|
|
|
### AI Services:
|
|
1. ✅ **Text Generation (LLM)**
|
|
- RunPod integration via `src/lib/runpodApi.ts`
|
|
- Enhanced LLM utilities in `src/utils/llmUtils.ts`
|
|
- AI Orchestrator client in `src/lib/aiOrchestrator.ts`
|
|
- Prompt shapes, arrow LLM actions, command palette
|
|
|
|
2. ✅ **Image Generation**
|
|
- ImageGenShapeUtil in `src/shapes/ImageGenShapeUtil.tsx`
|
|
- ImageGenTool in `src/tools/ImageGenTool.ts`
|
|
- Mock mode **DISABLED** (ready for production)
|
|
- Smart routing: low priority → local CPU, high priority → RunPod GPU
|
|
|
|
3. ✅ **Video Generation (NEW!)**
|
|
- VideoGenShapeUtil in `src/shapes/VideoGenShapeUtil.tsx`
|
|
- VideoGenTool in `src/tools/VideoGenTool.ts`
|
|
- Wan2.1 I2V 14B 720p model on RunPod
|
|
- Always uses GPU (no local option)
|
|
|
|
4. ✅ **Voice Transcription**
|
|
- WhisperX integration via `src/hooks/useWhisperTranscriptionSimple.ts`
|
|
- Automatic fallback to local Whisper model
|
|
|
|
---
|
|
|
|
## 🚀 Deployment Steps
|
|
|
|
### Step 1: Deploy AI Orchestrator on Netcup RS 8000
|
|
|
|
**Prerequisites:**
|
|
- SSH access to Netcup RS 8000: `ssh netcup`
|
|
- Docker and Docker Compose installed
|
|
- RunPod API key
|
|
|
|
**1.1 Create AI Orchestrator Directory:**
|
|
|
|
```bash
|
|
ssh netcup << 'EOF'
|
|
mkdir -p /opt/ai-orchestrator/{services/{router,workers,monitor},configs,data/{redis,postgres,prometheus}}
|
|
cd /opt/ai-orchestrator
|
|
EOF
|
|
```
|
|
|
|
**1.2 Copy Configuration Files:**
|
|
|
|
From your local machine, copy the AI orchestrator files created in `NETCUP_MIGRATION_PLAN.md`:
|
|
|
|
```bash
|
|
# Copy docker-compose.yml
|
|
scp /path/to/docker-compose.yml netcup:/opt/ai-orchestrator/
|
|
|
|
# Copy service files
|
|
scp -r /path/to/services/* netcup:/opt/ai-orchestrator/services/
|
|
```
|
|
|
|
**1.3 Configure Environment Variables:**
|
|
|
|
```bash
|
|
ssh netcup "cat > /opt/ai-orchestrator/.env" << 'EOF'
|
|
# PostgreSQL
|
|
POSTGRES_PASSWORD=$(openssl rand -hex 16)
|
|
|
|
# RunPod API Keys
|
|
RUNPOD_API_KEY=your_runpod_api_key_here
|
|
RUNPOD_TEXT_ENDPOINT_ID=your_text_endpoint_id
|
|
RUNPOD_IMAGE_ENDPOINT_ID=your_image_endpoint_id
|
|
RUNPOD_VIDEO_ENDPOINT_ID=your_video_endpoint_id
|
|
|
|
# Grafana
|
|
GRAFANA_PASSWORD=$(openssl rand -hex 16)
|
|
|
|
# Monitoring
|
|
ALERT_EMAIL=your@email.com
|
|
COST_ALERT_THRESHOLD=100
|
|
EOF
|
|
```
|
|
|
|
**1.4 Deploy the Stack:**
|
|
|
|
```bash
|
|
ssh netcup << 'EOF'
|
|
cd /opt/ai-orchestrator
|
|
|
|
# Start all services
|
|
docker-compose up -d
|
|
|
|
# Check status
|
|
docker-compose ps
|
|
|
|
# View logs
|
|
docker-compose logs -f router
|
|
EOF
|
|
```
|
|
|
|
**1.5 Verify Deployment:**
|
|
|
|
```bash
|
|
# Check health endpoint
|
|
ssh netcup "curl http://localhost:8000/health"
|
|
|
|
# Check API documentation
|
|
ssh netcup "curl http://localhost:8000/docs"
|
|
|
|
# Check queue status
|
|
ssh netcup "curl http://localhost:8000/queue/status"
|
|
```
|
|
|
|
### Step 2: Setup Local AI Models on RS 8000
|
|
|
|
**2.1 Download Ollama Models:**
|
|
|
|
```bash
|
|
ssh netcup << 'EOF'
|
|
# Download recommended models
|
|
docker exec ai-ollama ollama pull llama3:70b
|
|
docker exec ai-ollama ollama pull codellama:34b
|
|
docker exec ai-ollama ollama pull deepseek-coder:33b
|
|
docker exec ai-ollama ollama pull mistral:7b
|
|
|
|
# Verify
|
|
docker exec ai-ollama ollama list
|
|
|
|
# Test a model
|
|
docker exec ai-ollama ollama run llama3:70b "Hello, how are you?"
|
|
EOF
|
|
```
|
|
|
|
**2.2 Download Stable Diffusion Models:**
|
|
|
|
```bash
|
|
ssh netcup << 'EOF'
|
|
mkdir -p /data/models/stable-diffusion/sd-v2.1
|
|
cd /data/models/stable-diffusion/sd-v2.1
|
|
|
|
# Download SD 2.1 weights
|
|
wget https://huggingface.co/stabilityai/stable-diffusion-2-1/resolve/main/v2-1_768-ema-pruned.safetensors
|
|
|
|
# Verify
|
|
ls -lh v2-1_768-ema-pruned.safetensors
|
|
EOF
|
|
```
|
|
|
|
**2.3 Download Wan2.1 Video Generation Model:**
|
|
|
|
```bash
|
|
ssh netcup << 'EOF'
|
|
# Install huggingface-cli
|
|
pip install huggingface-hub
|
|
|
|
# Download Wan2.1 I2V 14B 720p
|
|
mkdir -p /data/models/video-generation
|
|
cd /data/models/video-generation
|
|
|
|
huggingface-cli download Wan-AI/Wan2.1-I2V-14B-720P \
|
|
--include "*.safetensors" \
|
|
--local-dir wan2.1_i2v_14b
|
|
|
|
# Check size (~28GB)
|
|
du -sh wan2.1_i2v_14b
|
|
EOF
|
|
```
|
|
|
|
**Note:** The Wan2.1 model will be deployed to RunPod, not run locally on CPU.
|
|
|
|
### Step 3: Setup RunPod Endpoints
|
|
|
|
**3.1 Create RunPod Serverless Endpoints:**
|
|
|
|
Go to [RunPod Serverless](https://www.runpod.io/console/serverless) and create endpoints for:
|
|
|
|
1. **Text Generation Endpoint** (optional, fallback)
|
|
- Model: Any LLM (Llama, Mistral, etc.)
|
|
- GPU: Optional (we use local CPU primarily)
|
|
|
|
2. **Image Generation Endpoint**
|
|
- Model: SDXL or SD3
|
|
- GPU: A4000/A5000 (good price/performance)
|
|
- Expected cost: ~$0.02/image
|
|
|
|
3. **Video Generation Endpoint**
|
|
- Model: Wan2.1-I2V-14B-720P
|
|
- GPU: A100 or H100 (required for video)
|
|
- Expected cost: ~$0.50/video
|
|
|
|
**3.2 Get Endpoint IDs:**
|
|
|
|
For each endpoint, copy the endpoint ID from the URL or endpoint details.
|
|
|
|
Example: If URL is `https://api.runpod.ai/v2/jqd16o7stu29vq/run`, then `jqd16o7stu29vq` is your endpoint ID.
|
|
|
|
**3.3 Update Environment Variables:**
|
|
|
|
Update `/opt/ai-orchestrator/.env` with your endpoint IDs:
|
|
|
|
```bash
|
|
ssh netcup "nano /opt/ai-orchestrator/.env"
|
|
|
|
# Add your endpoint IDs:
|
|
RUNPOD_TEXT_ENDPOINT_ID=your_text_endpoint_id
|
|
RUNPOD_IMAGE_ENDPOINT_ID=your_image_endpoint_id
|
|
RUNPOD_VIDEO_ENDPOINT_ID=your_video_endpoint_id
|
|
|
|
# Restart services
|
|
cd /opt/ai-orchestrator && docker-compose restart
|
|
```
|
|
|
|
### Step 4: Configure canvas-website
|
|
|
|
**4.1 Create .env.local:**
|
|
|
|
In your canvas-website directory:
|
|
|
|
```bash
|
|
cd /home/jeffe/Github/canvas-website-branch-worktrees/add-runpod-AI-API
|
|
|
|
cat > .env.local << 'EOF'
|
|
# AI Orchestrator (Primary - Netcup RS 8000)
|
|
VITE_AI_ORCHESTRATOR_URL=http://159.195.32.209:8000
|
|
# Or use domain when DNS is configured:
|
|
# VITE_AI_ORCHESTRATOR_URL=https://ai-api.jeffemmett.com
|
|
|
|
# RunPod API (Fallback/Direct Access)
|
|
VITE_RUNPOD_API_KEY=your_runpod_api_key_here
|
|
VITE_RUNPOD_TEXT_ENDPOINT_ID=your_text_endpoint_id
|
|
VITE_RUNPOD_IMAGE_ENDPOINT_ID=your_image_endpoint_id
|
|
VITE_RUNPOD_VIDEO_ENDPOINT_ID=your_video_endpoint_id
|
|
|
|
# Other existing vars...
|
|
VITE_GOOGLE_CLIENT_ID=your_google_client_id
|
|
VITE_GOOGLE_MAPS_API_KEY=your_google_maps_api_key
|
|
VITE_DAILY_DOMAIN=your_daily_domain
|
|
VITE_TLDRAW_WORKER_URL=your_worker_url
|
|
EOF
|
|
```
|
|
|
|
**4.2 Install Dependencies:**
|
|
|
|
```bash
|
|
npm install
|
|
```
|
|
|
|
**4.3 Build and Start:**
|
|
|
|
```bash
|
|
# Development
|
|
npm run dev
|
|
|
|
# Production build
|
|
npm run build
|
|
npm run start
|
|
```
|
|
|
|
### Step 5: Register Video Generation Tool
|
|
|
|
You need to register the VideoGen shape and tool with tldraw. Find where shapes and tools are registered (likely in `src/routes/Board.tsx` or similar):
|
|
|
|
**Add to shape utilities array:**
|
|
```typescript
|
|
import { VideoGenShapeUtil } from '@/shapes/VideoGenShapeUtil'
|
|
|
|
const shapeUtils = [
|
|
// ... existing shapes
|
|
VideoGenShapeUtil,
|
|
]
|
|
```
|
|
|
|
**Add to tools array:**
|
|
```typescript
|
|
import { VideoGenTool } from '@/tools/VideoGenTool'
|
|
|
|
const tools = [
|
|
// ... existing tools
|
|
VideoGenTool,
|
|
]
|
|
```
|
|
|
|
---
|
|
|
|
## 🧪 Testing
|
|
|
|
### Test 1: Verify AI Orchestrator
|
|
|
|
```bash
|
|
# Test health endpoint
|
|
curl http://159.195.32.209:8000/health
|
|
|
|
# Expected response:
|
|
# {"status":"healthy","timestamp":"2025-11-25T12:00:00.000Z"}
|
|
|
|
# Test text generation
|
|
curl -X POST http://159.195.32.209:8000/generate/text \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"prompt": "Write a hello world program in Python",
|
|
"priority": "normal"
|
|
}'
|
|
|
|
# Expected response:
|
|
# {"job_id":"abc123","status":"queued","message":"Job queued on local provider"}
|
|
|
|
# Check job status
|
|
curl http://159.195.32.209:8000/job/abc123
|
|
|
|
# Check queue status
|
|
curl http://159.195.32.209:8000/queue/status
|
|
|
|
# Check costs
|
|
curl http://159.195.32.209:8000/costs/summary
|
|
```
|
|
|
|
### Test 2: Test Text Generation in Canvas
|
|
|
|
1. Open canvas-website in browser
|
|
2. Open browser console (F12)
|
|
3. Look for log messages:
|
|
- `✅ AI Orchestrator is available at http://159.195.32.209:8000`
|
|
4. Create a Prompt shape or use arrow LLM action
|
|
5. Enter a prompt and submit
|
|
6. Verify response appears
|
|
7. Check console for routing info:
|
|
- Should see `Using local Ollama (FREE)`
|
|
|
|
### Test 3: Test Image Generation
|
|
|
|
**Low Priority (Local CPU - FREE):**
|
|
|
|
1. Use ImageGen tool from toolbar
|
|
2. Click on canvas to create ImageGen shape
|
|
3. Enter prompt: "A beautiful mountain landscape"
|
|
4. Select priority: "Low"
|
|
5. Click "Generate"
|
|
6. Wait 30-60 seconds
|
|
7. Verify image appears
|
|
8. Check console: Should show `Using local Stable Diffusion CPU`
|
|
|
|
**High Priority (RunPod GPU - $0.02):**
|
|
|
|
1. Create new ImageGen shape
|
|
2. Enter prompt: "A futuristic city at sunset"
|
|
3. Select priority: "High"
|
|
4. Click "Generate"
|
|
5. Wait 5-10 seconds
|
|
6. Verify image appears
|
|
7. Check console: Should show `Using RunPod SDXL`
|
|
8. Check cost: Should show `~$0.02`
|
|
|
|
### Test 4: Test Video Generation
|
|
|
|
1. Use VideoGen tool from toolbar
|
|
2. Click on canvas to create VideoGen shape
|
|
3. Enter prompt: "A cat walking through a garden"
|
|
4. Set duration: 3 seconds
|
|
5. Click "Generate"
|
|
6. Wait 30-90 seconds
|
|
7. Verify video appears and plays
|
|
8. Check console: Should show `Using RunPod Wan2.1`
|
|
9. Check cost: Should show `~$0.50`
|
|
10. Test download button
|
|
|
|
### Test 5: Test Voice Transcription
|
|
|
|
1. Use Transcription tool from toolbar
|
|
2. Click to create Transcription shape
|
|
3. Click "Start Recording"
|
|
4. Speak into microphone
|
|
5. Click "Stop Recording"
|
|
6. Verify transcription appears
|
|
7. Check if using RunPod or local Whisper
|
|
|
|
### Test 6: Monitor Costs and Performance
|
|
|
|
**Access monitoring dashboards:**
|
|
|
|
```bash
|
|
# API Documentation
|
|
http://159.195.32.209:8000/docs
|
|
|
|
# Queue Status
|
|
http://159.195.32.209:8000/queue/status
|
|
|
|
# Cost Tracking
|
|
http://159.195.32.209:3000/api/costs/summary
|
|
|
|
# Grafana Dashboard
|
|
http://159.195.32.209:3001
|
|
# Default login: admin / admin (change this!)
|
|
```
|
|
|
|
**Check daily costs:**
|
|
|
|
```bash
|
|
curl http://159.195.32.209:3000/api/costs/summary
|
|
```
|
|
|
|
Expected response:
|
|
```json
|
|
{
|
|
"today": {
|
|
"local": 0.00,
|
|
"runpod": 2.45,
|
|
"total": 2.45
|
|
},
|
|
"this_month": {
|
|
"local": 0.00,
|
|
"runpod": 45.20,
|
|
"total": 45.20
|
|
},
|
|
"breakdown": {
|
|
"text": 0.00,
|
|
"image": 12.50,
|
|
"video": 32.70,
|
|
"code": 0.00
|
|
}
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## 🐛 Troubleshooting
|
|
|
|
### Issue: AI Orchestrator not available
|
|
|
|
**Symptoms:**
|
|
- Console shows: `⚠️ AI Orchestrator configured but not responding`
|
|
- Health check fails
|
|
|
|
**Solutions:**
|
|
```bash
|
|
# 1. Check if services are running
|
|
ssh netcup "cd /opt/ai-orchestrator && docker-compose ps"
|
|
|
|
# 2. Check logs
|
|
ssh netcup "cd /opt/ai-orchestrator && docker-compose logs -f router"
|
|
|
|
# 3. Restart services
|
|
ssh netcup "cd /opt/ai-orchestrator && docker-compose restart"
|
|
|
|
# 4. Check firewall
|
|
ssh netcup "sudo ufw status"
|
|
ssh netcup "sudo ufw allow 8000/tcp"
|
|
```
|
|
|
|
### Issue: Image generation fails with "No output found"
|
|
|
|
**Symptoms:**
|
|
- Job completes but no image URL returned
|
|
- Error: `Job completed but no output data found`
|
|
|
|
**Solutions:**
|
|
1. Check RunPod endpoint configuration
|
|
2. Verify endpoint handler returns correct format:
|
|
```json
|
|
{"output": {"image": "base64_or_url"}}
|
|
```
|
|
3. Check endpoint logs in RunPod console
|
|
4. Test endpoint directly with curl
|
|
|
|
### Issue: Video generation timeout
|
|
|
|
**Symptoms:**
|
|
- Job stuck in "processing" state
|
|
- Timeout after 120 attempts
|
|
|
|
**Solutions:**
|
|
1. Video generation takes 30-90 seconds, ensure patience
|
|
2. Check RunPod GPU availability (might be cold start)
|
|
3. Increase timeout in VideoGenShapeUtil if needed
|
|
4. Check RunPod endpoint logs for errors
|
|
|
|
### Issue: High costs
|
|
|
|
**Symptoms:**
|
|
- Monthly costs exceed budget
|
|
- Too many RunPod requests
|
|
|
|
**Solutions:**
|
|
```bash
|
|
# 1. Check cost breakdown
|
|
curl http://159.195.32.209:3000/api/costs/summary
|
|
|
|
# 2. Review routing decisions
|
|
curl http://159.195.32.209:8000/queue/status
|
|
|
|
# 3. Adjust routing thresholds
|
|
# Edit router configuration to prefer local more
|
|
ssh netcup "nano /opt/ai-orchestrator/services/router/main.py"
|
|
|
|
# 4. Set cost alerts
|
|
ssh netcup "nano /opt/ai-orchestrator/.env"
|
|
# COST_ALERT_THRESHOLD=50 # Alert if daily cost > $50
|
|
```
|
|
|
|
### Issue: Local models slow or failing
|
|
|
|
**Symptoms:**
|
|
- Text generation slow (>30s)
|
|
- Image generation very slow (>2min)
|
|
- Out of memory errors
|
|
|
|
**Solutions:**
|
|
```bash
|
|
# 1. Check system resources
|
|
ssh netcup "htop"
|
|
ssh netcup "free -h"
|
|
|
|
# 2. Reduce model size
|
|
ssh netcup << 'EOF'
|
|
# Use smaller models
|
|
docker exec ai-ollama ollama pull llama3:8b # Instead of 70b
|
|
docker exec ai-ollama ollama pull mistral:7b # Lighter model
|
|
EOF
|
|
|
|
# 3. Limit concurrent workers
|
|
ssh netcup "nano /opt/ai-orchestrator/docker-compose.yml"
|
|
# Reduce worker replicas if needed
|
|
|
|
# 4. Increase swap (if low RAM)
|
|
ssh netcup "sudo fallocate -l 8G /swapfile"
|
|
ssh netcup "sudo chmod 600 /swapfile"
|
|
ssh netcup "sudo mkswap /swapfile"
|
|
ssh netcup "sudo swapon /swapfile"
|
|
```
|
|
|
|
---
|
|
|
|
## 📊 Performance Expectations
|
|
|
|
### Text Generation:
|
|
- **Local (Llama3-70b)**: 2-10 seconds
|
|
- **Local (Mistral-7b)**: 1-3 seconds
|
|
- **RunPod (fallback)**: 3-8 seconds
|
|
- **Cost**: $0.00 (local) or $0.001-0.01 (RunPod)
|
|
|
|
### Image Generation:
|
|
- **Local SD CPU (low priority)**: 30-60 seconds
|
|
- **RunPod GPU (high priority)**: 3-10 seconds
|
|
- **Cost**: $0.00 (local) or $0.02 (RunPod)
|
|
|
|
### Video Generation:
|
|
- **RunPod Wan2.1**: 30-90 seconds
|
|
- **Cost**: ~$0.50 per video
|
|
|
|
### Expected Monthly Costs:
|
|
|
|
**Light Usage (100 requests/day):**
|
|
- 70 text (local): $0
|
|
- 20 images (15 local + 5 RunPod): $0.10
|
|
- 10 videos: $5.00
|
|
- **Total: ~$5-10/month**
|
|
|
|
**Medium Usage (500 requests/day):**
|
|
- 350 text (local): $0
|
|
- 100 images (60 local + 40 RunPod): $0.80
|
|
- 50 videos: $25.00
|
|
- **Total: ~$25-35/month**
|
|
|
|
**Heavy Usage (2000 requests/day):**
|
|
- 1400 text (local): $0
|
|
- 400 images (200 local + 200 RunPod): $4.00
|
|
- 200 videos: $100.00
|
|
- **Total: ~$100-120/month**
|
|
|
|
Compare to persistent GPU pod: $200-300/month regardless of usage!
|
|
|
|
---
|
|
|
|
## 🎯 Next Steps
|
|
|
|
1. ✅ Deploy AI Orchestrator on Netcup RS 8000
|
|
2. ✅ Setup local AI models (Ollama, SD)
|
|
3. ✅ Configure RunPod endpoints
|
|
4. ✅ Test all AI services
|
|
5. 📋 Setup monitoring and alerts
|
|
6. 📋 Configure DNS for ai-api.jeffemmett.com
|
|
7. 📋 Setup SSL with Let's Encrypt
|
|
8. 📋 Migrate canvas-website to Netcup
|
|
9. 📋 Monitor costs and optimize routing
|
|
10. 📋 Decommission DigitalOcean droplets
|
|
|
|
---
|
|
|
|
## 📚 Additional Resources
|
|
|
|
- **Migration Plan**: See `NETCUP_MIGRATION_PLAN.md`
|
|
- **RunPod Setup**: See `RUNPOD_SETUP.md`
|
|
- **Test Guide**: See `TEST_RUNPOD_AI.md`
|
|
- **API Documentation**: http://159.195.32.209:8000/docs
|
|
- **Monitoring**: http://159.195.32.209:3001 (Grafana)
|
|
|
|
---
|
|
|
|
## 💡 Tips for Cost Optimization
|
|
|
|
1. **Prefer low priority for batch jobs**: Use `priority: "low"` for non-urgent tasks
|
|
2. **Use local models first**: 70-80% of workload can run locally for $0
|
|
3. **Monitor queue depth**: Auto-scales to RunPod when local is backed up
|
|
4. **Set cost alerts**: Get notified if daily costs exceed threshold
|
|
5. **Review cost breakdown weekly**: Identify optimization opportunities
|
|
6. **Batch similar requests**: Process multiple items together
|
|
7. **Cache results**: Store and reuse common queries
|
|
|
|
---
|
|
|
|
**Ready to deploy?** Start with Step 1 and follow the guide! 🚀
|