39 KiB

Raw Permalink Blame History

Netcup RS 8000 Migration & AI Orchestration Setup Plan

🎯 Overview

Complete migration plan from DigitalOcean droplets to Netcup RS 8000 G12 Pro with smart AI orchestration layer that routes between local CPU (RS 8000) and serverless GPU (RunPod).

Server Specs:

20 cores, 64GB RAM, 3TB storage
IP: 159.195.32.209
Location: Germany (EU)
SSH: ssh netcup

Expected Savings: $86-350/month ($1,032-4,200/year)

📋 Phase 1: Pre-Migration Preparation

1.1 Inventory Current Services

DigitalOcean Main Droplet (143.198.39.165):

# Document all running services
ssh droplet "docker ps --format '{{.Names}}\t{{.Image}}\t{{.Ports}}'"
ssh droplet "pm2 list"
ssh droplet "systemctl list-units --type=service --state=running"

# Backup configurations
ssh droplet "tar -czf ~/configs-backup.tar.gz /etc/nginx /etc/systemd/system ~/.config"
scp droplet:~/configs-backup.tar.gz ~/backups/droplet-configs-$(date +%Y%m%d).tar.gz

DigitalOcean AI Services Droplet (178.128.238.87):

# Document AI services
ssh ai-droplet "docker ps --format '{{.Names}}\t{{.Image}}\t{{.Ports}}'"
ssh ai-droplet "nvidia-smi" # Check GPU usage
ssh ai-droplet "df -h" # Check disk usage for models

# Backup AI model weights and configs
ssh ai-droplet "tar -czf ~/ai-models-backup.tar.gz ~/models ~/.cache/huggingface"
scp ai-droplet:~/ai-models-backup.tar.gz ~/backups/ai-models-$(date +%Y%m%d).tar.gz

Create Service Inventory Document:

cat > ~/migration-inventory.md << 'EOF'
# Service Inventory

## Main Droplet (143.198.39.165)
- [ ] nginx reverse proxy
- [ ] canvas-website
- [ ] Other web apps: ________________
- [ ] Databases: ________________
- [ ] Monitoring: ________________

## AI Droplet (178.128.238.87)
- [ ] Stable Diffusion
- [ ] Ollama/LLM services
- [ ] Model storage location: ________________
- [ ] Current GPU usage: ________________

## Data to Migrate
- [ ] Databases (size: ___GB)
- [ ] User uploads (size: ___GB)
- [ ] AI models (size: ___GB)
- [ ] Configuration files
- [ ] SSL certificates
- [ ] Environment variables
EOF

1.2 Test Netcup RS 8000 Access

# Verify SSH access
ssh netcup "hostname && uname -a && df -h"

# Check system resources
ssh netcup "nproc && free -h && lscpu | grep 'Model name'"

# Install basic tools
ssh netcup "apt update && apt install -y docker.io docker-compose git htop ncdu curl wget"

# Configure Docker
ssh netcup "systemctl enable docker && systemctl start docker"
ssh netcup "docker run hello-world"

1.3 Setup Directory Structure on Netcup

ssh netcup << 'EOF'
# Create organized directory structure
mkdir -p /opt/{ai-orchestrator,apps,databases,monitoring,backups}
mkdir -p /data/{models,uploads,databases}
mkdir -p /etc/docker/compose

# Set permissions
chown -R $USER:$USER /opt /data
chmod 755 /opt /data

ls -la /opt /data
EOF

📋 Phase 2: Deploy AI Orchestration Infrastructure

2.1 Transfer AI Orchestration Stack

# Create the AI orchestration directory structure
cat > /tmp/create-ai-orchestrator.sh << 'SCRIPT'
#!/bin/bash
set -e

BASE_DIR="/opt/ai-orchestrator"
mkdir -p $BASE_DIR/{services/{router,workers,monitor},configs,data/{redis,postgres,prometheus}}

echo "✅ Created AI orchestrator directory structure"
ls -R $BASE_DIR
SCRIPT

# Copy to Netcup and execute
scp /tmp/create-ai-orchestrator.sh netcup:/tmp/
ssh netcup "chmod +x /tmp/create-ai-orchestrator.sh && /tmp/create-ai-orchestrator.sh"

2.2 Deploy Docker Compose Stack

Create main docker-compose.yml:

ssh netcup "cat > /opt/ai-orchestrator/docker-compose.yml" << 'EOF'
version: '3.8'

services:
  # Redis for job queues
  redis:
    image: redis:7-alpine
    container_name: ai-redis
    ports:
      - "6379:6379"
    volumes:
      - ./data/redis:/data
    command: redis-server --appendonly yes
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 5s
      timeout: 3s
      retries: 5

  # PostgreSQL for job history and analytics
  postgres:
    image: postgres:15-alpine
    container_name: ai-postgres
    environment:
      POSTGRES_DB: ai_orchestrator
      POSTGRES_USER: aiuser
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-changeme}
    ports:
      - "5432:5432"
    volumes:
      - ./data/postgres:/var/lib/postgresql/data
    restart: unless-stopped
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U aiuser"]
      interval: 5s
      timeout: 3s
      retries: 5

  # Smart Router API (FastAPI)
  router:
    build: ./services/router
    container_name: ai-router
    ports:
      - "8000:8000"
    environment:
      REDIS_URL: redis://redis:6379
      DATABASE_URL: postgresql://aiuser:${POSTGRES_PASSWORD:-changeme}@postgres:5432/ai_orchestrator
      RUNPOD_API_KEY: ${RUNPOD_API_KEY}
      OLLAMA_URL: http://ollama:11434
      SD_CPU_URL: http://stable-diffusion-cpu:7860
    depends_on:
      redis:
        condition: service_healthy
      postgres:
        condition: service_healthy
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
      interval: 10s
      timeout: 5s
      retries: 3

  # Text Worker (processes text generation queue)
  text-worker:
    build: ./services/workers
    container_name: ai-text-worker
    environment:
      REDIS_URL: redis://redis:6379
      DATABASE_URL: postgresql://aiuser:${POSTGRES_PASSWORD:-changeme}@postgres:5432/ai_orchestrator
      WORKER_TYPE: text
      OLLAMA_URL: http://ollama:11434
      RUNPOD_API_KEY: ${RUNPOD_API_KEY}
    depends_on:
      - redis
      - postgres
      - router
    restart: unless-stopped
    deploy:
      replicas: 2

  # Image Worker (processes image generation queue)
  image-worker:
    build: ./services/workers
    container_name: ai-image-worker
    environment:
      REDIS_URL: redis://redis:6379
      DATABASE_URL: postgresql://aiuser:${POSTGRES_PASSWORD:-changeme}@postgres:5432/ai_orchestrator
      WORKER_TYPE: image
      SD_CPU_URL: http://stable-diffusion-cpu:7860
      RUNPOD_API_KEY: ${RUNPOD_API_KEY}
    depends_on:
      - redis
      - postgres
      - router
    restart: unless-stopped

  # Code Worker (processes code generation queue)
  code-worker:
    build: ./services/workers
    container_name: ai-code-worker
    environment:
      REDIS_URL: redis://redis:6379
      DATABASE_URL: postgresql://aiuser:${POSTGRES_PASSWORD:-changeme}@postgres:5432/ai_orchestrator
      WORKER_TYPE: code
      OLLAMA_URL: http://ollama:11434
    depends_on:
      - redis
      - postgres
      - router
    restart: unless-stopped

  # Video Worker (processes video generation queue - always RunPod)
  video-worker:
    build: ./services/workers
    container_name: ai-video-worker
    environment:
      REDIS_URL: redis://redis:6379
      DATABASE_URL: postgresql://aiuser:${POSTGRES_PASSWORD:-changeme}@postgres:5432/ai_orchestrator
      WORKER_TYPE: video
      RUNPOD_API_KEY: ${RUNPOD_API_KEY}
      RUNPOD_VIDEO_ENDPOINT_ID: ${RUNPOD_VIDEO_ENDPOINT_ID}
    depends_on:
      - redis
      - postgres
      - router
    restart: unless-stopped

  # Ollama (local LLM server)
  ollama:
    image: ollama/ollama:latest
    container_name: ai-ollama
    ports:
      - "11434:11434"
    volumes:
      - /data/models/ollama:/root/.ollama
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:11434/api/tags"]
      interval: 30s
      timeout: 10s
      retries: 3

  # Stable Diffusion (CPU fallback)
  stable-diffusion-cpu:
    image: ghcr.io/stablecog/sc-worker:latest
    container_name: ai-sd-cpu
    ports:
      - "7860:7860"
    volumes:
      - /data/models/stable-diffusion:/models
    environment:
      USE_CPU: "true"
      MODEL_PATH: /models/sd-v2.1
    restart: unless-stopped

  # Cost Monitor & Analytics
  monitor:
    build: ./services/monitor
    container_name: ai-monitor
    ports:
      - "3000:3000"
    environment:
      REDIS_URL: redis://redis:6379
      DATABASE_URL: postgresql://aiuser:${POSTGRES_PASSWORD:-changeme}@postgres:5432/ai_orchestrator
    depends_on:
      - redis
      - postgres
    restart: unless-stopped

  # Prometheus (metrics collection)
  prometheus:
    image: prom/prometheus:latest
    container_name: ai-prometheus
    ports:
      - "9090:9090"
    volumes:
      - ./configs/prometheus.yml:/etc/prometheus/prometheus.yml
      - ./data/prometheus:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
    restart: unless-stopped

  # Grafana (dashboards)
  grafana:
    image: grafana/grafana:latest
    container_name: ai-grafana
    ports:
      - "3001:3000"
    volumes:
      - ./data/grafana:/var/lib/grafana
      - ./configs/grafana-dashboards:/etc/grafana/provisioning/dashboards
    environment:
      GF_SECURITY_ADMIN_PASSWORD: ${GRAFANA_PASSWORD:-admin}
    depends_on:
      - prometheus
    restart: unless-stopped

networks:
  default:
    name: ai-orchestrator-network
EOF

2.3 Create Smart Router Service

ssh netcup "mkdir -p /opt/ai-orchestrator/services/router"
ssh netcup "cat > /opt/ai-orchestrator/services/router/Dockerfile" << 'EOF'
FROM python:3.11-slim

WORKDIR /app

RUN pip install --no-cache-dir \
    fastapi==0.104.1 \
    uvicorn[standard]==0.24.0 \
    redis==5.0.1 \
    asyncpg==0.29.0 \
    httpx==0.25.1 \
    pydantic==2.5.0 \
    pydantic-settings==2.1.0

COPY main.py .

CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
EOF

Create Router API:

ssh netcup "cat > /opt/ai-orchestrator/services/router/main.py" << 'EOF'
from fastapi import FastAPI, HTTPException, BackgroundTasks
from pydantic import BaseModel
from typing import Optional, Literal
import redis.asyncio as redis
import asyncpg
import httpx
import json
import time
import os
from datetime import datetime
import uuid

app = FastAPI(title="AI Orchestrator", version="1.0.0")

# Configuration
REDIS_URL = os.getenv("REDIS_URL", "redis://localhost:6379")
DATABASE_URL = os.getenv("DATABASE_URL")
RUNPOD_API_KEY = os.getenv("RUNPOD_API_KEY")
OLLAMA_URL = os.getenv("OLLAMA_URL", "http://localhost:11434")
SD_CPU_URL = os.getenv("SD_CPU_URL", "http://localhost:7860")

# Redis connection pool
redis_pool = None

@app.on_event("startup")
async def startup():
    global redis_pool
    redis_pool = redis.ConnectionPool.from_url(REDIS_URL, decode_responses=True)

@app.on_event("shutdown")
async def shutdown():
    if redis_pool:
        await redis_pool.disconnect()

# Request Models
class TextGenerationRequest(BaseModel):
    prompt: str
    model: str = "llama3-70b"
    priority: Literal["low", "normal", "high"] = "normal"
    user_id: Optional[str] = None
    wait: bool = False  # Wait for result or return job_id

class ImageGenerationRequest(BaseModel):
    prompt: str
    model: str = "sdxl"
    priority: Literal["low", "normal", "high"] = "normal"
    size: str = "1024x1024"
    user_id: Optional[str] = None
    wait: bool = False

class VideoGenerationRequest(BaseModel):
    prompt: str
    model: str = "wan2.1-i2v"
    duration: int = 3  # seconds
    user_id: Optional[str] = None
    wait: bool = False

class CodeGenerationRequest(BaseModel):
    prompt: str
    language: str = "python"
    priority: Literal["low", "normal", "high"] = "normal"
    user_id: Optional[str] = None
    wait: bool = False

# Response Models
class JobResponse(BaseModel):
    job_id: str
    status: str
    message: str

class ResultResponse(BaseModel):
    job_id: str
    status: str
    result: Optional[dict] = None
    cost: Optional[float] = None
    provider: Optional[str] = None
    processing_time: Optional[float] = None

# Health Check
@app.get("/health")
async def health_check():
    return {"status": "healthy", "timestamp": datetime.utcnow().isoformat()}

# Smart Routing Logic
async def route_text_job(request: TextGenerationRequest) -> str:
    """
    Text routing logic:
    - Always use local Ollama (FREE, fast enough with 20 cores)
    - Only use RunPod for extremely large context or special models
    """
    return "local"  # 99% of text goes to local CPU

async def route_image_job(request: ImageGenerationRequest) -> str:
    """
    Image routing logic:
    - Low priority → Local SD CPU (slow but FREE)
    - Normal priority → Check queue depth, route to faster option
    - High priority → RunPod GPU (fast, $0.02)
    """
    if request.priority == "high":
        return "runpod"

    if request.priority == "low":
        return "local"

    # Normal priority: check queue depth
    r = redis.Redis(connection_pool=redis_pool)
    queue_depth = await r.llen("queue:image:local")

    # If local queue is backed up (>10 jobs), use RunPod for faster response
    if queue_depth > 10:
        return "runpod"

    return "local"

async def route_video_job(request: VideoGenerationRequest) -> str:
    """
    Video routing logic:
    - Always RunPod (no local option for video generation)
    """
    return "runpod"

async def route_code_job(request: CodeGenerationRequest) -> str:
    """
    Code routing logic:
    - Always local (CodeLlama/DeepSeek on Ollama)
    """
    return "local"

# Text Generation Endpoint
@app.post("/generate/text", response_model=JobResponse)
async def generate_text(request: TextGenerationRequest, background_tasks: BackgroundTasks):
    job_id = str(uuid.uuid4())
    provider = await route_text_job(request)

    # Add to queue
    r = redis.Redis(connection_pool=redis_pool)
    job_data = {
        "job_id": job_id,
        "type": "text",
        "provider": provider,
        "request": request.dict(),
        "created_at": datetime.utcnow().isoformat(),
        "status": "queued"
    }

    await r.lpush(f"queue:text:{provider}", json.dumps(job_data))
    await r.set(f"job:{job_id}", json.dumps(job_data))

    return JobResponse(
        job_id=job_id,
        status="queued",
        message=f"Job queued on {provider} provider"
    )

# Image Generation Endpoint
@app.post("/generate/image", response_model=JobResponse)
async def generate_image(request: ImageGenerationRequest):
    job_id = str(uuid.uuid4())
    provider = await route_image_job(request)

    r = redis.Redis(connection_pool=redis_pool)
    job_data = {
        "job_id": job_id,
        "type": "image",
        "provider": provider,
        "request": request.dict(),
        "created_at": datetime.utcnow().isoformat(),
        "status": "queued"
    }

    await r.lpush(f"queue:image:{provider}", json.dumps(job_data))
    await r.set(f"job:{job_id}", json.dumps(job_data))

    return JobResponse(
        job_id=job_id,
        status="queued",
        message=f"Job queued on {provider} provider (priority: {request.priority})"
    )

# Video Generation Endpoint
@app.post("/generate/video", response_model=JobResponse)
async def generate_video(request: VideoGenerationRequest):
    job_id = str(uuid.uuid4())
    provider = "runpod"  # Always RunPod for video

    r = redis.Redis(connection_pool=redis_pool)
    job_data = {
        "job_id": job_id,
        "type": "video",
        "provider": provider,
        "request": request.dict(),
        "created_at": datetime.utcnow().isoformat(),
        "status": "queued"
    }

    await r.lpush(f"queue:video:{provider}", json.dumps(job_data))
    await r.set(f"job:{job_id}", json.dumps(job_data))

    return JobResponse(
        job_id=job_id,
        status="queued",
        message="Video generation queued on RunPod GPU"
    )

# Code Generation Endpoint
@app.post("/generate/code", response_model=JobResponse)
async def generate_code(request: CodeGenerationRequest):
    job_id = str(uuid.uuid4())
    provider = "local"  # Always local for code

    r = redis.Redis(connection_pool=redis_pool)
    job_data = {
        "job_id": job_id,
        "type": "code",
        "provider": provider,
        "request": request.dict(),
        "created_at": datetime.utcnow().isoformat(),
        "status": "queued"
    }

    await r.lpush(f"queue:code:{provider}", json.dumps(job_data))
    await r.set(f"job:{job_id}", json.dumps(job_data))

    return JobResponse(
        job_id=job_id,
        status="queued",
        message="Code generation queued on local provider"
    )

# Job Status Endpoint
@app.get("/job/{job_id}", response_model=ResultResponse)
async def get_job_status(job_id: str):
    r = redis.Redis(connection_pool=redis_pool)
    job_data = await r.get(f"job:{job_id}")

    if not job_data:
        raise HTTPException(status_code=404, detail="Job not found")

    job = json.loads(job_data)

    return ResultResponse(
        job_id=job_id,
        status=job.get("status", "unknown"),
        result=job.get("result"),
        cost=job.get("cost"),
        provider=job.get("provider"),
        processing_time=job.get("processing_time")
    )

# Queue Status Endpoint
@app.get("/queue/status")
async def get_queue_status():
    r = redis.Redis(connection_pool=redis_pool)

    queues = {
        "text_local": await r.llen("queue:text:local"),
        "text_runpod": await r.llen("queue:text:runpod"),
        "image_local": await r.llen("queue:image:local"),
        "image_runpod": await r.llen("queue:image:runpod"),
        "video_runpod": await r.llen("queue:video:runpod"),
        "code_local": await r.llen("queue:code:local"),
    }

    return {
        "queues": queues,
        "total_pending": sum(queues.values()),
        "timestamp": datetime.utcnow().isoformat()
    }

# Cost Summary Endpoint
@app.get("/costs/summary")
async def get_cost_summary():
    # This would query PostgreSQL for cost data
    # For now, return mock data
    return {
        "today": {
            "local": 0.00,
            "runpod": 2.45,
            "total": 2.45
        },
        "this_month": {
            "local": 0.00,
            "runpod": 45.20,
            "total": 45.20
        },
        "breakdown": {
            "text": 0.00,
            "image": 12.50,
            "video": 32.70,
            "code": 0.00
        }
    }
EOF

2.4 Create Worker Service

ssh netcup "cat > /opt/ai-orchestrator/services/workers/Dockerfile" << 'EOF'
FROM python:3.11-slim

WORKDIR /app

RUN pip install --no-cache-dir \
    redis==5.0.1 \
    asyncpg==0.29.0 \
    httpx==0.25.1 \
    openai==1.3.0

COPY worker.py .

CMD ["python", "worker.py"]
EOF

ssh netcup "cat > /opt/ai-orchestrator/services/workers/worker.py" << 'EOF'
import redis
import json
import os
import time
import httpx
import asyncio
from datetime import datetime

REDIS_URL = os.getenv("REDIS_URL", "redis://localhost:6379")
WORKER_TYPE = os.getenv("WORKER_TYPE", "text")
OLLAMA_URL = os.getenv("OLLAMA_URL", "http://localhost:11434")
SD_CPU_URL = os.getenv("SD_CPU_URL", "http://localhost:7860")
RUNPOD_API_KEY = os.getenv("RUNPOD_API_KEY")

r = redis.Redis.from_url(REDIS_URL, decode_responses=True)

async def process_text_job(job_data):
    """Process text generation job using Ollama"""
    request = job_data["request"]
    provider = job_data["provider"]

    start_time = time.time()

    if provider == "local":
        # Use Ollama
        async with httpx.AsyncClient() as client:
            response = await client.post(
                f"{OLLAMA_URL}/api/generate",
                json={
                    "model": request["model"],
                    "prompt": request["prompt"],
                    "stream": False
                },
                timeout=120.0
            )
            result = response.json()

        return {
            "text": result.get("response", ""),
            "cost": 0.00,  # Local is free
            "provider": "ollama",
            "processing_time": time.time() - start_time
        }
    else:
        # Use RunPod (fallback)
        # Implementation for RunPod text endpoint
        return {
            "text": "RunPod text generation",
            "cost": 0.01,
            "provider": "runpod",
            "processing_time": time.time() - start_time
        }

async def process_image_job(job_data):
    """Process image generation job"""
    request = job_data["request"]
    provider = job_data["provider"]

    start_time = time.time()

    if provider == "local":
        # Use local Stable Diffusion (CPU)
        async with httpx.AsyncClient() as client:
            response = await client.post(
                f"{SD_CPU_URL}/sdapi/v1/txt2img",
                json={
                    "prompt": request["prompt"],
                    "steps": 20,
                    "width": 512,
                    "height": 512
                },
                timeout=180.0
            )
            result = response.json()

        return {
            "image_url": result.get("images", [""])[0],
            "cost": 0.00,  # Local is free
            "provider": "stable-diffusion-cpu",
            "processing_time": time.time() - start_time
        }
    else:
        # Use RunPod SDXL
        # Implementation for RunPod image endpoint
        return {
            "image_url": "runpod_image_url",
            "cost": 0.02,
            "provider": "runpod-sdxl",
            "processing_time": time.time() - start_time
        }

async def process_video_job(job_data):
    """Process video generation job (always RunPod)"""
    request = job_data["request"]
    start_time = time.time()

    # Implementation for RunPod video endpoint (Wan2.1)
    return {
        "video_url": "runpod_video_url",
        "cost": 0.50,
        "provider": "runpod-wan2.1",
        "processing_time": time.time() - start_time
    }

async def process_code_job(job_data):
    """Process code generation job (local only)"""
    request = job_data["request"]
    start_time = time.time()

    # Use Ollama with CodeLlama
    async with httpx.AsyncClient() as client:
        response = await client.post(
            f"{OLLAMA_URL}/api/generate",
            json={
                "model": "codellama",
                "prompt": request["prompt"],
                "stream": False
            },
            timeout=120.0
        )
        result = response.json()

    return {
        "code": result.get("response", ""),
        "cost": 0.00,
        "provider": "ollama-codellama",
        "processing_time": time.time() - start_time
    }

async def worker_loop():
    """Main worker loop"""
    print(f"🚀 Starting {WORKER_TYPE} worker...")

    processors = {
        "text": process_text_job,
        "image": process_image_job,
        "video": process_video_job,
        "code": process_code_job
    }

    processor = processors.get(WORKER_TYPE)
    if not processor:
        raise ValueError(f"Unknown worker type: {WORKER_TYPE}")

    while True:
        try:
            # Try both local and runpod queues
            for provider in ["local", "runpod"]:
                queue_name = f"queue:{WORKER_TYPE}:{provider}"

                # Block for 1 second waiting for job
                job_json = r.brpop(queue_name, timeout=1)

                if job_json:
                    _, job_data_str = job_json
                    job_data = json.loads(job_data_str)
                    job_id = job_data["job_id"]

                    print(f"📝 Processing job {job_id} ({WORKER_TYPE}/{provider})")

                    # Update status to processing
                    job_data["status"] = "processing"
                    r.set(f"job:{job_id}", json.dumps(job_data))

                    try:
                        # Process the job
                        result = await processor(job_data)

                        # Update job with result
                        job_data["status"] = "completed"
                        job_data["result"] = result
                        job_data["cost"] = result.get("cost", 0)
                        job_data["processing_time"] = result.get("processing_time", 0)
                        job_data["completed_at"] = datetime.utcnow().isoformat()

                        r.set(f"job:{job_id}", json.dumps(job_data))
                        print(f"✅ Completed job {job_id} (cost: ${result.get('cost', 0):.4f})")

                    except Exception as e:
                        print(f"❌ Error processing job {job_id}: {e}")
                        job_data["status"] = "failed"
                        job_data["error"] = str(e)
                        r.set(f"job:{job_id}", json.dumps(job_data))

                    break  # Processed a job, start loop again

            # Small delay to prevent tight loop
            await asyncio.sleep(0.1)

        except Exception as e:
            print(f"❌ Worker error: {e}")
            await asyncio.sleep(5)

if __name__ == "__main__":
    asyncio.run(worker_loop())
EOF

2.5 Create Environment Configuration

ssh netcup "cat > /opt/ai-orchestrator/.env" << 'EOF'
# PostgreSQL
POSTGRES_PASSWORD=change_this_password_$(openssl rand -hex 16)

# RunPod API Keys
RUNPOD_API_KEY=your_runpod_api_key_here
RUNPOD_TEXT_ENDPOINT_ID=your_text_endpoint_id
RUNPOD_IMAGE_ENDPOINT_ID=your_image_endpoint_id
RUNPOD_VIDEO_ENDPOINT_ID=your_video_endpoint_id

# Grafana
GRAFANA_PASSWORD=change_this_password_$(openssl rand -hex 16)

# Monitoring
ALERT_EMAIL=your@email.com
COST_ALERT_THRESHOLD=100  # Alert if daily cost exceeds $100
EOF

2.6 Deploy AI Orchestration Stack

# Deploy the stack
ssh netcup "cd /opt/ai-orchestrator && docker-compose up -d"

# Check status
ssh netcup "cd /opt/ai-orchestrator && docker-compose ps"

# View logs
ssh netcup "cd /opt/ai-orchestrator && docker-compose logs -f router"

# Test health
ssh netcup "curl http://localhost:8000/health"
ssh netcup "curl http://localhost:8000/docs"  # API documentation

📋 Phase 3: Setup Local AI Models

3.1 Download and Configure Ollama Models

# Pull recommended models
ssh netcup << 'EOF'
docker exec ai-ollama ollama pull llama3:70b
docker exec ai-ollama ollama pull codellama:34b
docker exec ai-ollama ollama pull deepseek-coder:33b
docker exec ai-ollama ollama pull mistral:7b

# List installed models
docker exec ai-ollama ollama list

# Test a model
docker exec ai-ollama ollama run llama3:70b "Hello, how are you?"
EOF

3.2 Setup Stable Diffusion Models

# Download Stable Diffusion v2.1 weights
ssh netcup << 'EOF'
mkdir -p /data/models/stable-diffusion/sd-v2.1

# Download from HuggingFace
cd /data/models/stable-diffusion/sd-v2.1
wget https://huggingface.co/stabilityai/stable-diffusion-2-1/resolve/main/v2-1_768-ema-pruned.safetensors

# Verify download
ls -lh /data/models/stable-diffusion/sd-v2.1/
EOF

3.3 Setup Video Generation Models (Wan2.1)

# Download Wan2.1 I2V model weights
ssh netcup << 'EOF'
# Install huggingface-cli if not already installed
pip install huggingface-hub

# Download Wan2.1 I2V 14B 720p model
mkdir -p /data/models/video-generation
cd /data/models/video-generation

huggingface-cli download Wan-AI/Wan2.1-I2V-14B-720P \
  --include "*.safetensors" \
  --local-dir wan2.1_i2v_14b

# Verify download
du -sh wan2.1_i2v_14b
ls -lh wan2.1_i2v_14b/
EOF

Note: The Wan2.1 model is very large (~28GB) and is designed to run on RunPod GPU, not locally on CPU. We'll configure RunPod endpoints for video generation.

📋 Phase 4: Migrate Existing Services

4.1 Migrate canvas-website

# On Netcup, create app directory
ssh netcup "mkdir -p /opt/apps/canvas-website"

# From local machine, sync the code
rsync -avz --exclude 'node_modules' --exclude '.git' \
  ~/Github/canvas-website/ \
  netcup:/opt/apps/canvas-website/

# Build and deploy on Netcup
ssh netcup << 'EOF'
cd /opt/apps/canvas-website

# Install dependencies
npm install

# Build
npm run build

# Create systemd service or Docker container
# Option 1: Docker (recommended)
cat > Dockerfile << 'DOCKER'
FROM node:20-alpine

WORKDIR /app
COPY package*.json ./
RUN npm ci --production
COPY . .
RUN npm run build

EXPOSE 3000
CMD ["npm", "start"]
DOCKER

docker build -t canvas-website .
docker run -d --name canvas-website -p 3000:3000 canvas-website

# Option 2: PM2
pm2 start npm --name canvas-website -- start
pm2 save
EOF

4.2 Setup Nginx Reverse Proxy

ssh netcup << 'EOF'
apt install -y nginx certbot python3-certbot-nginx

# Create nginx config
cat > /etc/nginx/sites-available/canvas-website << 'NGINX'
server {
    listen 80;
    server_name canvas.jeffemmett.com;

    location / {
        proxy_pass http://localhost:3000;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection 'upgrade';
        proxy_set_header Host $host;
        proxy_cache_bypass $http_upgrade;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    }
}

# AI Orchestrator API
server {
    listen 80;
    server_name ai-api.jeffemmett.com;

    location / {
        proxy_pass http://localhost:8000;
        proxy_http_version 1.1;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    }
}
NGINX

# Enable site
ln -s /etc/nginx/sites-available/canvas-website /etc/nginx/sites-enabled/
nginx -t
systemctl reload nginx

# Setup SSL
certbot --nginx -d canvas.jeffemmett.com -d ai-api.jeffemmett.com
EOF

4.3 Migrate Databases

# Export from DigitalOcean
ssh droplet << 'EOF'
# PostgreSQL
pg_dump -U postgres your_database > /tmp/db_backup.sql

# MongoDB (if you have it)
mongodump --out /tmp/mongo_backup
EOF

# Transfer to Netcup
scp droplet:/tmp/db_backup.sql /tmp/
scp /tmp/db_backup.sql netcup:/tmp/

# Import on Netcup
ssh netcup << 'EOF'
# PostgreSQL
psql -U postgres -d your_database < /tmp/db_backup.sql

# Verify
psql -U postgres -d your_database -c "SELECT COUNT(*) FROM your_table;"
EOF

4.4 Migrate User Uploads and Data

# Sync user uploads
rsync -avz --progress \
  droplet:/var/www/uploads/ \
  netcup:/data/uploads/

# Sync any other data directories
rsync -avz --progress \
  droplet:/var/www/data/ \
  netcup:/data/app-data/

📋 Phase 5: Update canvas-website for AI Orchestration

5.1 Update Environment Variables

Now let's update the canvas-website configuration to use the new AI orchestrator:

# Create updated .env file for canvas-website
cat > .env.local << 'EOF'
# AI Orchestrator
VITE_AI_ORCHESTRATOR_URL=http://159.195.32.209:8000
# Or use domain: https://ai-api.jeffemmett.com

# RunPod (direct access, fallback)
VITE_RUNPOD_API_KEY=your_runpod_api_key_here
VITE_RUNPOD_TEXT_ENDPOINT_ID=your_text_endpoint_id
VITE_RUNPOD_IMAGE_ENDPOINT_ID=your_image_endpoint_id
VITE_RUNPOD_VIDEO_ENDPOINT_ID=your_video_endpoint_id

# Other existing vars...
VITE_GOOGLE_CLIENT_ID=your_google_client_id
VITE_GOOGLE_MAPS_API_KEY=your_google_maps_api_key
VITE_DAILY_DOMAIN=your_daily_domain
VITE_TLDRAW_WORKER_URL=your_worker_url
EOF

5.2 Disable Mock Mode for Image Generation

Let's fix the ImageGenShapeUtil to use the real AI orchestrator:

# Update USE_MOCK_API flag
sed -i 's/const USE_MOCK_API = true/const USE_MOCK_API = false/' \
  src/shapes/ImageGenShapeUtil.tsx

5.3 Create AI Orchestrator Client

Create a new client library for the AI orchestrator:

// src/lib/aiOrchestrator.ts
export interface AIJob {
  job_id: string
  status: 'queued' | 'processing' | 'completed' | 'failed'
  result?: any
  cost?: number
  provider?: string
  processing_time?: number
}

export class AIOrchestrator {
  private baseUrl: string

  constructor(baseUrl?: string) {
    this.baseUrl = baseUrl ||
      import.meta.env.VITE_AI_ORCHESTRATOR_URL ||
      'http://localhost:8000'
  }

  async generateText(
    prompt: string,
    options: {
      model?: string
      priority?: 'low' | 'normal' | 'high'
      userId?: string
      wait?: boolean
    } = {}
  ): Promise<AIJob> {
    const response = await fetch(`${this.baseUrl}/generate/text`, {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({
        prompt,
        model: options.model || 'llama3-70b',
        priority: options.priority || 'normal',
        user_id: options.userId,
        wait: options.wait || false
      })
    })

    const job = await response.json()

    if (options.wait) {
      return this.waitForJob(job.job_id)
    }

    return job
  }

  async generateImage(
    prompt: string,
    options: {
      model?: string
      priority?: 'low' | 'normal' | 'high'
      size?: string
      userId?: string
      wait?: boolean
    } = {}
  ): Promise<AIJob> {
    const response = await fetch(`${this.baseUrl}/generate/image`, {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({
        prompt,
        model: options.model || 'sdxl',
        priority: options.priority || 'normal',
        size: options.size || '1024x1024',
        user_id: options.userId,
        wait: options.wait || false
      })
    })

    const job = await response.json()

    if (options.wait) {
      return this.waitForJob(job.job_id)
    }

    return job
  }

  async generateVideo(
    prompt: string,
    options: {
      model?: string
      duration?: number
      userId?: string
      wait?: boolean
    } = {}
  ): Promise<AIJob> {
    const response = await fetch(`${this.baseUrl}/generate/video`, {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({
        prompt,
        model: options.model || 'wan2.1-i2v',
        duration: options.duration || 3,
        user_id: options.userId,
        wait: options.wait || false
      })
    })

    const job = await response.json()

    if (options.wait) {
      return this.waitForJob(job.job_id)
    }

    return job
  }

  async generateCode(
    prompt: string,
    options: {
      language?: string
      priority?: 'low' | 'normal' | 'high'
      userId?: string
      wait?: boolean
    } = {}
  ): Promise<AIJob> {
    const response = await fetch(`${this.baseUrl}/generate/code`, {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({
        prompt,
        language: options.language || 'python',
        priority: options.priority || 'normal',
        user_id: options.userId,
        wait: options.wait || false
      })
    })

    const job = await response.json()

    if (options.wait) {
      return this.waitForJob(job.job_id)
    }

    return job
  }

  async getJobStatus(jobId: string): Promise<AIJob> {
    const response = await fetch(`${this.baseUrl}/job/${jobId}`)
    return response.json()
  }

  async waitForJob(
    jobId: string,
    maxAttempts: number = 120,
    pollInterval: number = 1000
  ): Promise<AIJob> {
    for (let i = 0; i < maxAttempts; i++) {
      const job = await this.getJobStatus(jobId)

      if (job.status === 'completed') {
        return job
      }

      if (job.status === 'failed') {
        throw new Error(`Job failed: ${JSON.stringify(job)}`)
      }

      await new Promise(resolve => setTimeout(resolve, pollInterval))
    }

    throw new Error(`Job ${jobId} timed out after ${maxAttempts} attempts`)
  }

  async getQueueStatus() {
    const response = await fetch(`${this.baseUrl}/queue/status`)
    return response.json()
  }

  async getCostSummary() {
    const response = await fetch(`${this.baseUrl}/costs/summary`)
    return response.json()
  }
}

// Singleton instance
export const aiOrchestrator = new AIOrchestrator()

📋 Phase 6: Testing & Validation

6.1 Test AI Orchestrator

# Test text generation
curl -X POST http://159.195.32.209:8000/generate/text \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Write a hello world program in Python",
    "priority": "normal",
    "wait": false
  }'

# Get job status
curl http://159.195.32.209:8000/job/YOUR_JOB_ID

# Check queue status
curl http://159.195.32.209:8000/queue/status

# Check costs
curl http://159.195.32.209:8000/costs/summary

6.2 Test Image Generation

# Low priority (local CPU)
curl -X POST http://159.195.32.209:8000/generate/image \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "A beautiful landscape",
    "priority": "low"
  }'

# High priority (RunPod GPU)
curl -X POST http://159.195.32.209:8000/generate/image \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "A beautiful landscape",
    "priority": "high"
  }'

6.3 Validate Migration

Checklist:

All services accessible from new IPs
SSL certificates installed and working
Databases migrated and verified
User uploads accessible
AI orchestrator responding
Monitoring dashboards working
Cost tracking functional

📋 Phase 7: DNS Updates & Cutover

7.1 Update DNS Records

# Update A records to point to Netcup RS 8000
# Old IP: 143.198.39.165 (DigitalOcean)
# New IP: 159.195.32.209 (Netcup)

# Update these domains:
# - canvas.jeffemmett.com → 159.195.32.209
# - ai-api.jeffemmett.com → 159.195.32.209
# - Any other domains hosted on droplet

7.2 Parallel Running Period

Run both servers in parallel for 1-2 weeks:

Monitor traffic on both
Compare performance
Watch for issues
Verify all features work on new server

7.3 Final Cutover

Once validated:

Update DNS TTL to 300 seconds (5 min)
Switch DNS to Netcup IPs
Monitor for 48 hours
Shut down DigitalOcean droplets
Cancel DigitalOcean subscription

📋 Phase 8: Monitoring & Optimization

8.1 Setup Monitoring Dashboards

Access your monitoring:

Grafana: http://159.195.32.209:3001
Prometheus: http://159.195.32.209:9090
AI API Docs: http://159.195.32.209:8000/docs

8.2 Cost Optimization Recommendations

# Get optimization suggestions
curl http://159.195.32.209:3000/api/recommendations

# Review daily costs
curl http://159.195.32.209:3000/api/costs/summary

8.3 Performance Tuning

Based on usage patterns:

Adjust worker pool sizes
Tune queue routing thresholds
Optimize model choices
Scale RunPod endpoints

💰 Expected Cost Breakdown

Before Migration (DigitalOcean):

Main Droplet (2 vCPU, 2GB): $18/mo
AI Droplet (2 vCPU, 4GB): $36/mo
RunPod persistent pods: $100-200/mo
Total: $154-254/mo

After Migration (Netcup + RunPod):

RS 8000 G12 Pro: €55.57/mo (~$60/mo)
RunPod serverless (70% reduction): $30-60/mo
Total: $90-120/mo

Savings:

Monthly: $64-134
Annual: $768-1,608

Plus you get:

10x CPU cores (20 vs 2)
32x RAM (64GB vs 2GB)
25x storage (3TB vs 120GB)

🎯 Next Steps Summary

TODAY: Verify Netcup RS 8000 access
Week 1: Deploy AI orchestration stack
Week 2: Migrate canvas-website and test
Week 3: Migrate remaining services
Week 4: DNS cutover and monitoring
Week 5: Decommission DigitalOcean

Total migration timeline: 4-5 weeks for safe, validated migration.

📚 Additional Resources

AI Orchestrator API Docs: http://159.195.32.209:8000/docs
Grafana Dashboards: http://159.195.32.209:3001
Queue Monitoring: http://159.195.32.209:8000/queue/status
Cost Tracking: http://159.195.32.209:3000/api/costs/summary

Ready to start? Let's begin with Phase 1: Pre-Migration Preparation! 🚀

39 KiB Raw Permalink Blame History