36 KiB
🔧 AUTO-APPROVED OPERATIONS
The following operations are auto-approved and do not require user confirmation:
- Read: All file read operations (
Read(*)) - Glob: All file pattern matching (
Glob(*)) - Grep: All content searching (
Grep(*))
These permissions are configured in ~/.claude/settings.json.
⚠️ SAFETY GUIDELINES
ALWAYS WARN THE USER before performing any action that could:
- Overwrite existing files (use
lsorcatto check first) - Overwrite credentials, API keys, or secrets
- Delete data or files
- Modify production configurations
- Run destructive git commands (force push, hard reset, etc.)
- Drop databases or truncate tables
Best practices:
- Before writing to a file, check if it exists and show its contents
- Use
>>(append) instead of>(overwrite) for credential files - Create backups before modifying critical configs (e.g.,
cp file file.backup) - Ask for confirmation before irreversible actions
Sudo commands:
- NEVER run sudo commands directly - the Bash tool doesn't support interactive input
- Instead, provide the user with the exact sudo command they need to run in their terminal
- Format the command clearly in a code block for easy copy-paste
- After user runs the sudo command, continue with the workflow
- Alternative: If user has recently run sudo (within ~15 min), subsequent sudo commands may not require password
🔑 ACCESS & CREDENTIALS
Version Control & Code Hosting
-
Gitea: Self-hosted at
gitea.jeffemmett.com- PRIMARY repository- Push here FIRST, then mirror to GitHub
- Private repos and source of truth
- SSH Key:
~/.ssh/gitea_ed25519(private),~/.ssh/gitea_ed25519.pub(public) - Public Key:
ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIIE2+2UZElEYptgZ9GFs2CXW0PIA57BfQcU9vlyV6fz4 gitea@jeffemmett.com - Gitea CLI (tea): ✅ Installed at
~/bin/tea(added to PATH)
-
GitHub: Public mirror and collaboration
- Receives pushes from Gitea via mirror sync
- Token:
(REDACTED-GITHUB-TOKEN) - SSH Key:
~/.ssh/github_deploy_key(private),~/.ssh/github_deploy_key.pub(public) - GitHub CLI (gh): ✅ Installed and available for PR/issue management
Git Workflow
Two-way sync between Gitea and GitHub:
Gitea-Primary Repos (Default):
- Develop locally in
/home/jeffe/Github/ - Commit and push to Gitea first
- Gitea automatically mirrors TO GitHub (built-in push mirror)
- GitHub used for public collaboration and visibility
GitHub-Primary Repos (Mirror Repos): For repos where GitHub is source of truth (v0.dev exports, client collabs):
- Push to GitHub
- Deploy webhook pulls from GitHub and deploys
- Webhook triggers Gitea to sync FROM GitHub
🔀 DEV BRANCH WORKFLOW (MANDATORY)
CRITICAL: All development work on canvas-website (and other active projects) MUST use a dev branch.
Branch Strategy
main (production)
└── dev (integration/staging)
└── feature/* (optional feature branches)
Development Rules
-
ALWAYS work on the
devbranch for new features and changes:cd /home/jeffe/Github/canvas-website git checkout dev git pull origin dev -
After completing a feature, push to dev:
git add . git commit -m "feat: description of changes" git push origin dev -
Update backlog task immediately after pushing:
backlog task edit <task-id> --status "Done" --append-notes "Pushed to dev branch" -
NEVER push directly to main - main is for tested, verified features only
-
Merge dev → main manually when features are verified working:
git checkout main git pull origin main git merge dev git push origin main git checkout dev # Return to dev for continued work
Complete Feature Deployment Checklist
- Work on
devbranch (not main) - Test locally before committing
- Commit with descriptive message
- Push to
devbranch on Gitea - Update backlog task status to "Done"
- Add notes to backlog task about what was implemented
- (Later) When verified working: merge dev → main manually
Why This Matters
- Protects production: main branch always has known-working code
- Enables testing: dev branch can be deployed to staging for verification
- Clean history: main only gets complete, tested features
- Easy rollback: if dev breaks, main is still stable
Server Infrastructure
-
Netcup RS 8000 G12 Pro: Primary application & AI server
- IP:
159.195.32.209 - 20 cores, 64GB RAM, 3TB storage
- Hosts local AI models (Ollama, Stable Diffusion)
- All websites and apps deployed here in Docker containers
- Location: Germany (low latency EU)
- SSH Key (local):
~/.ssh/netcup_ed25519(private),~/.ssh/netcup_ed25519.pub(public) - Public Key:
ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIKmp4A2klKv/YIB1C6JAsb2UzvlzzE+0EcJ0jtkyFuhO netcup-rs8000@jeffemmett.com - SSH Access:
ssh netcup - SSH Keys ON the server (for git operations):
- Gitea:
~/.ssh/gitea_ed25519→ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIIE2+2UZElEYptgZ9GFs2CXW0PIA57BfQcU9vlyV6fz4 gitea@jeffemmett.com - GitHub:
~/.ssh/github_ed25519→ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIC6xXNICy0HXnqHO+U7+y7ui+pZBGe0bm0iRMS23pR1E github-deploy@netcup-rs8000
- Gitea:
- IP:
-
RunPod: GPU burst capacity for AI workloads
- Host:
ssh.runpod.io - Serverless GPU pods (pay-per-use)
- Used for: SDXL/SD3, video generation, training
- Smart routing from RS 8000 orchestrator
- SSH Key:
~/.ssh/runpod_ed25519(private),~/.ssh/runpod_ed25519.pub(public) - Public Key:
ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIAC7NYjI0U/2ChGaZBBWP7gKt/V12Ts6FgatinJOQ8JG runpod@jeffemmett.com - SSH Access:
ssh runpod - API Key:
(REDACTED-RUNPOD-KEY) - CLI Config:
~/.runpod/config.toml - Serverless Endpoints:
- Image (SD):
tzf1j3sc3zufsy(Automatic1111) - Video (Wan2.2):
4jql4l7l0yw0f3 - Text (vLLM):
03g5hz3hlo8gr2 - Whisper:
lrtisuv8ixbtub - ComfyUI:
5zurj845tbf8he
- Image (SD):
- Host:
API Keys & Services
IMPORTANT: All API keys and tokens are stored securely on the Netcup server. Never store credentials locally.
- Access credentials via:
ssh netcup "cat ~/.cloudflare-credentials.env"orssh netcup "cat ~/.porkbun_credentials" - All API operations should be performed FROM the Netcup server, not locally
Credential Files on Netcup (/root/)
| File | Contents |
|---|---|
~/.cloudflare-credentials.env |
Cloudflare API tokens, account ID, tunnel token |
~/.cloudflare_credentials |
Legacy/DNS token |
~/.porkbun_credentials |
Porkbun API key and secret |
~/.v0_credentials |
V0.dev API key |
Cloudflare
- Account ID:
0e7b3338d5278ed1b148e6456b940913 - Tokens stored on Netcup - source
~/.cloudflare-credentials.env:CLOUDFLARE_API_TOKEN- Zone read, Worker:read/edit, R2:read/editCLOUDFLARE_TUNNEL_TOKEN- Tunnel managementCLOUDFLARE_ZONE_TOKEN- Zone:Edit, DNS:Edit (for adding domains)
Porkbun (Domain Registrar)
- Credentials stored on Netcup - source
~/.porkbun_credentials:PORKBUN_API_KEYandPORKBUN_SECRET_KEY
- API Endpoint:
https://api-ipv4.porkbun.com/api/json/v3/ - API Docs: https://porkbun.com/api/json/v3/documentation
- Important: JSON must have
secretapikeybeforeapikeyin requests - Capabilities: Update nameservers, get auth codes for transfers, manage DNS
- Note: Each domain must have "API Access" enabled individually in Porkbun dashboard
Domain Onboarding Workflow (Porkbun → Cloudflare)
Run these commands FROM Netcup (ssh netcup):
- Add domain to Cloudflare (creates zone, returns nameservers)
- Update nameservers at Porkbun to point to Cloudflare
- Add CNAME record pointing to Cloudflare tunnel
- Add hostname to tunnel config and restart cloudflared
- Domain is live through the tunnel!
V0.dev (AI UI Generation)
- Credentials stored on Netcup - source
~/.v0_credentials:V0_API_KEY- Platform API access
- API Key:
v1:5AwJbit4j9rhGcAKPU4XlVWs:05vyCcJLiWRVQW7Xu4u5E03G - SDK:
npm install v0-sdk(usev0CLI for adding components) - Docs: https://v0.app/docs/v0-platform-api
- Capabilities:
- List/create/update/delete projects
- Manage chats and versions
- Download generated code
- Create deployments
- Manage environment variables
- Limitations: GitHub-only for git integration (no Gitea/GitLab support)
- Usage:
const { v0 } = require('v0-sdk'); // Uses V0_API_KEY env var automatically const projects = await v0.projects.find(); const chats = await v0.chats.find();
Other Services
- HuggingFace: CLI access available for model downloads
- RunPod: API access for serverless GPU orchestration (see Server Infrastructure above)
Dev Ops Stack & Principles
- Platform: Linux WSL2 (Ubuntu on Windows) for development
- Working Directory:
/home/jeffe/Github - Container Strategy:
- ALL repos should be Dockerized
- Optimized containers for production deployment
- Docker Compose for multi-service orchestration
- Process Management: PM2 available for Node.js services
- Version Control: Git configured with GitHub + Gitea mirrors
- Package Managers: npm/pnpm/yarn available
🚀 Traefik Reverse Proxy (Central Routing)
All HTTP services on Netcup RS 8000 route through Traefik for automatic service discovery.
Architecture:
Internet → Cloudflare Tunnel → Traefik (:80/:443) → Docker Services
│
├── gitea.jeffemmett.com → gitea:3000
├── mycofi.earth → mycofi:3000
├── games.jeffemmett.com → games:80
└── [auto-discovered via Docker labels]
Location: /root/traefik/ on Netcup RS 8000
Adding a New Service:
# In your docker-compose.yml, add these labels:
services:
myapp:
image: myapp:latest
labels:
- "traefik.enable=true"
- "traefik.http.routers.myapp.rule=Host(`myapp.jeffemmett.com`)"
- "traefik.http.services.myapp.loadbalancer.server.port=3000"
networks:
- traefik-public
networks:
traefik-public:
external: true
Traefik Dashboard: http://159.195.32.209:8888 (internal only)
SSH Git Access:
- SSH goes direct (not through Traefik):
git.jeffemmett.com:223→159.195.32.209:223 - Web UI goes through Traefik:
gitea.jeffemmett.com→ Traefik → gitea:3000
☁️ Cloudflare Tunnel Configuration
Location: /root/cloudflared/ on Netcup RS 8000
The tunnel uses a token-based configuration managed via Cloudflare Zero Trust Dashboard.
All public hostnames should point to http://localhost:80 (Traefik), which routes based on Host header.
Managed hostnames:
gitea.jeffemmett.com→ Traefik → Giteaphotos.jeffemmett.com→ Traefik → Immichmovies.jeffemmett.com→ Traefik → Jellyfinsearch.jeffemmett.com→ Traefik → Semantic Searchmycofi.earth→ Traefik → MycoFigames.jeffemmett.com→ Traefik → Games Platformdecolonizeti.me→ Traefik → Decolonize Time
Tunnel ID: a838e9dc-0af5-4212-8af2-6864eb15e1b5
Tunnel CNAME Target: a838e9dc-0af5-4212-8af2-6864eb15e1b5.cfargotunnel.com
To deploy a new website/service:
-
Dockerize the project with Traefik labels in
docker-compose.yml:services: myapp: build: . labels: - "traefik.enable=true" - "traefik.http.routers.myapp.rule=Host(`mydomain.com`) || Host(`www.mydomain.com`)" - "traefik.http.services.myapp.loadbalancer.server.port=3000" networks: - traefik-public networks: traefik-public: external: true -
Deploy to Netcup:
ssh netcup "cd /opt/websites && git clone <repo-url>" ssh netcup "cd /opt/websites/<project> && docker compose up -d --build" -
Add hostname to tunnel config (
/root/cloudflared/config.yml):- hostname: mydomain.com service: http://localhost:80 - hostname: www.mydomain.com service: http://localhost:80Then restart:
ssh netcup "docker restart cloudflared" -
Configure DNS in Cloudflare dashboard (CRITICAL - prevents 525 SSL errors):
- Go to Cloudflare Dashboard → select domain → DNS → Records
- Delete any existing A/AAAA records for
@andwww - Add CNAME records:
Type Name Target Proxy CNAME @a838e9dc-0af5-4212-8af2-6864eb15e1b5.cfargotunnel.comProxied ✓ CNAME wwwa838e9dc-0af5-4212-8af2-6864eb15e1b5.cfargotunnel.comProxied ✓
API Credentials (on Netcup at ~/.cloudflare*):
CLOUDFLARE_API_TOKEN- Zone read access onlyCLOUDFLARE_TUNNEL_TOKEN- Tunnel management only- See API Keys & Services section above for Domain Management Token (required for DNS automation)
🔄 Auto-Deploy Webhook System
Location: /opt/deploy-webhook/ on Netcup RS 8000
Endpoint: https://deploy.jeffemmett.com/deploy/<repo-name>
Secret: gitea-deploy-secret-2025
Pushes to Gitea automatically trigger rebuilds. The webhook receiver:
- Validates HMAC signature from Gitea
- Runs
git pull && docker compose up -d --build - Returns build status
Adding a new repo to auto-deploy:
- Add entry to
/opt/deploy-webhook/webhook.pyREPOS dict - Restart:
ssh netcup "cd /opt/deploy-webhook && docker compose up -d --build" - Add Gitea webhook:
curl -X POST "https://gitea.jeffemmett.com/api/v1/repos/jeffemmett/<repo>/hooks" \ -H "Authorization: token <gitea-token>" \ -H "Content-Type: application/json" \ -d '{"type":"gitea","active":true,"events":["push"],"config":{"url":"https://deploy.jeffemmett.com/deploy/<repo>","content_type":"json","secret":"gitea-deploy-secret-2025"}}'
Currently auto-deploying:
decolonize-time-website→ /opt/websites/decolonize-time-websitemycofi-earth-website→ /opt/websites/mycofi-earth-websitegames-platform→ /opt/apps/games-platform
🔐 SSH Keys Quick Reference
Local keys (in ~/.ssh/ on your laptop):
| Service | Private Key | Public Key | Purpose |
|---|---|---|---|
| Gitea | gitea_ed25519 |
gitea_ed25519.pub |
Primary git repository |
| GitHub | github_deploy_key |
github_deploy_key.pub |
Public mirror sync |
| Netcup RS 8000 | netcup_ed25519 |
netcup_ed25519.pub |
Primary server SSH |
| RunPod | runpod_ed25519 |
runpod_ed25519.pub |
GPU pods SSH |
| Default | id_ed25519 |
id_ed25519.pub |
General purpose/legacy |
Server-side keys (in /root/.ssh/ on Netcup RS 8000):
| Service | Key File | Purpose |
|---|---|---|
| Gitea | gitea_ed25519 |
Server pulls from Gitea repos |
| GitHub | github_ed25519 |
Server pulls from GitHub (mirror repos) |
SSH Config: ~/.ssh/config contains all host configurations
Quick Access:
ssh netcup- Connect to Netcup RS 8000ssh runpod- Connect to RunPodssh gitea.jeffemmett.com- Git operations
🤖 AI ORCHESTRATION ARCHITECTURE
Smart Routing Strategy
All AI requests go through intelligent orchestration layer on RS 8000:
Routing Logic:
- Text/Code (70-80% of workload): Always local RS 8000 CPU (Ollama) → FREE
- Images - Low Priority: RS 8000 CPU (SD 1.5/2.1) → FREE but slow (~60s)
- Images - High Priority: RunPod GPU (SDXL/SD3) → $0.02/image, fast
- Video Generation: Always RunPod GPU → $0.50/video (only option)
- Training/Fine-tuning: RunPod GPU on-demand
Queue System:
- Redis-based queues: text, image, code, video
- Priority-based routing (low/normal/high)
- Worker pools scale based on load
- Cost tracking per job, per user
Cost Optimization:
- Target: $90-120/mo (vs $136-236/mo current)
- Savings: $552-1,392/year
- 70-80% of workload FREE (local CPU)
- GPU only when needed (serverless = no idle costs)
Deployment Architecture
RS 8000 G12 Pro (Netcup)
├── Cloudflare Tunnel (secure ingress)
├── Traefik Reverse Proxy (auto-discovery)
│ └── Routes to all services via Docker labels
├── Core Services
│ ├── Gitea (git hosting) - gitea.jeffemmett.com
│ └── Other internal tools
├── AI Services
│ ├── Ollama (text/code models)
│ ├── Stable Diffusion (CPU fallback)
│ └── Smart Router API (FastAPI)
├── Queue Infrastructure
│ ├── Redis (job queues)
│ └── PostgreSQL (job history/analytics)
├── Monitoring
│ ├── Prometheus (metrics)
│ ├── Grafana (dashboards)
│ └── Cost tracking API
└── Application Hosting
├── All websites (Dockerized + Traefik labels)
├── All apps (Dockerized + Traefik labels)
└── Backend services (Dockerized)
RunPod Serverless (GPU Burst)
├── SDXL/SD3 endpoints
├── Video generation (Wan2.1)
└── Training/fine-tuning jobs
Integration Pattern for Projects
All projects use unified AI client SDK:
from orchestrator_client import AIOrchestrator
ai = AIOrchestrator("http://rs8000-ip:8000")
# Automatically routes based on priority & model
result = await ai.generate_text(prompt, priority="low") # → FREE CPU
result = await ai.generate_image(prompt, priority="high") # → RunPod GPU
💰 GPU COST ANALYSIS & MIGRATION PLAN
Current Infrastructure Costs (Monthly)
| Service | Type | Cost | Notes |
|---|---|---|---|
| Netcup RS 8000 G12 Pro | Fixed | ~€45 | 20 cores, 64GB RAM, 3TB (CPU-only) |
| RunPod Serverless | Variable | $50-100 | Pay-per-use GPU (images, video) |
| DigitalOcean Droplets | Fixed | ~$48 | ⚠️ DEPRECATED - migrate ASAP |
| Current Total | ~$140-190/mo |
GPU Provider Comparison
Netcup vGPU (NEW - Early Access, Ends July 7, 2025)
| Plan | GPU | VRAM | vCores | RAM | Storage | Price/mo | Price/hr equiv |
|---|---|---|---|---|---|---|---|
| RS 2000 vGPU 7 | H200 | 7 GB dedicated | 8 | 16 GB DDR5 | 512 GB NVMe | €137.31 (~$150) | $0.21/hr |
| RS 4000 vGPU 14 | H200 | 14 GB dedicated | 12 | 32 GB DDR5 | 1 TB NVMe | €261.39 (~$285) | $0.40/hr |
Pros:
- NVIDIA H200 (latest gen, better than H100 for inference)
- Dedicated VRAM (no noisy neighbors)
- Germany location (EU data sovereignty, low latency to RS 8000)
- Fixed monthly cost = predictable budgeting
- 24/7 availability, no cold starts
Cons:
- Pay even when idle
- Limited to 7GB or 14GB VRAM options
- Early access = limited availability
RunPod Serverless (Current)
| GPU | VRAM | Price/hr | Typical Use |
|---|---|---|---|
| RTX 4090 | 24 GB | ~$0.44/hr | SDXL, medium models |
| A100 40GB | 40 GB | ~$1.14/hr | Large models, training |
| H100 80GB | 80 GB | ~$2.49/hr | Largest models |
Current Endpoint Costs:
- Image (SD/SDXL): ~$0.02/image (~2s compute)
- Video (Wan2.2): ~$0.50/video (~60s compute)
- Text (vLLM): ~$0.001/request
- Whisper: ~$0.01/minute audio
Pros:
- Zero idle costs
- Unlimited burst capacity
- Wide GPU selection (up to 80GB VRAM)
- Pay only for actual compute
Cons:
- Cold start delays (10-30s first request)
- Variable availability during peak times
- Per-request costs add up at scale
Break-even Analysis
When does Netcup vGPU become cheaper than RunPod?
| Scenario | RunPod Cost | Netcup RS 2000 vGPU 7 | Netcup RS 4000 vGPU 14 |
|---|---|---|---|
| 1,000 images/mo | $20 | $150 ❌ | $285 ❌ |
| 5,000 images/mo | $100 | $150 ❌ | $285 ❌ |
| 7,500 images/mo | $150 | $150 ✅ | $285 ❌ |
| 10,000 images/mo | $200 | $150 ✅ | $285 ❌ |
| 14,250 images/mo | $285 | $150 ✅ | $285 ✅ |
| 100 videos/mo | $50 | $150 ❌ | $285 ❌ |
| 300 videos/mo | $150 | $150 ✅ | $285 ❌ |
| 500 videos/mo | $250 | $150 ✅ | $285 ❌ |
Recommendation by Usage Pattern:
| Monthly Usage | Best Option | Est. Cost |
|---|---|---|
| < 5,000 images OR < 250 videos | RunPod Serverless | $50-100 |
| 5,000-10,000 images OR 250-500 videos | Netcup RS 2000 vGPU 7 | $150 fixed |
| > 10,000 images OR > 500 videos + training | Netcup RS 4000 vGPU 14 | $285 fixed |
| Unpredictable/bursty workloads | RunPod Serverless | Variable |
Migration Strategy
Phase 1: Immediate (Before July 7, 2025)
Decision Point: Secure Netcup vGPU Early Access?
- Monitor actual GPU usage for 2-4 weeks
- Calculate average monthly image/video generation
- If consistently > 5,000 images/mo → Consider RS 2000 vGPU 7
- If consistently > 10,000 images/mo → Consider RS 4000 vGPU 14
- ACTION: Redeem early access code if usage justifies fixed GPU
Phase 2: Hybrid Architecture (If vGPU Acquired)
RS 8000 G12 Pro (CPU - Current)
├── Ollama (text/code) → FREE
├── SD 1.5/2.1 CPU fallback → FREE
└── Orchestrator API
Netcup vGPU Server (NEW - If purchased)
├── Primary GPU workloads
├── SDXL/SD3 generation
├── Video generation (Wan2.1 I2V)
├── Model inference (14B params with 14GB VRAM)
└── Connected via internal netcup network (low latency)
RunPod Serverless (Burst Only)
├── Overflow capacity
├── Models requiring > 14GB VRAM
├── Training/fine-tuning jobs
└── Geographic distribution needs
Phase 3: Cost Optimization Targets
| Scenario | Current | With vGPU Migration | Savings |
|---|---|---|---|
| Low usage | $140/mo | $95/mo (RS8000 + minimal RunPod) | $540/yr |
| Medium usage | $190/mo | $195/mo (RS8000 + vGPU 7) | Break-even |
| High usage | $250/mo | $195/mo (RS8000 + vGPU 7) | $660/yr |
| Very high usage | $350/mo | $330/mo (RS8000 + vGPU 14) | $240/yr |
Model VRAM Requirements Reference
| Model | VRAM Needed | Fits vGPU 7? | Fits vGPU 14? |
|---|---|---|---|
| SD 1.5 | ~4 GB | ✅ | ✅ |
| SD 2.1 | ~5 GB | ✅ | ✅ |
| SDXL | ~7 GB | ⚠️ Tight | ✅ |
| SD3 Medium | ~8 GB | ❌ | ✅ |
| Wan2.1 I2V 14B | ~12 GB | ❌ | ✅ |
| Wan2.1 T2V 14B | ~14 GB | ❌ | ⚠️ Tight |
| Flux.1 Dev | ~12 GB | ❌ | ✅ |
| LLaMA 3 8B (Q4) | ~6 GB | ✅ | ✅ |
| LLaMA 3 70B (Q4) | ~40 GB | ❌ | ❌ (RunPod) |
Decision Framework
┌─────────────────────────────────────────────────────────┐
│ GPU WORKLOAD DECISION TREE │
├─────────────────────────────────────────────────────────┤
│ │
│ Is usage predictable and consistent? │
│ ├── YES → Is monthly GPU spend > $150? │
│ │ ├── YES → Netcup vGPU (fixed cost wins) │
│ │ └── NO → RunPod Serverless (no idle cost) │
│ └── NO → RunPod Serverless (pay for what you use) │
│ │
│ Does model require > 14GB VRAM? │
│ ├── YES → RunPod (A100/H100 on-demand) │
│ └── NO → Netcup vGPU or RS 8000 CPU │
│ │
│ Is low latency critical? │
│ ├── YES → Netcup vGPU (same datacenter as RS 8000) │
│ └── NO → RunPod Serverless (acceptable for batch) │
│ │
└─────────────────────────────────────────────────────────┘
Monitoring & Review Schedule
- Weekly: Review RunPod spend dashboard
- Monthly: Calculate total GPU costs, compare to vGPU break-even
- Quarterly: Re-evaluate architecture, consider plan changes
- Annually: Full infrastructure cost audit
Action Items
- URGENT: Decide on Netcup vGPU early access before July 7, 2025
- Set up GPU usage tracking in orchestrator
- Create Grafana dashboard for cost monitoring
- Test Wan2.1 I2V 14B model on vGPU 14 (if acquired)
- Document migration runbook for vGPU setup
- Complete DigitalOcean deprecation (separate from GPU decision)
📁 PROJECT PORTFOLIO STRUCTURE
Repository Organization
- Location:
/home/jeffe/Github/ - Primary Flow: Gitea (source of truth) → GitHub (public mirror)
- Containerization: ALL repos must be Dockerized with optimized production containers
🎯 MAIN PROJECT: canvas-website
Location: /home/jeffe/Github/canvas-website
Description: Collaborative canvas deployment - the integration hub where all tools come together
- Tldraw-based collaborative canvas platform
- Integrates Hyperindex, rSpace, MycoFi, and other tools
- Real-time collaboration features
- Deployed on RS 8000 in Docker
- Uses AI orchestrator for intelligent features
Project Categories
AI & Infrastructure:
- AI Orchestrator (smart routing between RS 8000 & RunPod)
- Model hosting & fine-tuning pipelines
- Cost optimization & monitoring dashboards
Web Applications & Sites:
- canvas-website: Main collaborative canvas (integration hub)
- All deployed in Docker containers on RS 8000
- Cloudflare Workers for edge functions (Hyperindex)
- Static sites + dynamic backends containerized
Supporting Projects:
- Hyperindex: Tldraw canvas integration (Cloudflare stack) - integrates into canvas-website
- rSpace: Real-time collaboration platform - integrates into canvas-website
- MycoFi: DeFi/Web3 project - integrates into canvas-website
- Canvas-related tools: Knowledge graph & visualization components
Deployment Strategy
- Development: Local WSL2 environment (
/home/jeffe/Github/) - Version Control: Push to Gitea FIRST → Auto-mirror to GitHub
- Containerization: Build optimized Docker images with Traefik labels
- Deployment: Deploy to RS 8000 via Docker Compose (join
traefik-publicnetwork) - Routing: Traefik auto-discovers service via labels, no config changes needed
- DNS: Add hostname to Cloudflare tunnel (if new domain) or it just works (existing domains)
- AI Integration: Connect to local orchestrator API
- Monitoring: Grafana dashboards for all services
Infrastructure Philosophy
- Self-hosted first: Own your infrastructure (RS 8000 + Gitea)
- Cloud for edge cases: Cloudflare (edge), RunPod (GPU burst)
- Cost-optimized: Local CPU for 70-80% of workload
- Dockerized everything: Reproducible, scalable, maintainable
- Smart orchestration: Right compute for the right job
- can you make sure you are runing the hf download for a non deprecated version? After that, you can proceed with Image-to-Video 14B 720p (RECOMMENDED)
huggingface-cli download Wan-AI/Wan2.1-I2V-14B-720P
--include "*.safetensors"
--local-dir models/diffusion_models/wan2.1_i2v_14b
🕸️ HYPERINDEX PROJECT - TOP PRIORITY
Location: /home/jeffe/Github/hyperindex-system/
When user is ready to work on the hyperindexing system:
- Reference
HYPERINDEX_PROJECT.mdfor complete architecture and implementation details - Follow
HYPERINDEX_TODO.mdfor step-by-step checklist - Start with Phase 1 (Database & Core Types), then proceed sequentially through Phase 5
- This is a tldraw canvas integration project using Cloudflare Workers, D1, R2, and Durable Objects
- Creates a "living, mycelial network" of web discoveries that spawn on the canvas in real-time
📋 BACKLOG.MD - UNIFIED TASK MANAGEMENT
All projects use Backlog.md for task tracking. Tasks are managed as markdown files and can be viewed at backlog.jeffemmett.com for a unified cross-project view.
MCP Integration
Backlog.md is integrated via MCP server. Available tools:
backlog.task_create- Create new tasksbacklog.task_list- List tasks with filtersbacklog.task_update- Update task status/detailsbacklog.task_view- View task detailsbacklog.search- Search across tasks, docs, decisions
Task Lifecycle Workflow
CRITICAL: Claude agents MUST follow this workflow for ALL development tasks:
1. Task Discovery (Before Starting Work)
# Check if task already exists
backlog search "<task description>" --plain
# List current tasks
backlog task list --plain
2. Task Creation (If Not Exists)
# Create task with full details
backlog task create "Task Title" \
--desc "Detailed description" \
--priority high \
--status "To Do"
3. Starting Work (Move to In Progress)
# Update status when starting
backlog task edit <task-id> --status "In Progress"
4. During Development (Update Notes)
# Append progress notes
backlog task edit <task-id> --append-notes "Completed X, working on Y"
# Update acceptance criteria
backlog task edit <task-id> --check-ac 1
5. Completion (Move to Done)
# Mark complete when finished
backlog task edit <task-id> --status "Done"
Project Initialization
When starting work in a new repository that doesn't have backlog:
cd /path/to/repo
backlog init "Project Name" --integration-mode mcp --defaults
This creates the backlog/ directory structure:
backlog/
├── config.yml # Project configuration
├── tasks/ # Active tasks
├── completed/ # Finished tasks
├── drafts/ # Draft tasks
├── docs/ # Project documentation
├── decisions/ # Architecture decision records
└── archive/ # Archived tasks
Task File Format
Tasks are markdown files with YAML frontmatter:
---
id: task-001
title: Feature implementation
status: In Progress
assignee: [@claude]
created_date: '2025-12-03 14:30'
labels: [feature, backend]
priority: high
dependencies: [task-002]
---
## Description
What needs to be done...
## Plan
1. Step one
2. Step two
## Acceptance Criteria
- [ ] Criterion 1
- [x] Criterion 2 (completed)
## Notes
Progress updates go here...
Cross-Project Aggregation (backlog.jeffemmett.com)
Architecture:
┌─────────────────────────────────────────────────────────────┐
│ backlog.jeffemmett.com │
│ (Unified Kanban Dashboard) │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ canvas-web │ │ hyperindex │ │ mycofi │ ... │
│ │ (purple) │ │ (green) │ │ (blue) │ │
│ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │
│ │ │ │ │
│ └────────────────┴────────────────┘ │
│ │ │
│ ┌───────────┴───────────┐ │
│ │ Aggregation API │ │
│ │ (polls all projects) │ │
│ └───────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘
Data Sources:
├── Local: /home/jeffe/Github/*/backlog/
└── Remote: ssh netcup "ls /opt/*/backlog/"
Color Coding by Project:
| Project | Color | Location |
|---|---|---|
| canvas-website | Purple | Local + Netcup |
| hyperindex-system | Green | Local |
| mycofi-earth | Blue | Local + Netcup |
| decolonize-time | Orange | Local + Netcup |
| ai-orchestrator | Red | Netcup |
Aggregation Service (to be deployed on Netcup):
- Polls all project
backlog/tasks/directories - Serves unified JSON API at
api.backlog.jeffemmett.com - Web UI at
backlog.jeffemmett.comshows combined Kanban - Real-time updates via WebSocket
- Filter by project, status, priority, assignee
Agent Behavior Requirements
When Claude starts working on ANY task:
-
Check for existing backlog in the repo:
ls backlog/config.yml 2>/dev/null || echo "Backlog not initialized" -
If backlog exists, search for related tasks:
backlog search "<relevant keywords>" --plain -
Create or update task before writing code:
# If new task needed: backlog task create "Task title" --status "In Progress" # If task exists: backlog task edit <id> --status "In Progress" -
Update task on completion:
backlog task edit <id> --status "Done" --append-notes "Implementation complete" -
Never leave tasks in "In Progress" when stopping work - either complete them or add notes explaining blockers.
Viewing Tasks
Terminal Kanban Board:
backlog board
Web Interface (single project):
backlog browser --port 6420
Unified View (all projects):
Visit backlog.jeffemmett.com (served from Netcup)
Backlog CLI Quick Reference
Task Operations
| Action | Command |
|---|---|
| View task | backlog task 42 --plain |
| List tasks | backlog task list --plain |
| Search tasks | backlog search "topic" --plain |
| Filter by status | backlog task list -s "In Progress" --plain |
| Create task | backlog task create "Title" -d "Description" --ac "Criterion 1" |
| Edit task | backlog task edit 42 -t "New Title" -s "In Progress" |
| Assign task | backlog task edit 42 -a @claude |
Acceptance Criteria Management
| Action | Command |
|---|---|
| Add AC | backlog task edit 42 --ac "New criterion" |
| Check AC #1 | backlog task edit 42 --check-ac 1 |
| Check multiple | backlog task edit 42 --check-ac 1 --check-ac 2 |
| Uncheck AC | backlog task edit 42 --uncheck-ac 1 |
| Remove AC | backlog task edit 42 --remove-ac 2 |
Multi-line Input (Description/Plan/Notes)
The CLI preserves input literally. Use shell-specific syntax for real newlines:
# Bash/Zsh (ANSI-C quoting)
backlog task edit 42 --notes $'Line1\nLine2\nLine3'
backlog task edit 42 --plan $'1. Step one\n2. Step two'
# POSIX portable
backlog task edit 42 --notes "$(printf 'Line1\nLine2')"
# Append notes progressively
backlog task edit 42 --append-notes $'- Completed X\n- Working on Y'
Definition of Done (DoD)
A task is Done only when ALL of these are complete:
Via CLI:
- All acceptance criteria checked:
--check-ac <index>for each - Implementation notes added:
--notes "..."or--append-notes "..." - Status set to Done:
-s Done
Via Code/Testing: 4. Tests pass (run test suite and linting) 5. Documentation updated if needed 6. Code self-reviewed 7. No regressions
NEVER mark a task as Done without completing ALL items above.
Configuration Reference
🔧 TROUBLESHOOTING
tmux "server exited unexpectedly"
This error occurs when a stale socket file exists from a crashed tmux server.
Fix:
rm -f /tmp/tmux-$(id -u)/default
Then start a new session normally with tmux or tmux new -s <name>.
Default backlog/config.yml:
project_name: "Project Name"
default_status: "To Do"
statuses: ["To Do", "In Progress", "Done"]
labels: []
milestones: []
date_format: yyyy-mm-dd
max_column_width: 20
auto_open_browser: true
default_port: 6420
remote_operations: true
auto_commit: true
zero_padded_ids: 3
bypass_git_hooks: false
check_active_branches: true
active_branch_days: 60